## What Change `Processor.OnEmit` methods to accept a record pointer. ## Why Fixes https://github.com/open-telemetry/opentelemetry-go/issues/5219 This would be specification compliant according to discussions around https://github.com/open-telemetry/opentelemetry-specification/pull/4067 This is inline of how processors Go span processors works and how log processors work in other languages. If the performance (an additional heap allocation during log processing) would occur to be a significant problem for some users, we have at few possible solutions: 1. Utilize PGO which may also lead to decreasing heap allocations (sources: https://landontclipp.github.io/blog/2023/08/25/profile-guided-optimizations-in-go/#devirtualization, https://andrewwphillips.github.io/blog/pgo.html). Currently it does not but I expect it may change in future. 2. Improve the Go compilers escape analysis (related to previous point) 3. introduce new "chaining processor" which can be also desirable in other languages ## Benchstat `old` is from `main`. `new` is from current branch. `new-pgo` is from current branch with PGO optimization. I first run benchmarks to generate a CPU profile using `go test -run=^$ -bench=. -count=10 -cpuprofile default.pgo` and then I rerun the tests with PGO. Currently, the number of heap allocations is the same. ``` goos: linux goarch: amd64 pkg: go.opentelemetry.io/otel/sdk/log cpu: Intel(R) Core(TM) i9-10885H CPU @ 2.40GHz │ old.txt │ new.txt │ new-pgo.txt │ │ sec/op │ sec/op vs base │ sec/op vs base │ BatchProcessorOnEmit-16 402.7n ± 18% 382.0n ± 7% ~ (p=0.247 n=10) 376.7n ± 14% ~ (p=0.210 n=10) Processor/Simple-16 350.9n ± 9% 782.5n ± 6% +123.00% (p=0.000 n=10) 755.1n ± 5% +115.19% (p=0.000 n=10) Processor/Batch-16 1.333µ ± 15% 1.497µ ± 11% +12.27% (p=0.000 n=10) 1.528µ ± 8% +14.63% (p=0.000 n=10) Processor/SetTimestampSimple-16 329.5n ± 15% 711.6n ± 4% +115.93% (p=0.000 n=10) 721.9n ± 5% +119.04% (p=0.000 n=10) Processor/SetTimestampBatch-16 1.163µ ± 2% 1.524µ ± 3% +31.03% (p=0.000 n=10) 1.461µ ± 5% +25.57% (p=0.000 n=10) Processor/AddAttributesSimple-16 408.7n ± 3% 810.1n ± 4% +98.21% (p=0.000 n=10) 830.1n ± 4% +103.11% (p=0.000 n=10) Processor/AddAttributesBatch-16 1.270µ ± 2% 1.623µ ± 4% +27.71% (p=0.000 n=10) 1.597µ ± 7% +25.66% (p=0.000 n=10) Processor/SetAttributesSimple-16 436.2n ± 10% 796.1n ± 3% +82.50% (p=0.000 n=10) 817.6n ± 4% +87.43% (p=0.000 n=10) Processor/SetAttributesBatch-16 1.202µ ± 2% 1.552µ ± 2% +29.06% (p=0.000 n=10) 1.659µ ± 11% +37.96% (p=0.000 n=10) LoggerNewRecord/5_attributes-16 366.6n ± 3% 363.7n ± 7% ~ (p=0.952 n=10) 426.2n ± 7% +16.27% (p=0.000 n=10) LoggerNewRecord/10_attributes-16 1.711µ ± 2% 1.909µ ± 18% +11.57% (p=0.000 n=10) 2.077µ ± 10% +21.39% (p=0.000 n=10) LoggerProviderLogger-16 650.1n ± 4% 690.1n ± 8% +6.15% (p=0.019 n=10) 737.6n ± 13% +13.47% (p=0.004 n=10) WalkAttributes/1_attributes-16 5.264n ± 12% 5.510n ± 8% ~ (p=0.812 n=10) 5.865n ± 5% +11.41% (p=0.011 n=10) WalkAttributes/10_attributes-16 5.440n ± 8% 5.881n ± 7% +8.12% (p=0.004 n=10) 6.104n ± 7% +12.21% (p=0.005 n=10) WalkAttributes/100_attributes-16 5.403n ± 9% 5.894n ± 9% +9.10% (p=0.029 n=10) 5.783n ± 6% ~ (p=0.052 n=10) WalkAttributes/1000_attributes-16 5.196n ± 4% 5.860n ± 8% +12.79% (p=0.000 n=10) 5.981n ± 13% +15.13% (p=0.002 n=10) SetAddAttributes/SetAttributes-16 181.2n ± 14% 208.1n ± 12% +14.85% (p=0.005 n=10) 209.9n ± 11% +15.87% (p=0.007 n=10) SetAddAttributes/AddAttributes-16 156.7n ± 14% 161.1n ± 16% ~ (p=0.190 n=10) 165.5n ± 15% ~ (p=0.315 n=10) SimpleProcessorOnEmit-16 11.775n ± 10% 9.027n ± 17% -23.33% (p=0.000 n=10) 9.389n ± 18% -20.26% (p=0.002 n=10) geomean 169.1n 209.6n +23.98% 215.5n +27.48% │ old.txt │ new.txt │ new-pgo.txt │ │ B/s │ B/s vs base │ B/s vs base │ BatchProcessorOnEmit-16 1004.39Mi ± 15% 79.88Mi ± 7% -92.05% (p=0.000 n=10) 81.06Mi ± 12% -91.93% (p=0.000 n=10) │ old.txt │ new.txt │ new-pgo.txt │ │ B/op │ B/op vs base │ B/op vs base │ BatchProcessorOnEmit-16 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ 0.000 ± 0% ~ (p=1.000 n=10) ¹ Processor/Simple-16 0.0 ± 0% 417.0 ± 0% ? (p=0.000 n=10) 417.0 ± 0% ? (p=0.000 n=10) Processor/Batch-16 621.5 ± 2% 1057.5 ± 1% +70.15% (p=0.000 n=10) 1064.5 ± 1% +71.28% (p=0.000 n=10) Processor/SetTimestampSimple-16 0.0 ± 0% 417.0 ± 0% ? (p=0.000 n=10) 418.0 ± 0% ? (p=0.000 n=10) Processor/SetTimestampBatch-16 626.5 ± 3% 1049.5 ± 1% +67.52% (p=0.000 n=10) 1057.5 ± 2% +68.79% (p=0.000 n=10) Processor/AddAttributesSimple-16 0.0 ± 0% 417.0 ± 0% ? (p=0.000 n=10) 417.0 ± 0% ? (p=0.000 n=10) Processor/AddAttributesBatch-16 616.5 ± 3% 1053.0 ± 2% +70.80% (p=0.000 n=10) 1048.5 ± 2% +70.07% (p=0.000 n=10) Processor/SetAttributesSimple-16 48.00 ± 0% 466.00 ± 0% +870.83% (p=0.000 n=10) 466.00 ± 0% +870.83% (p=0.000 n=10) Processor/SetAttributesBatch-16 648.0 ± 3% 1089.5 ± 1% +68.13% (p=0.000 n=10) 1087.5 ± 4% +67.82% (p=0.000 n=10) LoggerNewRecord/5_attributes-16 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ 0.000 ± 0% ~ (p=1.000 n=10) ¹ LoggerNewRecord/10_attributes-16 610.0 ± 0% 610.0 ± 0% ~ (p=1.000 n=10) ¹ 610.0 ± 0% ~ (p=1.000 n=10) ¹ LoggerProviderLogger-16 354.5 ± 6% 368.0 ± 7% ~ (p=0.288 n=10) 391.0 ± 29% ~ (p=0.239 n=10) WalkAttributes/1_attributes-16 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ 0.000 ± 0% ~ (p=1.000 n=10) ¹ WalkAttributes/10_attributes-16 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ 0.000 ± 0% ~ (p=1.000 n=10) ¹ WalkAttributes/100_attributes-16 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ 0.000 ± 0% ~ (p=1.000 n=10) ¹ WalkAttributes/1000_attributes-16 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ 0.000 ± 0% ~ (p=1.000 n=10) ¹ SetAddAttributes/SetAttributes-16 48.00 ± 0% 48.00 ± 0% ~ (p=1.000 n=10) ¹ 48.00 ± 0% ~ (p=1.000 n=10) ¹ SetAddAttributes/AddAttributes-16 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ 0.000 ± 0% ~ (p=1.000 n=10) ¹ SimpleProcessorOnEmit-16 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ 0.000 ± 0% ~ (p=1.000 n=10) ¹ geomean ² ? ² ? ² ¹ all samples are equal ² summaries must be >0 to compute geomean │ old.txt │ new.txt │ new-pgo.txt │ │ allocs/op │ allocs/op vs base │ allocs/op vs base │ BatchProcessorOnEmit-16 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ 0.000 ± 0% ~ (p=1.000 n=10) ¹ Processor/Simple-16 0.000 ± 0% 1.000 ± 0% ? (p=0.000 n=10) 1.000 ± 0% ? (p=0.000 n=10) Processor/Batch-16 0.000 ± 0% 1.000 ± 0% ? (p=0.000 n=10) 1.000 ± 0% ? (p=0.000 n=10) Processor/SetTimestampSimple-16 0.000 ± 0% 1.000 ± 0% ? (p=0.000 n=10) 1.000 ± 0% ? (p=0.000 n=10) Processor/SetTimestampBatch-16 0.000 ± 0% 1.000 ± 0% ? (p=0.000 n=10) 1.000 ± 0% ? (p=0.000 n=10) Processor/AddAttributesSimple-16 0.000 ± 0% 1.000 ± 0% ? (p=0.000 n=10) 1.000 ± 0% ? (p=0.000 n=10) Processor/AddAttributesBatch-16 0.000 ± 0% 1.000 ± 0% ? (p=0.000 n=10) 1.000 ± 0% ? (p=0.000 n=10) Processor/SetAttributesSimple-16 1.000 ± 0% 2.000 ± 0% +100.00% (p=0.000 n=10) 2.000 ± 0% +100.00% (p=0.000 n=10) Processor/SetAttributesBatch-16 1.000 ± 0% 2.000 ± 0% +100.00% (p=0.000 n=10) 2.000 ± 0% +100.00% (p=0.000 n=10) LoggerNewRecord/5_attributes-16 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ 0.000 ± 0% ~ (p=1.000 n=10) ¹ LoggerNewRecord/10_attributes-16 4.000 ± 0% 4.000 ± 0% ~ (p=1.000 n=10) ¹ 4.000 ± 0% ~ (p=1.000 n=10) ¹ LoggerProviderLogger-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ 1.000 ± 0% ~ (p=1.000 n=10) ¹ WalkAttributes/1_attributes-16 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ 0.000 ± 0% ~ (p=1.000 n=10) ¹ WalkAttributes/10_attributes-16 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ 0.000 ± 0% ~ (p=1.000 n=10) ¹ WalkAttributes/100_attributes-16 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ 0.000 ± 0% ~ (p=1.000 n=10) ¹ WalkAttributes/1000_attributes-16 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ 0.000 ± 0% ~ (p=1.000 n=10) ¹ SetAddAttributes/SetAttributes-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ 1.000 ± 0% ~ (p=1.000 n=10) ¹ SetAddAttributes/AddAttributes-16 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ 0.000 ± 0% ~ (p=1.000 n=10) ¹ SimpleProcessorOnEmit-16 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ 0.000 ± 0% ~ (p=1.000 n=10) ¹ geomean ² ? ² ? ² ¹ all samples are equal ² summaries must be >0 to compute geomean ```
8.2 KiB
Logs SDK
Abstract
go.opentelemetry.io/otel/sdk/log
provides Logs SDK compliant with the
specification.
The prototype was created in #4955.
Background
The goal is to design the exported API of the SDK would have low performance overhead. Most importantly, have a design that reduces the amount of heap allocations and even make it possible to have a zero-allocation implementation. Eliminating the amount of heap allocations reduces the GC pressure which can produce some of the largest improvements in performance.1
The main and recommended use case is to configure the SDK to use an OTLP exporter with a batch processor.2 Therefore, the implementation aims to be high-performant in this scenario. Some users that require high throughput may also want to use e.g. an user_events, LLTng or ETW exporter with a simple processor. Users may also want to use OTLP File or Standard Output exporter in order to emit logs to standard output/error or files.
Modules structure
The SDK is published as a single go.opentelemetry.io/otel/sdk/log
Go module.
The exporters are going to be published as following Go modules:
go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploggrpc
go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploghttp
go.opentelemetry.io/otel/exporters/stdout/stdoutlog
LoggerProvider
The LoggerProvider
is implemented as LoggerProvider
struct in provider.go.
LogRecord limits
The LogRecord limits can be configured using following options:
func WithAttributeCountLimit(limit int) LoggerProviderOption
func WithAttributeValueLengthLimit(limit int) LoggerProviderOption
The limits can be also configured using the OTEL_LOGRECORD_*
environment variables as
defined by the specification.
Processor
The LogRecordProcessor
is defined as Processor
interface in processor.go.
The user set processors for the LoggerProvider
using
func WithProcessor(processor Processor) LoggerProviderOption
.
The user can configure custom processors and decorate built-in processors.
The specification may add new operations to the LogRecordProcessor. If it happens, CONTRIBUTING.md describes how the SDK can be extended in a backwards-compatible way.
SimpleProcessor
The Simple processor
is implemented as SimpleProcessor
struct in simple.go.
BatchProcessor
The Batching processor
is implemented as BatchProcessor
struct in batch.go.
The Batcher
can be also configured using the OTEL_BLRP_*
environment variables as
defined by the specification.
Exporter
The LogRecordExporter
is defined as Exporter
interface in exporter.go.
The slice passed to Export
must not be retained by the implementation
(like e.g. io.Writer
)
so that the caller can reuse the passed slice
(e.g. using sync.Pool
)
to avoid heap allocations on each call.
The specification may add new operations to the LogRecordExporter. If it happens, CONTRIBUTING.md describes how the SDK can be extended in a backwards-compatible way.
Record
The ReadWriteLogRecord
is defined as Record
struct in record.go.
The Record
is designed similarly to log.Record
in order to reduce the number of heap allocations when processing attributes.
The SDK does not have have an additional definition of ReadableLogRecord as the specification does not say that the exporters must not be able to modify the log records. It simply requires them to be able to read the log records. Having less abstractions reduces the API surface and makes the design simpler.
Benchmarking
The benchmarks are supposed to test end-to-end scenarios and avoid I/O that could affect the stability of the results.
The benchmark results can be found in the prototype.
Rejected alternatives
Represent both LogRecordProcessor and LogRecordExporter as Expoter
Because the LogRecordProcessor
and the LogRecordProcessor
abstractions are so similar, there was a proposal to unify them under
single Expoter
interface.3
However, introducing a Processor
interface makes it easier
to create custom processor decorators4
and makes the design more aligned with the specification.
Embed log.Record
Because Record
and log.Record
are very similar, there was a proposal to embed log.Record
in Record
definition.
log.Record
supports only adding attributes.
In the SDK, we also need to be able to modify the attributes (e.g. removal)
provided via API.
Moreover it is safer to have these abstraction decoupled. E.g. there can be a need for some fields that can be set via API and cannot be modified by the processors.
Processor.OnEmit to accept Record values
There was a proposal to make the Processor's OnEmit
to accept a Record value instead of a pointer to reduce allocations
as well as to have design similar to slog.Handler
.
There have been long discussions within the OpenTelemetry Specification SIG5 about whether such a design would comply with the specification. The summary was that the current processor design flaws are present in other languages as well. Therefore, it would be favorable to introduce new processing concepts (e.g. chaining processors) in the specification that would coexist with the current "mutable" processor design.
The performance disadvantages caused by using a pointer (which at the time of writing causes an additional heap allocation) may be mitigated by future versions of the Go compiler, thanks to improved escape analysis and profile-guided optimization (PGO)6.
On the other hand, Processor's Enabled
is fine to accept
a Record value as the processors should not mutate the passed
parameters.