1
0
mirror of https://github.com/open-telemetry/opentelemetry-go.git synced 2024-12-26 21:05:00 +02:00
opentelemetry-go/log/DESIGN.md
Robert Pająk 23f7b41e0a
log: Introduce EnabledParameters (#5791)
Fixes https://github.com/open-telemetry/opentelemetry-go/issues/5769

Related spec PR:
https://github.com/open-telemetry/opentelemetry-specification/pull/4203

Remark: A follow-up in contrib is required and afterwards here to fix
the `example/dice`.

Benchstat results for `sdk/log` (`log` has no benchmarks related to
Enabled):

```
goos: linux
goarch: amd64
pkg: go.opentelemetry.io/otel/sdk/log
cpu: Intel(R) Core(TM) i9-10885H CPU @ 2.40GHz
                                  │    old.txt    │               new.txt                │
                                  │    sec/op     │    sec/op     vs base                │
BatchProcessorOnEmit-16              398.8n ± 10%   395.6n ± 12%        ~ (p=0.971 n=10)
Processor/Simple-16                  882.2n ±  8%   869.8n ±  9%        ~ (p=0.811 n=10)
Processor/Batch-16                   1.478µ ±  3%   1.485µ ±  5%        ~ (p=0.646 n=10)
Processor/SetTimestampSimple-16      847.8n ±  1%   844.6n ±  3%        ~ (p=0.247 n=10)
Processor/SetTimestampBatch-16       1.480µ ±  3%   1.473µ ±  4%        ~ (p=0.700 n=10)
Processor/AddAttributesSimple-16     930.0n ±  1%   933.8n ±  1%        ~ (p=0.172 n=10)
Processor/AddAttributesBatch-16      1.624µ ±  2%   1.639µ ±  2%        ~ (p=0.839 n=10)
Processor/SetAttributesSimple-16     903.4n ±  1%   895.1n ±  1%        ~ (p=0.190 n=10)
Processor/SetAttributesBatch-16      1.554µ ±  4%   1.529µ ±  3%        ~ (p=0.159 n=10)
LoggerNewRecord/5_attributes-16      346.0n ±  2%   343.3n ±  2%        ~ (p=0.448 n=10)
LoggerNewRecord/10_attributes-16     1.608µ ±  6%   1.503µ ±  2%   -6.53% (p=0.007 n=10)
LoggerEnabled-16                    34.305n ±  8%   6.706n ±  1%  -80.45% (p=0.000 n=10)
LoggerProviderLogger-16              636.9n ± 10%   605.8n ±  3%        ~ (p=0.105 n=10)
WalkAttributes/1_attributes-16       5.363n ±  3%   4.540n ± 14%  -15.34% (p=0.002 n=10)
WalkAttributes/10_attributes-16      5.436n ±  7%   4.461n ±  2%  -17.95% (p=0.000 n=10)
WalkAttributes/100_attributes-16     5.126n ±  9%   4.465n ±  1%  -12.90% (p=0.000 n=10)
WalkAttributes/1000_attributes-16    5.316n ±  9%   4.502n ±  5%  -15.32% (p=0.002 n=10)
SetAddAttributes/SetAttributes-16    220.5n ± 18%   192.6n ± 11%  -12.67% (p=0.007 n=10)
SetAddAttributes/AddAttributes-16    165.3n ± 21%   127.3n ± 22%  -22.96% (p=0.011 n=10)
SimpleProcessorOnEmit-16             2.159n ±  9%   2.167n ±  9%        ~ (p=0.739 n=10)
geomean                              178.3n         154.5n        -13.31%

                        │    old.txt    │             new.txt             │
                        │      B/s      │      B/s       vs base          │
BatchProcessorOnEmit-16   76.52Mi ± 11%   77.14Mi ± 14%  ~ (p=0.971 n=10)

                                  │    old.txt     │                new.txt                │
                                  │      B/op      │     B/op      vs base                 │
BatchProcessorOnEmit-16               0.000 ± 0%       0.000 ± 0%       ~ (p=1.000 n=10) ¹
Processor/Simple-16                   417.0 ± 0%       417.0 ± 0%       ~ (p=1.000 n=10) ¹
Processor/Batch-16                  1.093Ki ± 1%     1.088Ki ± 1%       ~ (p=0.254 n=10)
Processor/SetTimestampSimple-16       417.0 ± 0%       417.0 ± 0%       ~ (p=1.000 n=10) ¹
Processor/SetTimestampBatch-16      1.095Ki ± 1%     1.084Ki ± 2%       ~ (p=0.361 n=10)
Processor/AddAttributesSimple-16      417.0 ± 0%       417.0 ± 0%       ~ (p=1.000 n=10) ¹
Processor/AddAttributesBatch-16     1.085Ki ± 1%     1.086Ki ± 1%       ~ (p=1.000 n=10)
Processor/SetAttributesSimple-16      465.0 ± 0%       465.0 ± 0%       ~ (p=1.000 n=10) ¹
Processor/SetAttributesBatch-16     1.129Ki ± 1%     1.125Ki ± 1%       ~ (p=0.084 n=10)
LoggerNewRecord/5_attributes-16       0.000 ± 0%       0.000 ± 0%       ~ (p=1.000 n=10) ¹
LoggerNewRecord/10_attributes-16      610.0 ± 0%       610.0 ± 0%       ~ (p=1.000 n=10) ¹
LoggerEnabled-16                      0.000 ± 0%       0.000 ± 0%       ~ (p=1.000 n=10) ¹
LoggerProviderLogger-16               359.0 ± 6%       346.0 ± 3%       ~ (p=0.117 n=10)
WalkAttributes/1_attributes-16        0.000 ± 0%       0.000 ± 0%       ~ (p=1.000 n=10) ¹
WalkAttributes/10_attributes-16       0.000 ± 0%       0.000 ± 0%       ~ (p=1.000 n=10) ¹
WalkAttributes/100_attributes-16      0.000 ± 0%       0.000 ± 0%       ~ (p=1.000 n=10) ¹
WalkAttributes/1000_attributes-16     0.000 ± 0%       0.000 ± 0%       ~ (p=1.000 n=10) ¹
SetAddAttributes/SetAttributes-16     48.00 ± 0%       48.00 ± 0%       ~ (p=1.000 n=10) ¹
SetAddAttributes/AddAttributes-16     0.000 ± 0%       0.000 ± 0%       ~ (p=1.000 n=10) ¹
SimpleProcessorOnEmit-16              0.000 ± 0%       0.000 ± 0%       ~ (p=1.000 n=10) ¹
geomean                                          ²                 -0.27%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                                  │   old.txt    │               new.txt               │
                                  │  allocs/op   │ allocs/op   vs base                 │
BatchProcessorOnEmit-16             0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Processor/Simple-16                 1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10) ¹
Processor/Batch-16                  1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10) ¹
Processor/SetTimestampSimple-16     1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10) ¹
Processor/SetTimestampBatch-16      1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10) ¹
Processor/AddAttributesSimple-16    1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10) ¹
Processor/AddAttributesBatch-16     1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10) ¹
Processor/SetAttributesSimple-16    2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
Processor/SetAttributesBatch-16     2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
LoggerNewRecord/5_attributes-16     0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
LoggerNewRecord/10_attributes-16    4.000 ± 0%     4.000 ± 0%       ~ (p=1.000 n=10) ¹
LoggerEnabled-16                    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
LoggerProviderLogger-16             1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10) ¹
WalkAttributes/1_attributes-16      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
WalkAttributes/10_attributes-16     0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
WalkAttributes/100_attributes-16    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
WalkAttributes/1000_attributes-16   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
SetAddAttributes/SetAttributes-16   1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10) ¹
SetAddAttributes/AddAttributes-16   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
SimpleProcessorOnEmit-16            0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
geomean                                        ²               +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean
```
2024-09-13 06:35:01 +02:00

26 KiB

Logs Bridge API

Abstract

go.opentelemetry.io/otel/log provides Logs Bridge API.

The prototype was created in #4725.

Background

The key challenge is to create a performant API compliant with the specification with an intuitive and user friendly design. Performance is seen as one of the most important characteristics of logging libraries in Go.

Design

This proposed design aims to:

  • be specification compliant,
  • be similar to Trace and Metrics API,
  • take advantage of both OpenTelemetry and slog experience to achieve acceptable performance.

Module structure

The API is published as a single go.opentelemetry.io/otel/log Go module.

The module name is compliant with Artifact Naming and the package structure is the same as for Trace API and Metrics API.

The Go module consists of the following packages:

  • go.opentelemetry.io/otel/log
  • go.opentelemetry.io/otel/log/embedded
  • go.opentelemetry.io/otel/log/noop

Rejected alternative:

LoggerProvider

The LoggerProvider abstraction is defined as LoggerProvider interface in provider.go.

The specification may add new operations to LoggerProvider. The interface may have methods added without a package major version bump. This embeds embedded.LoggerProvider to help inform an API implementation author about this non-standard API evolution. This approach is already used in Trace API and Metrics API.

LoggerProvider.Logger

The Logger method implements the Get a Logger operation.

The required name parameter is accepted as a string method argument.

The LoggerOption options are defined to support optional parameters.

Implementation requirements:

  • The specification requires the method to be safe to be called concurrently.

  • The method should use some default name if the passed name is empty in order to meet the specification's SDK requirement to return a working logger when an invalid name is passed as well as to resemble the behavior of getting tracers and meters.

Logger can be extended by adding new LoggerOption options and adding new exported fields to the LoggerConfig struct. This design is already used in Trace API for getting tracers and in Metrics API for getting meters.

Rejected alternative:

Logger

The Logger abstraction is defined as Logger interface in logger.go.

The specification may add new operations to Logger. The interface may have methods added without a package major version bump. This embeds embedded.Logger to help inform an API implementation author about this non-standard API evolution. This approach is already used in Trace API and Metrics API.

Logger.Emit

The Emit method implements the Emit a LogRecord operation.

Context associated with the LogRecord is accepted as a context.Context method argument.

Calls to Emit are supposed to be on the hot path. Therefore, in order to reduce the number of heap allocations, the LogRecord abstraction, is defined as Record struct in record.go.

Timestamp is accessed using following methods:

func (r *Record) Timestamp() time.Time
func (r *Record) SetTimestamp(t time.Time)

ObservedTimestamp is accessed using following methods:

func (r *Record) ObservedTimestamp() time.Time
func (r *Record) SetObservedTimestamp(t time.Time)

SeverityNumber is accessed using following methods:

func (r *Record) Severity() Severity
func (r *Record) SetSeverity(s Severity)

Severity type is defined in severity.go. The constants are are based on Displaying Severity recommendation. Additionally, Severity[Level] constants are defined to make the API more readable and user friendly.

SeverityText is accessed using following methods:

func (r *Record) SeverityText() string
func (r *Record) SetSeverityText(s string)

Body is accessed using following methods:

func (r *Record) Body() Value
func (r *Record) SetBody(v Value)

Log record attributes are accessed using following methods:

func (r *Record) WalkAttributes(f func(KeyValue) bool)
func (r *Record) AddAttributes(attrs ...KeyValue)

Record has a AttributesLen method that returns the number of attributes to allow slice preallocation when converting records to a different representation:

func (r *Record) AttributesLen() int

The records attributes design and implementation is based on slog.Record. It allows achieving high-performance access and manipulation of the attributes while keeping the API user friendly. It relieves the user from making his own improvements for reducing the number of allocations when passing attributes.

The abstractions described in the specification are defined in keyvalue.go.

Value is representing any. KeyValue is representing a key(string)-value(any) pair.

Kind is an enumeration used for specifying the underlying value type. KindEmpty is used for an empty (zero) value. KindBool is used for boolean value. KindFloat64 is used for a double precision floating point (IEEE 754-1985) value. KindInt64 is used for a signed integer value. KindString is used for a string value. KindBytes is used for a slice of bytes (in spec: A byte array). KindSlice is used for a slice of values (in spec: an array (a list) of any values). KindMap is used for a slice of key-value pairs (in spec: map<string, any>).

These types are defined in go.opentelemetry.io/otel/log package as they are tightly coupled with the API and different from common attributes.

The internal implementation of Value is based on slog.Value and the API is mostly inspired by attribute.Value. The benchmarks1 show that the implementation is more performant than attribute.Value.

The value accessors (func (v Value) As[Kind] methods) must not panic, as it would violate the specification:

API methods MUST NOT throw unhandled exceptions when used incorrectly by end users. The API and SDK SHOULD provide safe defaults for missing or invalid arguments. [...] Whenever the library suppresses an error that would otherwise have been exposed to the user, the library SHOULD log the error using language-specific conventions.

Therefore, the value accessors should return a zero value and log an error when a bad accessor is called.

The Severity, Kind, Value, KeyValue may implement the fmt.Stringer interface. However, it is not needed for the first stable release and the String methods can be added later.

The caller must not subsequently mutate the record passed to Emit. This would allow the implementation to not clone the record, but simply retain, modify or discard it. The implementation may still choose to clone the record or copy its attributes if it needs to retain or modify it, e.g. in case of asynchronous processing to eliminate the possibility of data races, because the user can technically reuse the record and add new attributes after the call (even when the documentation says that the caller must not do it).

Implementation requirements:

Emit can be extended by adding new exported fields to the Record struct.

Rejected alternatives:

Logger.Enabled

The Enabled method implements the Enabled operation.

Context associated with the LogRecord is accepted as a context.Context method argument.

Calls to Enabled are supposed to be on the hot path and the list of arguments can be extendend in future. Therefore, in order to reduce the number of heap allocations and make it possible to handle new arguments, Enabled accepts a EnabledParameters struct, defined in logger.go, as the second method argument.

The EnabledParameters getters are returning values using the (value, ok) idiom in order to indicate if the values were actually set by the caller or if there are unspecified.

noop package

The go.opentelemetry.io/otel/log/noop package provides Logs Bridge API No-Op Implementation.

Trace context correlation

The bridge implementation should do its best to pass the ctx containing the trace context from the caller so it can later be passed via Logger.Emit.

It is not expected that users (caller or bridge implementation) reconstruct a context.Context. Reconstructing a context.Context with trace.ContextWithSpanContext and trace.NewSpanContext would usually involve more memory allocations.

The logging libraries which have recording methods that accepts context.Context, such us slog, logrus, zerolog, makes passing the trace context trivial.

However, some libraries do not accept a context.Context in their recording methods. Structured logging libraries, such as logr and zap, offer passing any type as a log attribute/field. Therefore, their bridge implementations can define a "special" log attributes/field that will be used to capture the trace context.

The prototype has bridge implementations that handle trace context correlation efficiently.

Benchmarking

The benchmarks take inspiration from slog, because for the Go team it was also critical to create API that would be fast and interoperable with existing logging packages.23

The benchmark results can be found in the prototype.

Rejected alternatives

Reuse slog

The API must not be coupled to slog, nor any other logging library.

The API needs to evolve orthogonally to slog.

slog is not compliant with the Logs Bridge API. and we cannot expect the Go team to make slog compliant with it.

The interoperability can be achieved using a log bridge.

You can read more about OpenTelemetry Logs design on opentelemetry.io.

Record as interface

Record is defined as a struct because of the following reasons.

Log record is a value object without any behavior. It is used as data input for Logger methods.

The log record resembles the instrument config structs like metric.Float64CounterConfig.

Using struct instead of interface improves the performance as e.g. indirect calls are less optimized, usage of interfaces tend to increase heap allocations.3

Options as parameter to Logger.Emit

One of the initial ideas was to have:

type Logger interface{
	embedded.Logger
	Emit(ctx context.Context, options ...RecordOption)
}

The main reason was that design would be similar to the Meter API for creating instruments.

However, passing Record directly, instead of using options, is more performant as it reduces heap allocations.4

Another advantage of passing Record is that API would not have functions like NewRecord(options...), which would be used by the SDK and not by the users.

Finally, the definition would be similar to slog.Handler.Handle that was designed to provide optimization opportunities.2

Passing record as pointer to Logger.Emit

So far the benchmarks do not show differences that would favor passing the record via pointer (and vice versa).

Passing via value feels safer because of the following reasons.

The user would not be able to pass nil. Therefore, it reduces the possibility to have a nil pointer dereference.

It should reduce the possibility of a heap allocation.

It follows the design of slog.Handler.

If follows one of Google's Go Style Decisions to prefer passing values.

Passing struct as parameter to LoggerProvider.Logger

Similarly to Logger.Emit, we could have something like:

type LoggerProvider interface{
	embedded.LoggerProvider
	Logger(name string, config LoggerConfig)
}

The drawback of this idea would be that this would be a different design from Trace and Metrics API.

The performance of acquiring a logger is not as critical as the performance of emitting a log record. While a single HTTP/RPC handler could write hundreds of logs, it should not create a new logger for each log entry. The bridge implementation should reuse loggers whenever possible.

Logger.WithAttributes

We could add WithAttributes to the Logger interface. Then Record could be a simple struct with only exported fields. The idea was that the SDK would implement the performance improvements instead of doing it in the API. This would allow having different optimization strategies.

During the analysis5, it occurred that the main problem of this proposal is that the variadic slice passed to an interface method is always heap allocated.

Moreover, the logger returned by WithAttribute was allocated on the heap.

Lastly, the proposal was not specification compliant.

Record attributes as slice

One of the proposals6 was to have Record as a simple struct:

type Record struct {
	Timestamp         time.Time
	ObservedTimestamp time.Time
	Severity          Severity
	SeverityText      string
	Body              Value
	Attributes        []KeyValue

The bridge implementations could use sync.Pool for reducing the number of allocations when passing attributes.

The benchmarks results were better.

In such a design, most bridges would have a sync.Pool to reduce the number of heap allocations. However, the sync.Pool will not work correctly with API implementations that would take ownership of the record (e.g. implementations that do not copy records for asynchronous processing). The current design, even in case of improper API implementation, has lower chances of encountering a bug as most bridges would create a record, pass it, and forget about it.

For reference, here is the reason why slog does not use sync.Pool3 as well:

We can use a sync pool for records though we decided not to. You can but it's a bad idea for us. Why? Because users have control of Records. Handler writers can get their hands on a record and we'd have to ask them to free it or try to free it magically at some some point. But either way, they could get themselves in trouble by freeing it twice or holding on to one after they free it. That's a use after free bug and that's why zerolog was problematic for us. zerolog as as part of its speed exposes a pool allocated value to users if you use zerolog the normal way, that you'll see in all the examples, you will never encounter a problem. But if you do something a little out of the ordinary you can get use after free bugs and we just didn't want to put that in the standard library.

Therefore, we decided to not follow the proposal as it is less user friendly (users and bridges would use e.g. a sync.Pool to reduce the number of heap allocation), less safe (more prone to use after free bugs and race conditions), and the benchmark differences were not significant.

Use any instead of defining Value

Logs Data Model defines Body to be any. One could propose to define Body (and attribute values) as any instead of a defining a new type (Value).

First of all, any type defined in the specification is not the same as any (interface{}) in Go.

Moreover, using any as a field would decrease the performance.7

Notice it will be still possible to add following kind and factories in a backwards compatible way:

const KindMap Kind

func AnyValue(value any) KeyValue

func Any(key string, value any) KeyValue

However, currently, it would not be specification compliant.

Severity type encapsulating number and text

We could combine severity into a single field defining a type:

type Severity struct {
	Number SeverityNumber
	Text string
}

However, the Logs Data Model define it as independent fields. It should be more user friendly to have them separated. Especially when having getter and setter methods, setting one value when the other is already set would be unpleasant.

Reuse attribute package

It was tempting to reuse the existing [https://pkg.go.dev/go.opentelemetry.io/otel/attribute] package for defining log attributes and body.

However, this would be wrong because the log attribute definition is different from the common attribute definition.

Moreover, it there is nothing telling that the body definition has anything in common with a common attribute value.

Therefore, we define new types representing the abstract types defined in the Logs Data Model.

Mix receiver types for Record

Methods of slog.Record have different receiver types.

In log/slog GitHub issue we can only find that the reason is:8

some receiver of Record struct is by value Passing Records by value means they incur no heap allocation. That improves performance overall, even though they are copied.

However, the benchmarks do not show any noticeable differences.9

The compiler is smart-enough to not make a heap allocation for any of these methods. The use of a pointer receiver does not cause any heap allocation. From Go FAQ:10

In the current compilers, if a variable has its address taken, that variable is a candidate for allocation on the heap. However, a basic escape analysis recognizes some cases when such variables will not live past the return from the function and can reside on the stack.

The Understanding Allocations: the Stack and the Heap presentation by Jacob Walker describes the escape analysis with details.

Moreover, also from Go FAQ:10

Also, if a local variable is very large, it might make more sense to store it on the heap rather than the stack.

Therefore, even if we use a value receiver and the value is very large it may be heap allocated.

Both Go Code Review Comments and Google's Go Style Decisions highly recommend making the methods for a type either all pointer methods or all value methods. Google's Go Style Decisions even goes further and says:

There is a lot of misinformation about whether passing a value or a pointer to a function can affect performance. The compiler can choose to pass pointers to values on the stack as well as copying values on the stack, but these considerations should not outweigh the readability and correctness of the code in most circumstances. When the performance does matter, it is important to profile both approaches with a realistic benchmark before deciding that one approach outperforms the other.

Because, the benchmarks9 do not proof any performance difference and the general recommendation is to not mix receiver types, we decided to use pointer receivers for all Record methods.

Add XYZ method to Logger

The Logger does not have methods like Enabled, SetSeverity, etc. as the Bridge API needs to follow (be compliant with) the specification

Moreover, the Bridge API is intended to be used to implement bridges. Applications should not use it directly. The applications should use logging packages such as slog, logrus, zap, zerolog, logr.

Rename KeyValue to Attr

There was a proposal to rename KeyValue to Attr (or Attribute).11 New developers may not intuitively know that log.KeyValue is an attribute in the OpenTelemetry parlance.

During the discussion we agreed to keep the KeyValue name.

The type is used in two semantics:

  • as a log attribute
  • as a map item

As for map item semantics, this type is a key-value pair, not an attribute. Naming the type as Attr would convey semantical meaning that would not be correct for a map.

We expect that most of the Bridge API users will be OpenTelemetry contributors. We plan to implement bridges for the most popular logging libraries ourselves. Given we will all have the context needed to disambiguate these overlapping names, developers' confusion should not be an issue.

For bridges not developed by us, developers will likely look at our existing bridges for inspiration. Our correct use of these types will be a reference to them.

At last, we could consider a design defining both types: KeyValue and Attr. However, in this approach we would need have factory functions for both types. It would make the API surface unnecessarily big, and we may even have problems naming the functions.