1
0
mirror of https://github.com/open-telemetry/opentelemetry-go.git synced 2025-03-17 20:57:51 +02:00
Tyler Yahn e016a78c9f
Fix attribute value truncation (#5997)
Fix #5996

### Correctness

From the [OTel
specification](88bffeac48/specification/common/README.md (attribute-limits)):

> - set an attribute value length limit such that for each attribute
value:
> - if it is a string, if it exceeds that limit (counting any character
in it as 1), SDKs MUST truncate that value, so that its length is at
most equal to the limit...

Our current implementation truncates on number of bytes not characters.

Unit tests are added/updated to validate this fix and prevent
regressions.

### Performance

```
goos: linux
goarch: amd64
pkg: go.opentelemetry.io/otel/sdk/trace
cpu: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz
                        │ commit-b6264913(old).txt │       commit-54c61ac2(new).txt        │
                        │          sec/op          │    sec/op      vs base                │
Truncate/Unlimited-8                  1.2300n ± 7%   0.8757n ±  3%  -28.80% (p=0.000 n=10)
Truncate/Zero-8                        2.341n ± 2%    1.550n ±  9%  -33.77% (p=0.000 n=10)
Truncate/Short-8                     31.6800n ± 3%   0.9960n ±  4%  -96.86% (p=0.000 n=10)
Truncate/ASCII-8                       8.821n ± 1%    3.567n ±  3%  -59.57% (p=0.000 n=10)
Truncate/ValidUTF-8-8                 11.960n ± 1%    7.163n ±  1%  -40.10% (p=0.000 n=10)
Truncate/InvalidUTF-8-8                56.35n ± 0%    37.34n ± 18%  -33.74% (p=0.000 n=10)
Truncate/MixedUTF-8-8                  81.83n ± 1%    50.00n ±  1%  -38.90% (p=0.000 n=10)
geomean                                12.37n         4.865n        -60.68%

                        │ commit-b6264913(old).txt │      commit-54c61ac2(new).txt       │
                        │           B/op           │    B/op     vs base                 │
Truncate/Unlimited-8                  0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Truncate/Zero-8                       0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Truncate/Short-8                      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Truncate/ASCII-8                      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Truncate/ValidUTF-8-8                 0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Truncate/InvalidUTF-8-8               16.00 ± 0%     16.00 ± 0%       ~ (p=1.000 n=10) ¹
Truncate/MixedUTF-8-8                 32.00 ± 0%     32.00 ± 0%       ~ (p=1.000 n=10) ¹
geomean                                          ²               +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                        │ commit-b6264913(old).txt │      commit-54c61ac2(new).txt       │
                        │        allocs/op         │ allocs/op   vs base                 │
Truncate/Unlimited-8                  0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Truncate/Zero-8                       0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Truncate/Short-8                      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Truncate/ASCII-8                      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Truncate/ValidUTF-8-8                 0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Truncate/InvalidUTF-8-8               1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10) ¹
Truncate/MixedUTF-8-8                 1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10) ¹
geomean                                          ²               +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean
```

#### Values shorter than limit

This is the default code path. Most attribute values will be shorter
than the default 128 limit that users will not modify.

The current code, `safeTruncate` requires a full iteration of the value
to determine it is valid and under the limit.

The replacement, `truncate`, first checks if the number of bytes in the
value are less than or equal to the limit (which guarantees the number
of characters are less than or equal to the limit) and returns
immediately. This will mean that invalid encoding less than the limit is
not changed, which meets the specification requirements.

#### Values longer than the limit

For values who's number of bytes exceeds the limit, they are iterated
only once with the replacement, `truncate`.

In comparison, the current code, `safeTruncate`, can iterate the string
up to three separate times when the string contains invalid characters.
2024-11-26 09:41:55 +01:00
..
2024-03-26 20:13:54 +01:00

SDK

PkgGoDev