1
0
mirror of https://github.com/open-telemetry/opentelemetry-go.git synced 2026-06-03 18:35:08 +02:00

Check context prior to delaying retry in OTLP exporters (#7678)

Fix #7673

[Issue being
addressed](https://github.com/open-telemetry/opentelemetry-go/issues/7673#issuecomment-3618325229):

> 1.
[`fn`](https://github.com/open-telemetry/opentelemetry-go/blob/1bc9713ac6dc8cbe2fd04fd6dc716d316059eb90/exporters/otlp/otlplog/otlploggrpc/internal/retry/retry_test.go#L163-L165)
is
[called](https://github.com/open-telemetry/opentelemetry-go/blob/1bc9713ac6dc8cbe2fd04fd6dc716d316059eb90/exporters/otlp/otlplog/otlploggrpc/internal/retry/retry.go#L87)
> 2. It [returns an
error](https://github.com/open-telemetry/opentelemetry-go/blob/1bc9713ac6dc8cbe2fd04fd6dc716d316059eb90/exporters/otlp/otlplog/otlploggrpc/internal/retry/retry_test.go#L165)
> 3. The code [checks if the error is
retryable](https://github.com/open-telemetry/opentelemetry-go/blob/1bc9713ac6dc8cbe2fd04fd6dc716d316059eb90/exporters/otlp/otlplog/otlploggrpc/internal/retry/retry.go#L92),
it [always
is](https://github.com/open-telemetry/opentelemetry-go/blob/1bc9713ac6dc8cbe2fd04fd6dc716d316059eb90/exporters/otlp/otlplog/otlploggrpc/internal/retry/retry_test.go#L149)
> 4. [Time delay is
checked](https://github.com/open-telemetry/opentelemetry-go/blob/1bc9713ac6dc8cbe2fd04fd6dc716d316059eb90/exporters/otlp/otlplog/otlploggrpc/internal/retry/retry.go#L97-L108)
> - [Max elsapsed
time](https://github.com/open-telemetry/opentelemetry-go/blob/1bc9713ac6dc8cbe2fd04fd6dc716d316059eb90/exporters/otlp/otlplog/otlploggrpc/internal/retry/retry_test.go#L156-L157)
is 10 ms
> - Initial [delay is
1ms](https://github.com/open-telemetry/opentelemetry-go/blob/1bc9713ac6dc8cbe2fd04fd6dc716d316059eb90/exporters/otlp/otlplog/otlploggrpc/internal/retry/retry_test.go#L151)
>    - Delay is determined to be 1ms
>    - The program proceeds to waiting
> 5. [Wait is
called](https://github.com/open-telemetry/opentelemetry-go/blob/1bc9713ac6dc8cbe2fd04fd6dc716d316059eb90/exporters/otlp/otlplog/otlploggrpc/internal/retry/retry.go#L110-L112)
> 6. The [wait select statement is
evaluated](https://github.com/open-telemetry/opentelemetry-go/blob/1bc9713ac6dc8cbe2fd04fd6dc716d316059eb90/exporters/otlp/otlplog/otlploggrpc/internal/retry/retry.go#L127-L138)
>    - On slow systems both `case`s are true
> -
[Non-deterministically](https://go.dev/ref/spec#:~:text=If%20one%20or,communications%20can%20proceed.)
the [timer channel
`case`](https://github.com/open-telemetry/opentelemetry-go/blob/1bc9713ac6dc8cbe2fd04fd6dc716d316059eb90/exporters/otlp/otlplog/otlploggrpc/internal/retry/retry.go#L137)
is selected
> - The retry function is re-run and and second iteration is recorded
causing the failure
>    - On fast systems only the context cancel is true
>       - The retry stops here with only `1` execution

Do not rely on non-deterministic `select` statement to catch ended
context prior to waiting for a retry delay. Explicitly check the context
prior to entering the wait.

This resolves the flaky test and ensure in normal operation that
requests with canceled context are ended without having to wait for any
additional delays.
This commit is contained in:
Tyler Yahn
2025-12-07 10:11:36 -08:00
committed by GitHub
parent 61765e78a6
commit d03b03395d
7 changed files with 35 additions and 0 deletions
@@ -94,6 +94,11 @@ func (c Config) RequestFunc(evaluate EvaluateFunc) RequestFunc {
return err
}
// Check if context is canceled before attempting to wait and retry.
if ctx.Err() != nil {
return fmt.Errorf("%w: %w", ctx.Err(), err)
}
if maxElapsedTime != 0 && time.Since(startTime) > maxElapsedTime {
return fmt.Errorf("max retry time elapsed: %w", err)
}
@@ -94,6 +94,11 @@ func (c Config) RequestFunc(evaluate EvaluateFunc) RequestFunc {
return err
}
// Check if context is canceled before attempting to wait and retry.
if ctx.Err() != nil {
return fmt.Errorf("%w: %w", ctx.Err(), err)
}
if maxElapsedTime != 0 && time.Since(startTime) > maxElapsedTime {
return fmt.Errorf("max retry time elapsed: %w", err)
}
@@ -94,6 +94,11 @@ func (c Config) RequestFunc(evaluate EvaluateFunc) RequestFunc {
return err
}
// Check if context is canceled before attempting to wait and retry.
if ctx.Err() != nil {
return fmt.Errorf("%w: %w", ctx.Err(), err)
}
if maxElapsedTime != 0 && time.Since(startTime) > maxElapsedTime {
return fmt.Errorf("max retry time elapsed: %w", err)
}
@@ -94,6 +94,11 @@ func (c Config) RequestFunc(evaluate EvaluateFunc) RequestFunc {
return err
}
// Check if context is canceled before attempting to wait and retry.
if ctx.Err() != nil {
return fmt.Errorf("%w: %w", ctx.Err(), err)
}
if maxElapsedTime != 0 && time.Since(startTime) > maxElapsedTime {
return fmt.Errorf("max retry time elapsed: %w", err)
}
@@ -94,6 +94,11 @@ func (c Config) RequestFunc(evaluate EvaluateFunc) RequestFunc {
return err
}
// Check if context is canceled before attempting to wait and retry.
if ctx.Err() != nil {
return fmt.Errorf("%w: %w", ctx.Err(), err)
}
if maxElapsedTime != 0 && time.Since(startTime) > maxElapsedTime {
return fmt.Errorf("max retry time elapsed: %w", err)
}
@@ -94,6 +94,11 @@ func (c Config) RequestFunc(evaluate EvaluateFunc) RequestFunc {
return err
}
// Check if context is canceled before attempting to wait and retry.
if ctx.Err() != nil {
return fmt.Errorf("%w: %w", ctx.Err(), err)
}
if maxElapsedTime != 0 && time.Since(startTime) > maxElapsedTime {
return fmt.Errorf("max retry time elapsed: %w", err)
}
+5
View File
@@ -94,6 +94,11 @@ func (c Config) RequestFunc(evaluate EvaluateFunc) RequestFunc {
return err
}
// Check if context is canceled before attempting to wait and retry.
if ctx.Err() != nil {
return fmt.Errorf("%w: %w", ctx.Err(), err)
}
if maxElapsedTime != 0 && time.Since(startTime) > maxElapsedTime {
return fmt.Errorf("max retry time elapsed: %w", err)
}