You've already forked opentelemetry-go
mirror of
https://github.com/open-telemetry/opentelemetry-go.git
synced 2025-11-25 22:41:46 +02:00
Metrics stdout export pipeline (#265)
* Add MetricAggregator.Merge() implementations * Update from feedback * Type * Ckpt * Ckpt * Add push controller * Ckpt * Add aggregator interfaces, stdout encoder * Modify basic main.go * Main is working * Batch stdout output * Sum udpate * Rename stdout * Add stateless/stateful Batcher options * Undo a for-loop in the example, remove a done TODO * Update imports * Add note * Rename defaultkeys * Support variable label encoder to speed OpenMetrics/Statsd export * Lint * Doc * Precommit/lint * Simplify Aggregator API * Record->Identifier * Remove export.Record a.k.a. Identifier * Checkpoint * Propagate errors to the SDK, remove a bunch of 'TODO warn' * Checkpoint * Introduce export.Labels * Comments in export/metric.go * Comment * More merge * More doc * Complete example * Lint fixes * Add a testable example * Lint * Let Export return an error * add a basic stdout exporter test * Add measure test; fix aggregator APIs * Use JSON numbers, not strings * Test stdout exporter error * Add a test for the call to RangeTest * Add error handler API to improve correctness test; return errors from RecordOne * Undo the previous -- do not expose errors * Add simple selector variations, test * Repair examples * Test push controller error handling * Add SDK label encoder tests * Add a defaultkeys batcher test * Add an ungrouped batcher test * Lint new tests * Respond to krnowak's feedback * Undo comment * Use concrete receivers for export records and labels, since the constructors return structs not pointers * Bug fix for stateful batchers; clone an aggregator for long term storage * Remove TODO addressed in #318 * Add errors to all aggregator interfaces * Handle ErrNoLastValue case in stdout exporter * Move aggregator API into sdk/export/metric/aggregator * Update all aggregator exported-method comments * Document the aggregator APIs * More aggregator comments * Add multiple updates to the ungrouped test * Fixes for feedback from Gustavo and Liz * Producer->CheckpointSet; add FinishedCollection * Process takes an export.Record * ReadCheckpoint->CheckpointSet * EncodeLabels->Encode * Format a better inconsistent type error; add more aggregator API tests * More RangeTest test coverage * Make benbjohnson/clock a test-only dependency * Handle ErrNoLastValue in stress_test
This commit is contained in:
committed by
rghetia
parent
c3d5b7b16d
commit
9878f3b700
@@ -13,48 +13,157 @@
|
||||
// limitations under the License.
|
||||
|
||||
/*
|
||||
Package metric implements the OpenTelemetry metric.Meter API. The SDK
|
||||
supports configurable metrics export behavior through a collection of
|
||||
export interfaces that support various export strategies, described below.
|
||||
|
||||
Package metric implements the OpenTelemetry `Meter` API. The SDK
|
||||
supports configurable metrics export behavior through a
|
||||
`export.MetricBatcher` API. Most metrics behavior is controlled
|
||||
by the `MetricBatcher`, including:
|
||||
The metric.Meter API consists of methods for constructing each of the
|
||||
basic kinds of metric instrument. There are six types of instrument
|
||||
available to the end user, comprised of three basic kinds of metric
|
||||
instrument (Counter, Gauge, Measure) crossed with two kinds of number
|
||||
(int64, float64).
|
||||
|
||||
1. Selecting the concrete type of aggregation to use
|
||||
2. Receiving exported data during SDK.Collect()
|
||||
The API assists the SDK by consolidating the variety of metric instruments
|
||||
into a narrower interface, allowing the SDK to avoid repetition of
|
||||
boilerplate. The API and SDK are separated such that an event reaching
|
||||
the SDK has a uniform structure: an instrument, a label set, and a
|
||||
numerical value.
|
||||
|
||||
The call to SDK.Collect() initiates collection. The SDK calls the
|
||||
`MetricBatcher` for each current record, asking the aggregator to
|
||||
export itself. Aggregators, found in `./aggregators`, are responsible
|
||||
for receiving updates and exporting their current state.
|
||||
To this end, the API uses a core.Number type to represent either an int64
|
||||
or a float64, depending on the instrument's definition. A single
|
||||
implementation interface is used for instruments, metric.InstrumentImpl,
|
||||
and a single implementation interface is used for handles,
|
||||
metric.HandleImpl.
|
||||
|
||||
The SDK.Collect() API should be called by an exporter. During the
|
||||
call to Collect(), the exporter receives calls in a single-threaded
|
||||
context. No locking is required because the SDK.Collect() call
|
||||
prevents concurrency.
|
||||
There are three entry points for events in the Metrics API: via instrument
|
||||
handles, via direct instrument calls, and via BatchRecord. The SDK is
|
||||
designed with handles as the primary entry point, the other two entry
|
||||
points are implemented in terms of short-lived handles. For example, the
|
||||
implementation of a direct call allocates a handle, operates on the
|
||||
handle, and releases the handle. Similarly, the implementation of
|
||||
RecordBatch uses a short-lived handle for each measurement in the batch.
|
||||
|
||||
The SDK uses lock-free algorithms to maintain its internal state.
|
||||
There are three central data structures at work:
|
||||
Internal Structure
|
||||
|
||||
1. A sync.Map maps unique (InstrumentID, LabelSet) to records
|
||||
2. A "primary" atomic list of records
|
||||
3. A "reclaim" atomic list of records
|
||||
The SDK is designed with minimal use of locking, to avoid adding
|
||||
contention for user-level code. For each handle, whether it is held by
|
||||
user-level code or a short-lived device, there exists an internal record
|
||||
managed by the SDK. Each internal record corresponds to a specific
|
||||
instrument and label set combination.
|
||||
|
||||
Collection is oriented around epochs. The SDK internally has a
|
||||
notion of the "current" epoch, which is incremented each time
|
||||
Collect() is called. Records contain two atomic counter values,
|
||||
the epoch in which it was last modified and the epoch in which it
|
||||
was last collected. Records may be garbage collected when the
|
||||
epoch in which they were last updated is less than the epoch in
|
||||
which they were last collected.
|
||||
A sync.Map maintains the mapping of current instruments and label sets to
|
||||
internal records. To create a new handle, the SDK consults the Map to
|
||||
locate an existing record, otherwise it constructs a new record. The SDK
|
||||
maintains a count of the number of references to each record, ensuring
|
||||
that records are not reclaimed from the Map while they are still active
|
||||
from the user's perspective.
|
||||
|
||||
Collect() performs a record-by-record scan of all active records
|
||||
and exports their current state, before incrementing the current
|
||||
epoch. Collection events happen at a point in time during
|
||||
`Collect()`, but all records are not collected in the same instant.
|
||||
Metric collection is performed via a single-threaded call to Collect that
|
||||
sweeps through all records in the SDK, checkpointing their state. When a
|
||||
record is discovered that has no references and has not been updated since
|
||||
the prior collection pass, it is marked for reclamation and removed from
|
||||
the Map. There exists, at this moment, a race condition since another
|
||||
goroutine could, in the same instant, obtain a reference to the handle.
|
||||
|
||||
The SDK is designed to tolerate this sort of race condition, in the name
|
||||
of reducing lock contention. It is possible for more than one record with
|
||||
identical instrument and label set to exist simultaneously, though only
|
||||
one can be linked from the Map at a time. To avoid lost updates, the SDK
|
||||
maintains two additional linked lists of records, one managed by the
|
||||
collection code path and one managed by the instrumentation code path.
|
||||
|
||||
The SDK maintains a current epoch number, corresponding to the number of
|
||||
completed collections. Each record contains the last epoch during which
|
||||
it was collected and updated. These variables allow the collection code
|
||||
path to detect stale records while allowing the instrumentation code path
|
||||
to detect potential reclamations. When the instrumentation code path
|
||||
detects a potential reclamation, it adds itself to the second linked list,
|
||||
where records are saved from reclamation.
|
||||
|
||||
Each record has an associated aggregator, which maintains the current
|
||||
state resulting from all metric events since its last checkpoint.
|
||||
Aggregators may be lock-free or they may use locking, but they should
|
||||
expect to be called concurrently. Because of the tolerated race condition
|
||||
described above, aggregators must be capable of merging with another
|
||||
aggregator of the same type.
|
||||
|
||||
Export Pipeline
|
||||
|
||||
While the SDK serves to maintain a current set of records and
|
||||
coordinate collection, the behavior of a metrics export pipeline is
|
||||
configured through the export types in
|
||||
go.opentelemetry.io/otel/sdk/export/metric. It is important to keep
|
||||
in mind the context these interfaces are called from. There are two
|
||||
contexts, instrumentation context, where a user-level goroutine that
|
||||
enters the SDK resulting in a new record, and collection context,
|
||||
where a system-level thread performs a collection pass through the
|
||||
SDK.
|
||||
|
||||
Descriptor is a struct that describes the metric instrument to the
|
||||
export pipeline, containing the name, recommended aggregation keys,
|
||||
units, description, metric kind (counter, gauge, or measure), number
|
||||
kind (int64 or float64), and whether the instrument has alternate
|
||||
semantics or not (i.e., monotonic=false counter, monotonic=true gauge,
|
||||
absolute=false measure). A Descriptor accompanies metric data as it
|
||||
passes through the export pipeline.
|
||||
|
||||
The AggregationSelector interface supports choosing the method of
|
||||
aggregation to apply to a particular instrument. Given the
|
||||
Descriptor, this AggregatorFor method returns an implementation of
|
||||
Aggregator. If this interface returns nil, the metric will be
|
||||
disabled. The aggregator should be matched to the capabilities of the
|
||||
exporter. Selecting the aggregator for counter and gauge instruments
|
||||
is relatively straightforward, but for measure instruments there are
|
||||
numerous choices with different cost and quality tradeoffs.
|
||||
|
||||
Aggregator is an interface which implements a concrete strategy for
|
||||
aggregating metric updates. Several Aggregator implementations are
|
||||
provided by the SDK. Aggregators may be lock-free or use locking,
|
||||
depending on their structure and semantics. Aggregators implement an
|
||||
Update method, called in instrumentation context, to receive a single
|
||||
metric event. Aggregators implement a Checkpoint method, called in
|
||||
collection context, to save a checkpoint of the current state.
|
||||
Aggregators implement a Merge method, also called in collection
|
||||
context, that combines state from two aggregators into one. Each SDK
|
||||
record has an associated aggregator.
|
||||
|
||||
Batcher is an interface which sits between the SDK and an exporter.
|
||||
The Batcher embeds an AggregationSelector, used by the SDK to assign
|
||||
new Aggregators. The Batcher supports a Process() API for submitting
|
||||
checkpointed aggregators to the batcher, and a CheckpointSet() API
|
||||
for producing a complete checkpoint for the exporter. Two default
|
||||
Batcher implementations are provided, the "defaultkeys" Batcher groups
|
||||
aggregate metrics by their recommended Descriptor.Keys(), the
|
||||
"ungrouped" Batcher aggregates metrics at full dimensionality.
|
||||
|
||||
LabelEncoder is an optional optimization that allows an exporter to
|
||||
provide the serialization logic for labels. This allows avoiding
|
||||
duplicate serialization of labels, once as a unique key in the SDK (or
|
||||
Batcher) and once in the exporter.
|
||||
|
||||
CheckpointSet is an interface between the Batcher and the Exporter.
|
||||
After completing a collection pass, the Batcher.CheckpointSet() method
|
||||
returns a CheckpointSet, which the Exporter uses to iterate over all
|
||||
the updated metrics.
|
||||
|
||||
Record is a struct containing the state of an individual exported
|
||||
metric. This is the result of one collection interface for one
|
||||
instrument and one label set.
|
||||
|
||||
Labels is a struct containing an ordered set of labels, the
|
||||
corresponding unique encoding, and the encoder that produced it.
|
||||
|
||||
Exporter is the final stage of an export pipeline. It is called with
|
||||
a CheckpointSet capable of enumerating all the updated metrics.
|
||||
|
||||
Controller is not an export interface per se, but it orchestrates the
|
||||
export pipeline. For example, a "push" controller will establish a
|
||||
periodic timer to regularly collect and export metrics. A "pull"
|
||||
controller will await a pull request before initiating metric
|
||||
collection. Either way, the job of the controller is to call the SDK
|
||||
Collect() method, then read the checkpoint, then invoke the exporter.
|
||||
Controllers are expected to implement the public metric.MeterProvider
|
||||
API, meaning they can be installed as the global Meter provider.
|
||||
|
||||
The purpose of the two lists: the primary list is appended-to when
|
||||
new handles are created and atomically cleared during collect. The
|
||||
reclaim list is used as a second chance, in case there is a race
|
||||
between looking up a record and record deletion.
|
||||
*/
|
||||
package metric
|
||||
package metric // import "go.opentelemetry.io/otel/sdk/metric"
|
||||
|
||||
Reference in New Issue
Block a user