Metrics stdout export pipeline (#265)

* Add MetricAggregator.Merge() implementations * Update from feedback * Type * Ckpt * Ckpt * Add push controller * Ckpt * Add aggregator interfaces, stdout encoder * Modify basic main.go * Main is working * Batch stdout output * Sum udpate * Rename stdout * Add stateless/stateful Batcher options * Undo a for-loop in the example, remove a done TODO * Update imports * Add note * Rename defaultkeys * Support variable label encoder to speed OpenMetrics/Statsd export * Lint * Doc * Precommit/lint * Simplify Aggregator API * Record->Identifier * Remove export.Record a.k.a. Identifier * Checkpoint * Propagate errors to the SDK, remove a bunch of 'TODO warn' * Checkpoint * Introduce export.Labels * Comments in export/metric.go * Comment * More merge * More doc * Complete example * Lint fixes * Add a testable example * Lint * Let Export return an error * add a basic stdout exporter test * Add measure test; fix aggregator APIs * Use JSON numbers, not strings * Test stdout exporter error * Add a test for the call to RangeTest * Add error handler API to improve correctness test; return errors from RecordOne * Undo the previous -- do not expose errors * Add simple selector variations, test * Repair examples * Test push controller error handling * Add SDK label encoder tests * Add a defaultkeys batcher test * Add an ungrouped batcher test * Lint new tests * Respond to krnowak's feedback * Undo comment * Use concrete receivers for export records and labels, since the constructors return structs not pointers * Bug fix for stateful batchers; clone an aggregator for long term storage * Remove TODO addressed in #318 * Add errors to all aggregator interfaces * Handle ErrNoLastValue case in stdout exporter * Move aggregator API into sdk/export/metric/aggregator * Update all aggregator exported-method comments * Document the aggregator APIs * More aggregator comments * Add multiple updates to the ungrouped test * Fixes for feedback from Gustavo and Liz * Producer->CheckpointSet; add FinishedCollection * Process takes an export.Record * ReadCheckpoint->CheckpointSet * EncodeLabels->Encode * Format a better inconsistent type error; add more aggregator API tests * More RangeTest test coverage * Make benbjohnson/clock a test-only dependency * Handle ErrNoLastValue in stress_test
2025-11-25 22:41:46 +02:00 · 2019-11-15 13:01:20 -08:00
parent c3d5b7b16d
commit 9878f3b700
48 changed files with 3312 additions and 491 deletions
--- a/sdk/metric/doc.go
+++ b/sdk/metric/doc.go
@@ -13,48 +13,157 @@
 // limitations under the License.

 /*
+Package metric implements the OpenTelemetry metric.Meter API.  The SDK
+supports configurable metrics export behavior through a collection of
+export interfaces that support various export strategies, described below.

-Package metric implements the OpenTelemetry `Meter` API.  The SDK
-supports configurable metrics export behavior through a
-`export.MetricBatcher` API.  Most metrics behavior is controlled
-by the `MetricBatcher`, including:
+The metric.Meter API consists of methods for constructing each of the
+basic kinds of metric instrument.  There are six types of instrument
+available to the end user, comprised of three basic kinds of metric
+instrument (Counter, Gauge, Measure) crossed with two kinds of number
+(int64, float64).

-1. Selecting the concrete type of aggregation to use
-2. Receiving exported data during SDK.Collect()
+The API assists the SDK by consolidating the variety of metric instruments
+into a narrower interface, allowing the SDK to avoid repetition of
+boilerplate.  The API and SDK are separated such that an event reaching
+the SDK has a uniform structure: an instrument, a label set, and a
+numerical value.

-The call to SDK.Collect() initiates collection.  The SDK calls the
-`MetricBatcher` for each current record, asking the aggregator to
-export itself.  Aggregators, found in `./aggregators`, are responsible
-for receiving updates and exporting their current state.
+To this end, the API uses a core.Number type to represent either an int64
+or a float64, depending on the instrument's definition.  A single
+implementation interface is used for instruments, metric.InstrumentImpl,
+and a single implementation interface is used for handles,
+metric.HandleImpl.

-The SDK.Collect() API should be called by an exporter.  During the
-call to Collect(), the exporter receives calls in a single-threaded
-context.  No locking is required because the SDK.Collect() call
-prevents concurrency.
+There are three entry points for events in the Metrics API: via instrument
+handles, via direct instrument calls, and via BatchRecord.  The SDK is
+designed with handles as the primary entry point, the other two entry
+points are implemented in terms of short-lived handles.  For example, the
+implementation of a direct call allocates a handle, operates on the
+handle, and releases the handle. Similarly, the implementation of
+RecordBatch uses a short-lived handle for each measurement in the batch.

-The SDK uses lock-free algorithms to maintain its internal state.
-There are three central data structures at work:
+Internal Structure

-1. A sync.Map maps unique (InstrumentID, LabelSet) to records
-2. A "primary" atomic list of records
-3. A "reclaim" atomic list of records
+The SDK is designed with minimal use of locking, to avoid adding
+contention for user-level code.  For each handle, whether it is held by
+user-level code or a short-lived device, there exists an internal record
+managed by the SDK.  Each internal record corresponds to a specific
+instrument and label set combination.

-Collection is oriented around epochs.  The SDK internally has a
-notion of the "current" epoch, which is incremented each time
-Collect() is called.  Records contain two atomic counter values,
-the epoch in which it was last modified and the epoch in which it
-was last collected.  Records may be garbage collected when the
-epoch in which they were last updated is less than the epoch in
-which they were last collected.
+A sync.Map maintains the mapping of current instruments and label sets to
+internal records.  To create a new handle, the SDK consults the Map to
+locate an existing record, otherwise it constructs a new record.  The SDK
+maintains a count of the number of references to each record, ensuring
+that records are not reclaimed from the Map while they are still active
+from the user's perspective.

-Collect() performs a record-by-record scan of all active records
-and exports their current state, before incrementing the current
-epoch.  Collection events happen at a point in time during
-`Collect()`, but all records are not collected in the same instant.
+Metric collection is performed via a single-threaded call to Collect that
+sweeps through all records in the SDK, checkpointing their state.  When a
+record is discovered that has no references and has not been updated since
+the prior collection pass, it is marked for reclamation and removed from
+the Map.  There exists, at this moment, a race condition since another
+goroutine could, in the same instant, obtain a reference to the handle.
+
+The SDK is designed to tolerate this sort of race condition, in the name
+of reducing lock contention.  It is possible for more than one record with
+identical instrument and label set to exist simultaneously, though only
+one can be linked from the Map at a time.  To avoid lost updates, the SDK
+maintains two additional linked lists of records, one managed by the
+collection code path and one managed by the instrumentation code path.
+
+The SDK maintains a current epoch number, corresponding to the number of
+completed collections.  Each record contains the last epoch during which
+it was collected and updated.  These variables allow the collection code
+path to detect stale records while allowing the instrumentation code path
+to detect potential reclamations.  When the instrumentation code path
+detects a potential reclamation, it adds itself to the second linked list,
+where records are saved from reclamation.
+
+Each record has an associated aggregator, which maintains the current
+state resulting from all metric events since its last checkpoint.
+Aggregators may be lock-free or they may use locking, but they should
+expect to be called concurrently.  Because of the tolerated race condition
+described above, aggregators must be capable of merging with another
+aggregator of the same type.
+
+Export Pipeline
+
+While the SDK serves to maintain a current set of records and
+coordinate collection, the behavior of a metrics export pipeline is
+configured through the export types in
+go.opentelemetry.io/otel/sdk/export/metric.  It is important to keep
+in mind the context these interfaces are called from.  There are two
+contexts, instrumentation context, where a user-level goroutine that
+enters the SDK resulting in a new record, and collection context,
+where a system-level thread performs a collection pass through the
+SDK.
+
+Descriptor is a struct that describes the metric instrument to the
+export pipeline, containing the name, recommended aggregation keys,
+units, description, metric kind (counter, gauge, or measure), number
+kind (int64 or float64), and whether the instrument has alternate
+semantics or not (i.e., monotonic=false counter, monotonic=true gauge,
+absolute=false measure).  A Descriptor accompanies metric data as it
+passes through the export pipeline.
+
+The AggregationSelector interface supports choosing the method of
+aggregation to apply to a particular instrument.  Given the
+Descriptor, this AggregatorFor method returns an implementation of
+Aggregator.  If this interface returns nil, the metric will be
+disabled.  The aggregator should be matched to the capabilities of the
+exporter.  Selecting the aggregator for counter and gauge instruments
+is relatively straightforward, but for measure instruments there are
+numerous choices with different cost and quality tradeoffs.
+
+Aggregator is an interface which implements a concrete strategy for
+aggregating metric updates.  Several Aggregator implementations are
+provided by the SDK.  Aggregators may be lock-free or use locking,
+depending on their structure and semantics.  Aggregators implement an
+Update method, called in instrumentation context, to receive a single
+metric event.  Aggregators implement a Checkpoint method, called in
+collection context, to save a checkpoint of the current state.
+Aggregators implement a Merge method, also called in collection
+context, that combines state from two aggregators into one.  Each SDK
+record has an associated aggregator.
+
+Batcher is an interface which sits between the SDK and an exporter.
+The Batcher embeds an AggregationSelector, used by the SDK to assign
+new Aggregators.  The Batcher supports a Process() API for submitting
+checkpointed aggregators to the batcher, and a CheckpointSet() API
+for producing a complete checkpoint for the exporter.  Two default
+Batcher implementations are provided, the "defaultkeys" Batcher groups
+aggregate metrics by their recommended Descriptor.Keys(), the
+"ungrouped" Batcher aggregates metrics at full dimensionality.
+
+LabelEncoder is an optional optimization that allows an exporter to
+provide the serialization logic for labels.  This allows avoiding
+duplicate serialization of labels, once as a unique key in the SDK (or
+Batcher) and once in the exporter.
+
+CheckpointSet is an interface between the Batcher and the Exporter.
+After completing a collection pass, the Batcher.CheckpointSet() method
+returns a CheckpointSet, which the Exporter uses to iterate over all
+the updated metrics.
+
+Record is a struct containing the state of an individual exported
+metric.  This is the result of one collection interface for one
+instrument and one label set.
+
+Labels is a struct containing an ordered set of labels, the
+corresponding unique encoding, and the encoder that produced it.
+
+Exporter is the final stage of an export pipeline.  It is called with
+a CheckpointSet capable of enumerating all the updated metrics.
+
+Controller is not an export interface per se, but it orchestrates the
+export pipeline.  For example, a "push" controller will establish a
+periodic timer to regularly collect and export metrics.  A "pull"
+controller will await a pull request before initiating metric
+collection.  Either way, the job of the controller is to call the SDK
+Collect() method, then read the checkpoint, then invoke the exporter.
+Controllers are expected to implement the public metric.MeterProvider
+API, meaning they can be installed as the global Meter provider.

-The purpose of the two lists: the primary list is appended-to when
-new handles are created and atomically cleared during collect.  The
-reclaim list is used as a second chance, in case there is a race
-between looking up a record and record deletion.
 */
-package metric
+package metric // import "go.opentelemetry.io/otel/sdk/metric"