1
0
mirror of https://github.com/open-telemetry/opentelemetry-go.git synced 2025-11-25 22:41:46 +02:00

Metrics stdout export pipeline (#265)

* Add MetricAggregator.Merge() implementations

* Update from feedback

* Type

* Ckpt

* Ckpt

* Add push controller

* Ckpt

* Add aggregator interfaces, stdout encoder

* Modify basic main.go

* Main is working

* Batch stdout output

* Sum udpate

* Rename stdout

* Add stateless/stateful Batcher options

* Undo a for-loop in the example, remove a done TODO

* Update imports

* Add note

* Rename defaultkeys

* Support variable label encoder to speed OpenMetrics/Statsd export

* Lint

* Doc

* Precommit/lint

* Simplify Aggregator API

* Record->Identifier

* Remove export.Record a.k.a. Identifier

* Checkpoint

* Propagate errors to the SDK, remove a bunch of 'TODO warn'

* Checkpoint

* Introduce export.Labels

* Comments in export/metric.go

* Comment

* More merge

* More doc

* Complete example

* Lint fixes

* Add a testable example

* Lint

* Let Export return an error

* add a basic stdout exporter test

* Add measure test; fix aggregator APIs

* Use JSON numbers, not strings

* Test stdout exporter error

* Add a test for the call to RangeTest

* Add error handler API to improve correctness test; return errors from RecordOne

* Undo the previous -- do not expose errors

* Add simple selector variations, test

* Repair examples

* Test push controller error handling

* Add SDK label encoder tests

* Add a defaultkeys batcher test

* Add an ungrouped batcher test

* Lint new tests

* Respond to krnowak's feedback

* Undo comment

* Use concrete receivers for export records and labels, since the constructors return structs not pointers

* Bug fix for stateful batchers; clone an aggregator for long term storage

* Remove TODO addressed in #318

* Add errors to all aggregator interfaces

* Handle ErrNoLastValue case in stdout exporter

* Move aggregator API into sdk/export/metric/aggregator

* Update all aggregator exported-method comments

* Document the aggregator APIs

* More aggregator comments

* Add multiple updates to the ungrouped test

* Fixes for feedback from Gustavo and Liz

* Producer->CheckpointSet; add FinishedCollection

* Process takes an export.Record

* ReadCheckpoint->CheckpointSet

* EncodeLabels->Encode

* Format a better inconsistent type error; add more aggregator API tests

* More RangeTest test coverage

* Make benbjohnson/clock a test-only dependency

* Handle ErrNoLastValue in stress_test
This commit is contained in:
Joshua MacDonald
2019-11-15 13:01:20 -08:00
committed by rghetia
parent c3d5b7b16d
commit 9878f3b700
48 changed files with 3312 additions and 491 deletions

View File

@@ -13,48 +13,157 @@
// limitations under the License.
/*
Package metric implements the OpenTelemetry metric.Meter API. The SDK
supports configurable metrics export behavior through a collection of
export interfaces that support various export strategies, described below.
Package metric implements the OpenTelemetry `Meter` API. The SDK
supports configurable metrics export behavior through a
`export.MetricBatcher` API. Most metrics behavior is controlled
by the `MetricBatcher`, including:
The metric.Meter API consists of methods for constructing each of the
basic kinds of metric instrument. There are six types of instrument
available to the end user, comprised of three basic kinds of metric
instrument (Counter, Gauge, Measure) crossed with two kinds of number
(int64, float64).
1. Selecting the concrete type of aggregation to use
2. Receiving exported data during SDK.Collect()
The API assists the SDK by consolidating the variety of metric instruments
into a narrower interface, allowing the SDK to avoid repetition of
boilerplate. The API and SDK are separated such that an event reaching
the SDK has a uniform structure: an instrument, a label set, and a
numerical value.
The call to SDK.Collect() initiates collection. The SDK calls the
`MetricBatcher` for each current record, asking the aggregator to
export itself. Aggregators, found in `./aggregators`, are responsible
for receiving updates and exporting their current state.
To this end, the API uses a core.Number type to represent either an int64
or a float64, depending on the instrument's definition. A single
implementation interface is used for instruments, metric.InstrumentImpl,
and a single implementation interface is used for handles,
metric.HandleImpl.
The SDK.Collect() API should be called by an exporter. During the
call to Collect(), the exporter receives calls in a single-threaded
context. No locking is required because the SDK.Collect() call
prevents concurrency.
There are three entry points for events in the Metrics API: via instrument
handles, via direct instrument calls, and via BatchRecord. The SDK is
designed with handles as the primary entry point, the other two entry
points are implemented in terms of short-lived handles. For example, the
implementation of a direct call allocates a handle, operates on the
handle, and releases the handle. Similarly, the implementation of
RecordBatch uses a short-lived handle for each measurement in the batch.
The SDK uses lock-free algorithms to maintain its internal state.
There are three central data structures at work:
Internal Structure
1. A sync.Map maps unique (InstrumentID, LabelSet) to records
2. A "primary" atomic list of records
3. A "reclaim" atomic list of records
The SDK is designed with minimal use of locking, to avoid adding
contention for user-level code. For each handle, whether it is held by
user-level code or a short-lived device, there exists an internal record
managed by the SDK. Each internal record corresponds to a specific
instrument and label set combination.
Collection is oriented around epochs. The SDK internally has a
notion of the "current" epoch, which is incremented each time
Collect() is called. Records contain two atomic counter values,
the epoch in which it was last modified and the epoch in which it
was last collected. Records may be garbage collected when the
epoch in which they were last updated is less than the epoch in
which they were last collected.
A sync.Map maintains the mapping of current instruments and label sets to
internal records. To create a new handle, the SDK consults the Map to
locate an existing record, otherwise it constructs a new record. The SDK
maintains a count of the number of references to each record, ensuring
that records are not reclaimed from the Map while they are still active
from the user's perspective.
Collect() performs a record-by-record scan of all active records
and exports their current state, before incrementing the current
epoch. Collection events happen at a point in time during
`Collect()`, but all records are not collected in the same instant.
Metric collection is performed via a single-threaded call to Collect that
sweeps through all records in the SDK, checkpointing their state. When a
record is discovered that has no references and has not been updated since
the prior collection pass, it is marked for reclamation and removed from
the Map. There exists, at this moment, a race condition since another
goroutine could, in the same instant, obtain a reference to the handle.
The SDK is designed to tolerate this sort of race condition, in the name
of reducing lock contention. It is possible for more than one record with
identical instrument and label set to exist simultaneously, though only
one can be linked from the Map at a time. To avoid lost updates, the SDK
maintains two additional linked lists of records, one managed by the
collection code path and one managed by the instrumentation code path.
The SDK maintains a current epoch number, corresponding to the number of
completed collections. Each record contains the last epoch during which
it was collected and updated. These variables allow the collection code
path to detect stale records while allowing the instrumentation code path
to detect potential reclamations. When the instrumentation code path
detects a potential reclamation, it adds itself to the second linked list,
where records are saved from reclamation.
Each record has an associated aggregator, which maintains the current
state resulting from all metric events since its last checkpoint.
Aggregators may be lock-free or they may use locking, but they should
expect to be called concurrently. Because of the tolerated race condition
described above, aggregators must be capable of merging with another
aggregator of the same type.
Export Pipeline
While the SDK serves to maintain a current set of records and
coordinate collection, the behavior of a metrics export pipeline is
configured through the export types in
go.opentelemetry.io/otel/sdk/export/metric. It is important to keep
in mind the context these interfaces are called from. There are two
contexts, instrumentation context, where a user-level goroutine that
enters the SDK resulting in a new record, and collection context,
where a system-level thread performs a collection pass through the
SDK.
Descriptor is a struct that describes the metric instrument to the
export pipeline, containing the name, recommended aggregation keys,
units, description, metric kind (counter, gauge, or measure), number
kind (int64 or float64), and whether the instrument has alternate
semantics or not (i.e., monotonic=false counter, monotonic=true gauge,
absolute=false measure). A Descriptor accompanies metric data as it
passes through the export pipeline.
The AggregationSelector interface supports choosing the method of
aggregation to apply to a particular instrument. Given the
Descriptor, this AggregatorFor method returns an implementation of
Aggregator. If this interface returns nil, the metric will be
disabled. The aggregator should be matched to the capabilities of the
exporter. Selecting the aggregator for counter and gauge instruments
is relatively straightforward, but for measure instruments there are
numerous choices with different cost and quality tradeoffs.
Aggregator is an interface which implements a concrete strategy for
aggregating metric updates. Several Aggregator implementations are
provided by the SDK. Aggregators may be lock-free or use locking,
depending on their structure and semantics. Aggregators implement an
Update method, called in instrumentation context, to receive a single
metric event. Aggregators implement a Checkpoint method, called in
collection context, to save a checkpoint of the current state.
Aggregators implement a Merge method, also called in collection
context, that combines state from two aggregators into one. Each SDK
record has an associated aggregator.
Batcher is an interface which sits between the SDK and an exporter.
The Batcher embeds an AggregationSelector, used by the SDK to assign
new Aggregators. The Batcher supports a Process() API for submitting
checkpointed aggregators to the batcher, and a CheckpointSet() API
for producing a complete checkpoint for the exporter. Two default
Batcher implementations are provided, the "defaultkeys" Batcher groups
aggregate metrics by their recommended Descriptor.Keys(), the
"ungrouped" Batcher aggregates metrics at full dimensionality.
LabelEncoder is an optional optimization that allows an exporter to
provide the serialization logic for labels. This allows avoiding
duplicate serialization of labels, once as a unique key in the SDK (or
Batcher) and once in the exporter.
CheckpointSet is an interface between the Batcher and the Exporter.
After completing a collection pass, the Batcher.CheckpointSet() method
returns a CheckpointSet, which the Exporter uses to iterate over all
the updated metrics.
Record is a struct containing the state of an individual exported
metric. This is the result of one collection interface for one
instrument and one label set.
Labels is a struct containing an ordered set of labels, the
corresponding unique encoding, and the encoder that produced it.
Exporter is the final stage of an export pipeline. It is called with
a CheckpointSet capable of enumerating all the updated metrics.
Controller is not an export interface per se, but it orchestrates the
export pipeline. For example, a "push" controller will establish a
periodic timer to regularly collect and export metrics. A "pull"
controller will await a pull request before initiating metric
collection. Either way, the job of the controller is to call the SDK
Collect() method, then read the checkpoint, then invoke the exporter.
Controllers are expected to implement the public metric.MeterProvider
API, meaning they can be installed as the global Meter provider.
The purpose of the two lists: the primary list is appended-to when
new handles are created and atomically cleared during collect. The
reclaim list is used as a second chance, in case there is a race
between looking up a record and record deletion.
*/
package metric
package metric // import "go.opentelemetry.io/otel/sdk/metric"