Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abstract Data Storage away from Metric objects, introduce Swappable Data Stores, and support Multiprocess Pre-fork Servers #95

Merged
merged 18 commits into from
Apr 24, 2019

Commits on Mar 4, 2019

  1. Remove deprecation notice from 2 years ago

    Signed-off-by: Daniel Magliola <danielmagliola@gocardless.com>
    Daniel Magliola committed Mar 4, 2019
    Configuration menu
    Copy the full SHA
    d85f7dc View commit details
    Browse the repository at this point in the history
  2. Remove support for Ruby 1.9.3 and update current versions

    Ruby < 2.0 has been EOL'd over 3 years ago, and we want to use kwargs,
    so we're dropping support for it.
    
    Signed-off-by: Daniel Magliola <danielmagliola@gocardless.com>
    Daniel Magliola committed Mar 4, 2019
    Configuration menu
    Copy the full SHA
    457d674 View commit details
    Browse the repository at this point in the history
  3. Add keyword arguments to the API where it makes sense

    This should make it cleaner and more obvious to call into these methods.
    
    Signed-off-by: Daniel Magliola <danielmagliola@gocardless.com>
    Daniel Magliola committed Mar 4, 2019
    Configuration menu
    Copy the full SHA
    160c25e View commit details
    Browse the repository at this point in the history
  4. Require declaration of labels when creating metrics

    Instead of allowing the first observation / setting of a metric to have
    any labels, and then validating that any further observations match those,
    we now require declaring what labels will be specified at the time of
    instantiating the metric.
    
    The Validator now only needs to verify that the provided labels match the
    expectation, without needing to keep track of the first observation. For
    now, it's still caching the validations, for performance, but that should
    disappear in a soon-to-come commit.
    
    `base_labels` now works slightly different. Instead of being a "base"
    value that is set when creating the metric, and then combined with the
    "other" labels at the time of exporting, it's now called `preset_values`,
    and it that get merged as part of the labels stored with every observation.
    
    This means the storage that holds the values will always have all the labels,
    including the common ones. This allows having a method like the `labels()`
    one recommended by the best practices, to have a "view" of a metric with
    some labels pre-applied. This will be added on a future commit, once we
    have a central data store.
    
    Finally, this needed a little adaptation of the Collector middleware
    that comes bundled in. This Collector automatically tracks metrics with
    some pre-defined labels, but allows the user to define which labels
    they want, on a per-request basis, by specifying a `proc` that will generate
    the labels given a Rack `env`.
    
    Given this design, we can't know what labels the custom `proc` will return
    at the time of initializing the metric, and we don't have an `env` to
    pass to them, so the interface of these procs has changed such that they
    now need to be able to handle an empty `env`, and still return a `hash`
    with all the right keys.
    
    Signed-off-by: Daniel Magliola <danielmagliola@gocardless.com>
    Daniel Magliola committed Mar 4, 2019
    Configuration menu
    Copy the full SHA
    1a7235f View commit details
    Browse the repository at this point in the history
  5. Remove quantile calculation from Summary

    Expose only `sum` and `count` instead, with no quantiles / percentiles.
    
    The main reason to do this is that all this change is based on the idea
    of having a "Store" interface that can be shared between Ruby processes.
    
    The `quantile` gem doesn't play well with this, since we'd need to store
    instances of `Quantile::Estimator`, which is a complex data structure, and
    tricky to share between Ruby processes.
    
    Moreover, individual Summaries on different processes cannot be aggregated,
    so all processes would actually have to share one instance of this class,
    which makes it extremely tricky, particularly to do performantly.
    
    Even though this is a loss of functionality, this puts the Ruby client
    on par with other client libraries, like the Python one, which also only
    offers sum and count without quantiles.
    
    Also, this is actually more compliant with the Client Library best practices:
    - Summary is ENCOURAGED to offer quantiles as exports
    - MUST allow not having quantiles, since just count and sum is quite useful
    - This MUST be the default
    
    The original client didn't comply with the last 2 rules, where this one
    does, just like the Python client.
    
    And quantiles, while seemingly the point of summaries, are encouraged but
    not required.
    
    I'm not discarding the idea of adding quantiles back. I have ideas on how
    this could be done, but it'd probably be pretty expensive, as a single
    Estimator instance would have to be marshalled / unmarshalled into bytes
    for every call to `observe`, and it'd have a hard requirement of having
    some sort of low-cardinality Process ID as a label (possible with Unicorn)
    to avoid aggregation.
    
    Signed-off-by: Daniel Magliola <danielmagliola@gocardless.com>
    Daniel Magliola committed Mar 4, 2019
    Configuration menu
    Copy the full SHA
    ce6d44d View commit details
    Browse the repository at this point in the history
  6. Add concurrent-ruby gem

    The reason we need this gem is to be able to provide a good concurrency
    story in our "Data Stores".
    
    We want the data stores to be thread-safe to make it easy for metrics to
    use them, and to shield away the complexity so that the store can do what's
    most efficient, and metrics don't need to make assumptions on how it
    works.
    
    However, we also need metrics to be able to update multiple values at once
    so in some cases (most notably, histograms), the store does need to provide
    a synchronization method.
    
    Since the normal `set` / `increment` methods call a Mutex already, having
    a Mutex block around them means we end up calling `Mutex.synchronize` twice,
    which *should* work, but Ruby Mutexes are not reentrant.
    
    `concurrent-ruby` provides a Reentrant lock. It's a Read-Write Lock, not
    a Mutex, but by always grabbing Write locks, it's functionally the same
    
    Signed-off-by: Daniel Magliola <danielmagliola@gocardless.com>
    Daniel Magliola committed Mar 4, 2019
    Configuration menu
    Copy the full SHA
    b787e37 View commit details
    Browse the repository at this point in the history
  7. Abstract away storage of Metric data into DataStore objects

    Currently, each Metric object stores its data (labels and values) in a Hash
    object internal to itself. This change introduces the concept of a Data Store
    that is external to the metrics themselves, which holds all the data.
    
    The main reason to do this is that by having a standardized and simple
    interface that metrics use to access the store, we abstract away the details
    of storing the data from the specific needs of each metric.
    
    This allows us to then simply swap around the stores based on the needs
    of different circumstances, with no changes to the rest of the client. The
    main use case for this is solving the "pre-fork" server scenario, where
    multiple processes need to share the metric storage, to be able to report
    coherent numbers to Prometheus.
    
    This change provides one store, a Synchronized hash, that pretty much has
    the same characteristics that the existing Storage had, plus some Mutex overhead.
    Within a single process scenario, this should be the fastest way to operate in a
    multi-threaded environment, and the safest. As such, this is the default store.
    
    Future commits will introduce new ones, for specific scenarios, and also each
    consumer of the Prometheus Client can make their own and simply swap them for
    the built-in ones, if they have specific needs.
    
    The interface and requirements of Stores are specified in detail in a README.md
    file in the `client/data_stores` directory. This is the documentation that must
    be used by anyone wishing to create their own store.
    
    Signed-off-by: Daniel Magliola <danielmagliola@gocardless.com>
    Daniel Magliola committed Mar 4, 2019
    Configuration menu
    Copy the full SHA
    a951c22 View commit details
    Browse the repository at this point in the history
  8. Add DataStores::SingleThreaded

    This is the simplest possible store we can have, which simply has a hash
    for each metric, and no synchronization code whatsoever.
    
    It can only be used in single-threaded, single-process scenarios, but for
    those, it should be the fastest option.
    
    Signed-off-by: Daniel Magliola <danielmagliola@gocardless.com>
    Daniel Magliola committed Mar 4, 2019
    Configuration menu
    Copy the full SHA
    3178104 View commit details
    Browse the repository at this point in the history
  9. Add DataStores::DirectFileStore

    This store keeps all its data in files, one file per process and per metric.
    These files have binary data mapping keys to Floats, and they're read/written
    at the specific offsets of these Floats.
    
    This is generally the recommended store to use to deal with pre-fork servers
    and other "multi-process" scenarios, at least until we crack `mmap`.
    
    This seems to be, for now, the fastest possible way to safely share data
    between all the processes. Because we never actually `fsync`, we never
    actually touch the disk, and FS caching makes this extremely fast. Almost
    as fast as the `mmaps` without all the added risk, or the burden of the
    C extensions.
    
    Each process gets their own file for a metric. When a Prometheus scrape
    is received in one of the processes, it finds all the files for a metric,
    reads their values and aggregates them. We use `flock` to guarantee
    consistency.
    
    Signed-off-by: Daniel Magliola <danielmagliola@gocardless.com>
    Daniel Magliola committed Mar 4, 2019
    Configuration menu
    Copy the full SHA
    5917ad8 View commit details
    Browse the repository at this point in the history
  10. Change text_spec.rb to be more like an integration test

    The existing spec on `Formats::Text` is based on mocking a registry with
    fake metrics and values inside, which means if anything changes in the
    interface for metrics, the test will not catch it.
    
    And there's no test validating that interface.
    
    This test is more realistic, and it actually catches the kind of bugs
    we introduced in the process of this refactor
    
    Signed-off-by: Daniel Magliola <danielmagliola@gocardless.com>
    Daniel Magliola committed Mar 4, 2019
    Configuration menu
    Copy the full SHA
    403b31a View commit details
    Browse the repository at this point in the history
  11. Update Histogram to return buckets as strings

    In the Text export format, the `le` label is reported as a string.
    
    Moreover, some of our stores may coerce Float label values into Strings.
    This is fine, since Labelsets should be a hash of symbols to strings.
    It also makes sense since one of those "buckets" will be "+Inf".
    
    It's also somewhat unavoidable, since neither the Store nor the Metric
    should be coercing those back into Floats, so report the buckets as
    strings instead of Floats.
    
    Signed-off-by: Daniel Magliola <danielmagliola@gocardless.com>
    Daniel Magliola committed Mar 4, 2019
    Configuration menu
    Copy the full SHA
    ac9a514 View commit details
    Browse the repository at this point in the history
  12. Accumulate Histogram Buckets on Export, not Observe

    In a Histogram, each bucket's count includes also the counts of all
    buckets smaller than it.
    
    The original code for Histograms would handle this at `observe` time,
    by incrementing all buckets bigger than the observed sample.
    
    Since calls to the Data Store may be expensive, and also all these
    increments need to be done atomically to guarantee the export doesn't
    have inconsistent values, this can have a big effect on performance.
    
    With this change, at `observe` time, we only increase the bucket the
    sample falls in, and the `sum`, and do the accumulation when reading
    the Histogram, which makes observations about 30% to 40% faster with a
    "uniform" distribution of sample. A sample skewed towards smaller buckets
    should show a much larger performance gain.
    
    With this change, we also remove the `count` key from the result of
    calling `get`. This key is redundant (it's the same value as `+Inf`,
    and rendering this as `count` is the job of the Text Exporter already,
    `get` shouldn't be returning it.
    
    Signed-off-by: Daniel Magliola <danielmagliola@gocardless.com>
    Daniel Magliola committed Mar 4, 2019
    Configuration menu
    Copy the full SHA
    190f770 View commit details
    Browse the repository at this point in the history
  13. Improve performance of Histograms

    Merging the `le` key into the `base_label_set` is where most time is
    spent, surprisingly.
    
    Also somewhat surprisingly, `dup` + setting a key is faster than
    `.merge`.
    
    This version is less pretty, but makes `observe` significantly faster
    
    For some stores (the ones that serialize the hash into a string), we
    could simply set the key without the `dup`, which makes this method
    about 3x faster. However, this would require calling `dup` into all
    Hash-based stores (or serializing the Hash into a string), which would
    make THEM significantly slower.
    
    Signed-off-by: Daniel Magliola <danielmagliola@gocardless.com>
    Daniel Magliola committed Mar 4, 2019
    Configuration menu
    Copy the full SHA
    9d39313 View commit details
    Browse the repository at this point in the history
  14. Add Metric#with_labels method

    `with_labels` returns a metric object, augmented with some pre-set labels.
    This is useful to be able to reduce repetition if a certain part of the
    code is passing the same label over and over again, but the label can
    have different values in different parts of the codebase (so passing in
    `preset_labels` when declaring the metric is not appropriate.
    
    Signed-off-by: Daniel Magliola <danielmagliola@gocardless.com>
    Daniel Magliola committed Mar 4, 2019
    Configuration menu
    Copy the full SHA
    c1d6126 View commit details
    Browse the repository at this point in the history
  15. Short-circuit label validation on Metric observations

    Every time a metric gets observed / incremented, we are composing the
    labels hash, merging the labels that were just passed in as part of the
    observation with the metric's `preset_labels`, and validating that the
    labelset is valid.
    
    If all labels are pre-set already, however (either as `preset_labels` when
    declaring the metric, or by use of `with_labels`), we can validate when
    the labels are pre-set, and then skip the validation on every observation,
    which can lead to some decent time savings if a metric is observed often.
    
    Signed-off-by: Daniel Magliola <danielmagliola@gocardless.com>
    Daniel Magliola committed Mar 4, 2019
    Configuration menu
    Copy the full SHA
    3862c1c View commit details
    Browse the repository at this point in the history
  16. Rename confusing methods in LabelSetValidator

    LabelSetValidator has methods `valid?` and `validate`. It's not very clear
    what these do, what's the difference between them, and the name `valid?`
    would imply (by convention) that it returns a Boolean, rather than raising
    an exception on invalid input.
    
    Renamed these to `validate_symbols!` and `validate_labelset!`, to make it
    clearer what each of them do. The `!` also follows the usual Ruby
    convention of "do X, and if that doesn't work, Raise"
    
    This commit also starts drawing the distinction between an array of `labels`,
    and a hash of label keys and values (which we call a `labelset`). The term
    `labels` is used interchangeably for both concepts throughout the code.
    This commit doesn't fix all instances, but it's a step in that direction.
    
    Signed-off-by: Daniel Magliola <danielmagliola@gocardless.com>
    Daniel Magliola committed Mar 4, 2019
    Configuration menu
    Copy the full SHA
    2347e2a View commit details
    Browse the repository at this point in the history
  17. Add Performance Benchmarks

    These benchmarks are useful to know what kind of performance to expect
    from metrics, in scenarios where consumers may be sensitive to performance,
    and also to help teams that need to build their own stores, to benchmark
    them and test them.
    
    Signed-off-by: Daniel Magliola <danielmagliola@gocardless.com>
    Daniel Magliola committed Mar 4, 2019
    Configuration menu
    Copy the full SHA
    5cae3b4 View commit details
    Browse the repository at this point in the history

Commits on Mar 12, 2019

  1. Improve wording around example data store implementation

    Signed-off-by: Chris Sinjakli <chris@gocardless.com>
    Chris Sinjakli committed Mar 12, 2019
    Configuration menu
    Copy the full SHA
    5dce3e5 View commit details
    Browse the repository at this point in the history