Abstract Data Storage away from Metric objects, introduce Swappable Data Stores, and support Multiprocess Pre-fork Servers #95

Signed-off-by: Daniel Magliola <danielmagliola@gocardless.com>

Ruby < 2.0 has been EOL'd over 3 years ago, and we want to use kwargs, so we're dropping support for it. Signed-off-by: Daniel Magliola <danielmagliola@gocardless.com>

This should make it cleaner and more obvious to call into these methods. Signed-off-by: Daniel Magliola <danielmagliola@gocardless.com>

Instead of allowing the first observation / setting of a metric to have any labels, and then validating that any further observations match those, we now require declaring what labels will be specified at the time of instantiating the metric. The Validator now only needs to verify that the provided labels match the expectation, without needing to keep track of the first observation. For now, it's still caching the validations, for performance, but that should disappear in a soon-to-come commit. `base_labels` now works slightly different. Instead of being a "base" value that is set when creating the metric, and then combined with the "other" labels at the time of exporting, it's now called `preset_values`, and it that get merged as part of the labels stored with every observation. This means the storage that holds the values will always have all the labels, including the common ones. This allows having a method like the `labels()` one recommended by the best practices, to have a "view" of a metric with some labels pre-applied. This will be added on a future commit, once we have a central data store. Finally, this needed a little adaptation of the Collector middleware that comes bundled in. This Collector automatically tracks metrics with some pre-defined labels, but allows the user to define which labels they want, on a per-request basis, by specifying a `proc` that will generate the labels given a Rack `env`. Given this design, we can't know what labels the custom `proc` will return at the time of initializing the metric, and we don't have an `env` to pass to them, so the interface of these procs has changed such that they now need to be able to handle an empty `env`, and still return a `hash` with all the right keys. Signed-off-by: Daniel Magliola <danielmagliola@gocardless.com>

Expose only `sum` and `count` instead, with no quantiles / percentiles. The main reason to do this is that all this change is based on the idea of having a "Store" interface that can be shared between Ruby processes. The `quantile` gem doesn't play well with this, since we'd need to store instances of `Quantile::Estimator`, which is a complex data structure, and tricky to share between Ruby processes. Moreover, individual Summaries on different processes cannot be aggregated, so all processes would actually have to share one instance of this class, which makes it extremely tricky, particularly to do performantly. Even though this is a loss of functionality, this puts the Ruby client on par with other client libraries, like the Python one, which also only offers sum and count without quantiles. Also, this is actually more compliant with the Client Library best practices: - Summary is ENCOURAGED to offer quantiles as exports - MUST allow not having quantiles, since just count and sum is quite useful - This MUST be the default The original client didn't comply with the last 2 rules, where this one does, just like the Python client. And quantiles, while seemingly the point of summaries, are encouraged but not required. I'm not discarding the idea of adding quantiles back. I have ideas on how this could be done, but it'd probably be pretty expensive, as a single Estimator instance would have to be marshalled / unmarshalled into bytes for every call to `observe`, and it'd have a hard requirement of having some sort of low-cardinality Process ID as a label (possible with Unicorn) to avoid aggregation. Signed-off-by: Daniel Magliola <danielmagliola@gocardless.com>

The reason we need this gem is to be able to provide a good concurrency story in our "Data Stores". We want the data stores to be thread-safe to make it easy for metrics to use them, and to shield away the complexity so that the store can do what's most efficient, and metrics don't need to make assumptions on how it works. However, we also need metrics to be able to update multiple values at once so in some cases (most notably, histograms), the store does need to provide a synchronization method. Since the normal `set` / `increment` methods call a Mutex already, having a Mutex block around them means we end up calling `Mutex.synchronize` twice, which *should* work, but Ruby Mutexes are not reentrant. `concurrent-ruby` provides a Reentrant lock. It's a Read-Write Lock, not a Mutex, but by always grabbing Write locks, it's functionally the same Signed-off-by: Daniel Magliola <danielmagliola@gocardless.com>

Currently, each Metric object stores its data (labels and values) in a Hash object internal to itself. This change introduces the concept of a Data Store that is external to the metrics themselves, which holds all the data. The main reason to do this is that by having a standardized and simple interface that metrics use to access the store, we abstract away the details of storing the data from the specific needs of each metric. This allows us to then simply swap around the stores based on the needs of different circumstances, with no changes to the rest of the client. The main use case for this is solving the "pre-fork" server scenario, where multiple processes need to share the metric storage, to be able to report coherent numbers to Prometheus. This change provides one store, a Synchronized hash, that pretty much has the same characteristics that the existing Storage had, plus some Mutex overhead. Within a single process scenario, this should be the fastest way to operate in a multi-threaded environment, and the safest. As such, this is the default store. Future commits will introduce new ones, for specific scenarios, and also each consumer of the Prometheus Client can make their own and simply swap them for the built-in ones, if they have specific needs. The interface and requirements of Stores are specified in detail in a README.md file in the `client/data_stores` directory. This is the documentation that must be used by anyone wishing to create their own store. Signed-off-by: Daniel Magliola <danielmagliola@gocardless.com>

This is the simplest possible store we can have, which simply has a hash for each metric, and no synchronization code whatsoever. It can only be used in single-threaded, single-process scenarios, but for those, it should be the fastest option. Signed-off-by: Daniel Magliola <danielmagliola@gocardless.com>

This store keeps all its data in files, one file per process and per metric. These files have binary data mapping keys to Floats, and they're read/written at the specific offsets of these Floats. This is generally the recommended store to use to deal with pre-fork servers and other "multi-process" scenarios, at least until we crack `mmap`. This seems to be, for now, the fastest possible way to safely share data between all the processes. Because we never actually `fsync`, we never actually touch the disk, and FS caching makes this extremely fast. Almost as fast as the `mmaps` without all the added risk, or the burden of the C extensions. Each process gets their own file for a metric. When a Prometheus scrape is received in one of the processes, it finds all the files for a metric, reads their values and aggregates them. We use `flock` to guarantee consistency. Signed-off-by: Daniel Magliola <danielmagliola@gocardless.com>

The existing spec on `Formats::Text` is based on mocking a registry with fake metrics and values inside, which means if anything changes in the interface for metrics, the test will not catch it. And there's no test validating that interface. This test is more realistic, and it actually catches the kind of bugs we introduced in the process of this refactor Signed-off-by: Daniel Magliola <danielmagliola@gocardless.com>

In the Text export format, the `le` label is reported as a string. Moreover, some of our stores may coerce Float label values into Strings. This is fine, since Labelsets should be a hash of symbols to strings. It also makes sense since one of those "buckets" will be "+Inf". It's also somewhat unavoidable, since neither the Store nor the Metric should be coercing those back into Floats, so report the buckets as strings instead of Floats. Signed-off-by: Daniel Magliola <danielmagliola@gocardless.com>

In a Histogram, each bucket's count includes also the counts of all buckets smaller than it. The original code for Histograms would handle this at `observe` time, by incrementing all buckets bigger than the observed sample. Since calls to the Data Store may be expensive, and also all these increments need to be done atomically to guarantee the export doesn't have inconsistent values, this can have a big effect on performance. With this change, at `observe` time, we only increase the bucket the sample falls in, and the `sum`, and do the accumulation when reading the Histogram, which makes observations about 30% to 40% faster with a "uniform" distribution of sample. A sample skewed towards smaller buckets should show a much larger performance gain. With this change, we also remove the `count` key from the result of calling `get`. This key is redundant (it's the same value as `+Inf`, and rendering this as `count` is the job of the Text Exporter already, `get` shouldn't be returning it. Signed-off-by: Daniel Magliola <danielmagliola@gocardless.com>

Merging the `le` key into the `base_label_set` is where most time is spent, surprisingly. Also somewhat surprisingly, `dup` + setting a key is faster than `.merge`. This version is less pretty, but makes `observe` significantly faster For some stores (the ones that serialize the hash into a string), we could simply set the key without the `dup`, which makes this method about 3x faster. However, this would require calling `dup` into all Hash-based stores (or serializing the Hash into a string), which would make THEM significantly slower. Signed-off-by: Daniel Magliola <danielmagliola@gocardless.com>

`with_labels` returns a metric object, augmented with some pre-set labels. This is useful to be able to reduce repetition if a certain part of the code is passing the same label over and over again, but the label can have different values in different parts of the codebase (so passing in `preset_labels` when declaring the metric is not appropriate. Signed-off-by: Daniel Magliola <danielmagliola@gocardless.com>

Every time a metric gets observed / incremented, we are composing the labels hash, merging the labels that were just passed in as part of the observation with the metric's `preset_labels`, and validating that the labelset is valid. If all labels are pre-set already, however (either as `preset_labels` when declaring the metric, or by use of `with_labels`), we can validate when the labels are pre-set, and then skip the validation on every observation, which can lead to some decent time savings if a metric is observed often. Signed-off-by: Daniel Magliola <danielmagliola@gocardless.com>

LabelSetValidator has methods `valid?` and `validate`. It's not very clear what these do, what's the difference between them, and the name `valid?` would imply (by convention) that it returns a Boolean, rather than raising an exception on invalid input. Renamed these to `validate_symbols!` and `validate_labelset!`, to make it clearer what each of them do. The `!` also follows the usual Ruby convention of "do X, and if that doesn't work, Raise" This commit also starts drawing the distinction between an array of `labels`, and a hash of label keys and values (which we call a `labelset`). The term `labels` is used interchangeably for both concepts throughout the code. This commit doesn't fix all instances, but it's a step in that direction. Signed-off-by: Daniel Magliola <danielmagliola@gocardless.com>

These benchmarks are useful to know what kind of performance to expect from metrics, in scenarios where consumers may be sensitive to performance, and also to help teams that need to build their own stores, to benchmark them and test them. Signed-off-by: Daniel Magliola <danielmagliola@gocardless.com>

Signed-off-by: Chris Sinjakli <chris@gocardless.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Abstract Data Storage away from Metric objects, introduce Swappable Data Stores, and support Multiprocess Pre-fork Servers #95

Abstract Data Storage away from Metric objects, introduce Swappable Data Stores, and support Multiprocess Pre-fork Servers #95

Commits on Mar 4, 2019

Commits on Mar 12, 2019