Consumption metering RFC #2884

kelvich · 2022-11-22T03:51:41Z

Rendered: https://github.com/neondatabase/neon/blob/metering_rfc/docs/rfcs/021-metering.md

docs/rfcs/021-metering.md

petuhovskiy

A lot of good points in the RFC and discussions. I think this RFC is a good starting point, but we can create more smaller RFCs on smaller independent topics, and move our comments with possible approaches there.

Topics for these RFCs can be:

1. Consumption events calculation

It seems that everyone mostly agrees to the following schema:

All services can produce "consumption events" at any time
We collect all events in a single place
End-user costs are calculated based on these events

We can further discuss how to calculate these events in different services, how often we should report them and what content should be in these events. For example, there was a discussion about what is "Synthetic storage size" and how to calculate it, it can be discussed in this separate RFC.

2. Events collection

This RFC can cover what solution we should use to collect events from all our services, e.g. should we use pull/push model or get a ready-made solution like vector.
We can also cover the format of events, e.g. should we use JSON, protobuf or something else.

Our collection solution should:

don't lose events, and make retries if one of the services is down
don't lose events on restart, that means we likely need to have persistent buffer with events stored on disk
be simple to use and configure, to make it easy to test
be easy to enable collection from new services and disable collection from old stopped services
provide exactly-once event delivery, can be done by appending UUID to events or single batch
have reasonable latency for events delivery

3. Events storage

There is an argument that collecting all events in a single place is bad for scalability, we can discuss this in this RFC. From my point of view, we can easily scale by sharding events by tenant_id. I know that large companies use event sourcing and it works for them, so my opinion is that it should work for us too.

We can discuss where we should store events. Requirements for the storage:

should be able to store a lot of events
it should should be easy to push new events
events should be durably stored for some time
events should be easy to query/consume at any time

Kafka sounds like a good solution for this, but we can discuss other options too.

4. Further processing

We can discuss how we should process events to calculate costs and pipelines for pushing events to other services.

.

I mostly agree with the proposed solution for the first iteration. I see the overall schema as follows:

  ┌──────────────┬────────────────┐
  │  pageserver  │                │
  └──────────────┘      ┌─────────▼──────────┐        ┌───────┐       ┌─────────────┐
                        │                    │        │       │       │             │
  ┌──────────────┐      │ POST /usage-events ├────────► Kafka ├──────►│   console   │
  │    proxy     ├──────►  (control-plane)   │        │       │       │             │
  └──────────────┘      │                    │        └───────┘       └─────────────┘
                        └─────────▲──────────┘
┌──────────────────┐              │
│ autoscaler-agent ├──────────────┘
└──────────────────┘

vadim2404 · 2022-11-30T15:10:23Z

@petuhovskiy, I saw several implementations like you proposed, except for one moment:
the HTTP endpoint was replaced with a local log file that was pushed to Kafka via filebeat directly (https://www.elastic.co/guide/en/beats/filebeat/current/kafka-output.html).

The one point of failure (HTTP endpoint) is replaced with filebeat, which supports retries/continuation from a previously committed state.

kelvich · 2022-12-01T02:29:35Z

We had a call with metronome and it seems that they can't do complicated aggregates on their side. So if we bill for gigabytes per hour we should send one event per hour and they can sum() it. But we can't send event two times more often because they will still interpret it as per-hour event ignoring timestamps and doubling price. So we have to be more accurate with our events and handle some aggregations on our side.

With that I can see few other possible pipelines. Let's assume that we always want to end up with aggregated (1 hour or 1 day) usage events on our side stored long-term.

a) POST /usage-events (control plane) -> kafka -> postgres table with 1h or 1day aggregated events
b) POST /usage-events (control plane) -> postgres table with last hour/day of unaggregated events -(move data with dbt and truncate source table)-> postgres table with 1h or 1day aggregated events
c) same but with clickhouse in the middle and at the end
d) POST to Vector.dev -> vector downsamples events -> something that can accept http jsons and put them in postgres (control plane or PostgREST) -> postgres table with 1h or 1day aggregated events
e) POST to Vector.dev -> vector downsamples events -> clickhouse table with 1h or 1day aggregated events
f) two previous variants with Vector.dev polling prometheus endpoints instead of waiting for POST
g) periodic job to query VictoriaMetrics for last hour of usage and put results in postgres table with 1h or 1day aggregated events

chaporgin · 2022-12-01T10:04:50Z

I personally like d) as an intermediate option. Vector could work with pipelines of HTTP server source -> log to metric transform -> Aggregate transform aggregating needed metrics with needed granularity -> HTTP sink -> console to billing services.

Pipeline sources -> multiple Vector instances for a cell -> Kafka -> single Postgres table for all the cloud -> some accounting engine seems to be a bright future with have following properties: not losing events, tolerant to repeating events, bill multiple events on a per-hour basis, low costs by sending only the needed events to the accounting engine.
Vector:
It gives agility in sources and can accept quite a several formats; HTTP, and Prometheus is easy to implement on storage nodes. Push does not need to implement storage nodes discovery, so the proposed HTTP or statsd seems easier to maintain.
It gives batching, which increases the throughput when working with Kafka.
Not losing events (or losing very little): Vector retries the events.
Kafka
Given Kafka streams, we can aggregate and bill on a per-hour basis, reducing the volume of events (which we are gonna pay for). If the events have an idempotency field (timestamp+hostname) we can probably ignore duplicate events on the Kafka side with log compaction. So at-least-once seems to be OK for us.
Put everything to Postgres with Kafka Postgresql Sink.
Postgres table
Having the events in a Postgres table would allow us to enrich them in the background with client id / other data. Then send it to an external system in the background (Like Metronome, or others) or use our home-grown (which does not exist and is not rocket science to build, but we need to get it first). And add taxes etc. And we could try to show usage from that table to the users and bills from some external system.

But this is a lot to build.

c) the Clickhouse sounds great for showing historical data to downsample it automatically. But it has weak guarantees on ingestion, so I would not use it as a primary data source. But a secondary for graphs, etc.
Option d) seems to be fast to build, but there might be a problem with it in scalability. If we leave >1 instance of vector accepting and aggregating the events, we get each GB-hour of storage billed twice. We could add another layer of single-instanced vector post-aggregating the events for each hour. However, I doubt that would be the consistent solution - it still allows billing each hour twice if events come with significant intervals into that post-aggregator.
And having 1 instance handling all the traffic (15k RPS load) would probably work if we give it enough CPU and mem; it's not that big a deal for vector. I see this scheme as intermediate and thought it would work for sure; we will need to rework ingest pipeline later; we probably do not want to live with a single instance for a long time as long as we don't want to introduce retries to event emitters.
f) needs storage service discovery, I would try to avoid that.
g) not sure we have any guarantees on this metrics storage; probably losing them is not what we want, so I would not rely on Victoria Metrics as a reliable storage for produce billing information heavily.

So I would vote for scheme a) with minor additions (see above), and I would start from the back: deliver data to some accounting engine (which is Orb, it probably has the needed aggregations) with POST /usage via HTTP. That would allow us to start with paying customers. Then add Vector just to bring Kafka later. Then we could add temporary Kafka -> Vector -> POST /usage to just reduce the number of events to the Orb. And then add a Postgres table to store the data ourselves. That would allow us to gradually move step by step to our own solutions while already having a 100% working billing solution. Because building those pipelines from scratch would put us, I would say, human months away from billing customers.

lubennikovaav · 2023-01-16T16:44:19Z

I resolved all the conversations to be able to merge PR.
If some discussions need revival, please, open follow-up issues.

lassizci reviewed Nov 22, 2022

View reviewed changes

docs/rfcs/021-metering.md Show resolved Hide resolved

stepashka reviewed Nov 22, 2022

View reviewed changes

docs/rfcs/021-metering.md Show resolved Hide resolved

stepashka reviewed Nov 22, 2022

View reviewed changes

vadim2404 reviewed Nov 22, 2022

View reviewed changes

kelvich mentioned this pull request Nov 28, 2022

Epic: storage consumption tracking #2941

Closed

5 tasks

LizardWizzard reviewed Nov 28, 2022

View reviewed changes

docs/rfcs/021-metering.md Show resolved Hide resolved

docs/rfcs/021-metering.md Show resolved Hide resolved

chaporgin reviewed Nov 28, 2022

View reviewed changes

petuhovskiy reviewed Nov 30, 2022

View reviewed changes

lubennikovaav mentioned this pull request Jan 12, 2023

Calculate synthetic size worker #3123

Merged

2 tasks

Consumption metering RFC

7841a9f

lubennikovaav force-pushed the metering_rfc branch from 1baedc6 to 7841a9f Compare January 16, 2023 16:36

lubennikovaav marked this pull request as ready for review January 16, 2023 16:37

lubennikovaav self-requested a review January 16, 2023 16:48

lubennikovaav approved these changes Jan 16, 2023

View reviewed changes

lubennikovaav enabled auto-merge (rebase) January 16, 2023 16:52

lubennikovaav merged commit 431e464 into main Jan 16, 2023

lubennikovaav deleted the metering_rfc branch January 16, 2023 17:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consumption metering RFC #2884

Consumption metering RFC #2884

kelvich commented Nov 22, 2022 •

edited

Loading

petuhovskiy left a comment

vadim2404 commented Nov 30, 2022

kelvich commented Dec 1, 2022

chaporgin commented Dec 1, 2022 •

edited

Loading

lubennikovaav commented Jan 16, 2023 •

edited

Loading

Consumption metering RFC #2884

Consumption metering RFC #2884

Conversation

kelvich commented Nov 22, 2022 • edited Loading

petuhovskiy left a comment

Choose a reason for hiding this comment

1. Consumption events calculation

2. Events collection

3. Events storage

4. Further processing

vadim2404 commented Nov 30, 2022

kelvich commented Dec 1, 2022

chaporgin commented Dec 1, 2022 • edited Loading

lubennikovaav commented Jan 16, 2023 • edited Loading

kelvich commented Nov 22, 2022 •

edited

Loading

chaporgin commented Dec 1, 2022 •

edited

Loading

lubennikovaav commented Jan 16, 2023 •

edited

Loading