Collect & visualise spark-stats API usage based on access tokens #206

bajtos · 2024-12-02T14:50:02Z

We started to hand out API keys to people who signed up with their email for Spark Data offering. Now we would like to understand which user is using Spark API and how frequently.

Docs which the users see: https://filspark.com/api-docs

Notes:

We are not enforcing the access token yet, anonymous usage is allowed. We want to track anonymous requests too.
The API is behind Cloudflare cache, some requests are served from the cache and don't hit our spark-stats service. We want to include the cached requests in the data.
Our API docs used to be slightly wrong. We need to look for the API key in the api-key header & api-key query string in addition to the standard Authorization header in Bearer format.

TODO:

Initial implementation
Setup GitHub Actions for spark-stats Cloudflare Worker #211
Dashboard showing the collected data

Update our internal spark-stats consumers to send an access token:

The text was updated successfully, but these errors were encountered:

pyropy · 2024-12-04T12:35:42Z

Are we currently using any tools for monitoring and collecting logs? Also, what service is responsible for managing API keys and setting rate limits?

juliangruber · 2024-12-04T12:42:00Z

Logs are being collected in Papertrail, which I believe Miro gave you access to.

I believe at the moment API keys are only parsed in Cloudflare, @bajtos correct me if I'm wrong.

pyropy · 2024-12-04T14:46:08Z

Logs are being collected in Papertrail, which I believe Miro gave you access to.

I believe at the moment API keys are only parsed in Cloudflare, @bajtos correct me if I'm wrong.

I don't have access to Papertrail yet.

pyropy · 2024-12-05T10:51:31Z

I believe if we want to track both cached and non-cached requests we'd have to use cloudflare workers. In that case we'd have two options:

Push logs directly from cloudflare worker to papertrail
Log on cloudflare worker and setup cloudflare logpush that would forward cloudflare worker logs over to papertrail

juliangruber · 2024-12-05T12:46:21Z

Instead of pushing logs to papertrail, where we then would parse them (I assume), can we make the cloudflare worker submit information about the request into a DB, via an HTTP API that we create? It could also feed this data into InfluxDB, for time series processing

pyropy · 2024-12-05T13:30:28Z

Instead of pushing logs to papertrail, where we then would parse them (I assume), can we make the cloudflare worker submit information about the request into a DB, via an HTTP API that we create? It could also feed this data into InfluxDB, for time series processing

Yeah I would prefer something like that over the papertrail also. It will be much easier to aggregate and visualize the data.

juliangruber · 2024-12-05T14:07:07Z

Cool, let's either add a route to spark-api then which persists this information in Postgres, or write it to InfluxDB. If both are viable options to you, best create a small document or GitHub comment describing your decision making process (pros cons, etc)

bajtos · 2024-12-05T15:20:31Z

Can we put the new route to spark-stats please? We are measuring spark-stats API usage, nothing from spark-api is involved in this.

bajtos · 2024-12-05T15:22:01Z

Writing from Cloudflare worker directly to InfluxDB may be the simplest option, as we don't need to write any new REST API 💪🏻

juliangruber · 2024-12-05T15:29:32Z

Good point, yes should be in spark-stats

juliangruber · 2024-12-05T15:29:56Z

Writing from Cloudflare worker directly to InfluxDB may be the simplest option, as we don't need to write any new REST API 💪🏻

It also has the advantage of automatic retention policies

pyropy · 2024-12-06T09:01:13Z

Do we already have InfluxDB deployed? If so I'd appreciate access to the instance 🙏🏻

juliangruber · 2024-12-06T11:41:26Z

Invite sent 👌

pyropy · 2024-12-09T14:10:15Z

I have deployed a cloudflare worker that should report data to influxdb. I have yet to configure it properly now.

I am proposing is adding another CNAME record (i.e. stats-test) on filspark.com domain that will point to spark-stats API and setting up cloudflare worker to that route.

After we make sure that cloudflare worker is working correctly, we can setup another CNAME record (i.e. stats-api) on filspark.com domain and point it to the spark-stats API while configuring cloudflare worker route to original CNAME (stats) so all the traffic is routed trough the worker.

juliangruber · 2024-12-09T14:25:01Z

I am proposing is adding another CNAME record (i.e. stats-test) on filspark.com domain that will point to spark-stats API and setting up cloudflare worker to that route.

This would be a kind of staging environment, right? We can ourselves make requests against that endpoint, and verify they end up in InfluxDB correctly.

As long as the Cloudflare worker doesn't negatively affect requests, I don't see an immediate problem with just using it in prod. Can we just make all requests to api.filspark.com go through your worker?

pyropy · 2024-12-09T20:17:00Z

Yeah, it's kind of a separate environment but In this case I didn't use cloudflare worker environments. Generally it shouldn't affect request performance as metrics are reported after the request has been executed.

I have created a repository which itself is a fork of some other repository I've used as started template. In the end I didn't end up using much of the code inside that repository. I am wondering should we fork my repository as an organization or could I create a new repository and push the code. Given that it's a MIT license I guess it isn't legally binding even if we copy code into new repo?

pyropy · 2024-12-10T07:43:56Z

@juliangruber I've tried adding new CNAME records but it seems like there's some issue with TSL certificates (I get error 525 when trying to send a request). Are the certificates managed by cloudflare or it's done outside of cloudflare?

juliangruber · 2024-12-10T10:29:23Z

Generally it shouldn't affect request performance as metrics are reported after the request has been executed.

Ok, let's just attach the worker to the production route then (with manual monitoring that nothing goes wrong), for simplicity.

I have created a repository which itself is a fork of some other repository I've used as started template. In the end I didn't end up using much of the code inside that repository. I am wondering should we fork my repository as an organization or could I create a new repository and push the code. Given that it's a MIT license I guess it isn't legally binding even if we copy code into new repo?

Are you able to create new repos in the Filecoin-station org? I think we can then transfer your repository into it. Yes we could just copy the code over, but forks are convenient because you see where the code came from, and that makes it easier to pull in upstream changes.

juliangruber · 2024-12-10T10:29:41Z

@juliangruber I've tried adding new CNAME records but it seems like there's some issue with TSL certificates (I get error 525 when trying to send a request). Are the certificates managed by cloudflare or it's done outside of cloudflare?

Can we use the existing domain name instead?

pyropy · 2024-12-10T12:48:02Z

@juliangruber I've tried adding new CNAME records but it seems like there's some issue with TSL certificates (I get error 525 when trying to send a request). Are the certificates managed by cloudflare or it's done outside of cloudflare?

Can we use the existing domain name instead?

Yes we could do that, I'm only worried if we're going to miss cache. I've switched worker for few minutes to stats.filspark.com this morning and it reported cache status of DYNAMIC.

juliangruber · 2024-12-10T12:54:31Z

I'm only worried if we're going to miss cache

Does this mean the API cache will be inactive with workers, or that it will be invalidated for every worker update?

pyropy · 2024-12-10T12:59:59Z

I'm only worried if we're going to miss cache

Does this mean the API cache will be inactive with workers, or that it will be invalidated for every worker update?

If the spark-stats server remains proxied behind cloudflare (like it currently is) it should still be proxied (hence that's why I wanted to add new CNAME record). Other option would be to cache responses directly in worker and have custom cache behavior.

juliangruber · 2024-12-10T13:16:00Z

I don't understand. We want spark-stats to stay proxied, so that we can use Cloudflare's caching behavior. We want that proxy to also submit usage data to the API, through a Cloudflare worker. What am I missing?

pyropy · 2024-12-10T13:50:45Z

Missing thing is that cloudflare worker and proxy cannot reside at the same route. What I'm suggesting is that we add new route stats-api.filspark.com and point it to spark-stats server and have cloudflare worker take old route stats.filspark.com

juliangruber · 2024-12-10T14:00:53Z

Got it, I wasn't aware that the worker needs to sit in front of the proxy. This makes sense then. Your graphic helped clarify this 👏 I will take a look at the certificate setup and get back to you

juliangruber · 2024-12-10T14:02:50Z

The problem is that the fly.io deployment requires the hostname to be stats.filspark.com. I will look into it there

juliangruber · 2024-12-10T14:07:05Z

stats-api isn't the most descriptive name to me, what do you think about stats-backend?

Thinking about this more, are you sure this is the right way to go about things? Ie is it a documented pattern to put the worker in front of the proxy? Or are there alternatives, like proxying from inside the worker?

juliangruber · 2024-12-10T14:07:26Z

FYI, https://stats-api.filspark.com works now

pyropy · 2024-12-10T15:21:28Z

stats-api isn't the most descriptive name to me, what do you think about stats-backend?

Thinking about this more, are you sure this is the right way to go about things? Ie is it a documented pattern to put the worker in front of the proxy? Or are there alternatives, like proxying from inside the worker?

We could proxy inside the worker, sure. I guess it would also be more appropriate and also would require a lot less DNS resolutions. I'll see how to use basic cloudflare cache setup with workers.

pyropy · 2024-12-10T16:15:43Z

@juliangruber I have added caching inside cloudflare worker here. With that I guess there's no need for additional CNAME records so I'm going to delete them. I've yet to add tests and alter the docs, but the it has been deployed to cloudflare and it seems to be working.

bajtos · 2024-12-10T16:19:29Z

Implementing a proxy inside a worker feels rather suboptimal to me. Doesn't Cloudflare CDN offer any hooks we can observe from a worker? In the future, we will want to validate the access token and reject requests with missing or invalid access tokens.

(EDITED) ^^^ That's no longer relevant based on what I learned while writing this comment. ^^^

Here is some documentation that may be relevant:

Auth with headers:

Using Cloudflare cache from workers:

Quoting from https://developers.cloudflare.com/workers/reference/how-the-cache-works/

Conceptually, there are two ways to interact with Cloudflare’s Cache using a Worker:
Call to fetch() in a Workers script. Requests proxied through Cloudflare are cached even without Workers according to a zone’s default or configured behavior (for example, static assets like files ending in .jpg are cached by default). Workers can further customize this behavior by:

Setting Cloudflare cache rules (that is, operating on the cf object of a request).

If I am understanding this topic correctly, then we should have the following architecture:

[API client] ---> [Cloudflare worker calling fetch()] --> [Fly.io app]

In other words:

The API client fetches https://stats.filspark.com/{endpoint}
Our Cloudflare Worker is invoked and it fetches https://spark-stats.fly.dev/{endpoint}
Behind the scene, this second fetch may be served from Cloudflare's cache (that's how I understand Cloudflare's architecture)

pyropy · 2024-12-10T17:06:00Z

@bajtos I think you might be right about this. To be honest I was confused by this 👇🏻

Call to fetch() in a Workers script. Requests proxied through Cloudflare are cached even without Workers according to a zone’s default or configured behavior (for example, static assets like files ending in .jpg are cached by default). Workers can further customize this behavior by...

As I understood it, request is cached if server is proxied behind cloudflare, hence me asking for another CNAME that we could put out server behind.

pyropy · 2024-12-17T10:35:57Z

I have added token to data source on https://spacemeridian.grafana.net/ that reads spark-stats API. Are the dashboards built there are using that data source also published on https://grafana.filstation.app/?

If that's the case, if I'd to edit API token directly on the dashboard I am afraid we would risk leaking API token.

juliangruber · 2024-12-17T11:27:38Z

Are the dashboards built there are using that data source also published on https://grafana.filstation.app/?

I don't understand this sentence

pyropy · 2024-12-17T15:09:42Z

Are the dashboards built there are using that data source also published on https://grafana.filstation.app/?

I don't understand this sentence

Woah, I made a mistake. 🤦

I wanted to ask if dashboards built on https://spacemeridian.grafana.net/ are published on https://grafana.filstation.app/ and use data sources defined in https://spacemeridian.grafana.net/?

bajtos · 2024-12-17T15:41:06Z

I wanted to ask if dashboards built on https://spacemeridian.grafana.net/ are published on https://grafana.filstation.app/ and use data sources defined in https://spacemeridian.grafana.net/?

No, these two Grafana instances are completely independent. If we want to use the same datasource from both instances, then we need to configure it twice.

I believe we don't need to expose API usage data publicly yet (if ever), so there is no need to touch https://grafana.filstation.app/.

pyropy · 2024-12-17T16:32:58Z

All requests to spark-stats should be proxied via cloudflare worker and collected data is visualized on grafana dashboard. Since we're not going to setup API key on https://grafana.filstation.app/ I'm going to close this issue.

bajtos mentioned this issue Dec 2, 2024

M4.12 #195

Open

15 tasks

bajtos assigned pyropy Dec 2, 2024

bajtos mentioned this issue Dec 10, 2024

Add cloudflare worker to collect API key usage inside Influx DB filecoin-station/spark-stats-request-metrics#1

Merged

pyropy closed this as completed in filecoin-station/spark-stats-request-metrics#1 Dec 17, 2024

pyropy reopened this Dec 17, 2024

This was referenced Dec 17, 2024

Setup GitHub Actions for spark-stats Cloudflare Worker #211

Closed

Automate testing and deployment using Github Actions filecoin-station/spark-stats-request-metrics#4

Closed

pyropy closed this as completed Dec 17, 2024

Collect & visualise spark-stats API usage based on access tokens #206

Collect & visualise spark-stats API usage based on access tokens #206

Comments

bajtos commented Dec 2, 2024 • edited by pyropy Loading

TODO:

pyropy commented Dec 4, 2024

juliangruber commented Dec 4, 2024

pyropy commented Dec 4, 2024

pyropy commented Dec 5, 2024

juliangruber commented Dec 5, 2024

pyropy commented Dec 5, 2024

juliangruber commented Dec 5, 2024

bajtos commented Dec 5, 2024

bajtos commented Dec 5, 2024

juliangruber commented Dec 5, 2024

juliangruber commented Dec 5, 2024

pyropy commented Dec 6, 2024

juliangruber commented Dec 6, 2024

pyropy commented Dec 9, 2024

juliangruber commented Dec 9, 2024

pyropy commented Dec 9, 2024

pyropy commented Dec 10, 2024 • edited Loading

juliangruber commented Dec 10, 2024

juliangruber commented Dec 10, 2024

pyropy commented Dec 10, 2024

juliangruber commented Dec 10, 2024

pyropy commented Dec 10, 2024

juliangruber commented Dec 10, 2024

pyropy commented Dec 10, 2024

juliangruber commented Dec 10, 2024

juliangruber commented Dec 10, 2024

juliangruber commented Dec 10, 2024

juliangruber commented Dec 10, 2024

pyropy commented Dec 10, 2024

pyropy commented Dec 10, 2024

bajtos commented Dec 10, 2024 • edited Loading

pyropy commented Dec 10, 2024

pyropy commented Dec 17, 2024

juliangruber commented Dec 17, 2024

pyropy commented Dec 17, 2024 • edited Loading

bajtos commented Dec 17, 2024

pyropy commented Dec 17, 2024

bajtos commented Dec 2, 2024 •

edited by pyropy

Loading

pyropy commented Dec 10, 2024 •

edited

Loading

bajtos commented Dec 10, 2024 •

edited

Loading

pyropy commented Dec 17, 2024 •

edited

Loading