Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: metrics #1051

Merged
merged 8 commits into from
Jan 11, 2022
Merged

fix: metrics #1051

merged 8 commits into from
Jan 11, 2022

Conversation

alanshaw
Copy link
Contributor

@alanshaw alanshaw commented Jan 10, 2022

  • They should not be generated on-the-fly
    • It restricts us to 30s per query (pgrest heroku dyno)
    • Can cause us problems with prom agent reading them (it times out)
    • ...needs to cache or expose to perf issues
  • They should be calculated from the replica
    • So they don't impact performance of the production DB

This PR:

  • Stores the current value for each metric in the DB
  • Adds a cron job for re-calculating metrics every 10 minutes
    • Connects to the replica DB to calculate stats
    • Writes to a new table metric
  • Updates the /metrics endpoint to query the metric table for the latest value

@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Jan 10, 2022

Deploying with  Cloudflare Pages  Cloudflare Pages

Latest commit: cf49f10
Status: ✅  Deploy successful!
Preview URL: https://a9eae64a.nft-storage-1at.pages.dev

View logs

@alanshaw alanshaw marked this pull request as ready for review January 10, 2022 18:02
Copy link
Contributor

@vasco-santos vasco-santos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall

Minor details commented inline

@@ -14,6 +14,7 @@ export async function dbTypesCmd() {

try {
await dbSqlCmd({ cargo: true, testing: true })
await delay(2000)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I was getting a socket hangup error. I think because the new postgrest version restarts itself when changes are made. It just needs a second or two to come back.

)
return { totals: Object.assign({}, ...totals) }
}
const UPLOAD_TYPES = ['Car', 'Blob', 'Multipart', 'Remote', 'Nft']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it would be great to have these in a single place. Currently, we have them at list here and in the cron code. This might be error prone in the case of a new PinService, or type of upload as we can forget to update the other side.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already do this in the cron package:

import { DBClient } from '../../../api/src/utils/db-client.js'

So I think should be ok to import them...

packages/cron/src/jobs/metrics.js Outdated Show resolved Hide resolved
await pg.query(UPDATE_METRIC, [metric.name, metric.value])
}

if (error) throw error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call that we update what we can and only error in the end. We still need to create an issue to integrate these errors with an alert system.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw this in the pins job

- name: Heartbeat
if: ${{ success() }}
run: ./packages/tools/cli.js heartbeat --token ${{ secrets.OPSGENIE_KEY }} --name cron-nft-pins

Is that what we want @hugomrdias?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, I think that's it. Not sure if we need any config in the expected opsgenie heartbeat timings

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants