Feat: Created daily_node_metrics table and received/stored station_id #188

PatrickNercessian · 2024-04-12T21:35:46Z

DRAFT: DO NOT MERGE

bajtos · 2024-04-15T14:51:50Z

lib/platform-stats.js

+      INSERT INTO daily_node_metrics (station_id, metric_date)
+      VALUES ($1, now()::date)
+      ON CONFLICT (station_id, metric_date) DO NOTHING
+    `, [m.station_id]) // TODO: when we add more fields, we should update the ON CONFLICT clause to update the fields


I have a concern about the storage required by this DB schema design. We will store each station ID string in many duplicates (one copy each day).

When I implemented "daily participants," I decided to create a lookup table mapping participant addresses to 32-bit integers and then store those integers in the daily_participants table using a foreign-key constraint.

See #133

The key insight is that for most queries, we don't care what the exact participant address (or station ID) is. We want to count the number of unique addresses/IDs, and that can be achieved by counting the unique FK references.

The downside is that maintaining such a lookup table efficiently requires complex code (see mapParticipantsToIds() in my PR). OTOH, I think it should be fairly easy to generalise that function to support both participant addresses and station IDs by accepting a few more parameters like table name.

WDYT?

I think this makes sense! As devil's advocate though with some napkin math:

If we're averaging about 2000 station nodes per day, and each row is, let's say, 250B, that's 174MB per year in total. A lookup table for station_id (and maybe deployment_type, CPU architecture, etc.) would save us roughly 60-100 bytes per row? So we'd save 55MB per year.

Even if we 15x the number of daily nodes, we're still looking at <1GB per year of data accumulation in savings. Is it worth the relatively complex code logic?

It's possible I missed some detail here or in the math though.

Yeah, I was thinking, too, whether this optimisation is worth the complexity.

We already have 12-20k Station nodes. I think it's realistic to grow the network to 100k nodes by the end of this year.

IIUC, Ed25519 has 32-byte long public keys, serialised as hex in 64 characters. We need 4 bytes to store the foreign key reference. 64-60=4. If we save 60 bytes per node, that gives us 365 * 60 * 100k bytes = ~2.19GB per year.

I think we can afford to store an extra 2 GB per module per year in our DB.

In the past, when we almost ran out of disk space, it was because our DB needed more than 500 GB. That's a long way to go :)

In that light, I propose to keep your current design.

@juliangruber WDYT?

It would be nice to implement input validation in preprocess and reject measurements where station_id has more than MAX characters, where MAX can be something like 100. That will prevent a DoS attack vector, where a malicious client would submit measurements with very long id strings.

It should be a constant 64, no? Any reason to accept anything below or above that?

It should be a constant 64, no? Any reason to accept anything below or above that?

Let's double check what's the string length of public keys produced by Ed25519. If it's always 64 chars, then I agree with you to make the validation strict and accept only string with this exact length 👍🏻

IMO, we should implement the same validation in zinniad, to reject invalid Station ID right during the startup. We should also change the hardcoded station id zinnia to use a valid string. It doesn't have to be a real public key, but it should pass the validation in spark-evaluate. For example, 64 zero characters (000(...)000) or something similar to Ethereum's null address 0x000(...)00dead. The goal is to use an address that's easy to spot and filter out from production data.

@PatrickNercessian Zinnia is using a valid address now (see filecoin-station/zinnia#530).

Remaining tasks from this comment thread:

(1)

implement the same validation in zinniad, to reject invalid Station ID right during the startup

Would you like to contribute this change yourself? If not, then please open an issue in https://github.com/filecoin-station/zinnia/issues

(2)

It would be nice to implement input validation in preprocess and reject measurements where station_id has more than MAX characters, where MAX can be something like 100. That will prevent a DoS attack vector, where a malicious client would submit measurements with very long id strings.

It should be a constant 64, no? Any reason to accept anything below or above that?

+1 to making it the validation as strict as possible

You implemented this validation in spark-api, which is a great start.

I think we should implement it in spark-evaluate, too. As we envision Meridian architecture, anybody should eventually be allowed to commit measurements directly into the smart contract, bypassing our spark-api service.

Here is our current validation function:

spark-evaluate/lib/preprocess.js

Lines 196 to 202 in 487473f

assert(

typeof measurement === 'object' && measurement !== null,

'object required'

)

assert(ethers.isAddress(measurement.participantAddress), 'valid participant address required')

assert(typeof measurement.inet_group === 'string', 'valid inet group required')

assert(typeof measurement.finished_at === 'number', 'field `finished_at` must be set to a number')

Let's implement the new validation step in a different pull request so that we can land this PR ASAP and start collecting the new stats.

For (1), sure I can definitely work on this in Zinnia!

For (2), this was already done here:

spark-evaluate/lib/preprocess.js

Lines 204 to 210 in f86658b

if (measurement.stationId) {

assert(

typeof measurement.stationId === 'string' &&

measurement.stationId.match(/^[0-9a-fA-F]{88}$/),

'stationId must be a hex string with 88 characters'

)

}

Or am I misunderstanding you?

For (2), this was already done

Oh, I missed that bit. All is good! 👍🏻

For (1), sure I can definitely work on this in Zinnia!

Great! 💪🏻

bajtos

Great start!

lib/preprocess.js

test/helpers/test-data.js

test/platform-stats.test.js

lib/public-stats.js

test/platform-stats.test.js

lib/platform-stats.js

bajtos

Getting close 👍🏻

test/platform-stats.test.js

migrations/006.do.daily-node-metrics.sql

bajtos

Getting close!

lib/platform-stats.js

migrations/006.do.daily-node-metrics.sql

bajtos

Almost there! 👏🏻

test/platform-stats.test.js

lib/preprocess.js

bajtos

LGTM 👏🏻

bajtos · 2024-04-29T16:03:24Z

@juliangruber could you please help us to get this pull request across the finish line? The CI build fails in the dry-run step because GLIF_TOKEN is not set from the GHA secrets. I clicked on the button approving GHA to execute the CI for this pull request after @PatrickNercessian pushed more changes today.

See https://github.com/filecoin-station/spark-evaluate/actions/runs/8880495010/job/24384374423?pr=188#step:7:14

juliangruber marked this pull request as draft April 15, 2024 07:03

juliangruber approved these changes Apr 15, 2024

View reviewed changes

bajtos reviewed Apr 15, 2024

View reviewed changes

bajtos requested changes Apr 15, 2024

View reviewed changes

lib/preprocess.js Outdated Show resolved Hide resolved

test/helpers/test-data.js Outdated Show resolved Hide resolved

test/platform-stats.test.js Show resolved Hide resolved

lib/public-stats.js Outdated Show resolved Hide resolved

test/platform-stats.test.js Outdated Show resolved Hide resolved

PatrickNercessian commented Apr 15, 2024

View reviewed changes

lib/platform-stats.js Outdated Show resolved Hide resolved

PatrickNercessian changed the title ~~DRAFT: Created daily_node_metrics table and received/stored station_id~~ Feat: Created daily_node_metrics table and received/stored station_id Apr 15, 2024

PatrickNercessian added 2 commits April 17, 2024 18:33

Created daily_node_metrics table and received/stored station_id

f57e122

Enriching tests and small syntax/structure changes

2c459d4

PatrickNercessian force-pushed the station_id branch from dab69f9 to 2c459d4 Compare April 17, 2024 22:33

PatrickNercessian marked this pull request as ready for review April 17, 2024 22:34

bajtos reviewed Apr 18, 2024

View reviewed changes

test/platform-stats.test.js Show resolved Hide resolved

migrations/006.do.daily-node-metrics.sql Outdated Show resolved Hide resolved

bajtos mentioned this pull request Apr 18, 2024

Feat: Included API handling for fetching from daily_node_metrics filecoin-station/spark-stats#71

Merged

Converted 'metric_date' column to 'day'

e583e3b

PatrickNercessian mentioned this pull request Apr 22, 2024

Public Station Dashboard and Leaderboard Work Tracking space-meridian/roadmap#97

Closed

bajtos requested changes Apr 24, 2024

View reviewed changes

lib/platform-stats.js Outdated Show resolved Hide resolved

lib/platform-stats.js Outdated Show resolved Hide resolved

Use bulk insert for updating daily_node_metrics

89aa827

bajtos reviewed Apr 29, 2024

View reviewed changes

migrations/006.do.daily-node-metrics.sql Outdated Show resolved Hide resolved

bajtos reviewed Apr 29, 2024

View reviewed changes

test/platform-stats.test.js Outdated Show resolved Hide resolved

test/platform-stats.test.js Outdated Show resolved Hide resolved

bajtos reviewed Apr 29, 2024

View reviewed changes

lib/preprocess.js Outdated Show resolved Hide resolved

bajtos added 2 commits April 29, 2024 15:06

Merge branch 'main' into station_id

85439c6

Merge branch 'main' into station_id

22a7cc8

bajtos closed this Apr 29, 2024

bajtos reopened this Apr 29, 2024

Accept 88-char station IDs instead, naming changes

f86658b

bajtos approved these changes Apr 29, 2024

View reviewed changes

Merge branch 'main' into station_id

50e7cc5

juliangruber merged commit 05da282 into filecoin-station:main Apr 30, 2024
5 checks passed

PatrickNercessian deleted the station_id branch April 30, 2024 15:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: Created daily_node_metrics table and received/stored station_id #188

Feat: Created daily_node_metrics table and received/stored station_id #188

PatrickNercessian commented Apr 12, 2024 •

edited

Loading

bajtos Apr 15, 2024

PatrickNercessian Apr 15, 2024

bajtos Apr 15, 2024

bajtos Apr 15, 2024

PatrickNercessian Apr 15, 2024

bajtos Apr 17, 2024

bajtos Apr 17, 2024

bajtos Apr 29, 2024 •

edited

Loading

PatrickNercessian Apr 29, 2024

bajtos May 2, 2024

bajtos left a comment

bajtos left a comment

bajtos left a comment

bajtos left a comment

bajtos left a comment

bajtos commented Apr 29, 2024

	assert(
	typeof measurement === 'object' && measurement !== null,
	'object required'
	)
	assert(ethers.isAddress(measurement.participantAddress), 'valid participant address required')
	assert(typeof measurement.inet_group === 'string', 'valid inet group required')
	assert(typeof measurement.finished_at === 'number', 'field `finished_at` must be set to a number')

	if (measurement.stationId) {
	assert(
	typeof measurement.stationId === 'string' &&
	measurement.stationId.match(/^[0-9a-fA-F]{88}$/),
	'stationId must be a hex string with 88 characters'
	)
	}

Feat: Created daily_node_metrics table and received/stored station_id #188

Feat: Created daily_node_metrics table and received/stored station_id #188

Conversation

PatrickNercessian commented Apr 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bajtos Apr 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bajtos left a comment

Choose a reason for hiding this comment

bajtos left a comment

Choose a reason for hiding this comment

bajtos left a comment

Choose a reason for hiding this comment

bajtos left a comment

Choose a reason for hiding this comment

bajtos left a comment

Choose a reason for hiding this comment

bajtos commented Apr 29, 2024

PatrickNercessian commented Apr 12, 2024 •

edited

Loading

bajtos Apr 29, 2024 •

edited

Loading