Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Created daily_node_metrics table and received/stored station_id #188

Merged
merged 8 commits into from
Apr 30, 2024

Conversation

PatrickNercessian
Copy link
Contributor

@PatrickNercessian PatrickNercessian commented Apr 12, 2024

DRAFT: DO NOT MERGE

Links:
space-meridian/roadmap#96

@juliangruber juliangruber marked this pull request as draft April 15, 2024 07:03
INSERT INTO daily_node_metrics (station_id, metric_date)
VALUES ($1, now()::date)
ON CONFLICT (station_id, metric_date) DO NOTHING
`, [m.station_id]) // TODO: when we add more fields, we should update the ON CONFLICT clause to update the fields
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a concern about the storage required by this DB schema design. We will store each station ID string in many duplicates (one copy each day).

When I implemented "daily participants," I decided to create a lookup table mapping participant addresses to 32-bit integers and then store those integers in the daily_participants table using a foreign-key constraint.

See #133

The key insight is that for most queries, we don't care what the exact participant address (or station ID) is. We want to count the number of unique addresses/IDs, and that can be achieved by counting the unique FK references.

The downside is that maintaining such a lookup table efficiently requires complex code (see mapParticipantsToIds() in my PR). OTOH, I think it should be fairly easy to generalise that function to support both participant addresses and station IDs by accepting a few more parameters like table name.

WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes sense! As devil's advocate though with some napkin math:

If we're averaging about 2000 station nodes per day, and each row is, let's say, 250B, that's 174MB per year in total. A lookup table for station_id (and maybe deployment_type, CPU architecture, etc.) would save us roughly 60-100 bytes per row? So we'd save 55MB per year.

Even if we 15x the number of daily nodes, we're still looking at <1GB per year of data accumulation in savings. Is it worth the relatively complex code logic?

It's possible I missed some detail here or in the math though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was thinking, too, whether this optimisation is worth the complexity.

We already have 12-20k Station nodes. I think it's realistic to grow the network to 100k nodes by the end of this year.

IIUC, Ed25519 has 32-byte long public keys, serialised as hex in 64 characters. We need 4 bytes to store the foreign key reference. 64-60=4. If we save 60 bytes per node, that gives us 365 * 60 * 100k bytes = ~2.19GB per year.

I think we can afford to store an extra 2 GB per module per year in our DB.

In the past, when we almost ran out of disk space, it was because our DB needed more than 500 GB. That's a long way to go :)

In that light, I propose to keep your current design.

@juliangruber WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to implement input validation in preprocess and reject measurements where station_id has more than MAX characters, where MAX can be something like 100. That will prevent a DoS attack vector, where a malicious client would submit measurements with very long id strings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be a constant 64, no? Any reason to accept anything below or above that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be a constant 64, no? Any reason to accept anything below or above that?

Let's double check what's the string length of public keys produced by Ed25519. If it's always 64 chars, then I agree with you to make the validation strict and accept only string with this exact length 👍🏻

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, we should implement the same validation in zinniad, to reject invalid Station ID right during the startup. We should also change the hardcoded station id zinnia to use a valid string. It doesn't have to be a real public key, but it should pass the validation in spark-evaluate. For example, 64 zero characters (000(...)000) or something similar to Ethereum's null address 0x000(...)00dead. The goal is to use an address that's easy to spot and filter out from production data.

Copy link
Member

@bajtos bajtos Apr 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PatrickNercessian Zinnia is using a valid address now (see filecoin-station/zinnia#530).

Remaining tasks from this comment thread:

(1)

implement the same validation in zinniad, to reject invalid Station ID right during the startup

Would you like to contribute this change yourself? If not, then please open an issue in https://github.com/filecoin-station/zinnia/issues

(2)

It would be nice to implement input validation in preprocess and reject measurements where station_id has more than MAX characters, where MAX can be something like 100. That will prevent a DoS attack vector, where a malicious client would submit measurements with very long id strings.

It should be a constant 64, no? Any reason to accept anything below or above that?

+1 to making it the validation as strict as possible

You implemented this validation in spark-api, which is a great start.

I think we should implement it in spark-evaluate, too. As we envision Meridian architecture, anybody should eventually be allowed to commit measurements directly into the smart contract, bypassing our spark-api service.

Here is our current validation function:

assert(
typeof measurement === 'object' && measurement !== null,
'object required'
)
assert(ethers.isAddress(measurement.participantAddress), 'valid participant address required')
assert(typeof measurement.inet_group === 'string', 'valid inet group required')
assert(typeof measurement.finished_at === 'number', 'field `finished_at` must be set to a number')

Let's implement the new validation step in a different pull request so that we can land this PR ASAP and start collecting the new stats.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For (1), sure I can definitely work on this in Zinnia!

For (2), this was already done here:

if (measurement.stationId) {
assert(
typeof measurement.stationId === 'string' &&
measurement.stationId.match(/^[0-9a-fA-F]{88}$/),
'stationId must be a hex string with 88 characters'
)
}

Or am I misunderstanding you?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For (2), this was already done

Oh, I missed that bit. All is good! 👍🏻

For (1), sure I can definitely work on this in Zinnia!

Great! 💪🏻

Copy link
Member

@bajtos bajtos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great start!

lib/preprocess.js Outdated Show resolved Hide resolved
test/helpers/test-data.js Outdated Show resolved Hide resolved
test/platform-stats.test.js Show resolved Hide resolved
lib/public-stats.js Outdated Show resolved Hide resolved
test/platform-stats.test.js Outdated Show resolved Hide resolved
@PatrickNercessian PatrickNercessian changed the title DRAFT: Created daily_node_metrics table and received/stored station_id Feat: Created daily_node_metrics table and received/stored station_id Apr 15, 2024
@PatrickNercessian PatrickNercessian marked this pull request as ready for review April 17, 2024 22:34
Copy link
Member

@bajtos bajtos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting close 👍🏻

test/platform-stats.test.js Show resolved Hide resolved
migrations/006.do.daily-node-metrics.sql Outdated Show resolved Hide resolved
Copy link
Member

@bajtos bajtos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting close!

lib/platform-stats.js Outdated Show resolved Hide resolved
lib/platform-stats.js Outdated Show resolved Hide resolved
Copy link
Member

@bajtos bajtos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there! 👏🏻

test/platform-stats.test.js Outdated Show resolved Hide resolved
test/platform-stats.test.js Outdated Show resolved Hide resolved
lib/preprocess.js Outdated Show resolved Hide resolved
@bajtos bajtos closed this Apr 29, 2024
@bajtos bajtos reopened this Apr 29, 2024
Copy link
Member

@bajtos bajtos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👏🏻

@bajtos
Copy link
Member

bajtos commented Apr 29, 2024

@juliangruber could you please help us to get this pull request across the finish line? The CI build fails in the dry-run step because GLIF_TOKEN is not set from the GHA secrets. I clicked on the button approving GHA to execute the CI for this pull request after @PatrickNercessian pushed more changes today.

See https://github.com/filecoin-station/spark-evaluate/actions/runs/8880495010/job/24384374423?pr=188#step:7:14

@juliangruber juliangruber merged commit 05da282 into filecoin-station:main Apr 30, 2024
5 checks passed
@PatrickNercessian PatrickNercessian deleted the station_id branch April 30, 2024 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: ✅ done
Development

Successfully merging this pull request may close these issues.

3 participants