Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: report daily participants to public stats #133

Merged
merged 3 commits into from
Jan 18, 2024
Merged

Conversation

bajtos
Copy link
Member

@bajtos bajtos commented Jan 17, 2024

Create a new spark_stats table to keep track of daily participants:

SELECT day::TEXT, COUNT(DISTINCT participant_id) as count
FROM daily_participants GROUP BY day;

This table allows us to correctly calculate monthly participants too:

SELECT
  date_trunc('month', day)::DATE::TEXT as month,
  COUNT(DISTINCT participant_id) as count
FROM daily_participants
GROUP BY month;

TODO:

  • add tests

Links:

Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
@bajtos bajtos requested a review from juliangruber January 17, 2024 16:13
Copy link
Member

@juliangruber juliangruber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

..otherwise LGTM!

*/
const updateDailyParticipants = async (pgClient, participants) => {
debug('Updating daily participants (%s seen)', participants.length)
for (const participantAddress of participants) {
Copy link
Member

@juliangruber juliangruber Jan 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance wise, are we ok with running 2 SQL queries for every participant, in series? And if it fails midway, will we have inconsistent data? Playing devil's advocate here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great questions! 💯

I'll think about this tomorrow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was focused so much on storage efficiency and the querying side, that I completely neglected the writing side.

With ~2k participants per round, my current implementation would run 4k SQL queries. That would take forever to complete.

Thanks for flagging this early! 😍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I found a neat trick: We can use SELECT UNNEST($1::TEXT[]) to build a query that accepts a JavaScript array parameter and converts the items into result rows.

Then I can change INSERT...VALUES to INSERT...SELECT to leverage this mechanism.

The changes were a bit more involved (see 7a2d14d), but I expect the performance to be excellent.

@bajtos bajtos requested a review from juliangruber January 18, 2024 08:56
@bajtos bajtos marked this pull request as ready for review January 18, 2024 08:59
Copy link
Member

@juliangruber juliangruber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work 😍

@bajtos bajtos merged commit 045ff21 into main Jan 18, 2024
5 checks passed
@bajtos bajtos deleted the feat-daily-participants branch January 18, 2024 12:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: ✅ done
Development

Successfully merging this pull request may close these issues.

2 participants