fix: use correct index for the conclusion feed. #561

nathanielc · 2024-10-10T15:52:42Z

We found a few bugs in the conclusion feed implementation on the event
service. Fixes will follow, however this change updates the test to make
the bugs obvious.

Bugs:

Indexes are duplicated
Unsigned event indexes are always zero
Off by one bug in the bounds of the highwater mark

The first commit in this change updates the test to expose all three bugs. The following commit each fix one of the bugs.

Turns out the delivered insertion was not test friendly. I added two more refactor commits that make the delivered counter isolated per EventService and another to rename some types whose names had become less clear because of other refactors.

We found a few bugs in the conclusion feed implementation on the event service. Fixes will follow, however this change updates the test to make the bugs obvious. Bugs: 1. Indexes are duplicated 2. Unsigned event indexes are always zero 3. Off by one bug in the bounds of the highwater mark

The previous code would zip the parsed events with the all_blocks iterator. However the all_blocks iterator was longer than parsed and so the delivered values got misaligned with the events, causing the index of conclusion feed events to be incorrect. Now we correctly filter down the all_block iterator so it has only a single value per event.

The previous code had a typo to always use 0 as the delivered/index for unsigned init events. This is now fixed.

The sqlite db uses 1 based highwater_mark values, meaning it does delivered >= highwater_mark vs delivered > highwater_mark. The conclusion feed doesn't treat the highwater mark directly but instead just returns the index and expects clients to reuse the max index they have seen which means we want an exclusive comparison. We achieve this by adding one to the highwater_mark.

samika98 · 2024-10-10T19:40:00Z

event-svc/src/event/service.rs

@@ -403,7 +403,7 @@ impl EventService {
                        .map_err(|e| {
                            Error::new_app(anyhow::anyhow!("Failed to serialize IPLD data: {}", e))
                        })?,
-                    index: 0,
+                    index: delivered as u64,


Tests should have caught this

No test included an unsigned init event. That is why I added it to the generated data.

Previously a global static delivered counter was used to assign/track delivered values. However this makes testing hard as if multiple tests run they effect the delivered values of each other. This change makes the EventAccess contain the counter state within itself and then anywhere that a pool was used to access events it was replaced with an Arc<EventAccess>. This means we can truly have isolated delivered counters.

We have an access type pattern for structs that contain the logic to read specific tables in the db. There were called CeramicOne{Table}, with this change they are renamed to {Table}Access. The CeramicOne prefix was meaningless and the table name alone was not enough to distinguish the type from other types related to the same entities.

samika98 · 2024-10-11T00:29:38Z

event-svc/src/store/sql/access/event.rs

@@ -370,17 +371,30 @@ impl CeramicOneEvent {
            all_blocks.iter().map(|row| row.block.clone()).collect(),
        )
        .await?;
+
+        // We need to match up the delivered index with each event. However all_blocks contains an


We are keeping multiple copies of data in memory, can we move this logic in the SQL query itself?

let max_highwater = sqlx::query_as::<_, DeliveredEventBlockRow>( r#" SELECT event_id, MAX(delivered) AS max_delivered, block_data FROM events_table WHERE delivered > ? -- Only fetch events with a delivered value greater than the highwater mark GROUP BY event_id ORDER BY max_delivered ASC LIMIT ? "#, ) ``` What do you think? Is this a better approach?

Then we can call,

.bind(highwater) .bind(limit) .fetch_all(pool.reader()) .await? .into_iter() .map(|row| { // Decode the CAR file from the block data .... // Add the parsed event to the list .... // Update the max highwater mark }) .max() .unwrap_or(highwater); // Use the input highwater if no new events are found ```

The challenge is that we need events, but we have to construct the events from the list of blocks that make up each event. So the all_block query needs to join with the blocks tables in order to get all the data. This PR doesn't change how much memory is needed (we were previously keeping a copy of the data in memory). I think optimizing this query should be future work.(If we find its a cause of slow down)

samika98 · 2024-10-11T00:40:07Z

event-svc/src/event/feed.rs

@@ -33,7 +33,9 @@ impl ConclusionFeed for EventService {
        limit: i64,
    ) -> anyhow::Result<Vec<ConclusionEvent>> {
        let raw_events = self
-            .fetch_events_since_highwater_mark(highwater_mark, limit)
+            // TODO: Can we make highwater_marks zero based?


Is this the correct behavior? If we want to fetch events since a highwatermark, we would get events since highwater_mark + 1

Yes without this change you always get back the last row from the previous batch as the first row in the next batch.

If you look in the code for how the highwater mark is made you can see it already is adding one to it. This is what I mean by saying they are one based. Hence a TODO for later to investigate if we can simplify things to make everything zero based.

samika98 · 2024-10-11T00:43:23Z

Thank you for the detailed commit messages and explainations! Reviewing this code commit by commit was fun

samika98

LGTM!

nathanielc added 4 commits October 10, 2024 09:39

fix: use correct delivered for unsigned init events

7c34d06

The previous code had a typo to always use 0 as the delivered/index for unsigned init events. This is now fixed.

nathanielc requested a review from a team as a code owner October 10, 2024 15:52

nathanielc requested review from samika98 and removed request for a team October 10, 2024 15:52

nathanielc temporarily deployed to github-tests-2024 October 10, 2024 16:11 — with GitHub Actions Inactive

samika98 reviewed Oct 10, 2024

View reviewed changes

nathanielc added 2 commits October 10, 2024 15:07

nathanielc temporarily deployed to github-tests-2024 October 10, 2024 21:37 — with GitHub Actions Inactive

samika98 reviewed Oct 11, 2024

View reviewed changes

nathanielc requested a review from samika98 October 11, 2024 16:14

samika98 approved these changes Oct 11, 2024

View reviewed changes

nathanielc added this pull request to the merge queue Oct 11, 2024

Merged via the queue into main with commit 052196f Oct 11, 2024
6 checks passed

nathanielc deleted the fix/conclusion-feed-fixes branch October 11, 2024 19:07

smrz2001 mentioned this pull request Oct 17, 2024

chore: version v0.40.0 #564

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: use correct index for the conclusion feed. #561

fix: use correct index for the conclusion feed. #561

nathanielc commented Oct 10, 2024 •

edited

Loading

samika98 Oct 10, 2024

samika98 Oct 10, 2024

nathanielc Oct 10, 2024

samika98 Oct 11, 2024

samika98 Oct 11, 2024

nathanielc Oct 11, 2024

samika98 Oct 11, 2024

nathanielc Oct 11, 2024

nathanielc Oct 11, 2024

samika98 commented Oct 11, 2024

samika98 left a comment

fix: use correct index for the conclusion feed. #561

fix: use correct index for the conclusion feed. #561

Conversation

nathanielc commented Oct 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

samika98 commented Oct 11, 2024

samika98 left a comment

Choose a reason for hiding this comment

nathanielc commented Oct 10, 2024 •

edited

Loading