Re-run tasks for partially materialized blobs #618

sandreae · 2024-06-14T09:07:02Z

Before materializing a blob to the file system the blob task checks if a file already exists at the expected path. If there is one then it aborts the task without completing. This check is there as blob tasks can be triggered multiple times and we only want it to complete once.

This is fine except for in the case where a node is shutdown or crashes while a blob task is running. In this case the file is only partially written to disc and even when the task is picked up again on next restart the blob task will not be retried because of the previously described check.

This PR fixes this issue by introducing a check which ensures the file existing at the blob path is the expected length. If not the task continues to materialize the blob at that path as it assumes it didn't complete last time.

closes: #617

📋 Checklist

Add tests that cover your changes
Add this PR to the Unreleased section in CHANGELOG.md
Link this PR to any issues it closes
New files contain a SPDX license header

sandreae · 2024-06-14T09:19:09Z

Some clippy errors unrelated to this PR....

adzialocha · 2024-06-14T09:21:39Z

Some clippy errors unrelated to this PR....

Probably a new Rust version!

* Make clippy happy * Revert "Make clippy happy" This reverts commit e250ccd. * Try fmt and clippy again * Add clippy suggestions * Allow setting path to config file via env args (p2panda#611) * Enable passing path to config file via env args * Remove println * Update comment * Remove unwanted file * Update CHANGELOG * Accept domain name and ip addresses for peers (p2panda#612) * Accept String for relay and direct peer addresses in config * Use ToSocketAddress to handle ip and domain name addresses * Clippy * fmt * Update CHANGELOG * Update example config.toml * Prepare CHANGELOG for release * 0.7.2 * Fix: query for child relations fails when relation list empty (p2panda#614) * Add test get_child_document_ids test case for document with empty relation list * Account for null values when relation lists are empty * Update test comment * Update CHANGELOG * 0.7.3 * Re-run tasks for partially materialized blobs (p2panda#618) * Check materialized blob file is complete before aborting task * Add test * fmt * Update CHANGELOG * Clippy * Correct cmp logic * Remove double comment --------- Co-authored-by: adz <x12@adz.garden> * Fix: include all logs from target schema id during replication (p2panda#620) * Include tombstoned documents when calculating local log heights * Clippy * Update CHANGELOG * Make clippy happy * Bump rust gh action to v1 and define toolchain version * Introduce `PeerAddress` struct for improved address resolution patterns (p2panda#621) * Introduce PeerAddress struct with socket and multiaddr resolution methods * Don't pop of p2p protocol from relay address as it isn't there * fmt * Update CHANGELOG * Cache socket addresses * Remove Multiaddr from PeerAddress * Remove serde traits from PeerAddress * Add doc string to PeerAddress * Rename methods * Re-apply unhandled operations during startup of materializer service (p2panda#623) * Store method to get all un-indexed operation ids * Pick up un-indexed operations when starting materializer service, add a test * Add entry to CHANGELOG.md * Increase `max_pending_connections_*` (p2panda#628) * Increase max pending connections * Update CHANGELOG * Dial all configured known relay and direct node addresses on schedule (p2panda#622) * Poll all known peer addresses * Update PeerAddress method name * Update CHANGELOG * WIP: poll known peers * Check if a direct node was identified (and add comments) * Don't dial direct node address on startup, rely on scheduler * More comments * Remove unused import * fmt * Doc strings for EventLoop struct * Clippy * 0.7.4 * Minor CHANGELOG.md formatting change * Fix: handle connection ids greater than 9 in `Peer` impl of `Human` trait (p2panda#634) * Handle connection ids greater than 9 in peer Human impl * Clippy * Update CHANGELOG * Bump `libp2p` to version `0.53.2` (p2panda#631) * Bump libp2p to version 0.53.2 * We don't need to listen on tcp port when in relay mode * Listening on relay circuit no longer sometimes fails * Remove tcp feature requirement from libp2p * Refactor connection_keep_alive method * Clippy * Remove unnecessary connection_keep_alive method from peers behaviour * Add CHANGELOG.md entry --------- Co-authored-by: adz <x12@adz.garden> * Move relay connection logic into main event loop (p2panda#632) * Bump `libp2p` to version `0.53.2` (p2panda#631) * Bump libp2p to version 0.53.2 * We don't need to listen on tcp port when in relay mode * Listening on relay circuit no longer sometimes fails * Remove tcp feature requirement from libp2p * Refactor connection_keep_alive method * Clippy * Remove unnecessary connection_keep_alive method from peers behaviour * Add CHANGELOG.md entry --------- Co-authored-by: adz <x12@adz.garden> * Move network service relay initialization into main event loop * Clippy * Add DCUTR event debug logging to swarm * Change log message * Adjust connection limits * Even nicer log messages * Helper to print or info log depending on log level * Listening on relay circuit no longer sometimes fails --------- Co-authored-by: adz <x12@adz.garden> * Support private net with pre-shared key (p2panda#635) * Swarm listens on both TCP and QUIC addresses * Support both QUIC and TCP protocols * TCP port_reuse should be false * Establish a private net over TCP when psk provided in NetworkConfig * Initiate swarm with private net when psk provided in config * Update CHANGELOG * Doc string fix * Don't need to differentiate between transports when detecting port * Update README * Fix README formatting * Update example config file * Check if blob file exists before deleting it from fs (p2panda#636) * Check if blob file exists before deleting it from fs * Add entry to CHANGELOG.md * Inconsistent blob storage warning was wrongly shown (p2panda#638) * Inconsistent blob storage warning was wrongly shown * Add entry to CHANGELOG.md * Minor config.toml cleanup * Safely handle missing document when retrieving document view from store (p2panda#637) * Return None when document was deleted * Add entry to CHANGELOG.md * Introduce API to subscribe to peer connection events (p2panda#625) * Introduce API to subscribe to peer connection events * Add entry to CHANGELOG.md * 0.8.0 * Also bump version in aquadoggo_cli, add note about that in RELEASE.md * Adjust level of replication session and document materialization logs (p2panda#639) * Remove relay and direct peer poll attempt logging * Change document creation/update/delete logging to info level * Lower level of replication session logs to debug * Update CHANGELOG * Remove incorrectly commit file * Lower logging level for replication finished message * Fix logging logic error in reducer * Improve GraphQL re-build error * Update README.md * Expose NodeEvent to public API (p2panda#643) * Expose NodeEvent to public API * Add entry to CHANGELOG.md --------- Co-authored-by: adz <x12@adz.garden> Co-authored-by: Sam Andreae <contact@samandreae.com> Co-authored-by: adz <adzialocha@users.noreply.github.com>

sandreae added 4 commits June 14, 2024 09:42

Check materialized blob file is complete before aborting task

067ee1d

Add test

f5a2677

fmt

d22e0e3

Update CHANGELOG

cc445e4

sandreae changed the title ~~Blob task replay fix experiment~~ Re-run tasks for partially materialized blobs Jun 14, 2024

sandreae added 2 commits June 14, 2024 10:11

Clippy

ff864e0

Correct cmp logic

8a43f0c

sandreae requested a review from adzialocha June 14, 2024 09:13

Remove double comment

913ea5f

adzialocha approved these changes Jun 14, 2024

View reviewed changes

adzialocha merged commit 4b11f2b into main Jun 14, 2024
7 of 8 checks passed

adzialocha deleted the blob-task-replay-fix-experiment branch June 14, 2024 09:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-run tasks for partially materialized blobs #618

Re-run tasks for partially materialized blobs #618

sandreae commented Jun 14, 2024 •

edited

Loading

sandreae commented Jun 14, 2024

adzialocha commented Jun 14, 2024

Re-run tasks for partially materialized blobs #618

Re-run tasks for partially materialized blobs #618

Conversation

sandreae commented Jun 14, 2024 • edited Loading

📋 Checklist

sandreae commented Jun 14, 2024

adzialocha commented Jun 14, 2024

sandreae commented Jun 14, 2024 •

edited

Loading