Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-run tasks for partially materialized blobs #618

Merged
merged 7 commits into from
Jun 14, 2024

Conversation

sandreae
Copy link
Member

@sandreae sandreae commented Jun 14, 2024

Before materializing a blob to the file system the blob task checks if a file already exists at the expected path. If there is one then it aborts the task without completing. This check is there as blob tasks can be triggered multiple times and we only want it to complete once.

This is fine except for in the case where a node is shutdown or crashes while a blob task is running. In this case the file is only partially written to disc and even when the task is picked up again on next restart the blob task will not be retried because of the previously described check.

This PR fixes this issue by introducing a check which ensures the file existing at the blob path is the expected length. If not the task continues to materialize the blob at that path as it assumes it didn't complete last time.

closes: #617

📋 Checklist

  • Add tests that cover your changes
  • Add this PR to the Unreleased section in CHANGELOG.md
  • Link this PR to any issues it closes
  • New files contain a SPDX license header

@sandreae sandreae changed the title Blob task replay fix experiment Re-run tasks for partially materialized blobs Jun 14, 2024
@sandreae sandreae requested a review from adzialocha June 14, 2024 09:13
@sandreae
Copy link
Member Author

Some clippy errors unrelated to this PR....

@adzialocha
Copy link
Member

Some clippy errors unrelated to this PR....

Probably a new Rust version!

@adzialocha adzialocha merged commit 4b11f2b into main Jun 14, 2024
7 of 8 checks passed
@adzialocha adzialocha deleted the blob-task-replay-fix-experiment branch June 14, 2024 09:21
jmanm added a commit to jmanm/aquadoggo that referenced this pull request Jul 13, 2024
* Make clippy happy

* Revert "Make clippy happy"

This reverts commit e250ccd.

* Try fmt and clippy again

* Add clippy suggestions

* Allow setting path to config file via env args (p2panda#611)

* Enable passing path to config file via env args

* Remove println

* Update comment

* Remove unwanted file

* Update CHANGELOG

* Accept domain name and ip addresses for peers (p2panda#612)

* Accept String for relay and direct peer addresses in config

* Use ToSocketAddress to handle ip and domain name addresses

* Clippy

* fmt

* Update CHANGELOG

* Update example config.toml

* Prepare CHANGELOG for release

* 0.7.2

* Fix: query for child relations fails when relation list empty (p2panda#614)

* Add test get_child_document_ids test case for document with empty relation list

* Account for null values when relation lists are empty

* Update test comment

* Update CHANGELOG

* 0.7.3

* Re-run tasks for partially materialized blobs (p2panda#618)

* Check materialized blob file is complete before aborting task

* Add test

* fmt

* Update CHANGELOG

* Clippy

* Correct cmp logic

* Remove double comment

---------

Co-authored-by: adz <x12@adz.garden>

* Fix: include all logs from target schema id during replication (p2panda#620)

* Include tombstoned documents when calculating local log heights

* Clippy

* Update CHANGELOG

* Make clippy happy

* Bump rust gh action to v1 and define toolchain version

* Introduce `PeerAddress` struct for improved address resolution patterns (p2panda#621)

* Introduce PeerAddress struct with socket and multiaddr resolution methods

* Don't pop of p2p protocol from relay address as it isn't there

* fmt

* Update CHANGELOG

* Cache socket addresses

* Remove Multiaddr from PeerAddress

* Remove serde traits from PeerAddress

* Add doc string to PeerAddress

* Rename methods

* Re-apply unhandled operations during startup of materializer service (p2panda#623)

* Store method to get all un-indexed operation ids

* Pick up un-indexed operations when starting materializer service, add a test

* Add entry to CHANGELOG.md

* Increase `max_pending_connections_*` (p2panda#628)

* Increase max pending connections

* Update CHANGELOG

* Dial all configured known relay and direct node addresses on schedule (p2panda#622)

* Poll all known peer addresses

* Update PeerAddress method name

* Update CHANGELOG

* WIP: poll known peers

* Check if a direct node was identified (and add comments)

* Don't dial direct node address on startup, rely on scheduler

* More comments

* Remove unused import

* fmt

* Doc strings for EventLoop struct

* Clippy

* 0.7.4

* Minor CHANGELOG.md formatting change

* Fix: handle connection ids greater than 9 in `Peer` impl of `Human` trait (p2panda#634)

* Handle connection ids greater than 9 in peer Human impl

* Clippy

* Update CHANGELOG

* Bump `libp2p` to version `0.53.2` (p2panda#631)

* Bump libp2p to version 0.53.2

* We don't need to listen on tcp port when in relay mode

* Listening on relay circuit no longer sometimes fails

* Remove tcp feature requirement from libp2p

* Refactor connection_keep_alive method

* Clippy

* Remove unnecessary connection_keep_alive method from peers behaviour

* Add CHANGELOG.md entry

---------

Co-authored-by: adz <x12@adz.garden>

* Move relay connection logic into main event loop (p2panda#632)

* Bump `libp2p` to version `0.53.2` (p2panda#631)

* Bump libp2p to version 0.53.2

* We don't need to listen on tcp port when in relay mode

* Listening on relay circuit no longer sometimes fails

* Remove tcp feature requirement from libp2p

* Refactor connection_keep_alive method

* Clippy

* Remove unnecessary connection_keep_alive method from peers behaviour

* Add CHANGELOG.md entry

---------

Co-authored-by: adz <x12@adz.garden>

* Move network service relay initialization into main event loop

* Clippy

* Add DCUTR event debug logging to swarm

* Change log message

* Adjust connection limits

* Even nicer log messages

* Helper to print or info log depending on log level

* Listening on relay circuit no longer sometimes fails

---------

Co-authored-by: adz <x12@adz.garden>

* Support private net with pre-shared key (p2panda#635)

* Swarm listens on both TCP and QUIC addresses

* Support both QUIC and TCP protocols

* TCP port_reuse should be false

* Establish a private net over TCP when psk provided in NetworkConfig

* Initiate swarm with private net when psk provided in config

* Update CHANGELOG

* Doc string fix

* Don't need to differentiate between transports when detecting port

* Update README

* Fix README formatting

* Update example config file

* Check if blob file exists before deleting it from fs (p2panda#636)

* Check if blob file exists before deleting it from fs

* Add entry to CHANGELOG.md

* Inconsistent blob storage warning was wrongly shown (p2panda#638)

* Inconsistent blob storage warning was wrongly shown

* Add entry to CHANGELOG.md

* Minor config.toml cleanup

* Safely handle missing document when retrieving document view from store (p2panda#637)

* Return None when document was deleted

* Add entry to CHANGELOG.md

* Introduce API to subscribe to peer connection events (p2panda#625)

* Introduce API to subscribe to peer connection events

* Add entry to CHANGELOG.md

* 0.8.0

* Also bump version in aquadoggo_cli, add note about that in RELEASE.md

* Adjust level of replication session and document materialization logs (p2panda#639)

* Remove relay and direct peer poll attempt logging

* Change document creation/update/delete logging to info level

* Lower level of replication session logs to debug

* Update CHANGELOG

* Remove incorrectly commit file

* Lower logging level for replication finished message

* Fix logging logic error in reducer

* Improve GraphQL re-build error

* Update README.md

* Expose NodeEvent to public API (p2panda#643)

* Expose NodeEvent to public API

* Add entry to CHANGELOG.md

---------

Co-authored-by: adz <x12@adz.garden>
Co-authored-by: Sam Andreae <contact@samandreae.com>
Co-authored-by: adz <adzialocha@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

blob task not re-materializing blobs which were only partially written to disc
2 participants