Skip to content
This repository has been archived by the owner on Dec 13, 2023. It is now read-only.

Fix disappearing outputs from long running tasks #3573

Merged
merged 1 commit into from
Apr 11, 2023

Conversation

marosmars
Copy link
Contributor

with external storage enabled

In case of a:

  1. Long running task
  2. With big output (externalized)
  3. With output growing over time
  4. Causing multiple externalize / internalize executions
  5. ... such as a join task collecting outputs of all forked tasks
  6. Lost some of its outputs when finally completed

This issue was caused by / because:

  1. On an Nth execution of a task (such as described above)
  2. The task internalized its intermediate output from external storage
  3. The task was executed and it updated its output to current value in memory
  4. The task tried to externalize the new version of its output
  5. ... but while doing so, the outputPayload (last externalized value) was combined with outputData (current, in-memory value) in a way where output payload over-wrote the latest values
  6. Thus, newly calculated outputs have been lost

Pull Request type

  • Bugfix
  • Feature
  • Refactoring (no functional changes, no api changes)
  • Build related changes (Please run ./gradlew generateLock saveLock to refresh dependencies)
  • WHOSUSING.md
  • Other (please describe):

NOTE: Please remember to run ./gradlew spotlessApply to fix any format violations.

Changes in this PR

Describe the new behavior from this PR, and why it's needed
Issue #

Alternatives considered

Describe alternative implementation you have considered

with external storage enabled

In case of a:
1. Long running task
2. With big output (externalized)
3. With output growing over time
4. Causing multiple externalize / internalize executions
5. ... such as a join task collecting outputs of all forked tasks
6. Lost some of its outputs when finally completed

This issue was caused by / because:
1. On an Nth execution of a task (such as described above)
2. The task internalized its intermediate output from external storage
3. The task was executed and it updated its output to current value in memory
4. The task tried to externalize the new version of its output
5. ... but while doing so, the outputPayload (last externalized value)
   was combined with outputData (current, in-memory value) in a way
   where output payload over-wrote the latest values
6. Thus, newly calculated outputs have been lost

Signed-off-by: Maros Marsalek <mmarsalek@frinx.io>
@marosmars marosmars mentioned this pull request Apr 11, 2023
6 tasks
@marosmars
Copy link
Contributor Author

@v1r3n hey, we found this issue when running with external storage. Please take a look, it's just a 1 liner

@v1r3n v1r3n merged commit 3d9b27f into Netflix:main Apr 11, 2023
@marosmars marosmars deleted the fixMissingOutputs branch April 11, 2023 16:26
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants