Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add safeguards to prevent older attempts from generating metrics output in Scala Tool #1324

Merged
merged 2 commits into from
Sep 6, 2024

Conversation

parthosa
Copy link
Collaborator

@parthosa parthosa commented Aug 30, 2024

Fixes #1319. This PR adds safeguards to ensure that when users provide event logs for multiple attempts of the same app, only the newest attempt will generate metrics output.

Approach:

  • Changed allApps from Concurrent Linked Queue to Concurrent Hash Map of App ID -> QualificationSummaryInfo
  • Add AppSubscriber class to manage application attempt IDs and thread-safe updates.
    • Introduces methods for setting and validating attempt IDs.
    • withSafeValidAttempt(): Accepts a function parameter f, which is executed if the current attempt ID is valid (i.e., greater than the existing one).
    • Commit specific to this change - link
  • Progress Bar and Status Report are updated to skip previous attempts.

Tests

  • Added unit test and event logs with multiple attempts
  • Currently many UTs were processing multiple event logs with same App IDs:
    • Updated unit tests to ensure unique App IDs are provided.

@parthosa parthosa added bug Something isn't working core_tools Scope the core module (scala) labels Aug 30, 2024
@parthosa parthosa self-assigned this Aug 30, 2024
… output

Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
* Adding synchronization on the reporting and autotuner level

Signed-off-by: Ahmed Hussein (amahussein) <a@ahussein.me>

* Fix failing scala unit tests by providing unique app IDs in test (#53)

* Fix UTs

Signed-off-by: Partho Sarthi <psarthi@nvidia.com>

* Ignore test for event logs with same app Id and attempt Id

Signed-off-by: Partho Sarthi <psarthi@nvidia.com>

---------

Signed-off-by: Partho Sarthi <psarthi@nvidia.com>

---------

Signed-off-by: Ahmed Hussein (amahussein) <a@ahussein.me>
Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
Co-authored-by: Ahmed Hussein (amahussein) <a@ahussein.me>
@parthosa parthosa marked this pull request as ready for review September 3, 2024 16:00
Copy link
Collaborator

@cindyyuanjiang cindyyuanjiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amahussein amahussein merged commit 4f2d6e0 into NVIDIA:dev Sep 6, 2024
14 checks passed
@parthosa parthosa deleted the spark-rapids-tools-1319 branch October 9, 2024 17:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working core_tools Scope the core module (scala)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Eventlogs with application re-runs can add duplicate entries to qualification_summary.csv
3 participants