Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Improve performance of core module #367

Open
5 of 13 tasks
amahussein opened this issue Jun 5, 2023 · 1 comment
Open
5 of 13 tasks

[FEA] Improve performance of core module #367

amahussein opened this issue Jun 5, 2023 · 1 comment
Assignees
Labels
core_tools Scope the core module (scala) feature request New feature or request reliability

Comments

@amahussein
Copy link
Collaborator

amahussein commented Jun 5, 2023

Is your feature request related to a problem? Please describe.

It will be nice to improve the performance of the Qualification/Profiling tools.
The tools can easily run out of memory as the implementation keeps everything in the memory. Then finally dumps the report. This was mainly the case at the early development phases when the tools was generating the formatted output or pick some statistics across different applications.
This is probably not needed anymore as we can offload some of the cross-app to the user-tools wrapper. Furthermore, we can just separate the cross-app module to consume the raw data. This way we don't have to keep everything in the memory

Currently, we don't have a performance-profiler that lists the memory/CPU consumption of the code blocks.

This issue is filed to keep track of reports related to performance and possible areas of performance

Describe the solution you'd like

We need to:

  • Break-down of the time/memory consumption to identify the bottlenecks.
    • IMHO (@amahussein) the tools is memory bound and it is unlikely that CPU is highly utilized.
    • For large eventlogs, the tools might trigger frequent Full GCs which are characterized by long pause-times.
  • Enumerate areas that can be improved by multithreading.
  • It will be a set of configurations to the CLI that improves the execution of the tools. For example,
    • tools running on CSP, then we should recommend users to configure clusters with high memory capacity.
    • For JVM commands, ideal heap size and GC configuration.

Tasks

Preview Give feedback
  1. bug core_tools reliability
    amahussein
  2. bug core_tools
  3. bug core_tools
    nartal1
  4. core_tools feature request
    amahussein bilalbari
  5. bug core_tools
    amahussein
  6. 3 of 3
    bug core_tools
    amahussein
  7. 2 of 5
    core_tools feature request
    amahussein
@amahussein amahussein added feature request New feature or request ? - Needs Triage core_tools Scope the core module (scala) reliability labels Jun 5, 2023
@amahussein amahussein changed the title [FEA] Improve performance of the tools [FEA] Improve performance of core module Jun 5, 2023
@amahussein amahussein self-assigned this Mar 13, 2024
@amahussein
Copy link
Collaborator Author

Analysis.scala

  • jobAndStageMetricsAggregation:

    • consumes 22% of total CPU Time of profiler for a single app.
    • consumes 8% of total Memory allocations which is roughly 10GB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core_tools Scope the core module (scala) feature request New feature or request reliability
Projects
None yet
Development

No branches or pull requests

1 participant