Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix potential problems and AQE updates in Qual tool #1021

Merged
merged 3 commits into from
May 20, 2024

Conversation

amahussein
Copy link
Collaborator

Signed-off-by: Ahmed Hussein (amahussein) a@ahussein.me

Fixes #1019

What this PR fixes:

  • Fixes a bug introduced by Refactor TaskEnd to be accessible by Q/P tools #1000 where the Potential Problems of file sql_duration_and_executor_cpu_time_percent.csv generated by CSV file would be empty.
  • Fixes an old bug in the Qual tool caused by processing SQLplan as they are being created "SQLExecutionStart" and AQEUpdate. this bug was leaving some duplicate records in the SQL-to-problematic

Changes:

  • In order to fix the Old Bug caused by AQEs:
    • Avoid processing SQLPlans by the event handler. Instead, process the plan after all the eventlogs are processed.
    • This implies removing QualificationAppInfo.processSQLPlan since the logic is the same as AppSQLPlanAnalyzer.processSQLPlanMetrics()
    • Fix the implementation of RunningQualificationApp to make sure that QualificationAppInfo.processSQLPlan is called before aggregateStats(). Otherwise, the RunningQualificationApp would have empty dataSources/problematics/writeDataFormats
  • In order to fix the empty column:
    • Broke the logic of AppSQLPlanAnalyzer.processSQLPlanMetrics() into visitNode() that is separate from the main loop.
    • Created QualSQLPlanAnalyzer that extends AppSQLPlanAnalyzer overriding the visitNode() to be able to update the WriteDataFromats
  • Misc changes:
    • changed Some datastructure types to preserve the order of insertion or order of Keys.
    • For unit-tests: Updated the order of the unsupported execs in the expected Qualification Execs.

Signed-off-by: Ahmed Hussein (amahussein) <a@ahussein.me>

Fixes NVIDIA#1019
@amahussein amahussein added bug Something isn't working core_tools Scope the core module (scala) labels May 17, 2024
@amahussein amahussein self-assigned this May 17, 2024
@@ -196,7 +196,7 @@ class ApplicationInfo(
processEvents()

// Process SQL Plan Metrics after all events are processed
val planMetricProcessor: AppSQLPlanAnalyzer = AppSQLPlanAnalyzer.processSQLPlan(this)
val planMetricProcessor: AppSQLPlanAnalyzer = AppSQLPlanAnalyzer(this)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is appIndex not passed in here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the code to pass appIndex to make it more readable.
The reason it was not passing appIndex before is that it is handled by AppSQLPlanAnalyzer.apply().

    val sqlAnalyzer = app match {
      case qApp: QualificationAppInfo =>
        new QualSQLPlanAnalyzer(qApp, appIndex)
      case pApp: ApplicationInfo =>
        new AppSQLPlanAnalyzer(pApp, pApp.index)
    }

Signed-off-by: Ahmed Hussein (amahussein) <a@ahussein.me>
Copy link
Collaborator Author

@amahussein amahussein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tgravescs
I addressed the comments.

@@ -196,7 +196,7 @@ class ApplicationInfo(
processEvents()

// Process SQL Plan Metrics after all events are processed
val planMetricProcessor: AppSQLPlanAnalyzer = AppSQLPlanAnalyzer.processSQLPlan(this)
val planMetricProcessor: AppSQLPlanAnalyzer = AppSQLPlanAnalyzer(this)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the code to pass appIndex to make it more readable.
The reason it was not passing appIndex before is that it is handled by AppSQLPlanAnalyzer.apply().

    val sqlAnalyzer = app match {
      case qApp: QualificationAppInfo =>
        new QualSQLPlanAnalyzer(qApp, appIndex)
      case pApp: ApplicationInfo =>
        new AppSQLPlanAnalyzer(pApp, pApp.index)
    }

@amahussein amahussein requested a review from tgravescs May 20, 2024 17:58
private val sqlPlanNodeIdToStageIds: mutable.HashMap[(Long, Long), Set[Int]] =
mutable.HashMap.empty[(Long, Long), Set[Int]]
// A map between (SQL ID, Node ID) and the set of stage IDs
// TODO: The Qualification should use this map instead of building a new set for each exec.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this have any performance impact and do we have a tracking issue or need one?

Copy link
Collaborator Author

@amahussein amahussein May 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is indeed a memory overhead because of allocating more objects in memory. This overhead was introduced in #1000; but that was the price to combine the 2 tools.
Depending on priorities, we will address the redundant information stored in QualificationAppInfo

  • We do have [FEA] Improve performance of core module #367 as an umbrella for performance issues.
  • The incremental refactor changes the code frequently. I find that filing issues for each possible improvement will be cumbersome and create of flood of overlapping issues. So, unless there is a bug, I mark possible improvements as TODO that we can later revisit depending on priorities.

*
* It has the following effect on the visitor object:
* 1- It updates the sqlIsDsOrRDD argument to True when the visited node is an RDD or Dataset.
* 2- If the SLID is an RDD, the potentialProblems is cleared because once SQL is marked as RDD,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: typo Should this be 'SQLID'?

tgravescs
tgravescs previously approved these changes May 20, 2024
Signed-off-by: Ahmed Hussein (amahussein) <a@ahussein.me>
@amahussein amahussein merged commit aaf7dc8 into NVIDIA:dev May 20, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working core_tools Scope the core module (scala)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Fix potential problems and AQE updates in Qual tool
3 participants