Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add end-to-end behavioural tests for the python CLI #1313

Merged
merged 15 commits into from
Sep 10, 2024

Conversation

parthosa
Copy link
Collaborator

@parthosa parthosa commented Aug 23, 2024

Fixes #970.

This PR includes end-to-end behavioural test for the spark_rapids CLI. A high level directory structure and more details on usage and debugging are provided at docs.

Setup

From the <repo_root>/user_tools directory, run the following command to install the required dependencies:

pip install behave
# or
pip install .[tests]

Running Tests

Tests can be run using 'behave' cmd or using 'tox' cmd.

Basic Usage:

behave <options>
# or
tox -e behave -- <options>

Run All Tests:

behave
# or
tox -e behave

Skip building the Tools jar during setup.

behave -D build_jar=false   # Skip building the Tools jar during setup (default: true)
# or
tox -e behave -- -D build_jar=false

More details are provided in the docs.

Notes

  1. Most of the steps are straight forward. Below are the implementation details for some key steps:
    1. "<cli>" is not installed
      • Create a mock CLI cmd and prepend to PATH
    2. Qualification tool JAR crashes
      • Mimic crashing by starting a thread that watches for 'QualificationMain' and kills it
    3. HDFS is "running"
      • Download hadoop binaries and setup a single node HDFS
  2. Impact on Github Actions runtime:
    • Previously runtime: ~10min.
    • After including these tests: ~12min
  3. Behave Official Docs: https://behave.readthedocs.io/en/stable/tutorial.html

Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
@parthosa parthosa added feature request New feature or request user_tools Scope the wrapper module running CSP, QualX, and reports (python) labels Aug 23, 2024
@parthosa parthosa self-assigned this Aug 23, 2024
Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
@parthosa parthosa marked this pull request as ready for review August 23, 2024 23:30
Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
Copy link
Collaborator

@amahussein amahussein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @parthosa !

I get a failing test error while running on python 3.8 test_id_ELP_0002

It was expecting Qualification. Raised an error in phase [Execution], but instead the log did not have it.

      2024-08-26 11:00:37,116 INFO rapids.tools.qualification.stats: Merging dataframes to get stats...
      2024-08-26 11:00:37,116 INFO rapids.tools.qualification.stats: Preprocessing dataframes...
      2024-08-26 11:00:37,116 ERROR rapids.tools.qualification: Failed to generate the statistics report: Can only use .str accessor with string values!

user_tools/docs/tools_e2e_tests.md Outdated Show resolved Hide resolved
user_tools/docs/tools_e2e_tests.md Show resolved Hide resolved
user_tools/tox.ini Outdated Show resolved Hide resolved
user_tools/tox.ini Outdated Show resolved Hide resolved
user_tools/tox.ini Outdated Show resolved Hide resolved
user_tools/docs/tools_e2e_tests.md Show resolved Hide resolved
Copy link
Collaborator

@tgravescs tgravescs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so I haven't looked at this all yet but I saw we are adding more event logs. I really don't like the idea of checking in more static event logs. I guess if we don't have much options to run a few locally, I'm ok with a couple but I don't want that to grow. The test_event_log_1 and test_event_log_2 should be more descriptive. Also all should be zstd compressed unless we are explicilty testing without it.

Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
@parthosa
Copy link
Collaborator Author

Also all should be zstd compressed unless we are explicilty testing without it

Thanks @tgravescs. Compressed all event logs to zstd format.

The test_event_log_1 and test_event_log_2 should be more descriptive

Renamed to valid_eventlog_1.zstd

Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
amahussein
amahussein previously approved these changes Aug 28, 2024
Copy link
Collaborator

@amahussein amahussein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @parthosa !

The first checkpoint LGTME.
We will discuss offline the steps that follow this PR.

cindyyuanjiang
cindyyuanjiang previously approved these changes Aug 29, 2024
Copy link
Collaborator

@cindyyuanjiang cindyyuanjiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @parthosa for this feature!

nartal1
nartal1 previously approved these changes Aug 29, 2024
Copy link
Collaborator

@nartal1 nartal1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @parthosa !

@tgravescs
Copy link
Collaborator

tgravescs commented Aug 29, 2024

so valid_event_log_1 and 2 aren't exactly what I meant. what is in that event log? I saw:

| platform | eventlog              |
| onprem   | valid_eventlog_1.zstd |
| dataproc | valid_eventlog_1.zstd |
      

So why not put onprem, or yarn eventlog, dataproc event log... what code was run in that to generate it?
If I need to generate a new version of these eventlogs how do I do it?

user_tools/docs/tools_e2e_tests.md Outdated Show resolved Hide resolved
user_tools/docs/tools_e2e_tests.md Show resolved Hide resolved
user_tools/docs/tools_e2e_tests.md Outdated Show resolved Hide resolved
- Add E2E_TOOLS prefix
- Fail if HDFS already running
- Add verbose output
- Move util functions to class

Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
@parthosa
Copy link
Collaborator Author

parthosa commented Sep 5, 2024

@tgravescs

So why not put onprem, or yarn eventlog, dataproc event log... what code was run in that to generate it?

  • Renamed the event log to join_agg_on_yarn_eventlog.zstd
  • It is a simple query that reads/creates 3 DFs of 100k rows with random values and does a join -> aggregation -> join.

Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
Copy link
Collaborator

@tgravescs tgravescs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few comments but overall looking very good

Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
Copy link
Collaborator

@amahussein amahussein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @parthosa
This is going to be a very important feature for the tools.

Copy link
Collaborator

@nartal1 nartal1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @parthosa!

@parthosa parthosa merged commit 71601e1 into NVIDIA:dev Sep 10, 2024
14 checks passed
@parthosa parthosa deleted the spark-rapids-tools-e2e-test-behave branch September 10, 2024 20:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request user_tools Scope the wrapper module running CSP, QualX, and reports (python)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[DOC] Improve E2E testing for tools
5 participants