Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: Enable Comet shuffle in Spark SQL tests #210

Merged
merged 5 commits into from
Mar 26, 2024

Conversation

sunchao
Copy link
Member

@sunchao sunchao commented Mar 16, 2024

Which issue does this PR close?

Closes #195.

Rationale for this change

Currently the Spark SQL tests do not enable Comet shuffle yet, which causes it to miss quite a lot test coverage. This PR enables shuffle for them. It uses the default shuffle mechanism for now (Comet native shuffle).

I have to disable a few tests that are failing probably due to some bugs in Comet itself. Once this is merged, I'll create separate Github issues to track them.

What changes are included in this PR?

Enable Comet shuffle for Spark SQL tests.

How are these changes tested?

@codecov-commenter
Copy link

codecov-commenter commented Mar 16, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 33.32%. Comparing base (8aab44c) to head (04c4aed).
Report is 6 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main     #210      +/-   ##
============================================
- Coverage     33.41%   33.32%   -0.09%     
- Complexity      768      769       +1     
============================================
  Files           107      107              
  Lines         36329    37037     +708     
  Branches       7935     8106     +171     
============================================
+ Hits          12138    12342     +204     
- Misses        21643    22098     +455     
- Partials       2548     2597      +49     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@sunchao sunchao marked this pull request as ready for review March 20, 2024 05:13
@sunchao
Copy link
Member Author

sunchao commented Mar 25, 2024

cc @viirya @advancedxy this PR is ready now

@@ -1238,6 +1392,9 @@ index ed2e309fa07..3767d4e7ca4 100644
+ conf
+ .set("spark.comet.exec.enabled", "true")
+ .set("spark.comet.exec.all.enabled", "true")
+ .set("spark.shuffle.manager",
+ "org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager")
+ .set("spark.comet.exec.shuffle.enabled", "true")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about comet columnar shuffle? I think we are prioritizing that as the first class shuffle support.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is still pending on the work to make it as default. Once it becomes default, all the tests will automatically switch to use it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, that makes sense.

@advancedxy
Copy link
Contributor

Other changes lgtm. Only about whether we should enable columnar shuffle by default or not.


- test("SPARK-35886: PromotePrecision should be subexpr replaced") {
+ test("SPARK-35886: PromotePrecision should be subexpr replaced",
+ IgnoreComet("TODO: fix Comet for this test")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be forgotten easily. Can we create ticket for that?


test("SPARK-38204: flatMapGroupsWithState should require StatefulOpClusteredDistribution " +
- "from children - without initial state") {
+ "from children - without initial state", IgnoreComet("TODO: fix Comet for this test")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this.

test("SPARK-38204: flatMapGroupsWithState should require ClusteredDistribution " +
- "from children if the query starts from checkpoint in 3.2.x - without initial state") {
+ "from children if the query starts from checkpoint in 3.2.x - without initial state",
+ IgnoreComet("TODO: fix Comet for this test")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

test("SPARK-38204: flatMapGroupsWithState should require ClusteredDistribution " +
- "from children if the query starts from checkpoint in prior to 3.2") {
+ "from children if the query starts from checkpoint in prior to 3.2",
+ IgnoreComet("TODO: fix Comet for this test")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.


case class Result(key: Long, count: Int)

+// TODO: fix Comet to enable this suite
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, all this suite fail?

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I suggested that we can create tickets for the failed tests.

@viirya
Copy link
Member

viirya commented Mar 26, 2024

I created #231 to track these failed tests.

@viirya viirya merged commit ce63ff8 into apache:main Mar 26, 2024
28 checks passed
@viirya
Copy link
Member

viirya commented Mar 26, 2024

Merged. Thanks.

wangyum pushed a commit to wangyum/datafusion-comet that referenced this pull request Mar 28, 2024
* test: Enable Comet shuffle in Spark SQL tests

* disable some tests

* disable another test

* update

* update
snmvaughan pushed a commit to snmvaughan/arrow-datafusion-comet that referenced this pull request Apr 4, 2024
* test: Enable Comet shuffle in Spark SQL tests

* disable some tests

* disable another test

* update

* update
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable shuffle in Spark SQL tests
4 participants