Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support code coverage report with single version jar [skip ci] #4030

Merged
merged 4 commits into from
Dec 6, 2021

Conversation

zhanga5
Copy link
Contributor

@zhanga5 zhanga5 commented Nov 4, 2021

and re-enable code coverage report from premerge job

Signed-off-by: Alex Zhang alex4zhang@gmail.com

Test passed on test PR pxLi#226

@zhanga5 zhanga5 linked an issue Nov 4, 2021 that may be closed by this pull request
mkdir -p target/jacoco_classes/
FILE=$(ls dist/target/rapids-4-spark_2.12-*.jar | grep -v test | xargs readlink -f)
pushd target/jacoco_classes/
jar xf $FILE com org rapids spark3xx-common "spark${JACOCO_SPARK_VER:-301}/"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know enough about jacoco, but with the layout being different and code being under spark3xx-common, jacoco is still able to understand that?

Copy link
Contributor Author

@zhanga5 zhanga5 Nov 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, there're some warnings like following

[WARN] Execution data for class org/apache/spark/sql/rapids/GpuBitwiseNot does not match.
[WARN] Execution data for class com/nvidia/spark/rapids/shims/v2/ShimDataSourceRDD does not match.
[WARN] Execution data for class org/apache/spark/sql/rapids/GpuStringRepeat does not match.
...

Main report was still be generated properly for com.nvidia.spark.rapids, org.apache.spark.sql.rapids.execution, com.nvidia.spark.rapids.shuffle etc. Please refer to sample coverage report http://(deleted) for more details. Any suggestions to fix above warnings are welcome

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Execution data for class ..." typically means that the checksum for the .class that the data was collected for didn't match the .class provided when generating the report. It would be nice for us to dig in a bit and understand if we have an actual mismatch, or if something else is happening, like the path is getting included in that checksum somehow.

Copy link
Contributor Author

@zhanga5 zhanga5 Nov 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more testing found only udf related class reported mismatch. All others can be resolved with clean environment build (i.e, spark-premerge-build.sh mvn_verify). Is there any special handling for udf?

New coverage report is updated from http://(deleted)

...
[INFO] Loading execution data file /home/alexzhang/dev/github.com/nvidia/spark-rapids/./udf-compiler/target/spark301/jacoco.exec.
[INFO] Loading execution data file /home/alexzhang/dev/github.com/nvidia/spark-rapids/./sql-plugin/target/spark301/jacoco.exec.
[INFO] Loading execution data file /home/alexzhang/dev/github.com/nvidia/spark-rapids/./integration_tests/target/jacoco.exec.
[INFO] Loading execution data file /home/alexzhang/dev/github.com/nvidia/spark-rapids/./tests/target/spark301/jacoco.exec.
[INFO] Loading execution data file /home/alexzhang/dev/github.com/nvidia/spark-rapids/./shims/target/spark301/jacoco.exec.
[INFO] Loading execution data file /home/alexzhang/dev/github.com/nvidia/spark-rapids/./shims/spark301/target/jacoco.exec.
[WARN] Some classes do not match with execution data.
[WARN] For report generation the same class files must be used as at runtime.
[WARN] Execution data for class com/nvidia/spark/udf/Instruction does not match.
[WARN] Execution data for class com/nvidia/spark/udf/Instruction$ does not match.
[WARN] Execution data for class com/nvidia/spark/udf/GpuScalaUDFLogical does not match.
[WARN] Execution data for class com/nvidia/spark/udf/State$ does not match.
[WARN] Execution data for class com/nvidia/spark/udf/CFG$ does not match.
[WARN] Execution data for class com/nvidia/spark/udf/Repr$LocalDateTime$ does not match.
[WARN] Execution data for class com/nvidia/spark/udf/Repr$DateTimeFormatter does not match.
[WARN] Execution data for class com/nvidia/spark/udf/Repr$StringBuilder$ does not match.
[WARN] Execution data for class com/nvidia/spark/udf/Repr$DateTimeFormatter$ does not match.
[WARN] Execution data for class com/nvidia/spark/udf/CatalystExpressionBuilder$ does not match.
[WARN] Execution data for class com/nvidia/spark/udf/CatalystExpressionBuilder does not match.
[WARN] Execution data for class com/nvidia/spark/udf/BB does not match.
[WARN] Execution data for class com/nvidia/spark/udf/CFG does not match.
[WARN] Execution data for class com/nvidia/spark/udf/Repr$ArrayBuffer does not match.
[WARN] Execution data for class com/nvidia/spark/udf/Repr$ClassTag does not match.
[WARN] Execution data for class com/nvidia/spark/udf/Repr$StringBuilder does not match.
[WARN] Execution data for class com/nvidia/spark/udf/Repr$ArrayBuffer$ does not match.
[WARN] Execution data for class com/nvidia/spark/udf/LambdaReflection does not match.
[WARN] Execution data for class com/nvidia/spark/udf/State does not match.
[WARN] Execution data for class com/nvidia/spark/udf/LogicalPlanRules$$anonfun$replacePartialFunc$1 does not match.
[WARN] Execution data for class com/nvidia/spark/udf/Repr$LocalDateTime does not match.
[WARN] Execution data for class com/nvidia/spark/udf/LambdaReflection$ does not match.
[INFO] Analyzing 3964 classes.
...

Copy link
Collaborator

@tgravescs tgravescs Nov 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have any reports from before it was shut off to compare to? Unfortunately the blossom-jenkins premerge jobs have expired.

for udf classes I don't know any packaging differences there, is the output here saying tests reference com/nvidia/spark/udf/Instruction but can't find that in jar? Something in the classloader part must be happening different there.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is fine with me, but I would like to have a follow on issue to move the tests.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @revans2 , in #3932, we plan to remove the submodule tests, as my understanding. Which way should we use?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jlowe #3932 is supposed to migrate tests under the sql-plugin sub-project. That is going to mess with code coverage because one will be shaded and the other will not be. So now we need to decide if we need to collect code coverage for unit tests or not. I am inclined to say that we skip it for now unless someone else has a better solution on how to fix this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, let's not worry about coverage for the unshaded code in unit tests for now. Almost all of our tests are integration tests, and we can tackle coverage for unit tests in a followup.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GaryShen2008 then what we have now is fine, and if/when someone starts to complain about the issues described above we can turn off code coverage for the unit tests.

@pxLi pxLi added the build Related to CI / CD or cleanly building label Nov 8, 2021
zhanga5 and others added 3 commits November 26, 2021 10:15
and re-enable code coverage report from premerge job

Signed-off-by: Alex Zhang <alex4zhang@gmail.com>
Extract the class files from rapids-4-spark-udf jar
to match the Jacoco.exec report
Because they'll be modified in aggregator's shade

Signed-off-by: gashen <gashen@nvidia.com>
for Jacoco report by using rapids-4-spark-udf jar

Signed-off-by: Alex Zhang <alex4zhang@gmail.com>
Signed-off-by: Alex Zhang <alex4zhang@gmail.com>
@zhanga5 zhanga5 changed the title Support code coverage report with single version jar Support code coverage report with single version jar [skip ci] Dec 3, 2021
@zhanga5
Copy link
Contributor Author

zhanga5 commented Dec 3, 2021

build

@zhanga5 zhanga5 merged commit 6a54ab6 into NVIDIA:branch-21.12 Dec 6, 2021
@zhanga5 zhanga5 deleted the issue-3776 branch December 6, 2021 01:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Related to CI / CD or cleanly building
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] JaCoCo code coverage long term strategy
6 participants