Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance tika document parsing tests #13618

Merged
merged 7 commits into from
May 16, 2024

Conversation

finnegancarroll
Copy link
Contributor

@finnegancarroll finnegancarroll commented May 9, 2024

Description

Enhance tika document parsing tests by validating output against current version.

Related Issues

Resolves "Improve the validation on TikaDocTests #12887"

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Contributor

github-actions bot commented May 9, 2024

❌ Gradle check result for 7ee6045: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@finnegancarroll finnegancarroll marked this pull request as draft May 10, 2024 00:00
@finnegancarroll finnegancarroll changed the title Draft: Tika tests Tika tests May 10, 2024
Copy link
Contributor

❌ Gradle check result for 810f3a9: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@finnegancarroll
Copy link
Contributor Author

Gradle check failing due to unrelated flaky test: #11979

Copy link
Contributor

❌ Gradle check result for 810f3a9: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 3bd9469: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 2dc3fcf: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@finnegancarroll
Copy link
Contributor Author

Known flaky test: #13600

Copy link
Contributor

❌ Gradle check result for 3fcc4bc: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 9ae651e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for ef62853: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@reta
Copy link
Collaborator

reta commented May 15, 2024

❌ Gradle check result for ef62853: FAILURE

Needs #13673

@reta
Copy link
Collaborator

reta commented May 15, 2024

@finnegancarroll we sadly have pretty flaky test suite now, fe this combination fails for me:

./gradlew ':plugins:ingest-attachment:test' --tests "org.opensearch.ingest.attachment.TikaDocTests.testParseSamples" -Dtests.seed=98D53194946B5C85 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=hi-IN -Dtests.timezone=Asia/Istanbul

Please let ./gradlew :plugins:ingest-attachment:check run for a couple of hours, to make sure the test suite is stable, thank you.

Signed-off-by: Carroll <carrofin@amazon.com>
Signed-off-by: Carroll <carrofin@amazon.com>
Signed-off-by: Carroll <carrofin@amazon.com>
Signed-off-by: Carroll <carrofin@amazon.com>
Signed-off-by: Carroll <carrofin@amazon.com>
Signed-off-by: Carroll <carrofin@amazon.com>
Copy link
Contributor

✅ Gradle check result for f0cc854: SUCCESS

Copy link

codecov bot commented May 16, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 71.56%. Comparing base (b15cb0c) to head (f0cc854).
Report is 286 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main   #13618      +/-   ##
============================================
+ Coverage     71.42%   71.56%   +0.14%     
- Complexity    59978    61201    +1223     
============================================
  Files          4985     5059      +74     
  Lines        282275   287522    +5247     
  Branches      40946    41646     +700     
============================================
+ Hits         201603   205759    +4156     
- Misses        63999    64777     +778     
- Partials      16673    16986     +313     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@finnegancarroll
Copy link
Contributor Author

Removed strict checksum validation for some additional files with locale dependent parsing. Ran for a couple hours and with all available locales in Locale.getAvailableLocales() to ensure no flaky cases remain.

Copy link
Member

@dblock dblock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks better than what we have, @reta any objections?

@reta
Copy link
Collaborator

reta commented May 16, 2024

This looks better than what we have, @reta any objections?

It really does, no objections @dblock , just double checking no flakyness is going to be introduced

@dblock
Copy link
Member

dblock commented May 16, 2024

This looks better than what we have, @reta any objections?

It really does, no objections @dblock , just double checking no flakyness is going to be introduced

Thanks. All yours to merge.

@reta reta added the backport 2.x Backport to 2.x branch label May 16, 2024
@reta reta merged commit f217270 into opensearch-project:main May 16, 2024
32 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request May 16, 2024
* Update tika document parsing bwc tests.

Signed-off-by: Carroll <carrofin@amazon.com>

* Skip sample tika files which do not parse consistently.

Signed-off-by: Carroll <carrofin@amazon.com>

* Formatting for spotlessJavaCheck.

Signed-off-by: Carroll <carrofin@amazon.com>

* Use fixed locale for consistent tika parsing.

Signed-off-by: Carroll <carrofin@amazon.com>

* Move sha1 map to .checksums file.

Signed-off-by: Carroll <carrofin@amazon.com>

* For locale dependant files do not verify contents with hash.

Signed-off-by: Carroll <carrofin@amazon.com>

* Remove strict checksum validation for additional locale dependant files.

Signed-off-by: Carroll <carrofin@amazon.com>

---------

Signed-off-by: Carroll <carrofin@amazon.com>
(cherry picked from commit f217270)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
reta pushed a commit that referenced this pull request May 16, 2024
* Update tika document parsing bwc tests.



* Skip sample tika files which do not parse consistently.



* Formatting for spotlessJavaCheck.



* Use fixed locale for consistent tika parsing.



* Move sha1 map to .checksums file.



* For locale dependant files do not verify contents with hash.



* Remove strict checksum validation for additional locale dependant files.



---------


(cherry picked from commit f217270)

Signed-off-by: Carroll <carrofin@amazon.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
deshsidd pushed a commit to deshsidd/OpenSearch that referenced this pull request May 17, 2024
* Update tika document parsing bwc tests.

Signed-off-by: Carroll <carrofin@amazon.com>

* Skip sample tika files which do not parse consistently.

Signed-off-by: Carroll <carrofin@amazon.com>

* Formatting for spotlessJavaCheck.

Signed-off-by: Carroll <carrofin@amazon.com>

* Use fixed locale for consistent tika parsing.

Signed-off-by: Carroll <carrofin@amazon.com>

* Move sha1 map to .checksums file.

Signed-off-by: Carroll <carrofin@amazon.com>

* For locale dependant files do not verify contents with hash.

Signed-off-by: Carroll <carrofin@amazon.com>

* Remove strict checksum validation for additional locale dependant files.

Signed-off-by: Carroll <carrofin@amazon.com>

---------

Signed-off-by: Carroll <carrofin@amazon.com>
parv0201 pushed a commit to parv0201/OpenSearch that referenced this pull request Jun 10, 2024
* Update tika document parsing bwc tests.

Signed-off-by: Carroll <carrofin@amazon.com>

* Skip sample tika files which do not parse consistently.

Signed-off-by: Carroll <carrofin@amazon.com>

* Formatting for spotlessJavaCheck.

Signed-off-by: Carroll <carrofin@amazon.com>

* Use fixed locale for consistent tika parsing.

Signed-off-by: Carroll <carrofin@amazon.com>

* Move sha1 map to .checksums file.

Signed-off-by: Carroll <carrofin@amazon.com>

* For locale dependant files do not verify contents with hash.

Signed-off-by: Carroll <carrofin@amazon.com>

* Remove strict checksum validation for additional locale dependant files.

Signed-off-by: Carroll <carrofin@amazon.com>

---------

Signed-off-by: Carroll <carrofin@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch skip-changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants