Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[improve][ci] Detect test thread leaks in CI builds and add tooling for resource leak investigation #21450

Merged
merged 1 commit into from
Oct 30, 2023

Conversation

lhotari
Copy link
Member

@lhotari lhotari commented Oct 26, 2023

Motivation

It should be ensured in each test that resources created by the test are properly cleaned up. Failing to do so can lead to memory leaks and, in some instances, unnecessary CPU consumption. These issues can, in turn, slow down test execution, increase Pulsar CI build durations, and cause flakiness. A significant source of memory leaks in Pulsar tests stems from thread leaks. This PR aims to add reporting and tooling to detect thread leaks in tests.

Issues / requirements:

Proposed Solutions:

  • Enhance thread leak detection so that it becomes useful and has less false positives.
  • Introduce warnings in Pulsar CI for classes causing thread leaks, allowing for early detection.
  • Offer an option in Pulsar CI to collect heap dumps or histograms to identify memory leak sources.

Heap dump tooling for tests:

This PR includes a helper class that enables programmatic creation of heap dumps during tests. A heap dump offers a snapshot of the heap state, which can be invaluable for debugging certain types of issues. Tools such as Eclipse MAT can be used to inspect heap dumps. Eclipse MAT offers a query language to extract summaries. Moreover, the Calcite plugin for Eclipse MAT enables querying and summarizing heap dump information using SQL, which can be extremely handy at times.

Modifications

  1. ThreadLeakDetectorListener Improvements:

    • Refine the detection algorithm to account for waiting threads.
    • Exclude known thread pools not spawned by the tests.
    • Introduce file-based reporting.
  2. Pulsar CI Reporting Enhancements:

    • Leverage files generated by ThreadLeakDetectorListener for improved reporting.
  3. TraceTestResourceCleanupListener Introduction:

    • Track test resource cleanup by generating thread dumps, heap histograms, and heap dumps before the TestNG JVM exits.
    • Add integration to Pulsar CI so that heap dump collection can be selected for manually triggered builds.

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

@lhotari lhotari force-pushed the lh-trace-test-resources-cleanup branch from 9b371f4 to b121aae Compare October 29, 2023 00:24
@codecov-commenter
Copy link

Codecov Report

Merging #21450 (b121aae) into master (b010bab) will increase coverage by 36.44%.
Report is 1 commits behind head on master.
The diff coverage is 0.00%.

Impacted file tree graph

@@              Coverage Diff              @@
##             master   #21450       +/-   ##
=============================================
+ Coverage     36.78%   73.22%   +36.44%     
- Complexity    12235    32583    +20348     
=============================================
  Files          1713     1890      +177     
  Lines        130758   140357     +9599     
  Branches      14250    15425     +1175     
=============================================
+ Hits          48095   102774    +54679     
+ Misses        76321    29499    -46822     
- Partials       6342     8084     +1742     
Flag Coverage Δ
inttests 24.18% <0.00%> (-0.05%) ⬇️
systests 24.69% <0.00%> (-0.04%) ⬇️
unittests 72.52% <0.00%> (+40.58%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
.../org/apache/pulsar/common/util/ThreadDumpUtil.java 59.45% <0.00%> (+59.45%) ⬆️

... and 1453 files with indirect coverage changes

@github-actions github-actions bot added doc-label-missing and removed doc-not-needed Your PR changes do not impact docs labels Oct 29, 2023
@apache apache deleted a comment from github-actions bot Oct 29, 2023
@github-actions github-actions bot added doc-not-needed Your PR changes do not impact docs and removed doc-label-missing labels Oct 29, 2023
Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff

Lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants