Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to disable ZIP compression & fix bug in overwriting info files #114

Merged
merged 3 commits into from
Nov 5, 2022

Conversation

stonemaster
Copy link
Contributor

This MR fixes two issues seen in a production environment:

  • We have the problem that big core dumps were corrupted which can be fixed by using io::copy instead of the manual read/write logic. It's unclear to me why because I am not a Rust guru. Another problem with big core dumps was that they turned out to be corrupted when compression is enabled. The -D flag works well for that and just stores data as-is in the resulting ZIP file.
  • The counter variable to gather information on containers was never incremented thus overwriting previous container information.

timbuchwaldt
timbuchwaldt previously approved these changes Oct 19, 2022
Copy link
Collaborator

@timbuchwaldt timbuchwaldt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good for CI

@No9
Copy link
Collaborator

No9 commented Oct 19, 2022

Thanks for this PR.
My initial reaction is that if the zip is corrupting the core dumps then it should just be turned off rather than having a flag.

That said I'd like to investigate it further if possible.
Do you know the the size of file that is failing along with the OS of the host so I can try to reproduce?

Also the merge is blocked as cargo fmt needs to be ran on the code.

uncompressed. Also rewrite coredump piping code to use io::copy which
prevents core dump corruption in large core dumps.

Signed-off-by: André Stein <andre@deepl.com>
@stonemaster
Copy link
Contributor Author

Thanks for the comment! I just ran cargo fmt and updated the PR.

Your reaction is completely right. I just though I wouldn't want to disable it for everybody because for some sizes it probably works - otherwise I would have suspected much more bug reports. In my case it is a 8GB core dump on an Ubuntu 22.04 machine.

@No9
Copy link
Collaborator

No9 commented Oct 21, 2022

Thanks for the update.
Let me see if I can reproduce this over the weekend and get back to you.

@stonemaster
Copy link
Contributor Author

Hi there, are there any updates on getting this reproduced on your end? Let me know if you need more information!

@No9
Copy link
Collaborator

No9 commented Nov 1, 2022

Haven't had the bandwidth to look at it yet. Too much happening at this time of year.
If you have some bandwidth can you create a large core file (could be just random data) and replace test.core then run the tests to see if it reproduces?
Thanks

@stonemaster
Copy link
Contributor Author

I followed your idea and ran the tests locally and I can produce a failed unit test with a random core file of size 512M:

base64 /dev/urandom | head -c 512M >./core-dump-composer/mocks/test.core

Test output:

---- default_scenario stdout ----
The current directory is <...>
Running tests using this PATH: <...>
timeout

ReadDir("./output")
thread 'default_scenario' panicked at 'assertion failed: `(left == right)`
  left: `8`,
 right: `1`', core-dump-composer/tests/default.rs:162:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    default_scenario

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 120.01s

error: test failed, to rerun pass '-p core-dump-composer --test default'

A core file of 256M was working fine BTW.

As a side note: for the tests I always needed to clear the folder core-dump-composer/output otherwise I was getting some weird hickup in the test. Probably not a problem for CI but just wanted to mention it.

@No9
Copy link
Collaborator

No9 commented Nov 2, 2022

Excellent thanks for looking into this and providing reproduction steps.
The hickup is intentional for a failed test. It's good to have the failed artifacts when it breaks but it should clean up if it's successful - I'll check it when I look into it.

@stonemaster
Copy link
Contributor Author

Also: if I change the tests to use the new -D flag introduce with my MR the testcases run through with a 512MB test.core.

@No9 No9 mentioned this pull request Nov 3, 2022
@No9
Copy link
Collaborator

No9 commented Nov 4, 2022

Found the problem - The default time out for the thread is 120 seconds and the zip for 512M is taking longer than that.
I've bumped the timeout to 600 seconds and the tests now pass with the 512M
Can you set the default time out to 600 or 10 mins as part of this PR please.
https://github.com/IBM/core-dump-handler/blob/main/core-dump-composer/src/config.rs#L62

Given the impact of compression I think having the option to disable it is a good feature to have so I'm happy to land this PR.

@timbuchwaldt any additional thoughts on the timeout setting change?

stonemaster and others added 2 commits November 4, 2022 10:28
containers.

Signed-off-by: André Stein <andre@deepl.com>
Signed-off-by: André Stein <andre@deepl.com>
@stonemaster
Copy link
Contributor Author

Okay perfect, thanks for your help! I just pushed a commit that changes the default for the timeout parameter.

@No9
Copy link
Collaborator

No9 commented Nov 4, 2022

Thanks for the change. Can you apply the signoff for DCO please
Instructions in this link.
https://github.com/IBM/core-dump-handler/pull/114/checks?check_run_id=9292246336

I'll look to merge this over the weekend.

@stonemaster
Copy link
Contributor Author

Yes, sorry, done! Thank you a lot for taking care!

Copy link
Collaborator

@No9 No9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed questions commit and this LGTM

@No9 No9 merged commit 30daae8 into IBM:main Nov 5, 2022
@stonemaster
Copy link
Contributor Author

One question: is there a plan when this going to be released in a new minor release? Thanks.

@No9
Copy link
Collaborator

No9 commented Nov 10, 2022

Hey @stonemaster
I've made some changes to this PR in order to make it configurable from the chart.
These are all rolled up into #116
If the changes are ok for you I'll roll a release.
You should be able to validate that release by checking out the pr and running it as the image will also be built

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants