Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bestpractices: add crash pre-requisites #285

Closed
wants to merge 0 commits into from

Conversation

gireeshpunathil
Copy link
Member

This covers the preparation of crash diagnostics
Few caveats:

  • I am not good at markup, so please advise on the structure
  • As this will form a baseline for the rest of the docs, a thorough review is requested: both in the format as well as the content.
    thanks!

Refs: #254

Yama security policy inhibits a second process from collecting dump,
practically rendering `gcore` unusable.

`setcap cap_sys_ptrace=+ep `which gdb``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me if this is to address the problem of gcore not working?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mhdawson - addressed, added a line that explains what we are doing.

@mhdawson
Copy link
Member

Pushed a few commits to do minor fixup and some suggested changes as well.

A more structural suggestion is that we should organize it into the following format:

Introduction

What you have for the introduction with additional paragraphs for

  • mentioning the key artifacts for exploring crashes (core dumps, diagnostic report?)

Common issues

Cover common issues in generating the key artifacts.

  • Disk space
  • ulimits

Recommended best practice

This section provides specific recommendations for how to configure your systems in advance to be ready in order to be ready to investigate crashes.

Ensuring enough disk space

your existing content on how to size. Maybe talk about cleanup/management of cores to avoid space problems here instead of later on?

Configuring to ensure core generation

  • ulmits (what you have + would be good to have a recommendation on how to check/configure the har limits)
  • specific recommendation for core file naming/patterns and how to set them
  • recommendations for cleanup/management of cores that are generated (otherwise can result in out of space for application)

Additional information

  • Manual Dump generation

Let me know what you think about this structure/organization.

@gireeshpunathil
Copy link
Member Author

@mhdawson - I have addressed all of your review comments. PTAL, thanks!

Copy link
Member

@mhdawson mhdawson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gireeshpunathil
Copy link
Member Author

thanks @mhdawson . Can I get one more review please!

Copy link
Member

@mhdawson mhdawson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@hekike hekike left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, you could call out that the user will need the exact same node binary to inspect your core dump. Or they need to bundle it into the core dump.

documentation/crash/crash_setup.md Outdated Show resolved Hide resolved
documentation/crash/crash_setup.md Outdated Show resolved Hide resolved
documentation/crash/crash_setup.md Outdated Show resolved Hide resolved
@gireeshpunathil
Copy link
Member Author

Additionally, you could call out that the user will need the exact same node binary to inspect your core dump. Or they need to bundle it into the core dump.

@hekike - can I add that statement in my second part? as here we are talking about the production system, and this one is for the dev host where we will be performing crash analysis, it makes sense to talk about that when we mention the debugging basics? PLMK.

@hekike
Copy link
Contributor

hekike commented Oct 25, 2019

@gireeshpunathil makes sense!
Thanks for the PR!

@gireeshpunathil gireeshpunathil force-pushed the crashbp1 branch 2 times, most recently from bceb4ce to 3e283aa Compare October 26, 2019 05:34
gireeshpunathil added a commit that referenced this pull request Oct 26, 2019
This covers the preparation of crash diagnostics
Refs: #254

PR-URL: #285
Reviewed-By: Michael Dawson <Michael_Dawson@ca.ibm.com>
Reviewed-By: Peter Marton <pmarton@netflix.com>
@gireeshpunathil
Copy link
Member Author

landed as 38a577a

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants