Skip to content
This repository has been archived by the owner on Nov 14, 2023. It is now read-only.

Improve signal handling for df timeout #16

Merged
merged 3 commits into from
Aug 31, 2015
Merged

Improve signal handling for df timeout #16

merged 3 commits into from
Aug 31, 2015

Conversation

talwai
Copy link
Contributor

@talwai talwai commented Aug 31, 2015

PR #14 actually didn't work as expected, exiting with the following in the event of timeouts:

fatal error: all goroutines are asleep - deadlock!

goroutine 1 [chan receive]:
github.com/DataDog/gohai/filesystem.getFileSystemInfo(0x0, 0x0, 0x0, 0x0)
    /home/vagrant/workspace/go/src/github.com/DataDog/gohai/filesystem/filesystem.go:49 +0x61a
github.com/DataDog/gohai/filesystem.(*FileSystem).Collect(0x6556d8, 0x0, 0x0, 0x0, 0x0)
    /home/vagrant/workspace/go/src/github.com/DataDog/gohai/filesystem/filesystem_common.go:12 +0x43
main.Collect(0xc20803a3f0, 0x0, 0x0)
    /home/vagrant/workspace/go/src/github.com/DataDog/gohai/gohai.go:32 +0xe4
main.main()
    /home/vagrant/workspace/go/src/github.com/DataDog/gohai/gohai.go:44 +0x1f

goroutine 5 [chan send]:
github.com/DataDog/gohai/filesystem.func·001()
    /home/vagrant/workspace/go/src/github.com/DataDog/gohai/filesystem/filesystem.go:28 +0xb3
created by github.com/DataDog/gohai/filesystem.getFileSystemInfo
    /home/vagrant/workspace/go/src/github.com/DataDog/gohai/filesystem/filesystem.go:32 +0x267

This PR should fix the deadlock issue, by properly handling the signals in select. It also propagates the error message to dd-agent and adds a test for the timeout.

case <-time.After(2 * time.Second):
// Kill the process if it takes too long
if killErr := cmd.Process.Kill(); killErr != nil {
log.Fatal("failed to kill:", killErr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have to kill the whole process anymore ? Can't we just return the error ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it made sense to ensure that multiple df calls spawned by gohai weren't running at the same time. The change was prompted by a case where df was taking long and successive gohai calls kept spawning new, time-intensive processes. This will kill df before it returns, and not interrupt the main gohai process

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah misread the code. looks good to me!

talwai added a commit that referenced this pull request Aug 31, 2015
Improve signal handling for `df` timeout
@talwai talwai merged commit 663d96f into master Aug 31, 2015
hush-hush pushed a commit that referenced this pull request Apr 13, 2023
Improve signal handling for `df` timeout
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants