Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove unnecessary retries for locked port when there isn't a range t… #8020

Closed
wants to merge 2 commits into from

Conversation

SwampDragons
Copy link
Contributor

@SwampDragons SwampDragons commented Aug 21, 2019

Last night, I started seeing a "too many open files" error in TestListenRangeConfig_Listen when I ran make test locally.

I originally thought it was a side effect of a file descriptor leak, maybe from another test, that was recently created, but I did a bisect and it's failing for me all the way back to when this test and the port lock functionality were added in v1.4.1-dev. So something must have changed on my computer with my ulimit to make this test suddenly start failing -- however, even if it's my computer that changed recently, this looks an awful lot like a real file descriptor leak that I uncovered.

I think the leak is happening inside the retry function. Originally I tried just adding a lock.Close() here: https://github.com/hashicorp/packer/blob/master/common/net/configure_port.go#L93-L95 but that wasn't resolving the issue, and I'm not seeing where else a leak could be occurring in this code pathway.

For now, I've added a ShouldRetry func so we don't keep repeating the try if there are no other valid ports to attempt. @azr I'd appreciate your thoughts on this.

Closes #8035

@azr
Copy link
Contributor

azr commented Aug 23, 2019

Okay, so I've re-read the flock code and to lock it opens a file to try & lock it; but does not close the file if it could not lock, so in our case, for every port in range !

And calling lock.Close was also a good call but in the library it requires to own the lock.

The right fix would have been this one: gofrs/flock#31

I will try to get that PR sorted out based on the codeowner's review

@azr
Copy link
Contributor

azr commented Aug 23, 2019

Okay I added a test/refreshed that PR there gofrs/flock#43

@@ -74,6 +74,19 @@ func (lc ListenRangeConfig) Listen(ctx context.Context) (*Listener, error) {

err := retry.Config{
RetryDelay: func() time.Duration { return 1 * time.Millisecond },
ShouldRetry: func(err error) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One simplification would be to justreturn portRange > 0.

Also if there is no port range may be we could use a linearly growing retry delay instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would there be a time when the port would come available if we waited long enough?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh... I guess when the previous build completes its boot command.

@SwampDragons
Copy link
Contributor Author

@azr do we still need this now that you've merged a fix to the upstream library?

@azr
Copy link
Contributor

azr commented Sep 16, 2019

I think we should keep retrying - event when portRange == 0. the user might be explicitly needing to retry that port connection.

Therefore I think we do not need that fix as it is fixed in master.

Closing this one; please reopen if you disagree 🙂.

@azr azr closed this Sep 16, 2019
@azr azr deleted the fix_fd_leak_port_lock branch September 16, 2019 08:30
@SwampDragons
Copy link
Contributor Author

I concur!!

@ghost
Copy link

ghost commented Jan 23, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators Jan 23, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TestListenRangeConfig_Listen consistently fails on Mac OSX
2 participants