Skip to content
This repository has been archived by the owner on Feb 24, 2020. It is now read-only.

store: simplify db locking and functions #2897

Merged
merged 1 commit into from
Jul 14, 2016

Conversation

sgotti
Copy link
Contributor

@sgotti sgotti commented Jul 10, 2016

Instead of having a file lock to handle inter process locking and a sync.Mutex
to handle locking between multiple goroutines, just create, lock and close a
new file lock at every db.Do function.

Additionally, saving a sqldb instance in the db struct between open and close
doesn't make great sense (and will create data races accessing sqldb, and that
was one of the reasons for the added sync.Mutex) since sqldb will be opened and
closed for every db.Do.

This is a followup of #2391

/cc @yifan-gu @euank

@sgotti sgotti force-pushed the store_simplify_db_locking branch from f310bad to 46e8b0a Compare July 10, 2016 21:39
@sgotti sgotti mentioned this pull request Jul 11, 2016
if err != nil {
return err
}
defer db.Close()
defer l.Close()
Copy link
Member

@euank euank Jul 11, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, for our own debugging sanity we should log any close errors.. and, as before, if this fails we're in a bad state for any concurrency accesses.
Let's just hope this doesn't fail; at least the code is much prettier and I think/hope less error prone (since this'll primarily be called by short lived processes that close their fd's on exit anyways)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a problem everywhere in rkt code with pkg/lock since we always defer Close without caring/returning the error, and from the posix manual looks like the state will be undefined so close shouldn't be retried (needs to find a way to test if the lock is released or not).

For just error reporting, with defer I can use a named return value for error and do something like:

defer func() {
    if cerr := l.Close(); cerr != nil && err == nil {
        err = cerr
    }
}()

I'm just not sure it's worth or should be done in all other rkt places.

Let's just hope this doesn't fail; at least the code is much prettier and I think/hope less error prone (since this'll primarily be called by short lived processes that close their fd's on exit anyways)

Yeah, with the execption of the api-service that opens a store instance at start and uses it for all its life.

Instead of having a file lock to handle inter process locking and a sync.Mutex
to handle locking between multiple goroutines, just create, lock and close a
new file lock at every db.Do function.

Additionally, saving a sqldb instance in the db struct between open and close
doesn't makes great sense (and will create data races accessing sqldb, and that
was one of the reasons for the added sync.Mutex) since sqldb will be opened and
closed for every db.Do.
@lucab
Copy link
Member

lucab commented Jul 12, 2016

Targeted at next release and assigned to @yifan-gu @euank to shepherd this in.

@euank
Copy link
Member

euank commented Jul 13, 2016

LGTM! Thanks for coming back to this and making it look a lot nicer.

@sgotti
Copy link
Contributor Author

sgotti commented Jul 13, 2016

@euank Something that I just thought about now is that, when using multiple goroutines, there'll be an increased number of file descriptors since every goroutine will acquire a new flock when calling db.Do. So, if a default nofile limit for a user is 1024 and 1024 (will be less since other fd will be already in use) goroutines concurrenctly calls db.Do some will fail with a too many open files error. I'm not sure if this can be a real problem. Currently the unique multi goroutine store user is the api-service.

@iaguis
Copy link
Member

iaguis commented Jul 13, 2016

@euank Something that I just thought about now is that, when using multiple goroutines, there'll be an increased number of file descriptors since every goroutine will acquire a new flock when calling db.Do. So, if a default nofile limit for a user is 1024 and 1024 (will be less since other fd will be already in use) goroutines concurrenctly calls db.Do some will fail with a too many open files error. I'm not sure if this can be a real problem. Currently the unique multi goroutine store user is the api-service.

When the transaction finishes the files get closed, right? I don't think we'll run many concurrent goroutines so it's probably not an issue.

@sgotti
Copy link
Contributor Author

sgotti commented Jul 13, 2016

When the transaction finishes the files get closed, right? I don't think we'll run many concurrent goroutines so it's probably not an issue.

Yes, it's just when at the same time multiple goroutines tries to acquire the exclusive lock.

@euank
Copy link
Member

euank commented Jul 13, 2016

Even for many concurrent calls, wont' there only be 1 fd for the lock and 1 for the db since new calls will block before opening an fd, so only a couple fds (per store which there'll typically be only a few of, yeah?)

Regardless, there was already 1 per db handle open, so it's at the very least the same order of magnitude

@sgotti
Copy link
Contributor Author

sgotti commented Jul 13, 2016

Even for many concurrent calls, wont' there only be 1 fd for the lock and 1 for the db since new calls will block before opening an fd, so only a couple fds (per store which there'll typically be only a few of, yeah?)

Unfortunately not, to take an flock you have to call the flock syscall on an open fd (the db dir), so every goroutine will allocate an fd and wait for the exclusive lock. So if a goroutine is executing a transaction inside db.DO and N other goroutines call db.Do you'll have N+the ql required fds opened at the same time.

@euank
Copy link
Member

euank commented Jul 14, 2016

Ahh, thanks for explaining. I agree with @iaguis that it generally shouldn't be a problem and the code is nicer/cleaner enough I think it makes sense to merge this.

I also manually verified that while running 200 pods (admittedly 1-app ones) and hitting it with the client example, I didn't hit any fd troubles with this code.. though I did notice an unrelated fd leak in api-service which @yifan-gu is looking into.

Still LGTM unless someone thinks the fd-usage concerns need more thought/investigation.

@iaguis
Copy link
Member

iaguis commented Jul 14, 2016

LGTM, I think we can merge it.

@iaguis iaguis merged commit d46ccdf into rkt:master Jul 14, 2016
@lucab lucab unassigned euank and yifan-gu Apr 5, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants