Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open & writable database connections carried across fork() are automatically discarded in the child #558

Merged
merged 9 commits into from
Sep 18, 2024

Conversation

flavorjones
Copy link
Member

@flavorjones flavorjones commented Sep 15, 2024

Context

In August 2024, Andy Croll opened an issue1 describing sqlite file corruption related to solid queue. After investigation, we were able to reproduce corruption under certain circumstances when forking a process with open sqlite databases.2

SQLite is known to not be fork-safe3, so this was not entirely surprising though it was the first time your author had personally seen corruption in the wild. The corruption became much more likely after the sqlite3-ruby gem improved its memory management with respect to open statements4 in v2.0.0.

Advice from upstream contributors5 is, essentially: don't fork if you have open database connections. Or, if you have forked, don't call sqlite3_close on those connections and thereby leak some amount of memory in the child process. Neither of these options are ideal, see below.

Decisions

  1. Open writable database connections carried across a fork() will automatically be closed in the child process to mitigate the risk of corrupting the database file.
  2. These connections will be incompletely closed ("discarded") which will result in a one-time memory leak in the child process.

First, the gem will register an "after fork" handler via Process._fork that will close any open writable database connections in the child process. This is a best-effort attempt to avoid corruption, but it is not guaranteed to prevent corruption in all cases. Any connections closed by this handler will also emit a warning to let users know what's happening.

Second, the sqlite3-ruby gem will store the ID of the process that opened each database connection. If, when a writable database is closed (either explicitly with Database#close or implicitly via GC or after-fork callback) the current process ID is different from the original process, then we "discard" the connection.

"Discard" here means:

  • The Database object acts "closed", including returning true from #closed?.
  • sqlite3_close_v2 is not called on the object, because it is unsafe to do so per sqlite instructions3. As a result, some memory will be lost permanently (a one-time "memory leak").
  • Open file descriptors associated with the database are closed.
  • Any memory that can be freed safely is recovered.

Note that readonly databases are being treated as "fork safe" and are not affected by any of these changes.

Consequences

The positive consequence is that we remove a potential cause of database corruption for applications that fork with active sqlite database connections.

The negative consequence is that, for each discarded connection, some memory will be permanently lost (leaked) in the child process.

Alternatives considered.

1. Require applications to close database connections before forking.

This is the advice5 given by the upstream maintainers of sqlite, and so was the first thing we tried to implement in Rails in rails/rails#529316. That first simple implementation was not thread safe, however, and in order to make it thread-safe it would be necessary to pause all sqlite database activity, close the open connections, and then fork. At least one Rails core team member was not happy that this would interfere with database connections in the parent, and the complexity of a thread-safe solution seemed high, so this work was paused.

2. Memory arena

Sqlite offers a configuration option to specify custom memory functions for malloc et al. It seems possible that the sqlite3-ruby gem could implement a custom arena that would be used by sqlite so that in a new process, after forking, all the memory underlying the sqlite Ruby objects could be discarded in a single operation.

I think this approach is promising, but complex and risky. Sqlite is a complex library and uses shared memory in addition to the traditional heap. Would throwing away the heap memory (the arena) result in a segfault or other undefined behaviors or corruption? Determining the answer to that question feels expensive in and of itself, and any solution along these lines would not be supported by the sqlite authors. We can explore this space if the memory leak from discarded connections turns out to be a large source of pain.

References

Footnotes

Footnotes

  1. SQLite queue database corruption · Issue #324 · rails/solid_queue

  2. flavorjones/2024-09-13-sqlite-corruption: Temporary repo, reproduction of sqlite database corruption.

  3. How To Corrupt An SQLite Database File: §2.6 Carrying an open database connection across a fork() 2

  4. Always call sqlite3_finalize in deallocate func by haileys · Pull Request #392 · sparklemotion/sqlite3-ruby

  5. SQLite Forum: Correct way of carrying connections over forked processes 2

  6. SQLite3Adapter: Ensure fork-safety by flavorjones · Pull Request #52931 · rails/rails

@flavorjones flavorjones marked this pull request as ready for review September 16, 2024 15:00
@flavorjones flavorjones force-pushed the flavorjones-implement-discard branch 3 times, most recently from 48ea2fa to df07939 Compare September 16, 2024 15:27
@flavorjones
Copy link
Member Author

@jeremy suggested changing Database#close to call Database#discard if the process ID has changed. I really like this idea, and emitting the warning from the gem will help alert people to when they're doing something that sqlite doesn't support. I've added a commit to the PR to do this.

@flavorjones flavorjones force-pushed the flavorjones-implement-discard branch 6 times, most recently from 0c2ce10 to 760d766 Compare September 16, 2024 18:17
@byroot
Copy link
Contributor

byroot commented Sep 16, 2024

suggested changing Database#close to call Database#discard if the process ID has changed. I really like this idea, and emitting the warning from the gem will help alert people to when they're doing something that sqlite doesn't support.

Agreed. That's a great idea. Make it less weird than exposing a method that basically says: leak memory for me please.

@flavorjones flavorjones changed the title Database#discard Database connections carried across fork() will not be fully closed Sep 16, 2024
@flavorjones
Copy link
Member Author

I've rewritten this PR. Diffs from the original:

  • there is no Database#discard method being introduced
  • instead, any time a database is closed (either explicitly via Database#close or implicitly via GC) the current pid is checked against the creating pid and if they are different the connection is "discarded" and a warning is emitted

After conversations with a few folks, I came to the conclusion that a discard method was just confusing and unnecessary.

I'll leave this open for a bit if anyone wants to give feedback.

@flavorjones flavorjones force-pushed the flavorjones-implement-discard branch 3 times, most recently from ef292eb to 8a08461 Compare September 17, 2024 01:50
ext/sqlite3/database.c Outdated Show resolved Hide resolved
ext/sqlite3/database.c Outdated Show resolved Hide resolved
ext/sqlite3/database.c Outdated Show resolved Hide resolved
@flavorjones
Copy link
Member Author

@tenderlove Can I get your 👀 on this before I whack merge

Copy link
Member

@tenderlove tenderlove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense, but if I understand it correctly the change doesn't prevent corruption, it just emits a warning if someone calls "close" on a database that happened to be opened in another process. Someone could, for example, fork a process then execute queries in the child process thereby causing corruption.

I don't think we should check the PID for every query. Unsure if it's possible, but could we register a callback with Process._fork for every open database and then issue a warning on fork?

To be clear, I'm fine with the change in this PR. It makes sense.

ext/sqlite3/database.c Outdated Show resolved Hide resolved
@flavorjones
Copy link
Member Author

@tenderlove Thanks for reviewing.

if I understand it correctly the change doesn't prevent corruption

It will prevent corruption for Rails apps (when we also update sqlite3adapter). By not calling sqlite3_close_v2 we avoid corruption in the case demonstrated in https://github.com/flavorjones/2024-09-13-sqlite-corruption.

Someone could, for example, fork a process then execute queries in the child process thereby causing corruption

but yes, it doesn't always prevent corruption, I'm just trying to nail the Rails use case here (which will require a PR to Rails as well).

could we register a callback with Process._fork for every open database and then issue a warning on fork?

Yes, we could absolutely do that -- just to make sure I understand, you want the warning issued in the parent process when the fork happens, yes? I may push it to a separate PR though.

I guess we could also register a callback with Process._fork that would immediately close ("discard") the connection in the child. That might prevent corruption anywhere. Let me play with that.

Sqlite is not fork-safe and closing the database connection in the
child process can lead to corruption.

Emit a warning when this happens so developers know when they're doing
something that's not supported by sqlite.

See adr/2024-09-fork-safety.md for a full explanation.
@flavorjones
Copy link
Member Author

I've pushed one more commit that does some big changes that I'd like a re-review of.

  • Readonly connections are excepted from all special close handling.
  • The gem hooks into Process._fork and automatically closes all open, writable connections after fork.

@tenderlove @byroot Thanks for your feedback so far.

Copy link
Member

@tenderlove tenderlove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Nice work.

ext/sqlite3/database.c Outdated Show resolved Hide resolved
@flavorjones flavorjones changed the title Database connections carried across fork() will not be fully closed Open & writable database connections carried across fork() are automatically discarded in the child Sep 17, 2024
@flavorjones
Copy link
Member Author

Added a commit that ensures Statements related to a discarded database will raise an exception if people try to use them.

@flavorjones flavorjones force-pushed the flavorjones-implement-discard branch 3 times, most recently from 6d705a8 to c213875 Compare September 18, 2024 02:05
@flavorjones flavorjones merged commit e16fb4e into main Sep 18, 2024
130 checks passed
@flavorjones flavorjones deleted the flavorjones-implement-discard branch September 18, 2024 13:08
@flavorjones
Copy link
Member Author

Will cut a release candidate today.

@flavorjones
Copy link
Member Author

@@ -121,6 +135,7 @@ step(VALUE self)

TypedData_Get_Struct(self, sqlite3StmtRuby, &statement_type, ctx);

require_open_db(self);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't notice this in the initial review. step is an extremely hot method, do we actually need to do this? I thought SQLite would already raise an exception if the db is somehow closed out from under it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it does an IV read which is a bit much. If it was a struct member access I think it would be acceptable.

Copy link
Member Author

@flavorjones flavorjones Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will look at other ways to make sure we're protected. This is really just protecting us from segfault if the DB was "discarded" so I can put something in the struct and that should be good enough.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #565

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants