-
-
Notifications
You must be signed in to change notification settings - Fork 249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix pgrx install causing postgresql coredump #1263
Fix pgrx install causing postgresql coredump #1263
Conversation
this is because pgrx will overwrite the .so file in place. dlopen will use mmap with MAP_PRIVATE. but if other process is modifying the file. dlopen will use MAP_PRIVATE to open the file and map a read-only memory use mmap(2). usually this memory has copy-on-write. if others is modifying the file. the previously mapped memory should not change. but there is a undefined behavior, from man mmap(2): ``` MAP_PRIVATE: It is unspecified whether changes made to the file after the mmap() call are visible in the mapped region. ``` what actually happens is that the read-only memory in postgresql is modified, and all pointers in the .TEXT segment is mashed up. the call stack looks like ``` #0 0x0000000000731510 in ?? () pgcentralfoundation#1 0x00007f7a94515132 in core::sync::atomic::AtomicUsize::store (self=0x7f7a95110318 <pgrx_pg_sys::submodules::thread_check::ACTIVE_THREAD::h60448dcb81097e92>, val=0, order=core::sync::atomic::Ordering::Relaxed) at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/sync/atomic.rs:2291 pgcentralfoundation#2 0x00007f7a944f574b in pgrx_pg_sys::submodules::thread_check::init_active_thread::clear_in_child () at src/submodules/thread_check.rs:39 pgcentralfoundation#3 0x00007f7a962f8a88 in __run_postfork_handlers (who=who@entry=atfork_run_child, do_locking=do_locking@entry=false, lastrun=lastrun@entry=2) at register-atfork.c:187 pgcentralfoundation#4 0x00007f7a962df773 in __libc_fork () at fork.c:109 pgcentralfoundation#5 0x00005555e66ad948 in fork_process () at fork_process.c:61 pgcentralfoundation#6 0x00005555e66a5d48 in StartAutoVacWorker () at autovacuum.c:1543 pgcentralfoundation#7 0x00005555e66c43f7 in StartAutovacuumWorker () at postmaster.c:6155 pgcentralfoundation#8 0x00005555e66c3d21 in sigusr1_handler (postgres_signal_arg=10) at postmaster.c:5820 pgcentralfoundation#9 <signal handler called> pgcentralfoundation#10 0x00007f7a9630eb84 in __GI___select (nfds=6, readfds=0x7ffc61c2fa40, writefds=0x0, exceptfds=0x0, timeout=0x7ffc61c2f9b0) at ../sysdeps/unix/sysv/linux/select.c:69 pgcentralfoundation#11 0x00005555e66be343 in ServerLoop () at postmaster.c:1950 pgcentralfoundation#12 0x00005555e66bdb0f in PostmasterMain (argc=5, argv=0x5555e8fb5490) at postmaster.c:1631 pgcentralfoundation#13 0x00005555e6560e41 in main (argc=5, argv=0x5555e8fb5490) at main.c:240 ``` the is `pgrx_pg_sys` try to run the postfork hook. but the variable `ACTIVE_THREAD` and the code binary does not in the previous place.
Wow that's cursed. |
Hmm, can you give more exact repro instructions? This sounds like something we should be testing under valgrind, so if it's possible to replicate this consistently, jamming out a script to use this as a regression test would be nice. |
sorry I forget to mention the repro need |
the key point is ensure and try to add some static variable to the pgrx project between <3> and <4> if the .so has the same memory layout with the previous build another repro is to write some simple UDF, and install it. load and run it. or try |
I think it can be reproduced by simply creating any new session, like |
this person copies the .so file. and get crashed on every UDF call. after rebooting PostgreSQL it returns to normal. |
cargo-pgrx/src/command/install.rs
Outdated
@@ -161,10 +161,14 @@ pub(crate) fn install_extension( | |||
}; | |||
dest.push(format!("{}.so", so_name)); | |||
|
|||
if cfg!(target_os = "macos") { | |||
if cfg!(target_os = "macos") || cfg!(target_os = "linux") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me we should just always do it this way, keeping the comments as to why. macos and linux are our two primary target platforms and if they both exhibit similar problems with overwriting the existing .so, I'm sure other platforms do too.
BTW, have you reproduced the problem? I find
|
That is exciting. |
Successful repro. There seems to have been more than just one segfault, even. Great. Excellent. Fantastic. Merging this and I will look into a regression test after we expand cargo-pgrx's capabilities a bit to make such easier. |
The fifth beta release of 0.10.0! * `#[pg_test]`-compatible proptest support in #1258 courtesy of moi * PostgresEq (#1261) and PostgresOrd (#1262) now explicitly require their "base" Rust traits, which may hypothetically break some impls, but in exchange your equality implementations are now presumed to be commutative. * Work on improving/simplifying/optimizing our SPI continues, with @vrmiguel contributing some polish to Datum preparation in #1256 * We have been finding some extra-spicy edge cases in how PGRX interacts with a continuously-operational Postgres installation which might feature extensions being installed and reinstalled lately! @Sasasu fixed at least one core-dump-causing case from `cargo pgrx install` in #1263 ## New Contributors * @vrmiguel made their first contribution in #1256 **Full Changelog**: v0.10.0-beta.3...v0.10.0-beta.4
this is because pgrx will overwrite the .so file in place. dlopen will use MAP_PRIVATE to open the file and map a read-only memory using mmap(2). usually this memory is copy-on-write. if others are modifying the file, the previously mapped memory should not change. but there is an undefined behavior, from `man mmap(2)`: > MAP_PRIVATE: It is unspecified whether changes made to the file after the mmap() > call are visible in the mapped region. what actually happens is that the read-only memory in postgresql is modified, and all pointers in the .text segment can get mashed up. the call stack looks like ``` 0x0000000000731510 in ?? () 0x00007f7a94515132 in core::sync::atomic::AtomicUsize::store (self=0x7f7a95110318 <pgrx_pg_sys::submodules::thread_check::ACTIVE_THREAD::h60448dcb81097e92>, val=0, order=core::sync::atomic::Ordering::Relaxed) at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/sync/atomic.rs:2291 0x00007f7a944f574b in pgrx_pg_sys::submodules::thread_check::init_active_thread::clear_in_child () at src/submodules/thread_check.rs:39 0x00007f7a962f8a88 in __run_postfork_handlers (who=who@entry=atfork_run_child, do_locking=do_locking@entry=false, lastrun=lastrun@entry=2) at register-atfork.c:187 0x00007f7a962df773 in __libc_fork () at fork.c:109 0x00005555e66ad948 in fork_process () at fork_process.c:61 0x00005555e66a5d48 in StartAutoVacWorker () at autovacuum.c:1543 0x00005555e66c43f7 in StartAutovacuumWorker () at postmaster.c:6155 0x00005555e66c3d21 in sigusr1_handler (postgres_signal_arg=10) at postmaster.c:5820 <signal handler called> 0x00007f7a9630eb84 in __GI___select (nfds=6, readfds=0x7ffc61c2fa40, writefds=0x0, exceptfds=0x0, timeout=0x7ffc61c2f9b0) at ../sysdeps/unix/sysv/linux/select.c:69 0x00005555e66be343 in ServerLoop () at postmaster.c:1950 0x00005555e66bdb0f in PostmasterMain (argc=5, argv=0x5555e8fb5490) at postmaster.c:1631 0x00005555e6560e41 in main (argc=5, argv=0x5555e8fb5490) at main.c:240 ``` this is `pgrx_pg_sys` trying to run the postfork hook, but the variable `ACTIVE_THREAD` and the code binary will mismatch.
The fifth beta release of 0.10.0! * `#[pg_test]`-compatible proptest support in pgcentralfoundation#1258 courtesy of moi * PostgresEq (pgcentralfoundation#1261) and PostgresOrd (pgcentralfoundation#1262) now explicitly require their "base" Rust traits, which may hypothetically break some impls, but in exchange your equality implementations are now presumed to be commutative. * Work on improving/simplifying/optimizing our SPI continues, with @vrmiguel contributing some polish to Datum preparation in pgcentralfoundation#1256 * We have been finding some extra-spicy edge cases in how PGRX interacts with a continuously-operational Postgres installation which might feature extensions being installed and reinstalled lately! @Sasasu fixed at least one core-dump-causing case from `cargo pgrx install` in pgcentralfoundation#1263 ## New Contributors * @vrmiguel made their first contribution in pgcentralfoundation#1256 **Full Changelog**: pgcentralfoundation/pgrx@v0.10.0-beta.3...v0.10.0-beta.4
This is the final release of v0.10.0. Thanks everyone for the beta testing, pull requests, issues, and patience. As always, install `cargo-pgrx` with `cargo install cargo-pgrx --locked` and update your extension Cargo.toml files to use the `0.10.0` pgrx dependencies. This release includes support for Postgres 16RC1. Support for the previous betas has been removed. As such, a fresh `cargo pgrx init` is required. ## What's Changed Since v0.10.0-beta.4 * Fix `GetMemoryChunkContext` port by @workingjubilee in #1273 * Better error messages when `pg_config` isn't found. by @eeeebbbbrrrr in #1271 * Make `PostgresHash` also need `Eq` by @workingjubilee in #1264 * Memoize git hash and extension metadata by @levkk in #1274 * move to pg16rc1 by @eeeebbbbrrrr in #1276 * Fix bgworker template up to 0.10.0-beta.4 by @workingjubilee in #1270 ## New Contributors * @levkk made their first contribution in #1274 **Changelog**: v0.10.0-beta.4...v0.10.0 --- v0.10.0's full set of changes throughout the entire beta period are: * Postgres 16beta1 Support by @eeeebbbbrrrr in #1169 * Support building against macOS universal binaries by @clowder in #1166 * list specific versions in feature gates by @eeeebbbbrrrr in #1175 * Fix bug with converting a `pg_sys::Datum` into a `pgrx::Date` by @eeeebbbbrrrr in #1177 * Fix Arrays with leading nulls by @eeeebbbbrrrr in #1180 * Disable hello_versioned_so test by @workingjubilee in #1192 * doc: fix link broken by @yihong0618 in #1181 * fcinfo: fix incorrect length set in unsafe code by @Sasasu in #1190 * update to pg16beta2 support by @eeeebbbbrrrr in #1188 * Array-walking is aligned by @workingjubilee in #1191 * Implement PGRXSharedMemory for Deque by @feikesteenbergen in #1170 * Include security labels header by @daamien in #1189 * Fixes macos-11 tests by @BradyBonnette in #1197 * Pgcentralfoundation updates again by @eeeebbbbrrrr in #1200 * Update version to 0.10.0-beta.0 by @eeeebbbbrrrr in #1201 * Testing help by @eeeebbbbrrrr in #1203 * Type testability cleanup by @eeeebbbbrrrr in #1204 * Try to smartly propagate fs errors by @workingjubilee in #1186 * Fix issue #1209 by @eeeebbbbrrrr in #1210 * Type roundtrip tests by @eeeebbbbrrrr in #1185 * Update version to 0.10.0-beta.1 by @eeeebbbbrrrr in #1213 * Add a workaround for the pg16/homebrew/icu4c situation by @thomcc in #1206 * Add security policy by @johnrballard in #1207 * `AnyNumeric` is no longer backed by Postgres-allocated memory by @eeeebbbbrrrr in #1216 * Modularize pgrx::spi by @workingjubilee in #1219 * Stop SpiClient soundness from regressing by @workingjubilee in #1214 * Add foreign table headers by @workingjubilee in #1226 * Modularize the interior of pgrx-pg-sys by @workingjubilee in #1227 * Initial valgrind support by @thomcc in #1218 * Add support for handling SIGINT and SIGCHLD from bgworker by @JelteF in #1229 * Ignores UI tests for MUSL environments by @BradyBonnette in #1235 * Add a env flag that can be set to skip `#[pg_test]`-generated tests. by @thomcc in #1239 * Fix issue #1076: Properly handle dependency graph of `Result<T, _>` by @eeeebbbbrrrr in #1241 * Cleanup the error when cargo-pgrx version doesn't match Cargo.toml by @eeeebbbbrrrr in #1240 * Add operator and cache related api by @VoVAllen in #1242 * Addresses cargo-pgrx error reporting by @BradyBonnette in #1238 * Update version to 0.10.0-beta.2 by @eeeebbbbrrrr in #1244 * Bump cargo-metadata and clap-cargo by @thomcc in #1246 * Derive Clone for Inet by @JelteF in #1251 * Correct docs for datetime `From` impls by @workingjubilee in #1253 * Only enable line tables for profile.dev by @thomcc in #1249 * Remove references to master branch by @thomcc in #1243 * Ensure bindgen gets all the `cppflags` it needs (on macOS, anyway) by @thomcc in #1247 * update for pg16beta3 support by @eeeebbbbrrrr in #1254 * Update version to 0.10.0-beta.3 by @eeeebbbbrrrr in #1255 * Add proptest support by @workingjubilee in #1258 * Misc reformatting and typo fixes by @workingjubilee in #1260 * spi: simplify (optimize?) Datum preparation by @vrmiguel in #1256 * Assume commutation when deriving PostgresEq by @workingjubilee in #1261 * Demand Ord for PostgresOrd by @workingjubilee in #1262 * Fix pgrx install causing postgresql coredump by @Sasasu in #1263 * Update version to 0.10.0-beta.4 by @workingjubilee in #1267 ## New Contributors * @clowder made their first contribution in #1166 * @yihong0618 made their first contribution in #1181 * @Sasasu made their first contribution in #1190 * @daamien made their first contribution in #1189 * @johnrballard made their first contribution in #1207 * @VoVAllen made their first contribution in #1242 * @vrmiguel made their first contribution in #1256 **Full Changelog**: v0.9.8...v0.10.0
this is because pgrx will overwrite the .so file in place.
dlopen will use MAP_PRIVATE to open the file and map a read-only memory use mmap(2). usually this memory has copy-on-write. if others is modifying the file. the previously mapped memory should not change.
but there is a undefined behavior, from man mmap(2):
what actually happens is that the read-only memory in postgresql is modified, and all pointers in the .TEXT segment is mashed up. the call stack looks like
the is
pgrx_pg_sys
try to run the postfork hook. but the variableACTIVE_THREAD
and the code binary does not in the previous place.