-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prevent SIGSEGV for non-strings on read string method #284
prevent SIGSEGV for non-strings on read string method #284
Conversation
What's wrong with the |
This is why I would like to not enforce Then, the LTS does not know about this Lint at all… so I had to find a workaround there. The crucial part of this PR are the spurious SIGSEGVs that sometimes occur. If I run it locally, I get these every 10-15 runs. If I run something like
I get
Do you or anyone know how it can pinpoint me to the problem (e.g. line of code)? |
I don't think it's worth the trouble. A couple of
Does that happen only in this PR, but not on |
I get this backtrace on every test, it might be a false positive:
|
Okay, agreed, but should I really remove
I just added a new check, so this should happen on master. This is why I am looking into this because we thought (on Discord) of releasing a new version 😄 . Strangely, similar code runs without SIGSEGVs in Geo Engine. I basically wanted to switch it to this code, once released. You've got a nicer backtrace for your memory checks, could you post your run command? |
…-gdal into no-segfault-mdarray-string
I've merged the other PR into this, so we can focus on the MD stuff. @lnicola You've got a nicer backtrace for your memory checks, could you post your run command? |
I used the same command as you (maybe with a different nightly, The point is, that's reported in GDAL itself, in Unfortunately, I'll be AFK for the next couple of days. EDIT: you can also try ASAN and Valgrind. In my experience, ASAN catches more of the "really bad stuff", Valgrind has some "false" positives too. I tried Valgrind on |
So, all I did add was an additional check that we only use read_as_string when it is actually a string. I've created the following script #!/bin/bash
export RUSTFLAGS=-Zsanitizer=address RUSTDOCFLAGS=-Zsanitizer=address
while [ $? -eq 0 ]; do
sleep 1
cargo +nightly-2022-08-17-x86_64-unknown-linux-gnu test -Zbuild-std --target x86_64-unknown-linux-gnu -- raster::mdarray::tests
done And this gets some SEGV after 10-30 calls:
I still have no clue how this can happen. And I don't know why you get nicer ASAN messages that point to some code 🤷 . |
I let your script run for 20 minutes (maybe around 600 runs) and it didn't fail. Maybe it's something specific to your setup (version of GDAL or other dependencies)? |
Maybe someone else can confirm that. Then, we can merge it. But I am still confused 😄. |
What's weird is that your PR branch also crashed once with SIGSEGV on CI, didn't it? |
Ah, yes, it does:
|
This is weird. I can still reproduce it after |
Well, I have no clue how this PR could lead to that problem. |
Yeah, I'm pretty sure that your PR only causes it because it calls the problematic function a second time. |
TIL I'm able to reproduce the crash in the following test: #[test]
fn test_read_string_array() {
for _ in 0..1000 {
let t1 = std::thread::spawn(|| {
let dataset_options = DatasetOptions {
open_flags: GdalOpenFlags::GDAL_OF_MULTIDIM_RASTER,
allowed_drivers: None,
open_options: None,
sibling_files: None,
};
let dataset =
Dataset::open_ex("fixtures/alldatatypes.nc", dataset_options).unwrap();
let root_group = dataset.root_group().unwrap();
let string_array = root_group
.open_md_array("string_var", CslStringList::new())
.unwrap();
});
let t = std::thread::spawn(|| {
let dataset_options = DatasetOptions {
open_flags: GdalOpenFlags::GDAL_OF_MULTIDIM_RASTER,
allowed_drivers: None,
open_options: None,
sibling_files: None,
};
let dataset =
Dataset::open_ex("fixtures/alldatatypes.nc", dataset_options).unwrap();
});
t.join().unwrap();
t1.join().unwrap();
}
} This might be some threading issue in GDAL & friends. |
@ChristianBeilschmidt if I'm not mistaken, making a copy of (it's fine to commit a duplicate file, git will store it only once) |
Wow, you really caught the problem! 👍 Kudos for that. That seems to do the trick for the CI, I just merged the tests: 6ad91ed But it's good that you opened that issue in the GDAL repo. This is really strange and concerning. |
Not that strange, given that the approach |
What about |
The problem with read as string is that we treat the result as string pointers and dereference them. This happens here: Line 284 in 6ad91ed
I don't see it happen in other places. There could be maybe another issue that you are referring to, but this PR fixes the string pointer issue. |
I'm not saying it doesn't, I'm only saying that the implementation is still unsafe. Try this under ASAN: let string_array = root_group
.open_md_array("uint_var", CslStringList::new())
.unwrap();
let values = string_array.read_as::<u8>(vec![0, 0], vec![1, 2]).unwrap(); This happens because |
Valid point, but I would look at this in a different PR: |
Sure, but let's wait until tomorrow in case anyone else wants to take a look. |
bors r+ |
Build succeeded: |
293: enforce that `GdalError` is `Send` r=jdroenner a=ChristianBeilschmidt - [X] I agree to follow the project's [code of conduct](https://github.com/georust/gdal/blob/master/CODE_OF_CONDUCT.md). - [x] I added an entry to `CHANGES.md` if knowledge of this change could be valuable to users. --- Before #284, `GdalError` was `Send`. The previous PR changed it unknowingly. This PR changes #284 so that `GdalError` is `Send` again. Moreover, it adds a test that ensures that. Co-authored-by: Christian Beilschmidt <christian.beilschmidt@geoengine.de>
CHANGES.md
if knowledge of this change could be valuable to users.I noticed that there is a segmentation fault when calling read_as_string for the MDArray if it is not of type string.
This is why I check for its datatype first.
Moreover, there are new Clippy lints that I had to consider.