-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-10804: [Rust] Removed some unsafe code from the parquet crate #8829
Conversation
@jhorstmann and @Dandandan , I notice that you are quite experienced with these types of problems by your PRs. I am trying to remove this |
def_levels_buf.as_mut().map(|ptr| ptr.to_slice_mut()), | ||
rep_levels_buf.as_mut().map(|ptr| ptr.to_slice_mut()), | ||
values_buf.to_slice_mut(), | ||
def_levels, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error originates from here, the error shows that it is moved here. A clone likely "solves" the issue? Usually taking the value by &
in the function also solves this as it will stop it from taking ownership?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will have some time tomorrow to check out the code to see what can be done about it otherwise
@jorgecarleitao Did not find an easy fix yet. What works, though ugly, is using the |
Thanks for the help. I tried that, but it is causing a double mutable borrow on my code. Maybe I am doing something wrong. Could you push to the change to a branch in your repo and share it with me? |
@Dandandan , your tip helped. Thanks a lot! It compiles now. The semantics are still wrong and I need to figure out why. |
@jorgecarleitao great to hear! I saw it depends at least on changing the value of |
Codecov Report
@@ Coverage Diff @@
## master #8829 +/- ##
=======================================
Coverage 76.77% 76.77%
=======================================
Files 181 181
Lines 41009 40990 -19
=======================================
- Hits 31485 31472 -13
+ Misses 9524 9518 -6
Continue to review full report at Codecov.
|
@nevi-me , I would really appreciate your help here: there are major problems emerging in the parquet crate when I try to remove an unsafe struct. The background of this change is that This is blocking #8796 I verified that in all 4 tests the content passed to |
Hey Jorge, I've been out of town, got back this morning. I'm going to catch up on PRs this weekend. I'll have a look at this first. |
Hi @jorgecarleitao, I also spent most of today looking into this :( The problem was a bit vs byte issue on
This was hidden away by the way that we compare arrays. Because we only print the first 10 & last 10 values of arrays, the issue wasn't visible, but comparing I also changed the code slightly to rather initialise the buffers with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some feedback
FWIW I am not sure but this PR may conflict with #8698 |
@alamb they fortunately don't seem to conflict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I would like to propose that we outline and enforce guidelines on the arrow crate implementation with respect to the usage of `unsafe`. The background of this proposal are PRs #8645 and #8829. In both cases, while addressing an unrelated issue, they hit undefined behavior (UB) due to an incorrect usage of `unsafe` in the code base. This UB was very time-consuming to identify and debug: combined, they accounted for more than 12hs of my time. Safety against undefined behavior is the core premise of the Rust language. In many cases, the maintenance burden (time to find and fix bugs) does not justify the performance improvements and the decrease in motivation in handling them (they are just painful due to how difficult they are to debug). In particular, IMO those 12 hours would have been better spent in other parts of the code if `unsafe` would have not been used in the first place, which would have been likely translated in faster code or more features. There are situations where `unsafe` is necessary, and the guidelines outline these cases. However, I also see many uses of `unsafe` that are not necessary nor properly documented. The goal of these guidelines is to motivate contributors of the Rust implementation to be conscious about the maintenance cost of `unsafe`, and outline specific necessary conditions for any new `unsafe` to be introduced in the code base. Closes #8901 from jorgecarleitao/arrow_unsafe Lead-authored-by: Jorge Leitao <jorgecarleitao@gmail.com> Co-authored-by: Jorge C. Leitao <jorgecarleitao@gmail.com> Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
This PR removes an unsafe code by its safe counterpart. Closes apache#8829 from jorgecarleitao/remove_unsafe Lead-authored-by: Jorge C. Leitao <jorgecarleitao@gmail.com> Co-authored-by: Neville Dipale <nevilledips@gmail.com> Signed-off-by: Jorge C. Leitao <jorgecarleitao@gmail.com>
I would like to propose that we outline and enforce guidelines on the arrow crate implementation with respect to the usage of `unsafe`. The background of this proposal are PRs apache#8645 and apache#8829. In both cases, while addressing an unrelated issue, they hit undefined behavior (UB) due to an incorrect usage of `unsafe` in the code base. This UB was very time-consuming to identify and debug: combined, they accounted for more than 12hs of my time. Safety against undefined behavior is the core premise of the Rust language. In many cases, the maintenance burden (time to find and fix bugs) does not justify the performance improvements and the decrease in motivation in handling them (they are just painful due to how difficult they are to debug). In particular, IMO those 12 hours would have been better spent in other parts of the code if `unsafe` would have not been used in the first place, which would have been likely translated in faster code or more features. There are situations where `unsafe` is necessary, and the guidelines outline these cases. However, I also see many uses of `unsafe` that are not necessary nor properly documented. The goal of these guidelines is to motivate contributors of the Rust implementation to be conscious about the maintenance cost of `unsafe`, and outline specific necessary conditions for any new `unsafe` to be introduced in the code base. Closes apache#8901 from jorgecarleitao/arrow_unsafe Lead-authored-by: Jorge Leitao <jorgecarleitao@gmail.com> Co-authored-by: Jorge C. Leitao <jorgecarleitao@gmail.com> Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
This PR removes an unsafe code by its safe counterpart.