-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PartialEq implementation for RangeInclusive is unsound #67194
Comments
cc #45982 Lines 347 to 363 in 883b6aa
|
This comment has been minimized.
This comment has been minimized.
Interesting! Non-NLL version of the PoC: use std::cell::RefCell;
use std::cmp::Ordering;
struct Evil<'a, 'b> {
values: RefCell<Vec<&'a str>>,
to_insert: &'b String,
}
impl<'a, 'b> PartialEq for Evil<'a, 'b> {
fn eq(&self, _other: &Self) -> bool {
true
}
}
impl<'a> PartialOrd for Evil<'a, 'a> {
fn partial_cmp(&self, _other: &Self) -> Option<Ordering> {
self.values.borrow_mut().push(self.to_insert);
None
}
}
fn main() {
let values;
{
let to_insert = String::from("Hello, world!");
let e;
{
e = Evil {
values: RefCell::new(Vec::new()),
to_insert: &to_insert,
};
let range = &e..=&e;
let _ = range == range;
}
values = e.values;
}
println!("{:?}", values.borrow());
} |
Thanks jonas! Okay, it could be the code you linked, because it starts crashing at 1.29.0 and that matches the date of your code snippet. Thanks :) |
I'm almost sure this is due to specialization, because I actually did look in Rust source code for places with exploitable specialization by searching for |
If it helps at all, here's a completely self-contained version, with #![feature(specialization)]
use core::cmp::Ordering;
use core::fmt;
use core::hash::{Hash, Hasher};
use core::ops::{Bound, Bound::Included, RangeBounds};
#[derive(Clone)]
pub struct MyRangeInclusive<Idx> {
pub(crate) start: Idx,
pub(crate) end: Idx,
pub(crate) is_empty: Option<bool>,
}
impl<T> RangeBounds<T> for MyRangeInclusive<T> {
fn start_bound(&self) -> Bound<&T> {
Included(&self.start)
}
fn end_bound(&self) -> Bound<&T> {
Included(&self.end)
}
}
trait MyRangeInclusiveEquality: Sized {
fn canonicalized_is_empty(range: &MyRangeInclusive<Self>) -> bool;
}
impl<T> MyRangeInclusiveEquality for T {
#[inline]
default fn canonicalized_is_empty(range: &MyRangeInclusive<Self>) -> bool {
range.is_empty.unwrap_or_default()
}
}
impl<T: PartialOrd> MyRangeInclusiveEquality for T {
#[inline]
fn canonicalized_is_empty(range: &MyRangeInclusive<Self>) -> bool {
range.is_empty()
}
}
// Make this one a `default fn`...
impl<Idx: PartialEq> PartialEq for MyRangeInclusive<Idx> {
#[inline]
default fn eq(&self, other: &Self) -> bool {
self.start == other.start
&& self.end == other.end
&& MyRangeInclusiveEquality::canonicalized_is_empty(self)
== MyRangeInclusiveEquality::canonicalized_is_empty(other)
}
}
// And add a second version (with an identical implementation) for PartialOrd.
// The presence of this second impl specifically is what makes it fail the borrow
// checker as opposed to compiling and printing invalid text (though I don't know why.)
impl<Idx: PartialOrd> PartialEq for MyRangeInclusive<Idx> {
#[inline]
fn eq(&self, other: &Self) -> bool {
self.start == other.start
&& self.end == other.end
&& MyRangeInclusiveEquality::canonicalized_is_empty(self)
== MyRangeInclusiveEquality::canonicalized_is_empty(other)
}
}
impl<Idx: Eq> Eq for MyRangeInclusive<Idx> {}
impl<Idx: Hash> Hash for MyRangeInclusive<Idx> {
fn hash<H: Hasher>(&self, state: &mut H) {
self.start.hash(state);
self.end.hash(state);
MyRangeInclusiveEquality::canonicalized_is_empty(self).hash(state);
}
}
impl<Idx> MyRangeInclusive<Idx> {
#[inline]
pub const fn new(start: Idx, end: Idx) -> Self {
Self {
start,
end,
is_empty: None,
}
}
#[inline]
pub const fn start(&self) -> &Idx {
&self.start
}
#[inline]
pub const fn end(&self) -> &Idx {
&self.end
}
#[inline]
pub fn into_inner(self) -> (Idx, Idx) {
(self.start, self.end)
}
}
impl<Idx: fmt::Debug> fmt::Debug for MyRangeInclusive<Idx> {
fn fmt(&self, fmt: &mut fmt::Formatter<'_>) -> fmt::Result {
self.start.fmt(fmt)?;
write!(fmt, "..=")?;
self.end.fmt(fmt)?;
Ok(())
}
}
impl<Idx: PartialOrd<Idx>> MyRangeInclusive<Idx> {
pub fn contains<U>(&self, item: &U) -> bool
where
Idx: PartialOrd<U>,
U: ?Sized + PartialOrd<Idx>,
{
<Self as RangeBounds<Idx>>::contains(self, item)
}
#[inline]
pub fn is_empty(&self) -> bool {
self.is_empty.unwrap_or_else(|| !(self.start <= self.end))
}
#[inline]
pub(crate) fn compute_is_empty(&mut self) {
if self.is_empty.is_none() {
self.is_empty = Some(!(self.start <= self.end));
}
}
}
use core::cell::RefCell;
#[derive(Debug)]
struct Evil<'a, 'b> {
values: RefCell<Vec<&'a str>>,
to_insert: &'b String,
}
impl<'a, 'b> PartialEq for Evil<'a, 'b> {
fn eq(&self, _other: &Self) -> bool {
true
}
}
impl<'a> PartialOrd for Evil<'a, 'a> {
fn partial_cmp(&self, _other: &Self) -> Option<Ordering> {
self.values.borrow_mut().push(self.to_insert);
None
}
}
fn main() {
let e;
let values;
{
let to_insert = String::from("Hello, world!");
e = Evil {
values: RefCell::new(Vec::new()),
to_insert: &to_insert,
};
let range = MyRangeInclusive::new(&e, &e);
let _ = range == range;
values = e.values;
}
println!("{:?}", values.borrow());
} Edit: added a better comment indicating what the specific code change is that makes it fail the borrow checker as opposed to UB-ing. |
The problem here clearly has to do with the known unsoundness related to specialization. I believe the core specialization is here:
Unfortunately, this specialization violates the always applicable test. In particular, the specializing impl (the second one) is not, well, always applicable. =) In particular the (Note that the extended version of specialization that @aturon described here would permit this specialization, but it would exclude the troublesome |
So... the specialization is unsound as implemented today, but could be made sound. This is probably true of a lot of specializations. |
There's also this impl: Lines 5614 to 5616 in 2c796ee
|
We discussed this in today's @rust-lang/lang meeting on 2020-01-09. A few things: We should probably prepare a PR that comments out the unsound specialization. It'd be good to get an idea of the performance impact -- perhaps someone can figure out who added it and ping them? (Short on time right now or I'd do it myself) We might want to do a more general audit. In general, we'd be looking for specializations that are based on "do one thing for types that implement trait Specializations that should generally be fine are those that pick out specific types: impl<T> Foo for T { }
impl Foo for u32 { } This suggests we may be able to recover performance here if we can replace these specialized impls with various special cases: impl<Idx: PartialEq> PartialEq for MyRangeInclusive<Idx>
impl PartialEq for MyRangeInclusive<usize>
impl PartialEq for MyRangeInclusive<isize>
// etc |
T-compiler triage: I don't think this is currently a T-compiler issue. (Obviously if we were to adopt changes to specialization semantics to accommodate this, then that would be T-compiler; but my reading of the comment thread here is that in the immediate term, the right answer is to remove the offending specialization, which I think is a T-libs matter, right?) |
T-compiler triage: P-high, removing nomination. readding T-libs tag. |
@xfix Okay I missed that, although to be fair that impl is really: impl PartialEq<Evil<'c, 'd>> for Evil<'a, 'b>
where 'a = 'c, 'b = 'd You could've written even @nikomatsakis Oh, right, no bound relying a public |
I wasn't aware of this kind of problem with specialization. I'm trying to add some more specializations to |
@eddyb right...we do have a plan to fix such cases, but it's not implemented or really universally agreed upon |
Another option is to use a private, blanket implemented marker subtrait ( You could also not blanket impl it, but it would then still be less impls in theory than concrete impls: O(PartialOrd types + Specialization impls) instead of O(PartialOrd types * Specialization impls) |
Just to keep people up to date: We (or I) tried removing the specialization in PR #68280. The CI run for that revealed a regression test failure: This particular specialization has semantic meaning. It is not just a performance optimization, as I understand it. I talked to @alexcrichton about this today, and they told me that the T-libs team would take charge on resolving this. |
@rust-lang/libs this is something we'll likely want to take a look at. There's a lot to digest here but the tl;dr is that there's unsound code via specialization in the standard library. A PR to remove the specialization is failing a regression test. This specialized impl accidentally isn't following our attempted rule of "no specialization if it changes behavior". My personal read on this is that we should simply ignore the failing test (comment it out) and land the soundness fix. However surprising the Do others object to landing #68280 after commenting out the test and perhaps getting a crater run to double-check it doesn't have a massive community impact? |
So that it's easier for people landing here -- the regression test in question is this one: rust/src/libcore/tests/iter.rs Lines 1986 to 1991 in f2e1357
AFAICT, it's testing that the RangeInclusive remains equivalent according to The problem seems to stem from the fact that The reason we need specialization, at least by default, is that you can construct a RangeInclusive from something that doesn't implement The specialization was added in #51622, primarily I think to resolve this discussion. |
I wonder if specialization could be specifically for This likely would block stabilization of |
To be honest, I'm trying to understand the actual goal here, in terms of how the T-libs team expects ranges to behave. If it suffices to implement the specialized version on a set of built-in integer types, and not all |
All tests in libcore do pass with the following diff. Unfortunately, it means that the Hash and PartialEq/Eq impls no longer satisfy the requirement given in https://doc.rust-lang.org/nightly/std/hash/trait.Hash.html#hash-and-eq. It is my opinion that it was almost certainly a mistake to not have a diff --git a/src/libcore/ops/range.rs b/src/libcore/ops/range.rs
index d38b3516569..54672cd8bf6 100644
--- a/src/libcore/ops/range.rs
+++ b/src/libcore/ops/range.rs
@@ -349,32 +349,13 @@ pub struct RangeInclusive<Idx> {
// accept non-PartialOrd types, also we want the constructor to be const.
}
-trait RangeInclusiveEquality: Sized {
- fn canonicalized_is_empty(range: &RangeInclusive<Self>) -> bool;
-}
-
-impl<T> RangeInclusiveEquality for T {
- #[inline]
- default fn canonicalized_is_empty(range: &RangeInclusive<Self>) -> bool {
- range.is_empty.unwrap_or_default()
- }
-}
-
-impl<T: PartialOrd> RangeInclusiveEquality for T {
- #[inline]
- fn canonicalized_is_empty(range: &RangeInclusive<Self>) -> bool {
- range.is_empty()
- }
-}
-
#[stable(feature = "inclusive_range", since = "1.26.0")]
impl<Idx: PartialEq> PartialEq for RangeInclusive<Idx> {
#[inline]
fn eq(&self, other: &Self) -> bool {
self.start == other.start
&& self.end == other.end
- && RangeInclusiveEquality::canonicalized_is_empty(self)
- == RangeInclusiveEquality::canonicalized_is_empty(other)
+ && ((self.start == self.end) == (other.start == other.end))
}
}
@@ -386,7 +367,6 @@ impl<Idx: Hash> Hash for RangeInclusive<Idx> {
fn hash<H: Hasher>(&self, state: &mut H) {
self.start.hash(state);
self.end.hash(state);
- RangeInclusiveEquality::canonicalized_is_empty(self).hash(state);
}
}
|
Thinking about my impl more, it won't work -- we can't verify emptiness without PartialOrd/Step, as the three-field nature of RangeInclusive requires that all three fields are used to determine if we're empty (presuming we have not yet called next()). I believe the property we want is that if That approach essentially states that we only use the is_empty slot for cases where |
I think the following diff achieves that: diff --git a/src/libcore/iter/range.rs b/src/libcore/iter/range.rs
index eac3c107d22..640ea417207 100644
--- a/src/libcore/iter/range.rs
+++ b/src/libcore/iter/range.rs
@@ -341,10 +341,10 @@ impl<A: Step> Iterator for ops::RangeInclusive<A> {
#[inline]
fn next(&mut self) -> Option<A> {
- self.compute_is_empty();
- if self.is_empty.unwrap_or_default() {
+ if self.is_empty() {
return None;
}
+ self.compute_is_empty();
let is_iterating = self.start < self.end;
self.is_empty = Some(!is_iterating);
Some(if is_iterating {
@@ -369,10 +369,10 @@ impl<A: Step> Iterator for ops::RangeInclusive<A> {
#[inline]
fn nth(&mut self, n: usize) -> Option<A> {
- self.compute_is_empty();
- if self.is_empty.unwrap_or_default() {
+ if self.is_empty() {
return None;
}
+ self.compute_is_empty();
if let Some(plus_n) = self.start.add_usize(n) {
use crate::cmp::Ordering::*;
@@ -402,11 +402,10 @@ impl<A: Step> Iterator for ops::RangeInclusive<A> {
F: FnMut(B, Self::Item) -> R,
R: Try<Ok = B>,
{
- self.compute_is_empty();
-
if self.is_empty() {
return Try::from_ok(init);
}
+ self.compute_is_empty();
let mut accum = init;
@@ -445,10 +444,10 @@ impl<A: Step> Iterator for ops::RangeInclusive<A> {
impl<A: Step> DoubleEndedIterator for ops::RangeInclusive<A> {
#[inline]
fn next_back(&mut self) -> Option<A> {
- self.compute_is_empty();
- if self.is_empty.unwrap_or_default() {
+ if self.is_empty() {
return None;
}
+ self.compute_is_empty();
let is_iterating = self.start < self.end;
self.is_empty = Some(!is_iterating);
Some(if is_iterating {
@@ -461,10 +460,10 @@ impl<A: Step> DoubleEndedIterator for ops::RangeInclusive<A> {
#[inline]
fn nth_back(&mut self, n: usize) -> Option<A> {
- self.compute_is_empty();
- if self.is_empty.unwrap_or_default() {
+ if self.is_empty() {
return None;
}
+ self.compute_is_empty();
if let Some(minus_n) = self.end.sub_usize(n) {
use crate::cmp::Ordering::*;
@@ -494,11 +493,10 @@ impl<A: Step> DoubleEndedIterator for ops::RangeInclusive<A> {
F: FnMut(B, Self::Item) -> R,
R: Try<Ok = B>,
{
- self.compute_is_empty();
-
if self.is_empty() {
return Try::from_ok(init);
}
+ self.compute_is_empty();
let mut accum = init;
diff --git a/src/libcore/ops/range.rs b/src/libcore/ops/range.rs
index d38b3516569..72d66698520 100644
--- a/src/libcore/ops/range.rs
+++ b/src/libcore/ops/range.rs
@@ -349,32 +349,11 @@ pub struct RangeInclusive<Idx> {
// accept non-PartialOrd types, also we want the constructor to be const.
}
-trait RangeInclusiveEquality: Sized {
- fn canonicalized_is_empty(range: &RangeInclusive<Self>) -> bool;
-}
-
-impl<T> RangeInclusiveEquality for T {
- #[inline]
- default fn canonicalized_is_empty(range: &RangeInclusive<Self>) -> bool {
- range.is_empty.unwrap_or_default()
- }
-}
-
-impl<T: PartialOrd> RangeInclusiveEquality for T {
- #[inline]
- fn canonicalized_is_empty(range: &RangeInclusive<Self>) -> bool {
- range.is_empty()
- }
-}
-
#[stable(feature = "inclusive_range", since = "1.26.0")]
impl<Idx: PartialEq> PartialEq for RangeInclusive<Idx> {
#[inline]
fn eq(&self, other: &Self) -> bool {
- self.start == other.start
- && self.end == other.end
- && RangeInclusiveEquality::canonicalized_is_empty(self)
- == RangeInclusiveEquality::canonicalized_is_empty(other)
+ self.start == other.start && self.end == other.end && self.is_empty.unwrap_or_default()
}
}
@@ -386,7 +365,7 @@ impl<Idx: Hash> Hash for RangeInclusive<Idx> {
fn hash<H: Hasher>(&self, state: &mut H) {
self.start.hash(state);
self.end.hash(state);
- RangeInclusiveEquality::canonicalized_is_empty(self).hash(state);
+ self.is_empty.unwrap_or_default().hash(state);
}
}
Every use of This feels like creating a footgun for the future, but should resolve the immediate issue. (It is, however, untested, because I got a weird ICE when building stage1 while trying to test it.) Ultimately, perhaps that means changing the variable to |
What if we change it back to a two-field struct (rust-lang/rfcs#1980) 🤷? |
Let me explain what I mean that the specialization is only needed for integer types. pub struct RangeInclusive<Idx> {
pub(crate) start: Idx,
pub(crate) end: Idx,
pub(crate) is_empty: Option<bool>,
// This field is:
// - `None` when next() or next_back() was never called
// - `Some(false)` when `start <= end` assuming no overflow
// - `Some(true)` otherwise
// The field cannot be a simple `bool` because the `..=` constructor can
// accept non-PartialOrd types, also we want the constructor to be const.
} The field impl<A: Step> Iterator for ops::RangeInclusive<A>
This would pretty much prevent the stabilization of Alternatively, it's possible to modify Another way out is to revert to two-field struct. Then we would avoid the need to do a lazy computation of whether the iterator is empty or not. So, the fix essentially would be to do this (and of course, you need to import -impl<T: PartialOrd> RangeInclusiveEquality for T {
+impl<T: Step> RangeInclusiveEquality for T {
#[inline]
fn canonicalized_is_empty(range: &RangeInclusive<Self>) -> bool {
range.is_empty()
}
} It doesn't fix the issue fully (you can still have incorrect Also, we can simplify impl<T> RangeInclusiveEquality for T {
#[inline]
default fn canonicalized_is_empty(range: &RangeInclusive<Self>) -> bool {
false // the exact value doesn't matter when range is not an iterator
}
} Easy performance improvement in my opinion. You can even do some more advanced stuff to not even have a pointless constant value to skip stuff like hashing the value returned from this function. Maybe It kinda could be cool to not have to have |
I think based on the discussion so far replacing the specialization to be bounded by a new trait that is only implemented for the integer primitives makes sense to me. I think that gets us soundness and is (per what you've said) not breaking for essentially anyone today. It's only breaking if there are (unstable) impls of Step in the wild where the code also wants a |
And there's an open PR to redesign the |
PR #68358 has been merged. Is this issue fixed? |
cc @matthewjasper on that topic |
Also note that we're landing #68835 shortly which fixes the Range inclusive Hash and Eq impls to be the same. |
Implement a feature for a sound specialization subset This implements a new feature (`min_specialization`) that restricts specialization to a subset that is reasonable for the standard library to use. The plan is to then: * Update `libcore` and `liballoc` to compile with `min_specialization`. * Add a lint to forbid use of `feature(specialization)` (and other unsound, type system extending features) in the standard library. * Fix the soundness issues around `specialization`. * Remove `min_specialization` The rest of this is an overview from a comment in this PR ## Basic approach To enforce this requirement on specializations we take the following approach: 1. Match up the substs for `impl2` so that the implemented trait and self-type match those for `impl1`. 2. Check for any direct use of `'static` in the substs of `impl2`. 3. Check that all of the generic parameters of `impl1` occur at most once in the *unconstrained* substs for `impl2`. A parameter is constrained if its value is completely determined by an associated type projection predicate. 4. Check that all predicates on `impl1` also exist on `impl2` (after matching substs). ## Example Suppose we have the following always applicable impl: ```rust impl<T> SpecExtend<T> for std::vec::IntoIter<T> { /* specialized impl */ } impl<T, I: Iterator<Item=T>> SpecExtend<T> for I { /* default impl */ } ``` We get that the subst for `impl2` are `[T, std::vec::IntoIter<T>]`. `T` is constrained to be `<I as Iterator>::Item`, so we check only `std::vec::IntoIter<T>` for repeated parameters, which it doesn't have. The predicates of `impl1` are only `T: Sized`, which is also a predicate of impl2`. So this specialization is sound. ## Extensions Unfortunately not all specializations in the standard library are allowed by this. So there are two extensions to these rules that allow specializing on some traits. ### rustc_specialization_trait If a trait is always applicable, then it's sound to specialize on it. We check trait is always applicable in the same way as impls, except that step 4 is now "all predicates on `impl1` are always applicable". We require that `specialization` or `min_specialization` is enabled to implement these traits. ### rustc_specialization_marker There are also some specialization on traits with no methods, including the `FusedIterator` trait which is advertised as allowing optimizations. We allow marking marker traits with an unstable attribute that means we ignore them in point 3 of the checks above. This is unsound but we allow it in the short term because it can't cause use after frees with purely safe code in the same way as specializing on traits methods can. r? @nikomatsakis cc #31844 #67194
See rust-lang/rust#67194 for details
See rust-lang/rust#67194 for details
The following code causes UB in stable without
unsafe
(as can be seen on playground).Effects of executing this code are random, for instance, I once got this.
The text was updated successfully, but these errors were encountered: