-
-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add lifetimes — refs into StrInput<'i> are bound by &'i #172
Conversation
(I thought of just dropping |
This is exact what I wanted for #141 after adding
Agreed. |
Likewise, while it's cute that |
469bd8d represents what it'd look like if we did drop |
Test added in 41224d0 which also demonstrates the reference returned from The same test refuses to compile on master: error[E0597]: `input2` does not live long enough
--> src/inputs/string_input.rs:360:9
|
358 | unsafe { input2.slice(1, 3) }
| ------ borrow occurs here
359 |
360 | };
| ^ `input2` dropped here while still borrowed
...
363 | }
| - borrowed value needs to live until here |
I'm generally in favor of this change. The reason why I didn't begin with this was because I was unsure what the final API would look like and I was afraid it would be hindered by the huge amount of lifetimes, but seeing it now, it doesn't seem bad at all. One big question remains, however. Should we drop |
Super good question! I'll give it a shot (might not be until tomorrow) so we can see how much simpler it'd end up in practice. |
@kivikakk Sounds great! Thank you so much for your time. I'm really excited for the future of the project. |
This looks really exciting! I've been thinking about this for a bit and I'm wondering what exact the input for the parser should be. Here are the few solutions that I have in mind:
|
If the input is an iterator, can we generate the result slice from it (without overhead)? |
@sunng87, we could have a trait that extends |
I'm still not convinced whether there is any strong use case of pest outside |
The more I think about this, the more I think just using |
The refactor is complete (226314f). I'll try to make codecov happy. |
@kivikakk, nice work! I'll review the PR then. |
All greens (finally!) 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Splendid work! 😃 Only a few papercuts in the code, but there seems to be a small performance regression against master in the pest_grammar
json
bench.
pest/examples/parens.rs
Outdated
|
||
use pest::inputs::{Input, Position}; | ||
use pest::inputs::Position; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it would be more appropriate to rename this to input
now. Maybe even something else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I see Position
and Span
as so related any more. The former does matching operations and transformations; the latter is more of a useful container. I think I'll pull them out of a module, as there's not really one concept that binds them together.
pest/src/inputs/mod.rs
Outdated
@@ -7,14 +7,8 @@ | |||
|
|||
//! A `mod` containing the `Input`-related constructs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will need to be amended to its new name/purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
pest/src/inputs/position.rs
Outdated
@@ -4,48 +4,44 @@ | |||
// This Source Code Form is subject to the terms of the Mozilla Public | |||
// License, v. 2.0. If a copy of the MPL was not distributed with this | |||
// file, You can obtain one at http://mozilla.org/MPL/2.0/. | |||
|
|||
#[allow(unused_imports)] use std::ascii::AsciiExt; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would remove the attribute until the next version of Rust is stable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
pest/src/inputs/position.rs
Outdated
pub struct Position<'i> { | ||
input: &'i str, | ||
pos: usize, | ||
__phantom: ::std::marker::PhantomData<&'i str>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No.
pest/src/inputs/position.rs
Outdated
use super::span; | ||
use super::super::util::hash_str; | ||
|
||
/// A `struct` containing a position that is tied to an `Input` which provides useful methods to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All documentation mentioning Input
should be fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
pest/src/iterators/flat_pairs.rs
Outdated
start: usize, | ||
end: usize | ||
end: usize, | ||
__phantom: ::std::marker::PhantomData<&'i str> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed.
pest/src/iterators/pair.rs
Outdated
start: usize | ||
input: &'i str, | ||
start: usize, | ||
__phantom: ::std::marker::PhantomData<&'i str> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed.
pest/src/iterators/pairs.rs
Outdated
start: usize, | ||
end: usize | ||
end: usize, | ||
__phantom: ::std::marker::PhantomData<&'i str> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed.
pest/src/iterators/token_iterator.rs
Outdated
index: usize, | ||
start: usize, | ||
end: usize | ||
end: usize, | ||
__phantom: ::std::marker::PhantomData<&'i str> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed.
pest/src/prec_climber.rs
Outdated
@@ -146,7 +145,7 @@ impl<R: RuleType> PrecClimber<R> { | |||
/// let primary = |pair| { | |||
/// consume(pair, climber) | |||
/// }; | |||
/// let infix = |lhs: i32, op: Pair<Rule, StringInput>, rhs: i32| { | |||
/// let infix = |lhs: i32, op: Pair<Rule, StrInput>, rhs: i32| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to use the new 'i
signature. Best solution would probably be to grep after Input
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
I do note the performance regression as well; this surprises me, particularly as the net effect is removing a layer of indirection! |
ashqjfkldechrtushxshuocrcsn;jh code coverage |
One thing I'm noticing, drilling down into this, is that The entire diff between master...kivikakk:lifetimes in the |
Oh, there's not a |
I'm pretty mystified by this! Any direction from anyone with a clue would be appreciated. |
The inlining seems particularly weird; if anything, inlining should be easier now that the API is simpler. One particular culprit could be the fact that, unlike In the meantime, I'll try to figure out whether the fat pointer is impacting performance in any possible way. |
I think that the key here would be to add a few microbenchmarks in |
While I haven't looked that much at Rust performance (I have done it a lot for C/C++) I'm not sure why |
@emoon |
Ah right. Still I find it hard to think that is the issue here but thank for clarifying that :) |
Posted it on gitter but on my laptop with
so it might be worth trying the bench again, maybe it was a perf regression in the nightly used at the time to run the benches. |
@Keats Seems to be a regression in rustc. |
This is exciting news! 😄 |
I've dug a little bit. The performance regression of this PR seems to be caused by some extra heap allocations. Simplest way to see the difference is to revert to
vs.
The difference is small, but it's probably the main issue. By checking the code it's very hard to see exactly where this difference might be coming from, since there are no apparent extra allocations. A next step would probably be to try and run this experiment with all the debug information present, provided the results will show a similar difference. |
Looking at callgrind, I get the following as heavy costs for the This basically just confirms what @kivikakk found that The total instruction fetch cost is roughly similar there. Running it for the benchmark only on strings: #[bench]
fn string(b: &mut Bencher) {
b.iter(|| {
// parse_str -> parse on lifetimes branch
JsonParser::parse_str(Rule::string, r#""hello world""#).unwrap().next().unwrap()
});
} All rules seem to allocate a bit more in the lifetimes branch except the |
Could the inlining differences be because a lot of code which used to be generic is now concrete? If I recall correctly, rustc will include the source for all generic code (stuff which was all parameterized around With this commit, all of that is now concrete, and only generic over a lifetime - which no longer requires source to be included. If this is the case, maybe the regressions could be fixed with |
We tried that yesterday, adding
|
I've managed to fix the performance issue. Big, big shoutout to @kivikakk for the amazing work done! 🎉 |
Here's a preliminary attempt at resolving #141. We bind the input
Input
by a lifetime; forStrInput
, that's the lifetime of the string it references.It works, happily! If you have
pair: Pair<'i, Rule, StrInput<'i>>
, thenpair.as_str()
correctly returns a&'i str
(rather than&str
bound bypair
's lifetime). There isn't actually a test case added yet that demonstrates this, but I'd add one if we were merging. (Right now testing against a local library that I'm using pest with.)There's one problem I haven't resolved and without which this cannot be merged: what to do about
StringInput
? Right now I've added two hackytransmute
calls just so it'd compile and I could get on with the work ([1], [2]), but this needs to be resolved as it's currently super-unsound.Thoughts welcome! If this isn't the direction you'd like to go (SO MANY LIFETIME REFERENCES), that's of course understandable; I just wanted to give this a hack, and no hard feelings if you don't merge!
(Fixes #141. Closes #6.)
/cc @sunng87 @dragostis