Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

permit crate in absolute paths #45477

Closed
nikomatsakis opened this issue Oct 23, 2017 · 19 comments
Closed

permit crate in absolute paths #45477

nikomatsakis opened this issue Oct 23, 2017 · 19 comments
Assignees
Labels
T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@nikomatsakis
Copy link
Contributor

As part of #44660, we plan to support absolute paths that begin with crate. The RFC was actually a bit vague about how crate::foo paths work outside of use items. I'm going to go with the interpretation described by @rpjohnst, that one would write ::crate::foo to reference the item foo from outside of a use statement.

This means we want to support use statements like this:

use crate::foo::bar; // equivalent to `use foo::bar` today.

and "absolute" code references like this:

fn baz() {
    ::crate::foo::bar() // equivalent to `::foo::bar` today
}

This version also avoids a parsing ambiguity around tuple structs and the crate visibility modifier:

struct Foo(crate ::crate::Bar)
@nikomatsakis nikomatsakis added E-needs-mentor T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Oct 23, 2017
@power10dan
Copy link

Interesting. Mind if I give this a try? If yes, where should I start to investigate this?

@nikomatsakis
Copy link
Contributor Author

@power10dan

Mind if I give this a try?

I would love it if you did! I started the other day trying to write up mentoring instructions, but I never quite finished. This gist contains what I did so far. Maybe the best way to start is just to read a bit into some of the code mentioned there, to get a feeling for how things work today. You can reach out on gitter with questions, too.

I will try to extend the directions, or perhaps @petrochenkov has thoughts -- he knows this part of the code better than I.

@petrochenkov petrochenkov self-assigned this Oct 25, 2017
@power10dan
Copy link

Got it. It seems that @petrochenkov has self-assigned the issue to him. I did message him on gitter to ask whether he needed help. Nevertheless, I will go through the code base, and if there is anything I find interesting while working on this issue I will be sure to extend the directions.

@power10dan
Copy link

So @petrochenkov is not working on this issue. Can I get assigned to this issue please? I don't think I have permission to self-assign.

@power10dan
Copy link

Sorry for spamming. I think only repo members can assign assignees to the project. So I will just leave a comment that I will be working on this issue :)

@petrochenkov
Copy link
Contributor

@power10dan

only repo members can assign assignees to the project.

Moreover, only repo members can be assigned to the repo issues on GitHub, so I can't add you to the "Assignees" list either.

@power10dan
Copy link

power10dan commented Oct 28, 2017

Hi,

So what's the best way to test the parser that I wrote for this issue? I understand that there is a file called testparser.py within the project directory. However, how can I invoke the testparser.py to test specifically my changes? Do I need to augment testparser.py? Or do I have to write unit tests on a separate file, call the function that I wrote within the unit tests and test it against custom test cases?

@petrochenkov
Copy link
Contributor

@power10dan
You can freely ignore everything in src\grammar including testparser.py, it's not where actual parsing happens, the actual parser is src\libsyntax\parse\parser.rs.
Parser changes can be tested in the same way as other compiler changes, by adding a new test into src\test\parse-fail or src\test\run-pass and running x.py test --stage 1 src/test/parse-fail --test-args my-new-test-name. See https://github.com/rust-lang/rust/blob/master/CONTRIBUTING.md for more details.

@petrochenkov
Copy link
Contributor

petrochenkov commented Oct 28, 2017

Mentoring instructions from @nikomatsakis in #45477 (comment) seem to be much more complex than what's needed to do the job.

There are three places that need to be tweaked:

  • is_path_segment_keyword, crate does need to be added to that list
  • parse_path_common should avoid inserting CrateRoot into the path if the path starts with ::crate, and should report a non-fatal error if the path starts with crate (unless it's an import or visibility path).
  • resolve_path should treat Crate exactly as CrateRoot.
  • Extra: search keywords::DollarCrate and keywords::CrateRoot and look for other places that maybe need to be tweaked, e.g. pretty-printing.

Alternative implementation:

  • is_path_segment_keyword, crate does need to be added to that list
  • no changes in the parser, tweak default_to_global to still insert CrateRoot even for crate::... paths.
  • resolve_path should treat two-segment prefix [CrateRoot, Crate] exactly as CrateRoot.
  • Extra: same.

@power10dan
Copy link

@petrochenkov I also feel that the mentor instruction is a bit complex too. My thoughts for implementing this are the following : ( I am working on the use crate :: case)

  1. Use regex to see if the pattern starts with use followed by crate. For example, if the person typed use crate::foo::bar, a regex for this might be ^use\scrate*$, where we test if the expression starts with use followed by a space \s followed by crate and followed by whatever. This is my first thought, and I am not really sure if this is the correct way to go.

  2. For the absolute code reference case, we can come up with a regex to check if the expression ::crate::foo::bar is between a function, and then parse it.

I think my approach is a bit more complex than it needs to be. But I think I will take @petrochenkov 's advice. What do you guys think?

@petrochenkov
Copy link
Contributor

@power10dan
It's probably better to look at the parser in src\libsyntax\parse\parser.rs and try to figure out how it does things in general, in works in terms of tokens, not raw text and regexes.

@nikomatsakis
Copy link
Contributor Author

I agree that my instructions are complex =) they were mostly notes to myself, as I hadn't yet settled in my mind how I thought things should be implemented (at which point I would go back and simplify...). However, I was thinking about his over the weekend, and I thought we ought to clarify something: Precisely what are we aiming to accept?

Today, one can write self::foo and super::foo in either "absolute" or "relative" positions. I think the most naive reading of the RFC, which isn't particularly clear on this point from what I can see, is that crate would also work the same way. (In other words, use crate::foo and fn bar() { crate::foo() } would both be accepted.)

Unfortunately, that leads to a parsing conflict between "crate the visibility modifier" and "crate the absolute path" is unfortunate. This must be resolved somehow. I see the following options:

  • Arbitrarily decide in favor of "crate the visibility modifier".
  • Require a leading :: on crate, so that one writes fn bar() { ::crate::foo() }.

Earlier, I wrote that we ought to do adopt the second approach, and I still think it is the one I would prefer. But it does raise a question: do we also want to accept ::self::foo() and ::super::foo()? I would tend to think yes, since that enables the intuition that ::P works for any path P that can appear after in a use (i.e., ::P and use P name the same item).

I think some part of my hesitation in writing up the instructions was trying to decide how to resolve this discrepancy one way or the other. Whatever we wind up doing, I think that ultimately I think we will need some form of decision to clarify our intentions here.

@petrochenkov
Copy link
Contributor

@nikomatsakis
I expected requiring :: in ::crate::a::b to be just a conservative choice possibly relaxed later to support crate::a::b.

I'm pretty sure disambiguating struct S(crate :: a :: b) in favor of crate (::a::b) would be an incorrect choice.
First, paths in other contexts are always parsed greedily in case of ambiguities.
Second, imagine introducing super as a sugar for pub(super) (seems like a pretty realistic extension), then we will need to disambiguate struct S(super :: a :: b) as well, but it is already valid syntax interpreted as super::a::b, so crate should behave consistently with it.

::self::foo and ::super::foo are currently rejected in AST validation, but if we use the "filesystem" interpretation of paths, then ::self::foo == /./foo == /foo == ::foo != self::foo == ./foo.
We just not allow self/. as a non-starting segment in general (only sometimes in imports - use a::b::{self} == use /a/b/.).
Making ::self valid with the opposite meaning will probably only bring more confusion.

@power10dan
Copy link

power10dan commented Oct 31, 2017

Wait, so I am a bit confused.

Should we adopt the case where we allow ::crate::a ::b and disallow crate::a ::b until future release? Because if that's the case, then I shall modify my parser to look for :: followed by crate instead of the one I was aiming to implement, which is crate followed by ::.

Also for disambiguating for cases such as struct S(crate:: a:: b), my thoughts for implementing this part of the feature is that we should gobble up each word in the expression ( I think there is a built-in function for this), turn it into a token (like 'struct', 'S(crate::a ::b)') , and check if the token starts with "struct." If it is, then we know that we are dealing with structs with some expression S(crate:: foo:bazz). Then, we can just keep parsing the expression into tokens, and check if the inner path ( the expression surrounded by the parenthesis) is crate followed by ::. If it is, then we allow it since it's crate. Otherwise, if it is not, then we know that it isn't crate, and we disambiguate it.

I am not sure if this is the way to go, but this is my understanding of how to use tokens and parser.rs to parse the expression. I ran into similar problems that @nikomatsakis was talking
about in terms of disambiguation, and this is my solution to deal with it. Let me know what you guys think!

@nikomatsakis
Copy link
Contributor Author

@petrochenkov

I expected requiring :: in ::crate::a::b to be just a conservative choice possibly relaxed later to support crate::a::b.

This is also what I am advocating. I agree you make a good case for why crate :: a should be interpreted as a path and not a visibility modifier, but I guess that so long as we steer clear of the ambiguity, it's a moot point.

::self::foo and ::super::foo are currently rejected in AST validation

Right, and I guess we can just leave it that way for now. I would like to revisit this topic but -- as we've said earlier -- I'd rather do so after we've gained experience and can have more informed opinions.


@power10dan

Should we adopt the case where we allow ::crate::a::b and disallow crate::a::b until future release?

This is the plan, but not within a use -- only within a function body. In other words, we expect people to do things like this:

use crate::foo::bar;

fn baz() {
    bar(); // equivalent to next line
    ::crate::foo::bar();
}

Also for disambiguating for cases ...

The parser already has a notion of tokens and so forth -- it sounded to me from your description like you are "reinventing the wheel" a bit. That said, I've still got a few questions myself about the best way to implement this, let me try to write a second comment asking @petrochenkov their opinion, since they know this code better than I.

@nikomatsakis
Copy link
Contributor Author

@petrochenkov

There are three places that need to be tweaked:

Maybe worth spelling out a bit how you see crate::foo paths being represented. It seems like in your first alternative, you would represent a path like the one in this use:

use crate::foo::bar;

with an initial segment consisting of the crate keyword, and you would do the same for the path in this function body:

fn foo() {
    ::crate::foo::bar()
}

Is that correct? At least that is how I interpreted this:

parse_path_common should avoid inserting CrateRoot into the path if the path starts with ::crate

At first, I found that surprising, because we represent the paths use foo::bar and ::foo::bar identically today (i.e., via the call default_to_global()), and so I guess I had expected to store crate::foo::bar as ::crate::foo::bar in a view path. But I see no strong reason for it to be that way, I admit.

(We should also look forward a bit to a world where ::foo::bar represents a path relative to the crate foo -- which I also expect us to implement, albeit under a feature-gate that plays the role of an epoch, and hence determines how to interpret a path like ::foo::bar. It seems less obvious to me how we will handle this with the existing path-segment system. But maybe we should just "leave that for then".)

@nikomatsakis
Copy link
Contributor Author

nikomatsakis commented Nov 1, 2017

@petrochenkov

Alternative implementation:

In this alternative version, I guess the major downside is that we don't have enough information -- if I understood -- to pretty-print faithfully, right? In particular, it sounds like you were suggesting that the parser would parse in ::crate::foo::bar but emit a path that is identical to ::foo::bar?

That said, I guess one could imagine things like is_global() etc checking for one of two sentinels -- either a leading :: (as today) or a leading crate keyword. This might help preserve the difference.

@petrochenkov
Copy link
Contributor

@nikomatsakis

In this alternative version...

See #45771, no pretty printing is affected.

(Sorry, @power10dan, it was better to just do the implementation than continuing discussion about details speculatively.)

bors added a commit that referenced this issue Nov 21, 2017
@petrochenkov
Copy link
Contributor

Seems like the practice is to close these "intermediate" issues and keep the base issue (#44660) open for tracking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

3 participants