-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New declarative macros, functions and fields not being recognized #91249
Comments
My recollection is that declarative macros use "def-site" hygiene; that is, identifiers defined in the macro are scoped to the macro and cannot be referenced outside the macro. Passing #![feature(decl_macro)]
pub macro generate_class_new($name: ident, $new: ident) {
pub struct $name {
x: i32
}
impl $name {
pub fn $new(x: i32) -> Self {
Self { x }
}
}
}
generate_class_new!(MyTestClass, new);
fn main() {
let instance = MyTestClass::new(3);
} |
It works fine with old declarative macros: https://rust.godbolt.org/z/35oYrPv7E and that's how they're often used, including std. |
Yes, with macro rules, its working as intended. But with the new gen 2.0 macros it doesn't. |
IIRC, part of the reason for adding macros 2.0 was to have def-site hygiene. But I'm not sure, so I'll ask on Zulip. |
Yea, wasn't really sure if its actually a bug or my mistake, but in my opinion it would be great if it would work because I think many people including me use macros for things like that, sure u could use macro_rules in that specific case, but perhaps one could initiate a little discussion in this context |
I started a thread on Zulip about this. |
|
I don't know if there is a better way to implement this, but you could put a $ in front of the identifier, if the identifier is not in the pattern, the identifier has the same * value * as its name and has a global scope |
I'm pretty sure |
Perhaps metavariable expression syntax could be used? This would give us something like I do feel that while the limitations are a bit surprising coming from macros 1.0, they ultimately make hygiene more consistent. |
My impression is that one of the reasons macros 2.0 isn't stable is that their hygine isn't finished yet. I think currently the plain is to make def-site hygine the default, and add some way of distinguishing when you want different hygine. However, people haven't decided on what that way should be. Macros 2.0 haven't even been completely designed yet, let alone completely implemented, and it probably doesn't make sense to judge them on their current functionality. |
I don't think it was somebody's intention to judge the macros although they are not finished yet, it is much more about thinking about things and looking for a good solution together. |
I think this issue is basically a question of “what does hygiene mean for a reference through a qualified name?” When a macro expands to a free function definition, (definition site) hygiene is straightforward: bindings defined by the macro are not visible at the use site unless the binding identifier was passed as an argument to the macro. For example, if we adjust the program from the start of this thread to #![feature(decl_macro)]
pub macro generate_struct_new($name: ident) {
pub struct $name {
x: i32
}
pub fn make_struct(x: i32) -> $name {
$name { x }
}
}
generate_struct_new!(MyTestStruct);
fn main() {
let instance = make_struct(3);
} then it should be utterly unsurprising that this program fails to compile under definition site hygiene. But when the declared identifier is a method, things become more complicated. Qualified names are built from multiple identifiers, so it isn’t immediately clear what effect hygiene should have on their scope. Personally, I think that
All this suggests to me that the current behavior is a bug, or at the very least a misfeature. The current behavior does not respect the lexical scope of the program, and therefore it does not respect hygiene. |
An aside on the philosophy of hygienic macrosTo elaborate on my previous comment a little more, I think it’s helpful to make a distinction between scope of bindings and the names of exports. Macro hygiene is about the scope of bindings. The idea is that we intuitively think about variable scope lexically: variable uses reference the nearest enclosing definition with the same name, and both “nearest” and “enclosing” are words that describe the syntactic structure of the source program. When we add macros, we want that intuition to still apply, but that’s tricky, because macros can rearrange code in complex ways. “Hygiene” just refers to a system that tracks enough information so that the compiler can figure out which identifiers came from which scopes in the source program and treat them accordingly. Hygiene is not about renaming identifiers, and it’s not about introducing more scoping rules to the language—it’s really about trying to simplify scoping rules by ensuring that macros respect the usual rules of lexical scope. So hygiene is really just a means to an end, and that end is “well-behaved macros”. Many implementations of hygiene operate by enriching the representation of identifiers. In a language without macros, an identifier is just a symbolic name (i.e. a string), but to preserve hygiene, we need to keep track of extra information about where the identifier came from in the source program. This is the identifier’s “lexical context”, and in Rust’s But what about modules? Modules export a collection of names, and in a macro-enabled system, we must confront the question of whether modules’ exports correspond to identifiers, enriched with lexical context, or plain symbols, which are just strings. In both Racket and every Scheme implementation I’m aware of, the answer to that question is the latter: exported bindings are identified by plain symbols, not identifiers. This means that the rich, “three dimensional” scoping structure of identifiers is effectively “flattened” whenever identifiers cross a module boundary. ImplicationsRacket employs the philosophy I outlined above uniformly. There is a firm separation between bindings, which are associated with identifiers, and exports, which are associated with symbolic names. This has a few implications:
All together, this set of decisions appears to result in a fairly predictable, intuitive model of cross-module hygiene. What Rust does todayCurrently, Rust does not take the approach I’ve described so far. Instead, module exports are rich identifiers, not flat symbols. But as I described in my previous comment, I don’t just think that’s a different design decision, I actually think it’s arguably wrong, because it creates situations that don’t respect the usual scoping rules that apply elsewhere in the language. I’d therefore propose that cross-module hygiene be revised to work more like it does in other systems, making exports flat symbols, not rich identifiers. However, I’m not sure what impact this would have on backwards compatibility, as it would be a fairly meaningful change in philosophy, and it’s currently unclear to me how much of this behavior affects |
How would the next example behave? macro define_hello() {
const hello: u32 = 0;
}
define_hello!();
const hello: u32 = 1;
mod inner {
const use_hello: u32 = super::hello;
} In Rust it's pretty deeply ingrained that items are "planted" (or "exported"/"provided" using the terminology above) into modules regardless of their visibility (i.e. |
Regarding this specific issue, it doesn't look like a bug in the current system used by Rust, where resolution of module-relative paths ( (Although implementation for the type-based resolution part in the compiler may be particularly underspecified and buggy compared to others.) |
That's because a crucial part of the macro 2.0 story - a hygiene opt-out (call-site hygiene for declarative macros) - is still unimplemented. Without it all interactions with the call site has to be done through macro arguments, which is pretty inconvenient, no doubt. |
I assume you meant to include a But, as it happens, Racket actually does have a direct analogue of a “lexically nested module”. In Racket, if you declare a submodule using the Using #lang racket/base
(define-syntax-rule (define-hello)
(define hello 0))
(define-hello)
(define hello 1)
(module+ inner
(define use-hello hello)) This is a perfectly legal program, and However, even in a (module+ inner
(require (only-in (submod "..") hello))
(define use-hello hello)) which explicitly imports
The error is justified, as the enclosing module has no such export named All of this illustrates a pretty clear difference between Racket and Rust, namely that Racket has a very explicit distinction between module imports and variable references, whereas Rust has no such distinction. Rather, modules are never explicitly “opened”, so all bindings in all modules are always implicitly “in scope”, but explicit qualification may be needed to reference them, and some of them are illegal to access from certain contexts because of visibility rules. That difference is fine—obviously not all languages have to work like Racket or Scheme—it just means it’s that much more important to decide what hygiene and lexical scope mean in the context of Rust. Otherwise, after all, the behavior is arbitrary and impossible to really justify beyond “it does what it happens to be programmed to do”. In my opinion, the behavior of OP’s program seems pretty difficult to justify under any interpretation of lexical scope for the reasons I gave in my first comment—the difference compared to a nested module makes the current behavior pretty intensely suspect—but it’s a strong opinion, weakly held. :) I’m open to being convinced otherwise. However, I don’t find this argument convincing at all:
My quibble is that saying something is “affected by hygiene” doesn’t really mean anything, because again, the idea of hygiene is to respect the lexical structure of the program. The whole challenge of pinning down what hygiene means is deciding what the right scoping structure is supposed to be, then designing an algorithm that can match those intuitions. Uniformly hiding identifiers without some principled justification is, in my mind, not really any better than having no hygiene at all and allowing everything to be visible all the time (and in fact I think hiding too much is possibly even less useful than hiding nothing at all). So let’s ignore what Racket does for a moment—I don’t want to make it sound like I think Racket’s semantics are somehow the only right ones—and instead think about how scope works in Rust. For OP’s program, I already gave a Rust example without macros that illustrates how Rust’s scoping and visibility work, and how they’re inconsistent with the scoping of Let’s try and apply the same reasoning to your program: macro define_hello() {
const hello: u32 = 0;
}
define_hello!();
const hello: u32 = 1;
mod inner {
const use_hello: u32 = super::hello;
} I think, intuitively, Rust programmers would expect the use of One way to view this in the context of hygiene would be to say that references to enclosing modules actually are lexical references, and qualification just selects a scope to circumvent any shadowing. So far, this is consistent with what I said in my first comment:
So const hello: u32 = 0;
mod a {
pub macro get_hello() {
super::hello
}
}
mod b {
const hello: u32 = 1;
mod c {
fn foo() -> u32 {
super::super::a::get_hello!()
}
}
} Should Now, it’s totally possible that these rules don’t actually reconcile my previous comment suggesting that module exports be plain symbols. It’s possible that’s somehow incompatible with some scoping rules Rust has. But in that case, it’s still necessary to pin down what references to bindings in modules that don’t lexically enclose the current context means from the perspective of lexical scope, since otherwise, trying to come up with a hygiene system that respects it is hopeless. |
@lexi-lambda's points seem pretty convincing to me, though I'm by no means an expert on the issue. I also really appreciate the references to racket, as it was my favorite language for many years, even if I never learned how to use it at beyond a beginner level. Racket takes its macros about as seriously as Rust takes ownership and lifetimes. In any case, I'm really glad the lang team has been asked to take a look at this. ❤️ |
Might be just my misunderstanding (or I am missing the greater point), but should this compile at all regardless of
|
Yes, it should absolutely compile, as we are not defining the same symbol twice! Or, at least, we aren’t defining the same identifier twice. In a hygienic system, two definitions can coexist with the same symbolic name as long as they have different identifiers (which goes right back to my comment from above about the difference between “flat” symbols and “rich” identifiers). If that weren’t the case, then we’d have lost the key principle of hygienic macros, namely that macros respect lexical scope, since two bindings from two different lexical scopes would conflict with each other, forcing the programmer to worry about such details by manually generating fresh names. That’s not good! But I think your comment is good, because it highlights a crucial point that has not yet been explicitly made in this conversation: hygienic macros are fundamentally at odds with the notion of canonical paths without enriching paths with lexical context. If path segments are flat symbols, two different bindings could easily have the same canonical path. In my mind, there are really only two ways to resolve this tension:
I suspect that, to many people, option 2 sounds much more dire than option 1, but I actually think it’s the better tradeoff. And again, the justification ultimately comes down to stepping back and thinking about what hygiene means, because without doing that, it’s all hopeless. So let’s consider, for a moment, a program like this: fn define_hello() {
const hello: u32 = 0;
} Does this definition of To reiterate: programmers who have not spent much time working with truly hygienic macro systems have gotten used to reasoning about macros in terms of what they expand into, but this is really at odds with what hygiene is all about. Hygiene is about making it possible to reason locally about macros just by looking at the structure of the source code, the same way we reason about functions locally by looking at their source code and assuming they respect lexical scope. That goal must always be the ground truth when talking about hygiene—anything else puts the cart before the horse. |
Hey, I'm pretty sure jseyfried (and reviewers) did think about what hygiene means when implementing the logic for multi-segment paths :) |
No, it does not. Note that the example I gave there is a function definition, not a module definition. The section of the Rust Reference I linked is quite unambiguous about this:
It would, in my mind, be quite a departure to say that local definitions inside macro bodies have canonical paths even though local definitions inside functions do not (and hygiene means we really want macros to be as “function-like” in their scoping rules as possible). |
Yep, you are right, didn't notice it was a function. I still disagree that macros are supposed to be this close to functions, they are supposed to generate pieces of code inserted into arbitrary places after all, including items available to other modules (nested or not). |
Yes, that’s fair—and to be honest, after spending most of the past 6 hours thinking about this and trying out various examples… I think I’m wrong! Because sometimes you really do want such names to be kept distinct, like in a macro like this: macro define_struct($name: ident, $field_name: ident, $make_name: ident, { $($defn: item)* }) {
struct $name { $field_name: i32 }
impl $name {
pub fn secret() -> Self {
$name { $field_name: 42 }
}
$($defn)*
}
fn $make_name() -> $name {
$name::secret()
}
}
define_struct!(S, x, make_s, {
pub fn get_x(self) -> i32 {
self.x
}
});
fn main() {
println!("{}", make_s().get_x());
} In this case, input to the macro is intermingled with macro-generated definitions inside the So I’ll rescind my quibbles for now and think about things more—sorry about the noise. That said, I did find some genuine bugs in the current implementation during my experimentation. For example, I think this program does the wrong thing: mod m {
pub const x: i32 = 0;
}
macro mac_a($mac_b: ident) {
mod m {
pub const x: i32 = 1;
}
macro $mac_b() {
crate::m::x + m::x
}
}
mac_a!(mac_b);
fn main() {
println!("{}", mac_b!());
} I think it should print |
The general logic is that the identifier |
The current behavior, despite annoying, seems to be the most consistent one: in Rust, all the identifiers / path segments are "rich" (to use @lexi-lambda's terminology) /
So: The behavior observed is not a bug, even if cumbersomeThe behavior observed is indeed cumbersome, and there should be a way to alleviate / palliate itThis is where "there must be a hygiene opt-out" mechanism for such macros becomes relevant (nowadays, I'd say that the opt-out is to use a FWIW, assuming the existence of match! (new) { ($new:ident) => (
pub macro generate_class_new($name: ident) {
pub struct $name {
x: i32
}
impl $name {
pub fn $new(x: i32) -> Self {
Self { x }
}
}
}
)} as a way to solve this identifier issue without polluting all the callsites, and without requiring any extra features. That being said, this one currently does not work (and this could be considered a bug!): Playground |
We discussed this in our lang team meeting today. I've added a reference to this issue from the tracking issue (#39412) and I'm going to remove the lang team nomination. We are not actively working on the design of hygiene for macros 2.0 right now. I think the general sense is that yes, we will need some form of way to have methods that are "publicly exported" from a macro, but the precise design of such a mechanism is precisely what the macros 2.0 work is all about! |
Damn, wish this was the default macro usage. Looking at: pub macro generate_class_new($name: ident) {
pub struct $name {
x: i32
}
impl $name {
pub fn new(x: i32) -> Self {
Self { x }
}
}
} Anyway, is there ever a logical reason to write |
There is for non-pub fn. Macro hygiene can't distinguish between pub and non-pub items, so pub items need to use the same hygiene. |
We could imagine something like: pub macro generate_class_new($name: ident) {
pub struct $name {
x: i32
}
impl $name {
pub /* for macro */ fn new(x: i32) -> Self {
Self { x }
}
}
pub macro $name() {
$name::new(42)
}
} which defines the struct Oftentimes #[doc(hidden)] /* Not part of the public API */
pub fn __new() which is less clean and actually brittle if multiple macros pick the same More generally, macros operate on tokens, not on semantics, so it would be a very surprising thing to have But I do agree that there are cases where we want to disregard hygiene, to opt-out of the stricted |
So what would be best venue to discuss EDIT: Although one question would be universal. What is the maximum granularity of macro hygiene bending?
|
Maybe we can just "reuse" Also errors can just say something like "Can't access non-exported item. Did you forget to add export?" when identifier is requested and present only in the generated code but not exported. pub macro generate_class_new($name: ident) {
pub struct $name {
x: i32
}
impl $name {
pub export fn new(x: i32) -> Self {
Self { x }
}
}
} |
First of all I really like the export idea and that seems intuitive to me 👍 But fwiw I would expect the so for me tldr; I prefer macro something() {
export pub fn hello_world() -> &'static str {
"hello world"
}
} over macro something() {
pub export fn hello_world() -> &'static str {
"hello world"
}
} PS: I am not really involved in compiler dev here so I don't know if this is even still being worked on or something completely else was decided so if that is the case just ignore my comments here :) PPS: If there is like no development happening on this anymore and this is still something that is probably a good idea and could be further pursued and only needs people to work on it, I probably will have a lot of free time on my hands in the next few months and would like to learn a bit more about compiler internal stuff in that time (with the goal of contributing) then I could probably take a look at this in the future. Don't want to thread on anyone's feet here tho, so if someone knows more than me about the current state of things it would be amazing to get a few pointers in the right direction :) |
Perhaps a method similar to #83527 would be more ideal, like macro something() {
pub fn ${export(hello_world)}() -> &'static str {
"hello world"
}
} Because this directly targets the ident |
I ran into some issues with rustc 1.58.0-nightly (b426445 2021-11-24).
I have the following code:
What I try to do here is creating a macro that just creates a struct with a given identifier as a name.
When I call MyTestClass::new(3), the compiler complains that this function does not exist:
running cargo expand produces code that should work in my opinion:
Not sure if I'm doing something wrong here, but it pretty much looks like a bug for me
The text was updated successfully, but these errors were encountered: