-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Lock file RFC #307
RFC: Lock file RFC #307
Conversation
c7cc7f1
to
102d2a1
Compare
102d2a1
to
3ee3eb7
Compare
3ee3eb7
to
8eb4418
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for putting this together. This is a very well-written and detailed RFC.
I basically agree with everything you've written and think this gives Marker's infrastructure a good vision and clear direction. One edge case, which we should discuss as part of this RFC is the specification of lint crate via console arguments...
I would really like to keep the option to somehow specify lint crates via console arguments, as I believe Marker should also support use cases, that want to extract metadata for further processing. For example, Marker could be used to get the definition of structs to generate counterparts for different languages. In this case, a tool would invoke Marker with its own custom lint crate to extract the required information.
Here are my suggestions how to solve this:
- Add a
--no-lock-file
flag or something similar. This can run into the problems, this RFC is trying to solve, when running Marker using this flag. - Add a
--lock-file
flag, that can overwrite the usedMarker.lock
file. Tools would then deliver and use their ownMarker.lock
files. - Remove the option to specify lint crates via console arguments, but allow the specification of an alternative
Marker.toml
file via console arguments. Additionally, a new configuration to theMarker.toml
file is added to specify where the lock file is located (Probably relative to theMarker.toml
file). This configuration will be required if theMarker.toml
file was specified as a console argument.
From these options, I would favor option 3 as it seems to be the most resilient. This could be entirely implemented on the configuration and argument parsing front, which would mean the backend can handle both cases as one.
|
||
# Guide-level explanation | ||
|
||
## What is a lint crate? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I understand you correctly, that this is part of the proposal? Meaning, that the addition of the Marker.lock
file as proposed here also includes the migration to this type of lint crate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, having the code organized similarly to this will be needed for the dispatch crate in the buildspace. We could definitely try to keep the current lint passes interface, but I decided to go with the yak shaving and polishing the things that I didn't like about the current API. I just bundled a bunch of my ideas for the "future Lint and LintPass API" look, make them all work together with the lock file and "dispatch" crate concept, and tried to express it in this prototype of the user guide. Although, some impl considerations also leak in this text, but that's just for the RFC.
When thinking about having a shared lock file I decided to actually use the crate that we have in the buildspace to combine all 3-rd party lint crates, and using inventory
to register lints and lint passes would really help with that. This would remove a bunch of boilerplate and establish a clean-code pattern for organizing lint implementations into modules. That will need to be documented in a book and showcased in the lint crate template.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I like the idea, so if you believe that this could work, we should give it a shot :D
## Alternative. Not having `marker_api::lint_crate! {}` | ||
|
||
This macro adds the "dispatch" lint pass glue to the crate where it is defined and thus makes it possible to compile it in isolation via `cargo rustc --crate-type cdylib`. We could remove such requirement, and instead always compile lint crates with the `marker_rustc_driver` implementation where we could have special handling configured via an env var, for example. That special handling would implicitly add the "dispatch" lint pass to the crate's `lib.rs`. | ||
|
||
This won't be easy to implement, but we could look into that in the future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compiling all lint crates with makrer_rustc_driver
could also give us a few other advantages. Specifically, I'm thinking of how useful it would be to use extern crate marker_api
with the same artifact that the driver was compiled with, instead of having it as a normal dependency. If the same artifact is used, we could remove all the ABI compatibility stuff and just call everything directly.
The big thing why I decided against this, was that I wanted the user to just run the normal cargo check
on lint crates, without the need to specify any weird parameters, like the marker_api
artifact.
For me, an important factor for such suggestions is, that a normal cargo check
still passes, even if it's not compiled with Marker's driver. This idea sounds like it would only add code, which might be doable from the driver. For now, it'll be sufficient to have the marker_api::lint_crate! {}
macro. I mean, we already require a similar macro as is. We can investigate such quality of life things in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifically, I'm thinking of how useful it would be to use extern crate marker_api with the same artifact that the driver was compiled with, instead of having it as a normal dependency. If the same artifact is used, we could remove all the ABI compatibility stuff and just call everything directly.
Actually, it's a bit of magic to me how linking to rustc private crates works, because Rust doesn't have a stable ABI, but it is still possible to slap an extern crate rustc_driver;
. I wonder how this ABI works when rustc_driver
uses its own tracing
crate, and we use our own, for example.
In any case, it works only with #![feature(rustc_private)]
. I don't think it's possible to make extern crate marker_api
work at all. If you know how to implement it, I'd be interested to hear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has been some time, since I read into this, but here is the basic breakdown how I remember it:
rustc compiles normal library crates to .rlib
files. These store the types and functions defined in the crate and also define the ABI of them. This means that every crate using a specific rlib
file will have ABI for every type defined in the rlib
file.
The extern crate XYZ
item can be used to declare, that this crate depends on a crate called XYZ
. The interesting part is, that this way of declaring a dependency doesn't require the dependency to be specified inside Cargo.toml
. It basically just says that some rlib
file will be provided to the compiler/linker somehow. This allows the reuse of rlib
files, and therefore allow ABI compatibility.
The simplest example is probably Rust's std
crate. Every toolchain contains a precompiled version of the standard library. The prelude adds a extern crate std
item and the compiler discovers the rlib
file. This is the reason why you never need to compile the standard library locally.
Rustc provides rlib
files for its crates like rustc_hir
but restricts them to the #![feature(rustc_private)]
feature. This is how ABI compatibility is ensured.
Note: I'm not 100% sure if it's the rlib
file or another one, but I'm pretty sure that the general setup and idea is correct.
When I first read into this, there were two problems:
- Marker didn't have any precompiled binaries. The user would always build Marker locally. Storing the rlib file and validating that its the same, the driver was compiled with etc sounded complicated.
- The main reason why I decided against this, was that I wanted to allow users to check or compile lint crates with cargo, like any other crate. Having an
extern crate marker_api
requires that therlib
ofmarker_api
is always made available to the compiler, and I don't know how to permanently add it to therlib
s that rustc discovers. If we figure this out, it might be possible to switch over to this model. We might even be able to redesign a part of the API since crates could bind directly to the driver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been thinking a tiny bit more about rlib
s, as it would avoid a lot of boilerplate code and could make driver/adapter development easier. There are two things I considered:
- Since we provide precompiled builds of the driver or build the driver locally, we should have access to the
marker_api.rlib
file used for the driver. We could try to inject it, either in the folders rustc already searches forrlib
files or usingrustflags = ["--extern marker_api=path/to/libmarker_api.rlib"]
in the.cargo/config.toml
file. - We could differentiate between normal cargo and cargo-marker builds.
- During lint crate development, the user would use the
marker_api
version from the dependencies. For example, when runningcargo check
orcargo clippy
. - When running
cargo marker
orcargo maker test-setup
we control the compilation. We would rebuild the dynamic library and injectmarker_api.rlib
then, to make it ABI. This would be made even easier, by the change suggested in the RFC of combining all lint crates into one.
- During lint crate development, the user would use the
I'll think a bit more about this, but wanted to share the thought.
I think it will be quite inconvenient to use marker for that. I'd rather use If you want to have something like that I'd look into adding an option to dump the discovered metadata about the code in some structured format like JSON or protobuf or sqlite. For example, AST and semantic type info could be dumped in a Still, I think having a way to specify lint crates on the CLI is a good idea but for a different reason. I used this when trying marker for the first time. This allows one to just download marker and run it without modifying any files, which is a good first user experience and could be a nice short "Getting started" paragraph in the main README/book. However, having lint crates specified at the CLI level isn't something that I'd expect marker users to ever do except for the "first user experience and trial". This usage pattern isn't usual and as such will also have its own limitations. For the case of CLI-sepcified lints we will intentionally ignore creating the I'll update the RFC with a section for lint crates specified in the CLI params |
Regarding the metadata collection about the code. Check out the rustdoc JSON output format. There is a nice talk about this it here: https://youtu.be/OxQYyg_v3rw?si=pK9WagahfjJhLl0v. The limitation of rustdoc JSON though is that it provides only signature information and bodies aren't available, but that is already quite powerful |
I've started to look into how we could generate documentation for Marker's lints. Using rustdoc JSON was a thing I considered, but didn't investigate it further yet. I'll checkout the link, thank you for the suggestion :D |
Hey, @xFrednet I updated the RFC with the consideration for the CLI-level overrides. Hopefully, this is the last thing before we can merge this RFC. I re-considered your suggestion about I was stuck with various other ideas, designing is hard, and procrastination didn't make it quick, but here is my ultimate design. Hopefully it covers most of the questions at the high level. There are a lot of variables and unknowns here, so I'd rather merge the RFC as a good enough one, and continue iterating on the impl. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the new flag, and believe that this is a good RFC to start the implementation on this feature. Thank you for putting this together.
I left a tiny bikeshed commend, but feel like we should merge this version. You can merge it, when you think it's ready.
rfc/0000-lock-file.md
Outdated
- `default/`: the default buildspace that persists the `Cargo.lock` as `Marker.lock` | ||
- `Cargo.toml`: contains a definition of the `buildspace` and the "dispatch" crate. The `[dependencies]` section includes only `marker_lints = "x.y.z"`. The `path` dependencies never get into the `dependencies` here. | ||
- `Cargo.lock`: same as `Marker.lock` | ||
- `ephemeral/`: ephemeral buildspace that doesn't persist any state outside of it. It's used when `--no-marker-lock` is specified. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bike shed:
I think temp
would be a better name here. Mostly because this is the first time, I've heard the word ephemeral in this context. Either way, this shouldn't block the progress of the RFC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, renamed
Rendered.
This document includes my thinking on #251. The approaches that I considered all have their drawbacks, and the design is definitely not ideal. It's really not easy to design a solution for this, keep it obvious, extensible and performant.
The RFC format was inspired by Rust's RFC, but isn't as thorough as that one just yet. There are so many unknowns, that it's better to keep the RFC more vague and prototype.