Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Lock file RFC #307

Merged
merged 6 commits into from
Dec 31, 2023
Merged

RFC: Lock file RFC #307

merged 6 commits into from
Dec 31, 2023

Conversation

Veetaha
Copy link
Contributor

@Veetaha Veetaha commented Nov 5, 2023

Rendered.

This document includes my thinking on #251. The approaches that I considered all have their drawbacks, and the design is definitely not ideal. It's really not easy to design a solution for this, keep it obvious, extensible and performant.

The RFC format was inspired by Rust's RFC, but isn't as thorough as that one just yet. There are so many unknowns, that it's better to keep the RFC more vague and prototype.

@Veetaha Veetaha force-pushed the feat/lock-file-rfc branch 2 times, most recently from c7cc7f1 to 102d2a1 Compare November 5, 2023 17:37
@Veetaha Veetaha requested a review from xFrednet November 5, 2023 17:37
@Veetaha Veetaha force-pushed the feat/lock-file-rfc branch from 102d2a1 to 3ee3eb7 Compare November 5, 2023 17:43
@Veetaha Veetaha force-pushed the feat/lock-file-rfc branch from 3ee3eb7 to 8eb4418 Compare November 6, 2023 00:59
Copy link
Member

@xFrednet xFrednet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for putting this together. This is a very well-written and detailed RFC.

I basically agree with everything you've written and think this gives Marker's infrastructure a good vision and clear direction. One edge case, which we should discuss as part of this RFC is the specification of lint crate via console arguments...

I would really like to keep the option to somehow specify lint crates via console arguments, as I believe Marker should also support use cases, that want to extract metadata for further processing. For example, Marker could be used to get the definition of structs to generate counterparts for different languages. In this case, a tool would invoke Marker with its own custom lint crate to extract the required information.

Here are my suggestions how to solve this:

  1. Add a --no-lock-file flag or something similar. This can run into the problems, this RFC is trying to solve, when running Marker using this flag.
  2. Add a --lock-file flag, that can overwrite the used Marker.lock file. Tools would then deliver and use their own Marker.lock files.
  3. Remove the option to specify lint crates via console arguments, but allow the specification of an alternative Marker.toml file via console arguments. Additionally, a new configuration to the Marker.toml file is added to specify where the lock file is located (Probably relative to the Marker.toml file). This configuration will be required if the Marker.toml file was specified as a console argument.

From these options, I would favor option 3 as it seems to be the most resilient. This could be entirely implemented on the configuration and argument parsing front, which would mean the backend can handle both cases as one.


# Guide-level explanation

## What is a lint crate?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand you correctly, that this is part of the proposal? Meaning, that the addition of the Marker.lock file as proposed here also includes the migration to this type of lint crate?

Copy link
Contributor Author

@Veetaha Veetaha Nov 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, having the code organized similarly to this will be needed for the dispatch crate in the buildspace. We could definitely try to keep the current lint passes interface, but I decided to go with the yak shaving and polishing the things that I didn't like about the current API. I just bundled a bunch of my ideas for the "future Lint and LintPass API" look, make them all work together with the lock file and "dispatch" crate concept, and tried to express it in this prototype of the user guide. Although, some impl considerations also leak in this text, but that's just for the RFC.

When thinking about having a shared lock file I decided to actually use the crate that we have in the buildspace to combine all 3-rd party lint crates, and using inventory to register lints and lint passes would really help with that. This would remove a bunch of boilerplate and establish a clean-code pattern for organizing lint implementations into modules. That will need to be documented in a book and showcased in the lint crate template.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I like the idea, so if you believe that this could work, we should give it a shot :D

rfc/0000-lock-file.md Show resolved Hide resolved
rfc/0000-lock-file.md Show resolved Hide resolved
Comment on lines +264 to +268
## Alternative. Not having `marker_api::lint_crate! {}`

This macro adds the "dispatch" lint pass glue to the crate where it is defined and thus makes it possible to compile it in isolation via `cargo rustc --crate-type cdylib`. We could remove such requirement, and instead always compile lint crates with the `marker_rustc_driver` implementation where we could have special handling configured via an env var, for example. That special handling would implicitly add the "dispatch" lint pass to the crate's `lib.rs`.

This won't be easy to implement, but we could look into that in the future.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compiling all lint crates with makrer_rustc_driver could also give us a few other advantages. Specifically, I'm thinking of how useful it would be to use extern crate marker_api with the same artifact that the driver was compiled with, instead of having it as a normal dependency. If the same artifact is used, we could remove all the ABI compatibility stuff and just call everything directly.

The big thing why I decided against this, was that I wanted the user to just run the normal cargo check on lint crates, without the need to specify any weird parameters, like the marker_api artifact.

For me, an important factor for such suggestions is, that a normal cargo check still passes, even if it's not compiled with Marker's driver. This idea sounds like it would only add code, which might be doable from the driver. For now, it'll be sufficient to have the marker_api::lint_crate! {} macro. I mean, we already require a similar macro as is. We can investigate such quality of life things in the future.

Copy link
Contributor Author

@Veetaha Veetaha Nov 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically, I'm thinking of how useful it would be to use extern crate marker_api with the same artifact that the driver was compiled with, instead of having it as a normal dependency. If the same artifact is used, we could remove all the ABI compatibility stuff and just call everything directly.

Actually, it's a bit of magic to me how linking to rustc private crates works, because Rust doesn't have a stable ABI, but it is still possible to slap an extern crate rustc_driver;. I wonder how this ABI works when rustc_driver uses its own tracing crate, and we use our own, for example.

In any case, it works only with #![feature(rustc_private)]. I don't think it's possible to make extern crate marker_api work at all. If you know how to implement it, I'd be interested to hear

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has been some time, since I read into this, but here is the basic breakdown how I remember it:

rustc compiles normal library crates to .rlib files. These store the types and functions defined in the crate and also define the ABI of them. This means that every crate using a specific rlib file will have ABI for every type defined in the rlib file.

The extern crate XYZ item can be used to declare, that this crate depends on a crate called XYZ. The interesting part is, that this way of declaring a dependency doesn't require the dependency to be specified inside Cargo.toml. It basically just says that some rlib file will be provided to the compiler/linker somehow. This allows the reuse of rlib files, and therefore allow ABI compatibility.

The simplest example is probably Rust's std crate. Every toolchain contains a precompiled version of the standard library. The prelude adds a extern crate std item and the compiler discovers the rlib file. This is the reason why you never need to compile the standard library locally.

Rustc provides rlib files for its crates like rustc_hir but restricts them to the #![feature(rustc_private)] feature. This is how ABI compatibility is ensured.

Note: I'm not 100% sure if it's the rlib file or another one, but I'm pretty sure that the general setup and idea is correct.


When I first read into this, there were two problems:

  1. Marker didn't have any precompiled binaries. The user would always build Marker locally. Storing the rlib file and validating that its the same, the driver was compiled with etc sounded complicated.
  2. The main reason why I decided against this, was that I wanted to allow users to check or compile lint crates with cargo, like any other crate. Having an extern crate marker_api requires that the rlib of marker_api is always made available to the compiler, and I don't know how to permanently add it to the rlibs that rustc discovers. If we figure this out, it might be possible to switch over to this model. We might even be able to redesign a part of the API since crates could bind directly to the driver.

Copy link
Member

@xFrednet xFrednet Nov 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking a tiny bit more about rlibs, as it would avoid a lot of boilerplate code and could make driver/adapter development easier. There are two things I considered:

  1. Since we provide precompiled builds of the driver or build the driver locally, we should have access to the marker_api.rlib file used for the driver. We could try to inject it, either in the folders rustc already searches for rlib files or using rustflags = ["--extern marker_api=path/to/libmarker_api.rlib"] in the .cargo/config.toml file.
  2. We could differentiate between normal cargo and cargo-marker builds.
    • During lint crate development, the user would use the marker_api version from the dependencies. For example, when running cargo check or cargo clippy.
    • When running cargo marker or cargo maker test-setup we control the compilation. We would rebuild the dynamic library and inject marker_api.rlib then, to make it ABI. This would be made even easier, by the change suggested in the RFC of combining all lint crates into one.

I'll think a bit more about this, but wanted to share the thought.

@Veetaha
Copy link
Contributor Author

Veetaha commented Nov 8, 2023

I would really like to keep the option to somehow specify lint crates via console arguments, as I believe Marker should also support use cases, that want to extract metadata for further processing.

I think it will be quite inconvenient to use marker for that. I'd rather use syn or rust analyzer's crates to parse the Rust files even if such a solution would work only at syntax level and provide no semantic info, because in most of the cases this is enough.

If you want to have something like that I'd look into adding an option to dump the discovered metadata about the code in some structured format like JSON or protobuf or sqlite. For example, AST and semantic type info could be dumped in a {crate_name}.json file per each workspace crate. Of course, there should be a bunch of filtering options to let users select a subset of such data if performance and file size starts to become a problem. Maybe even a graphql API could be used for that. I don't know this use case anywhere deep, but I hope this makes sense to you.


Still, I think having a way to specify lint crates on the CLI is a good idea but for a different reason. I used this when trying marker for the first time. This allows one to just download marker and run it without modifying any files, which is a good first user experience and could be a nice short "Getting started" paragraph in the main README/book.

However, having lint crates specified at the CLI level isn't something that I'd expect marker users to ever do except for the "first user experience and trial". This usage pattern isn't usual and as such will also have its own limitations. For the case of CLI-sepcified lints we will intentionally ignore creating the Marker.lock file. I'm not sure how this will look like in the implementation though. Maybe there should be a separate sub-directory for CLI-level lints because cargo always generates a lockfile and it would override the lockfile already present in the buildspace (if there is one) which we should avoid.

I'll update the RFC with a section for lint crates specified in the CLI params

@Veetaha
Copy link
Contributor Author

Veetaha commented Nov 17, 2023

Regarding the metadata collection about the code. Check out the rustdoc JSON output format. There is a nice talk about this it here: https://youtu.be/OxQYyg_v3rw?si=pK9WagahfjJhLl0v.

The limitation of rustdoc JSON though is that it provides only signature information and bodies aren't available, but that is already quite powerful

@xFrednet
Copy link
Member

I've started to look into how we could generate documentation for Marker's lints. Using rustdoc JSON was a thing I considered, but didn't investigate it further yet. I'll checkout the link, thank you for the suggestion :D

@Veetaha Veetaha mentioned this pull request Nov 25, 2023
@Veetaha
Copy link
Contributor Author

Veetaha commented Dec 23, 2023

Hey, @xFrednet I updated the RFC with the consideration for the CLI-level overrides. Hopefully, this is the last thing before we can merge this RFC.

I re-considered your suggestion about --no-lock-file, and adopted it into the design to make a consistent and explicit first-user experience where people may want to skip generating a lock file.

I was stuck with various other ideas, designing is hard, and procrastination didn't make it quick, but here is my ultimate design. Hopefully it covers most of the questions at the high level. There are a lot of variables and unknowns here, so I'd rather merge the RFC as a good enough one, and continue iterating on the impl.

@Veetaha Veetaha requested a review from xFrednet December 23, 2023 18:22
Copy link
Member

@xFrednet xFrednet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the new flag, and believe that this is a good RFC to start the implementation on this feature. Thank you for putting this together.

I left a tiny bikeshed commend, but feel like we should merge this version. You can merge it, when you think it's ready.

- `default/`: the default buildspace that persists the `Cargo.lock` as `Marker.lock`
- `Cargo.toml`: contains a definition of the `buildspace` and the "dispatch" crate. The `[dependencies]` section includes only `marker_lints = "x.y.z"`. The `path` dependencies never get into the `dependencies` here.
- `Cargo.lock`: same as `Marker.lock`
- `ephemeral/`: ephemeral buildspace that doesn't persist any state outside of it. It's used when `--no-marker-lock` is specified.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bike shed:

I think temp would be a better name here. Mostly because this is the first time, I've heard the word ephemeral in this context. Either way, this shouldn't block the progress of the RFC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, renamed

@xFrednet xFrednet changed the title Lock file RFC RFC: Lock file RFC Dec 26, 2023
@Veetaha Veetaha enabled auto-merge December 31, 2023 19:20
@Veetaha Veetaha added this pull request to the merge queue Dec 31, 2023
Merged via the queue into rust-marker:master with commit 28f1dcc Dec 31, 2023
26 checks passed
@Veetaha Veetaha deleted the feat/lock-file-rfc branch December 31, 2023 19:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants