Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(sqlx-cli): do not clean sqlx during prepare #2786

Merged
merged 2 commits into from
Aug 1, 2024

Conversation

cycraig
Copy link
Contributor

@cycraig cycraig commented Sep 27, 2023

Description

Fix cargo sqlx prepare --workspace running cargo clean -p sqlx unnecessarily.

See #2785

This should be a non-breaking change. It shouldn't affect in-workspace dependencies that happen to be named sqlx either.

How to test or review this PR

  • Checkout this branch and install sqlx-cli.
cargo install -f --path ./sqlx-cli
  • Run cargo sqlx prepare --workspace in a project with crates that use query! macros.

The timing difference is quite significant in a larger workspace, saves ~40 seconds on an incremental build.

BEFORE:

$ time cargo sqlx prepare --database-url "postgresql://postgres:password@localhost/example" --workspace -- --all-targets --all-features
    Cleaning [=============================] 1/1: sqlx, 0 files/folders cleaned 
...
    Finished dev [unoptimized + debuginfo] target(s) in 19.83s
query data written to .sqlx in the workspace root; please check this into version control
make sqlx-prepare  80.33s user 28.84s system 138% cpu 1:18.64 total

AFTER (this PR, note there is no cleaning step):

$ time cargo sqlx prepare --database-url "postgresql://postgres:password@localhost/example" --workspace -- --all-targets --all-features
...
    Finished dev [unoptimized + debuginfo] target(s) in 15.88s
query data written to .sqlx in the workspace root; please check this into version control
make sqlx-prepare  36.71s user 13.13s system 260% cpu 19.155 total

@cycraig
Copy link
Contributor Author

cycraig commented Sep 27, 2023

The unit test is failing because it checks that the sqlx dependency is marked to be cleaned. This seems unintentional, as the original PR for that test mentioned sqlx was intended to be ignored: #1802 (comment)

Edit: not sure how to update the test right now without losing coverage or adding +10,000 useless lines of Cargo.lock, may come back to this later.

@abonander
Copy link
Collaborator

The only reason we clean anything in the first place is it seems like the only surefire way to force a recompile so that the proc macros actually execute. If you have any thoughts there, we could get rid of the cleaning entirely.

@cycraig
Copy link
Contributor Author

cycraig commented Nov 21, 2023

Unfortunately it looks like cargo clean remains the only recommended way to force recompilation, so we're kind of stuck with it: rust-lang/cargo#1101 (comment)

Regarding avoiding cleaning dependencies unnecessarily, I'm still encountering problems with this patch due to feature unification.

E.g. pgvector-rust disables all default features of sqlx, so it shouldn't depend on sqlx-macros. However, when used in a workspace with macros enabled, e.g. sqlx = { version = "0.7", default-features = false, features = ["macros"] }, it will appear as a dependency for pgvector-rust.

cargo tree -i sqlx-macros - essentially what metadata.all_dependents_of does:

sqlx-macros v0.7.2 (proc-macro)
└── sqlx v0.7.2
    ├── example v0.1.0 (/tmp/example)
    └── pgvector v0.3.2
        └── example v0.1.0 (/tmp/example)

This is just expected behaviour; I think we would need a way to isolate the features/dependencies of a particular dependency to avoid it (assuming the clean behaviour is kept)?

Maybe a CLI argument could be added as a manual workaround, e.g. cargo sqlx prepare --no-clean pgvector,other? Not great though.

@abonander
Copy link
Collaborator

Would the fix maybe be just cleaning workspace crates? Or just the current crate?

@cycraig
Copy link
Contributor Author

cycraig commented Nov 21, 2023

Would the fix maybe be just cleaning workspace crates? Or just the current crate?

There may be out-of-workspace dependencies that contain query macros, so I don't think that would work in general.

sqlx-prepare only cleans dependencies for --workspace (to ensure all queries are generated and combined), only the current crate is touched otherwise. I guess technically that misses the case where prepare is run on a non-workspace crate that relies on a dependency with query macros, but it's probably uncommon.

@abonander
Copy link
Collaborator

Ignoring non-workspace crates seems fine to me. Those should be prepared separately.

@cycraig
Copy link
Contributor Author

cycraig commented Nov 28, 2023

Ignoring non-workspace crates seems fine to me. Those should be prepared separately.

Is that possible, to vendor prepared query macros from dependencies? How can query macros from a

  • local dependency (crate/workspace in the same repo or a git submodule, just not in the same workspace),
  • git dependency, or
  • registry dependency (from crates.io or other registry, least common),

be prepared to ensure they're valid against a different database?

I'm all for ignoring (not cleaning) non-workspace crates, but there do seem to be valid, if uncommon, cases where it may be needed. Maybe just hide the behaviour behind a coarse CLI flag? (As opposed to a fine-grained flag suggested in my previous comment).

E.g.

  1. (breaking) --clean-deps: change prepare to ignore cleaning non-workspace dependencies by default, unless this flag is specified.
  2. (non-breaking) --no-clean-deps: keep current clean behaviour in prepare, allow users to disable it with this flag.

In both cases, prepare would still touch files in the workspace to force recompilation.

@abonander
Copy link
Collaborator

@cycraig sorry for such a long delay on getting back to this, I tend to forget about PRs when they get pushed to the second page. Out of sight, out of mind and all that.

What do you think of this:

  • cargo sqlx prepare: clean all workspace crates that depend on sqlx
  • cargo sqlx prepare --all: clean all crates in graph that depend on sqlx

One thought is we could parse dep-info to know when a crate uses the macros. If you call env!() or option_env!() in a crate, it records in the dep-info that you have a dependency on that environment variable. We wouldn't even need to be able to parse the dep-info format in case it changes; we could just search for the environment variable name if it's unique enough. We could start with the crates that depend on sqlx and use this to filter.

It would be possible to trigger recompiles with this directly, e.g. by emitting something like option_env!("SQLX_RECOMPILE_TRIGGER") and setting that to a unique string every time cargo sqlx prepare is run. I remember trying that before but at some point I abandoned that approach, I think because it triggered a rebuild when the compiler was run without it and that got annoying. I also vaguely recall that it wasn't all that reliable.

@cycraig
Copy link
Contributor Author

cycraig commented Jun 29, 2024

What do you think of this:

  • cargo sqlx prepare: clean all workspace crates that depend on sqlx
  • cargo sqlx prepare --all: clean all crates in graph that depend on sqlx

Sure, that's option 1 from #2786 (comment). Rebased and updated the branch to add the --all option to sqlx prepare. I kept the commit to exclude sqlx from the clean even when using --all (which doesn't solve the pgvector clean, but it should still be an improvement for most users).

Edit: I may have misunderstood what you meant by --all, should it instead behave the same as --workspace but also clean dependencies? There are probably cases where having two separate flags would be better. E.g. several independent crates using sqlx macros in a large workspace that compile to different binaries, and are prepared separately. If any one of them needs query macros from an external dependency --all would force them to prepare the whole workspace.

Some points:

  1. Bikeshedding: --all is a bit non-specific, I would expect it to prepare all crates in the workspace like --workspaces does. Some alternatives:
    a. --clean
    b. --clean-deps or --clean-dependencies
    c. --all-deps or --all-dependencies
    d. --deps or --dependencies
    e. --external
    f. Keep --all
  2. --all only works in conjunction with --workspace currently. Without --workspace, prepare only touches the current crate's source files (existing behavior). Just needs some code shifting though, wanted confirmation before going further.
  3. Technically a CLI behavior breaking change. No longer matters since 0.8.0 is planned.

[...] emitting something like option_env!("SQLX_RECOMPILE_TRIGGER") [...] I remember trying that before but at some point I abandoned that approach, I think because it triggered a rebuild when the compiler was run without it and that got annoying. I also vaguely recall that it wasn't all that reliable.

Not sure if that has changed recently, but env! does seem to trigger a recompilation. At least with trivial code which did not work a few years ago: https://users.rust-lang.org/t/should-env-cause-rebuild-is-env-var-changes/18013

Would have to check it works in all cases like in macros, workspaces, and dependencies. Even then I'm unsure whether one can rely on the recompilation triggering, if it's not guaranteed anywhere it could regress in a future version.

I mostly see people using cargo::rerun-if-env-changed in build.rs for that, but asking everyone to copy-paste a build.rs script in every crate/workspace that uses macros wouldn't seem like an improvement.

It definitely triggers another recompilation when the environment variable is unset, and I think some developers may find that worse than sqlx prepare recompiling more than necessary once.

@abonander
Copy link
Collaborator

At the end of the day, my concern is that cargo sqlx prepare should do what most people are going to expect it to do.

From a Cargo perspective, I suppose cargo sqlx prepare should be expected to prepare only the current crate, and cargo sqlx prepare --workspace obviously would do workspace crates. In this way, the mental model could be that it casts the smallest net possible, and then there are flags to widen that net.

But that has a couple problems:

  • The constant friction from having to remember to specify --workspace in more complex projects.
  • The lack of a clear winner for a flag to say "prepare all crates in the graph that might need it".
    • You make a good point that --all is a little too ambiguous, but 5 alternatives and I'm not enamored with any of them except maybe --all-deps.

Instead, we could approach it from the other direction: cargo sqlx prepare casts the broadest net necessary by default, and then we have flags to narrow that scope:

  • cargo sqlx prepare re-runs the macros for all crates in the graph that last ran them in online mode (I have an idea of how to select for this; see below)
  • cargo sqlx prepare --workspace restricts to just workspace crates
    • For most use-cases, this case and the previous case are equivalent.
  • cargo sqlx prepare -p <pkgspec...> to prepare specific crates.

While thinking about this, it occurred to me, "well duh, why don't we use files?" We can create them in the macros if they don't exist and use include_bytes!() in the output (or proc_macro::tracked_path::path() if/when it's stable) to add them to the watch list.

We can get all the selectivity we really need with three include_bytes!() per macro invocation:

  • <target directory>/sqlx/recompile-all
    • Emitted by all macro invocations
  • <target directory>/sqlx/recompile-workspace
    • Emitted by macro invocations from workspace crates only
  • $OUT_DIR/sqlx-recompile
    • Unique path for every crate, but the file will only exist for ones that touch macros.

The contents of these files is irrelevant; I'd probably make it a timestamp. In release mode, they should be optimized out anyway.

What cargo sqlx prepare would then do is update the appropriate file(s) (change its contents and update its mtime) for a given selection:

  • cargo sqlx prepare would touch <target>/sqlx/recompile-all;
  • cargo sqlx prepare --workspace would touch <target>/sqlx/recompile-workspace;
  • cargo sqlx prepare -p <pkgspec>... would touch $OUT_DIR/sqlx-recompile for each matched package;

then run cargo check for the same specification. This should pretty reliably trigger recompilation, and only for the relevant crates. This would also keep incremental compilation artifacts which would otherwise be discarded with the cleaning approach.

What do you think?

@abonander
Copy link
Collaborator

@cycraig have you seen my last comment?

@cycraig
Copy link
Contributor Author

cycraig commented Jul 31, 2024

The constant friction from having to remember to specify --workspace in more complex projects.

The --workspace flag is usually added to a Makefile/CICD pipeline once and forgotten about, so I don't think it's a significant pain point. Such projects typically need -- --all-targets --all-features too, which would not change with the proposed solution.

why don't we use files?

I'm hesitant to encourage another filesystem-based solution given the friction some developers experience with non-standard directories, e.g. #1706, #2395. Just saying there will likely be some teething issues.

We can create them in the macros if they don't exist and use include_bytes!() in the output (or proc_macro::tracked_path::path() if/when it's stable) to add them to the watch list. [...] The contents of these files is irrelevant; I'd probably make it a timestamp. In release mode, they should be optimized out anyway.

It would be nice to benchmark the impact of include_bytes!(), if that's the only implementation option currently. The content may be negligible, but it would be nice to confirm the file handling and including bytes for hundreds to thousands of queries doesn't negatively impact developer experience.

It also sounds a bit like invoked.timestamp. Not sure if there's any extra insight to be gained from looking at how fingerprints work.

https://doc.rust-lang.org/nightly/nightly-rustc/cargo/core/compiler/fingerprint/index.html#fingerprint-files

An invoked.timestamp file whose filesystem mtime is updated every time the Unit is built. This is used for capturing the time when the build starts, to detect if files are changed in the middle of the build. See below for more details.


This should pretty reliably trigger recompilation, and only for the relevant crates. This would also keep incremental compilation artifacts which would otherwise be discarded with the cleaning approach.

Thoughts: avoiding cleaning dependencies was the initial issue in this PR, and doing that with no manual configuration is certainly attractive, assuming the approach works reliably. I'm just a bit risk averse to potentially introducing a "cure worse than the disease" for a speedup of cargo sqlx prepare.

Excluding sqlx from the clean step in sqlx prepare, as done in this PR, is an easy 1-line change that would benefit all sqlx users, with no drawbacks that I know of. That commit could be merged independently (removing the --all flag commit) while the more robust solution to avoid unnecessarily cleaning other dependencies that use sqlx (pgvector etc.) is implemented. Unfortunately I'm not currently in a position to contribute.

@abonander abonander merged commit 4acecfc into launchbadge:main Aug 1, 2024
52 of 64 checks passed
@abonander
Copy link
Collaborator

You make a good point. I've accepted the change as-is.

jrasanen pushed a commit to jrasanen/sqlx that referenced this pull request Oct 14, 2024
* fix(cli): do not clean sqlx during prepare

* feat(cli): only clean dependencies with new --all flag for prepare
jrasanen pushed a commit to jrasanen/sqlx that referenced this pull request Oct 14, 2024
* fix(cli): do not clean sqlx during prepare

* feat(cli): only clean dependencies with new --all flag for prepare
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants