Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

salsa 3.0 #490

Merged
Merged

Conversation

nikomatsakis
Copy link
Member

This branch implements the Salsa 3.0 plan, at least in broad strokes. It's not quite ready to merge, needs more documentation, especially around the use of unsafe. Rather than write an extensive comment here, I'm going to push some documentation commits.

Right now, this doesn't change much except the
behavior in the event that `Eq` is not properly
implemented. In the future, it will enable
the use of references and slices and things.
This is a step towards the goal of keep a pointer
in the structs themselves.
There are 3 call-sites to this function:

* One of them has already marked the outputs
* One of them has no outputs
* The third does need to mark the outputs
In particular, the ingredient and the database
have the same lifetime.
This will be useful later for safety conditions.
The internal API is now based around providing
references to the `TrackedStructValue`.

Documenting the invariants led to one interesting
case, which is that we sometimes verify a tracked
struct as not having changed (and even create
`&`-ref to it!) but then re-execute the function
around it.

We now guarantee that, in this case, the data
does not change, even if it has leaked values.
This is required to ensure soundness.
Add a test case about it.
We need a cheap way to compute field indices.
Just use salsa::Id for the most part.
tracked structs with `'db` carry a pointer
and not an id.
is there a nicer way to do this?!
It will be shared between tracked structs and
interned structs.
Salsa struct is already a grab-bag, best to keep
it to shared functionality.
This will permit GATs so that interned values
can carry lifetimes.
also fix name of a fn in one case
We'll need these for use with tracked functions
Previously tracked structs relied on an
interned ingredient to intern their keys.
But really it has more complex logic than we
need. Simpler to just remove it and duplicate
the basic concept.
this will let us use different packages
but the same struct name from salsa struct
@nikomatsakis
Copy link
Member Author

nikomatsakis commented May 29, 2024 via email

In old code, we converted to a `&'db` when
creating a new tracked struct or interning,
but this value in fact persisted beyond the end
of `'db` (i.e., into the new revision).

We now refactor so that we create the `Foo<'db>`
from a `NonNull<T>` instead of a `&'db T`, and
then only create safe references when users
access fields.

This makes miri happy.
This...seems dated. We have `specify` which is
a more correct and principled version. Not sure
what `set` was meant to be but I don't see any
tests for it so...kill it.
tracked structs only support `'db` lifetimes
This was not obvious to me initially.
Still debating the best structure, so the contents
are rather scattershot.

I may have found a hole, but it's...obscure
and I'm comfortable with it for the time being,
though I think we want to close it eventually.
book/src/tutorial/ir.md Outdated Show resolved Hide resolved
book/src/tutorial/ir.md Outdated Show resolved Hide resolved
book/src/tutorial/ir.md Outdated Show resolved Hide resolved
Copy link
Contributor

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very exciting. I've already ran into an issue once where I hold on to a tracked strut across db versions and Salsa then panicked at runtime. Catching this at compile time would be great. It should also allow us to implement garbage collection under safe assumptions.

book/src/overview.md Show resolved Hide resolved
components/salsa-2022-macros/src/db_lifetime.rs Outdated Show resolved Hide resolved
components/salsa-2022-macros/src/db_lifetime.rs Outdated Show resolved Hide resolved

```rust
#[salsa::tracked]
struct Ast {
struct Ast<'db> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the addition of the db lifetime also allow queries to return data that reference the DB?

One use case that we have is that we need a mapping from AstNode -> Id where Id for example uniquely identifie's a scope, or a symbol in the program.

The challenge we're facing is that our Ast doesn't use Arcs internally, thus cloning a Node always clones the entire sub-tree. Our "work-around" for this is to keep hold to the AST's root (wrapped in an Arc) and store a raw pointer referencing the actual node. This works pretty well, but requires heavy use of Arcs (a lot of cloning). I "think" your changes would allow us to directly store a &'db Expr instead.

If not, then the "work-around" would just be to make the AstNode -> Id map a salsa tracked so that we get access to the db lifetime

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did intend to support storing &T references like that, but it's a subtle case, and I've gone back and forth on whether it works with the stacked borrows rules etc.

Suppose you do let f = ast.field(db) in R1 and it yields a &'db Foo (reference to some field of ast) and then you store that in the database as the result of a query (or part of the result). Now say we start a new revision R2 and, in R2, the value of field changes. This means that f (considered as a pointer) still points to the same memory, but the value behind f has changed. There are two challenges: (a) under the stacked borrow rules, it is UB to use f again; (b) should we consider functions that were dependent on f as needing to be re-executed?

I've tried to write an exploration of this question in this comment like 3 times but it keeps getting unwieldy. I think I will defer it for documentation or in-person discussion, it's a good one. I'm not entirely sure if and under what conditions this can be made to be safe at the moment. =)

That said, I also want to mention a feature I've been considering that I think is may help with your use case. The idea would be to make it easy to have a value that carries a memory arena and references into that arena. This is meant to model things like MIR, where we have some data structure that represents a function, and to allow it to go through phases where it is changed and updated, but without requiring everything to be in vectors nor requiring everything to be cloned constantly. I'm not sure the ergonomics exactly but the idea is roughly that you can declare a struct with two lifetimes...

#[in_arena(AstRoot)]
struct Ast<'ast, 'db: 'ast> {
   data: AstData<'ast, 'db>,
   children: Vec<&'ast Ast<'ast, 'db>>,
}

...and the procedural macro will create a type AstRoot that "hides" the first one:

struct AstRoot<'db> {
    arena: Arc<MemoryArena>,
    root: &'static Ast<'static 'static>, // <-- the lifetimes here are obviously lies
}

Later you can do root.open(|r| { .. }) to work with the data. One of the goals is that you can create new, derived values based on the same arena that have different pointers -- so e.g. it should be possible to extra subvalues from the tree. Each of them would carry a reference count to the same base arena.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, I also want to mention a feature I've been considering that I think is may help with your use case. The idea would be to make it easy to have a value that carries a memory arena and references into that arena.

Yeah, that sounds very similar to our "work-around", except that it is more flexible and the unsafety is handled by salsa instead of sprinkled through our code.

book/src/plumbing/db_lifetime.md Outdated Show resolved Hide resolved
book/src/plumbing/db_lifetime.md Show resolved Hide resolved
They must have gotten it through salsa's internal mechanisms.
This is important because salsa will provide `&`-references to fields within that remain valid during a revision.
But at the start of a new revision salsa may opt to modify those fields or even free the allocation.
This is safe because users cannot have references to `ts` at the start of a new revision.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I'm not sure if that's mentioned above. But I think that's because the operations require a &'db mut Db

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Co-authored-by: Micha Reiser <micha@reiser.io>
Co-authored-by: Ryan Cumming <etaoins@gmail.com>
Co-authored-by: David Barsky <me@davidbarsky.com>
@nikomatsakis
Copy link
Member Author

I am debating about how to proceed here. It's taking me far longer to write these docs than I hoped. And they are complicated. I'm somewhat inclined to merge the PR so that people can start trying it out and then continue to write the safety docs in parallel.

But I'm also eager to not overlook the safety docs.

Thoughts from others?

@nikomatsakis
Copy link
Member Author

One thing I don't know 100% is how much the memory safety of the scheme relies on salsa correctly tracking dependencies. I'd prefer if it didn't, because I don't trust users not to poke holes in that system, but I feel like as I talked out the reasoning I found myself wanting to rely on "and we know that the fn will have been re-executed". But then I think I did add some judicious panics in there partly for this reason (i.e., to double check users and panic if they messed things up), I just don't know yet if that's enough.

@nikomatsakis
Copy link
Member Author

OK, I pushed an update to the safety documentation that I think covers all the key details in a readable way. It also identifies what I believe to be a flaw in the current setup -- I think it's possible for users to abuse salsa through leaked structs or nondeterminism and access freed memory. It's not easy to do, and I should make a test (will do later).

There are various ways to close this hole, and in particular I think one of the planned improvements I had in mind (adopting a sharded-slab like structure) would serve for it. But fundamentally we need some way to test if a pointer is still "in bounds".

I'm inclined to land the PR in its current state and work on those improvements as follow-up.

@MichaReiser
Copy link
Contributor

I'm somewhat inclined to merge the PR so that people can start trying it out and then continue to write the safety docs in parallel.

I don't expect to have time to try out the new branch before the end of next week, but having something to play with certainly is nice (although that's also possible by pointing cargo to this PR's revision).

I'm inclined to land the PR in its current state and work on those improvements as follow-up.

I'm supportive of this. I would find separate PR's useful for more focused discussions around specific improvements.

@nikomatsakis
Copy link
Member Author

OK, I'm going to land this change. We can discuss future developments on Zulip. One thing I plan to do is explore using sharded-slab in the implementation, which will make the safety concerns much simpler but (maybe, not clearly) cost a bit of performance. It'd be useful to have measurements at some point.

@nikomatsakis
Copy link
Member Author

That said, I do feel certain that the lifetimes and requirement to use the Update trait etc is good. It is a bit more restrictive but gives us a lot of room for future improvement.

@ematipico
Copy link

I think you should merge and publish this version.

I want to start experimenting, so I can help filing possible issues and help with the developments, if they are accepted.

@nikomatsakis
Copy link
Member Author

I'm going to land it. I'm not inclined to publish though I would be, I think, ok with something like 3.0-alpha or something

@nikomatsakis nikomatsakis added this pull request to the merge queue Jun 17, 2024
@github-merge-queue github-merge-queue bot merged commit 283ccda into salsa-rs:master Jun 17, 2024
10 checks passed
@Y-Nak Y-Nak mentioned this pull request Jun 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants