salsa 3.0 #490

nikomatsakis · 2024-05-25T20:07:52Z

This branch implements the Salsa 3.0 plan, at least in broad strokes. It's not quite ready to merge, needs more documentation, especially around the use of unsafe. Rather than write an extensive comment here, I'm going to push some documentation commits.

Right now, this doesn't change much except the behavior in the event that `Eq` is not properly implemented. In the future, it will enable the use of references and slices and things.

This is a step towards the goal of keep a pointer in the structs themselves.

There are 3 call-sites to this function: * One of them has already marked the outputs * One of them has no outputs * The third does need to mark the outputs

In particular, the ingredient and the database have the same lifetime. This will be useful later for safety conditions.

The internal API is now based around providing references to the `TrackedStructValue`. Documenting the invariants led to one interesting case, which is that we sometimes verify a tracked struct as not having changed (and even create `&`-ref to it!) but then re-execute the function around it. We now guarantee that, in this case, the data does not change, even if it has leaked values. This is required to ensure soundness. Add a test case about it.

We need a cheap way to compute field indices.

This reverts commit 43b1b8e.

Just use salsa::Id for the most part.

tracked structs with `'db` carry a pointer and not an id.

is there a nicer way to do this?!

It will be shared between tracked structs and interned structs.

Salsa struct is already a grab-bag, best to keep it to shared functionality.

This will permit GATs so that interned values can carry lifetimes.

also fix name of a fn in one case

We'll need these for use with tracked functions

Previously tracked structs relied on an interned ingredient to intern their keys. But really it has more complex logic than we need. Simpler to just remove it and duplicate the basic concept.

this will let us use different packages but the same struct name from salsa struct

nikomatsakis · 2024-05-29T17:47:37Z

I don't understand exactly what you are trying to do -- can you elaborate?

…

On Tue, May 28, 2024, at 1:45 AM, Micha Reiser wrote: ***@***.**** commented on this pull request. In components/salsa-2022/src/id.rs <#490 (comment)>: > + /// Lookup from an `Id` to get an instance of the type. + /// + /// # Panics + /// + /// This fn may panic if the value with this id has not been + /// produced in this revision already (e.g., for a tracked + /// struct, the function will panic if the tracked struct + /// has not yet been created in this revision). Salsa's + /// dependency tracking typically ensures this does not + /// occur, but it is possible for a user to violate this + /// rule. + fn lookup_id(id: Id, db: DB) -> Self; +} + +/// Internal Salsa trait for types that are just a newtype'd [`Id`][]. +pub trait FromId: AsId + Copy + Eq + Hash + Debug { This is interesting. We might be doing something wrong but one thing we've been thinking about is that it would be nice if we could e.g. intern an entire `SymbolTable` but still use individual `Symbol`s as salsa ingredients (because we always rebuild the entire symbol table but we want symbol level invalidation). I'm not sure if that's something that the new `FromId` trait would enable (it would probably still require implementing a custom `Ingredient`) — Reply to this email directly, view it on GitHub <#490 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABF4ZRXB7FETQHIXMRBEPDZEQ7ZTAVCNFSM6AAAAABIJEURNGVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDAOBSGE2DIOJRGI>. You are receiving this because you authored the thread.Message ID: ***@***.***>

In old code, we converted to a `&'db` when creating a new tracked struct or interning, but this value in fact persisted beyond the end of `'db` (i.e., into the new revision). We now refactor so that we create the `Foo<'db>` from a `NonNull<T>` instead of a `&'db T`, and then only create safe references when users access fields. This makes miri happy.

This...seems dated. We have `specify` which is a more correct and principled version. Not sure what `set` was meant to be but I don't see any tests for it so...kill it.

book/src/reference/durability.md

tracked structs only support `'db` lifetimes

This was not obvious to me initially.

Still debating the best structure, so the contents are rather scattershot. I may have found a hole, but it's...obscure and I'm comfortable with it for the time being, though I think we want to close it eventually.

book/src/tutorial/ir.md

MichaReiser

This is very exciting. I've already ran into an issue once where I hold on to a tracked strut across db versions and Salsa then panicked at runtime. Catching this at compile time would be great. It should also allow us to implement garbage collection under safe assumptions.

book/src/overview.md

components/salsa-2022-macros/src/db_lifetime.rs

MichaReiser · 2024-06-15T09:04:58Z

book/src/overview.md


 ```rust
 #[salsa::tracked]
-struct Ast {
+struct Ast<'db> {


Does the addition of the db lifetime also allow queries to return data that reference the DB?

One use case that we have is that we need a mapping from AstNode -> Id where Id for example uniquely identifie's a scope, or a symbol in the program.

The challenge we're facing is that our Ast doesn't use Arcs internally, thus cloning a Node always clones the entire sub-tree. Our "work-around" for this is to keep hold to the AST's root (wrapped in an Arc) and store a raw pointer referencing the actual node. This works pretty well, but requires heavy use of Arcs (a lot of cloning). I "think" your changes would allow us to directly store a &'db Expr instead.

If not, then the "work-around" would just be to make the AstNode -> Id map a salsa tracked so that we get access to the db lifetime

I did intend to support storing &T references like that, but it's a subtle case, and I've gone back and forth on whether it works with the stacked borrows rules etc.

Suppose you do let f = ast.field(db) in R1 and it yields a &'db Foo (reference to some field of ast) and then you store that in the database as the result of a query (or part of the result). Now say we start a new revision R2 and, in R2, the value of field changes. This means that f (considered as a pointer) still points to the same memory, but the value behind f has changed. There are two challenges: (a) under the stacked borrow rules, it is UB to use f again; (b) should we consider functions that were dependent on f as needing to be re-executed?

I've tried to write an exploration of this question in this comment like 3 times but it keeps getting unwieldy. I think I will defer it for documentation or in-person discussion, it's a good one. I'm not entirely sure if and under what conditions this can be made to be safe at the moment. =)

That said, I also want to mention a feature I've been considering that I think is may help with your use case. The idea would be to make it easy to have a value that carries a memory arena and references into that arena. This is meant to model things like MIR, where we have some data structure that represents a function, and to allow it to go through phases where it is changed and updated, but without requiring everything to be in vectors nor requiring everything to be cloned constantly. I'm not sure the ergonomics exactly but the idea is roughly that you can declare a struct with two lifetimes...

#[in_arena(AstRoot)] struct Ast<'ast, 'db: 'ast> { data: AstData<'ast, 'db>, children: Vec<&'ast Ast<'ast, 'db>>, }

...and the procedural macro will create a type AstRoot that "hides" the first one:

struct AstRoot<'db> { arena: Arc<MemoryArena>, root: &'static Ast<'static 'static>, // <-- the lifetimes here are obviously lies }

Later you can do root.open(|r| { .. }) to work with the data. One of the goals is that you can create new, derived values based on the same arena that have different pointers -- so e.g. it should be possible to extra subvalues from the tree. Each of them would carry a reference count to the same base arena.

That said, I also want to mention a feature I've been considering that I think is may help with your use case. The idea would be to make it easy to have a value that carries a memory arena and references into that arena.

Yeah, that sounds very similar to our "work-around", except that it is more flexible and the unsafety is handled by salsa instead of sprinkled through our code.

book/src/plumbing/db_lifetime.md

MichaReiser · 2024-06-15T09:13:34Z

book/src/plumbing/db_lifetime.md

+They must have gotten it through salsa's internal mechanisms.
+This is important because salsa will provide `&`-references to fields within that remain valid during a revision.
+But at the start of a new revision salsa may opt to modify those fields or even free the allocation.
+This is safe because users cannot have references to `ts` at the start of a new revision.


Nit: I'm not sure if that's mentioned above. But I think that's because the operations require a &'db mut Db

Co-authored-by: Micha Reiser <micha@reiser.io> Co-authored-by: Ryan Cumming <etaoins@gmail.com> Co-authored-by: David Barsky <me@davidbarsky.com>

nikomatsakis · 2024-06-15T10:36:09Z

I am debating about how to proceed here. It's taking me far longer to write these docs than I hoped. And they are complicated. I'm somewhat inclined to merge the PR so that people can start trying it out and then continue to write the safety docs in parallel.

But I'm also eager to not overlook the safety docs.

Thoughts from others?

nikomatsakis · 2024-06-15T10:39:53Z

One thing I don't know 100% is how much the memory safety of the scheme relies on salsa correctly tracking dependencies. I'd prefer if it didn't, because I don't trust users not to poke holes in that system, but I feel like as I talked out the reasoning I found myself wanting to rely on "and we know that the fn will have been re-executed". But then I think I did add some judicious panics in there partly for this reason (i.e., to double check users and panic if they messed things up), I just don't know yet if that's enough.

nikomatsakis · 2024-06-15T13:19:31Z

OK, I pushed an update to the safety documentation that I think covers all the key details in a readable way. It also identifies what I believe to be a flaw in the current setup -- I think it's possible for users to abuse salsa through leaked structs or nondeterminism and access freed memory. It's not easy to do, and I should make a test (will do later).

There are various ways to close this hole, and in particular I think one of the planned improvements I had in mind (adopting a sharded-slab like structure) would serve for it. But fundamentally we need some way to test if a pointer is still "in bounds".

I'm inclined to land the PR in its current state and work on those improvements as follow-up.

MichaReiser · 2024-06-15T15:19:44Z

I'm somewhat inclined to merge the PR so that people can start trying it out and then continue to write the safety docs in parallel.

I don't expect to have time to try out the new branch before the end of next week, but having something to play with certainly is nice (although that's also possible by pointing cargo to this PR's revision).

I'm inclined to land the PR in its current state and work on those improvements as follow-up.

I'm supportive of this. I would find separate PR's useful for more focused discussions around specific improvements.

nikomatsakis · 2024-06-16T13:55:34Z

OK, I'm going to land this change. We can discuss future developments on Zulip. One thing I plan to do is explore using sharded-slab in the implementation, which will make the safety concerns much simpler but (maybe, not clearly) cost a bit of performance. It'd be useful to have measurements at some point.

nikomatsakis · 2024-06-16T13:56:06Z

That said, I do feel certain that the lifetimes and requirement to use the Update trait etc is good. It is a bit more restrictive but gives us a lot of room for future improvement.

ematipico · 2024-06-16T15:29:00Z

I think you should merge and publish this version.

I want to start experimenting, so I can help filing possible issues and help with the developments, if they are accepted.

nikomatsakis · 2024-06-17T12:30:35Z

I'm going to land it. I'm not inclined to publish though I would be, I think, ok with something like 3.0-alpha or something

nikomatsakis added 30 commits May 24, 2024 07:15

update docs to mention durability

225a81a

adopt the Salsa 3.0 Update` trait

4533cd9

Right now, this doesn't change much except the behavior in the event that `Eq` is not properly implemented. In the future, it will enable the use of references and slices and things.

return &TrackedStructValue<C> from new_struct

e24ace2

This is a step towards the goal of keep a pointer in the structs themselves.

separate marking the outputs as verified

a320781

There are 3 call-sites to this function: * One of them has already marked the outputs * One of them has no outputs * The third does need to mark the outputs

give trait more info about lifetime relationships

20cb307

In particular, the ingredient and the database have the same lifetime. This will be useful later for safety conditions.

allow (but don't test) lifetime parameters

79d24e0

track and assert struct ingredient indices

5ce5e3c

We need a cheap way to compute field indices.

WIP permit 'db on tracked struct definitions (opt)

b6311d8

Revert "WIP permit 'db on tracked struct definitions (opt)"

cb1a2bb

This reverts commit 43b1b8e.

just take salsa::Id instead of id structs

6e2647f

remove Key from Fn configuration

b050bd8

Just use salsa::Id for the most part.

make fn input/value a GAT

44a8a2f

give fields a lifetime

e95c8b2

permit <'db> on tracked struct

a84777d

tracked structs with `'db` carry a pointer and not an id.

support db lifetimes in fields

fe4ff98

rework debugging to be more permanent

04e041b

pipe debug output through rustfmt

4f74037

is there a nicer way to do this?!

generate configuration struct in salsa_struct

8ba6e60

It will be shared between tracked structs and interned structs.

move interned-specific fns out of salsa struct

54c9586

Salsa struct is already a grab-bag, best to keep it to shared functionality.

rework interning to have a Configuration

97fc6a0

This will permit GATs so that interned values can carry lifetimes.

update tests for new error messages

3441666

also fix name of a fn in one case

introduce helper functions

d190beb

We'll need these for use with tracked functions

permit interned data to take 'db lifetime

4822013

have tracked struct intern its own keys

d6d5226

Previously tracked structs relied on an interned ingredient to intern their keys. But really it has more complex logic than we need. Simpler to just remove it and duplicate the basic concept.

debug dump for interned struct tokens

af94b25

factor out useful helper fn

d92f2aa

return a pointer from interning, not just id

5095d79

rename from TrackedStruct to just Struct

0b8c27b

this will let us use different packages but the same struct name from salsa struct

parameterize salsa_struct module

9d8a60b

nikomatsakis added 6 commits May 30, 2024 01:59

use const _: () to disable clippy lints

88b964d

pacify the merciless clippy

0ad0be8

rustfmt has opinions

b9ab8fc

allow elided lifetimes in tracked fn return values

ce750da

remove "setter" function altogether

5326683

This...seems dated. We have `specify` which is a more correct and principled version. Not sure what `set` was meant to be but I don't see any tests for it so...kill it.

etaoins reviewed Jun 1, 2024

View reviewed changes

book/src/reference/durability.md Outdated Show resolved Hide resolved

nikomatsakis added 5 commits June 11, 2024 05:20

remove dead code

f91eeb9

tracked structs only support `'db` lifetimes

remove dead code from interned structs

c02f30a

rework tutorial a bit to be more up to date

af2c973

add a safety comment on Update

bcad24c

This was not obvious to me initially.

WIP: start writing a safety chapter

ab9aa3a

Still debating the best structure, so the contents are rather scattershot. I may have found a hole, but it's...obscure and I'm comfortable with it for the time being, though I think we want to close it eventually.

davidbarsky reviewed Jun 13, 2024

View reviewed changes

book/src/tutorial/ir.md Outdated Show resolved Hide resolved

book/src/tutorial/ir.md Outdated Show resolved Hide resolved

book/src/tutorial/ir.md Outdated Show resolved Hide resolved

MichaReiser reviewed Jun 15, 2024

View reviewed changes

Apply suggestions from code review

1544ee9

Co-authored-by: Micha Reiser <micha@reiser.io> Co-authored-by: Ryan Cumming <etaoins@gmail.com> Co-authored-by: David Barsky <me@davidbarsky.com>

nikomatsakis added this pull request to the merge queue Jun 17, 2024

github-merge-queue bot merged commit 283ccda into salsa-rs:master Jun 17, 2024
10 checks passed

nikomatsakis mentioned this pull request Jun 17, 2024

Support results with internal references #498

Open

carljm mentioned this pull request Jun 27, 2024

[red-knot] intern types using Salsa astral-sh/ruff#12061

Merged

Y-Nak mentioned this pull request Jun 30, 2024

Update salsa ethereum/fe#1015

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

salsa 3.0 #490

salsa 3.0 #490

nikomatsakis commented May 25, 2024

nikomatsakis commented May 29, 2024 via email

MichaReiser left a comment

MichaReiser Jun 15, 2024

nikomatsakis Jun 15, 2024

MichaReiser Jun 15, 2024

MichaReiser Jun 15, 2024

nikomatsakis Jun 15, 2024

nikomatsakis commented Jun 15, 2024

nikomatsakis commented Jun 15, 2024

nikomatsakis commented Jun 15, 2024

MichaReiser commented Jun 15, 2024

nikomatsakis commented Jun 16, 2024

nikomatsakis commented Jun 16, 2024

ematipico commented Jun 16, 2024

nikomatsakis commented Jun 17, 2024

salsa 3.0 #490

salsa 3.0 #490

Conversation

nikomatsakis commented May 25, 2024

nikomatsakis commented May 29, 2024 via email

MichaReiser left a comment

Choose a reason for hiding this comment

MichaReiser Jun 15, 2024

Choose a reason for hiding this comment

nikomatsakis Jun 15, 2024

Choose a reason for hiding this comment

MichaReiser Jun 15, 2024

Choose a reason for hiding this comment

MichaReiser Jun 15, 2024

Choose a reason for hiding this comment

nikomatsakis Jun 15, 2024

Choose a reason for hiding this comment

nikomatsakis commented Jun 15, 2024

nikomatsakis commented Jun 15, 2024

nikomatsakis commented Jun 15, 2024

MichaReiser commented Jun 15, 2024

nikomatsakis commented Jun 16, 2024

nikomatsakis commented Jun 16, 2024

ematipico commented Jun 16, 2024

nikomatsakis commented Jun 17, 2024