-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
salsa 3.0 #490
salsa 3.0 #490
Conversation
Right now, this doesn't change much except the behavior in the event that `Eq` is not properly implemented. In the future, it will enable the use of references and slices and things.
This is a step towards the goal of keep a pointer in the structs themselves.
There are 3 call-sites to this function: * One of them has already marked the outputs * One of them has no outputs * The third does need to mark the outputs
In particular, the ingredient and the database have the same lifetime. This will be useful later for safety conditions.
The internal API is now based around providing references to the `TrackedStructValue`. Documenting the invariants led to one interesting case, which is that we sometimes verify a tracked struct as not having changed (and even create `&`-ref to it!) but then re-execute the function around it. We now guarantee that, in this case, the data does not change, even if it has leaked values. This is required to ensure soundness. Add a test case about it.
We need a cheap way to compute field indices.
This reverts commit 43b1b8e.
Just use salsa::Id for the most part.
tracked structs with `'db` carry a pointer and not an id.
is there a nicer way to do this?!
It will be shared between tracked structs and interned structs.
Salsa struct is already a grab-bag, best to keep it to shared functionality.
This will permit GATs so that interned values can carry lifetimes.
also fix name of a fn in one case
We'll need these for use with tracked functions
Previously tracked structs relied on an interned ingredient to intern their keys. But really it has more complex logic than we need. Simpler to just remove it and duplicate the basic concept.
this will let us use different packages but the same struct name from salsa struct
I don't understand exactly what you are trying to do -- can you elaborate?
…On Tue, May 28, 2024, at 1:45 AM, Micha Reiser wrote:
***@***.**** commented on this pull request.
In components/salsa-2022/src/id.rs <#490 (comment)>:
> + /// Lookup from an `Id` to get an instance of the type.
+ ///
+ /// # Panics
+ ///
+ /// This fn may panic if the value with this id has not been
+ /// produced in this revision already (e.g., for a tracked
+ /// struct, the function will panic if the tracked struct
+ /// has not yet been created in this revision). Salsa's
+ /// dependency tracking typically ensures this does not
+ /// occur, but it is possible for a user to violate this
+ /// rule.
+ fn lookup_id(id: Id, db: DB) -> Self;
+}
+
+/// Internal Salsa trait for types that are just a newtype'd [`Id`][].
+pub trait FromId: AsId + Copy + Eq + Hash + Debug {
This is interesting. We might be doing something wrong but one thing we've been thinking about is that it would be nice if we could e.g. intern an entire `SymbolTable` but still use individual `Symbol`s as salsa ingredients (because we always rebuild the entire symbol table but we want symbol level invalidation). I'm not sure if that's something that the new `FromId` trait would enable (it would probably still require implementing a custom `Ingredient`)
—
Reply to this email directly, view it on GitHub <#490 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABF4ZRXB7FETQHIXMRBEPDZEQ7ZTAVCNFSM6AAAAABIJEURNGVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDAOBSGE2DIOJRGI>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
In old code, we converted to a `&'db` when creating a new tracked struct or interning, but this value in fact persisted beyond the end of `'db` (i.e., into the new revision). We now refactor so that we create the `Foo<'db>` from a `NonNull<T>` instead of a `&'db T`, and then only create safe references when users access fields. This makes miri happy.
This...seems dated. We have `specify` which is a more correct and principled version. Not sure what `set` was meant to be but I don't see any tests for it so...kill it.
tracked structs only support `'db` lifetimes
This was not obvious to me initially.
Still debating the best structure, so the contents are rather scattershot. I may have found a hole, but it's...obscure and I'm comfortable with it for the time being, though I think we want to close it eventually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very exciting. I've already ran into an issue once where I hold on to a tracked strut across db versions and Salsa then panicked at runtime. Catching this at compile time would be great. It should also allow us to implement garbage collection under safe assumptions.
|
||
```rust | ||
#[salsa::tracked] | ||
struct Ast { | ||
struct Ast<'db> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the addition of the db
lifetime also allow queries to return data that reference the DB?
One use case that we have is that we need a mapping from AstNode
-> Id
where Id
for example uniquely identifie's a scope, or a symbol in the program.
The challenge we're facing is that our Ast
doesn't use Arc
s internally, thus cloning a Node
always clones the entire sub-tree. Our "work-around" for this is to keep hold to the AST's root (wrapped in an Arc) and store a raw pointer referencing the actual node. This works pretty well, but requires heavy use of Arc
s (a lot of cloning). I "think" your changes would allow us to directly store a &'db Expr
instead.
If not, then the "work-around" would just be to make the AstNode
-> Id
map a salsa tracked so that we get access to the db lifetime
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did intend to support storing &T
references like that, but it's a subtle case, and I've gone back and forth on whether it works with the stacked borrows rules etc.
Suppose you do let f = ast.field(db)
in R1 and it yields a &'db Foo
(reference to some field of ast
) and then you store that in the database as the result of a query (or part of the result). Now say we start a new revision R2 and, in R2, the value of field
changes. This means that f
(considered as a pointer) still points to the same memory, but the value behind f
has changed. There are two challenges: (a) under the stacked borrow rules, it is UB to use f
again; (b) should we consider functions that were dependent on f
as needing to be re-executed?
I've tried to write an exploration of this question in this comment like 3 times but it keeps getting unwieldy. I think I will defer it for documentation or in-person discussion, it's a good one. I'm not entirely sure if and under what conditions this can be made to be safe at the moment. =)
That said, I also want to mention a feature I've been considering that I think is may help with your use case. The idea would be to make it easy to have a value that carries a memory arena and references into that arena. This is meant to model things like MIR, where we have some data structure that represents a function, and to allow it to go through phases where it is changed and updated, but without requiring everything to be in vectors nor requiring everything to be cloned constantly. I'm not sure the ergonomics exactly but the idea is roughly that you can declare a struct with two lifetimes...
#[in_arena(AstRoot)]
struct Ast<'ast, 'db: 'ast> {
data: AstData<'ast, 'db>,
children: Vec<&'ast Ast<'ast, 'db>>,
}
...and the procedural macro will create a type AstRoot
that "hides" the first one:
struct AstRoot<'db> {
arena: Arc<MemoryArena>,
root: &'static Ast<'static 'static>, // <-- the lifetimes here are obviously lies
}
Later you can do root.open(|r| { .. })
to work with the data. One of the goals is that you can create new, derived values based on the same arena that have different pointers -- so e.g. it should be possible to extra subvalues from the tree. Each of them would carry a reference count to the same base arena.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That said, I also want to mention a feature I've been considering that I think is may help with your use case. The idea would be to make it easy to have a value that carries a memory arena and references into that arena.
Yeah, that sounds very similar to our "work-around", except that it is more flexible and the unsafety is handled by salsa instead of sprinkled through our code.
They must have gotten it through salsa's internal mechanisms. | ||
This is important because salsa will provide `&`-references to fields within that remain valid during a revision. | ||
But at the start of a new revision salsa may opt to modify those fields or even free the allocation. | ||
This is safe because users cannot have references to `ts` at the start of a new revision. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I'm not sure if that's mentioned above. But I think that's because the operations require a &'db mut Db
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
Co-authored-by: Micha Reiser <micha@reiser.io> Co-authored-by: Ryan Cumming <etaoins@gmail.com> Co-authored-by: David Barsky <me@davidbarsky.com>
I am debating about how to proceed here. It's taking me far longer to write these docs than I hoped. And they are complicated. I'm somewhat inclined to merge the PR so that people can start trying it out and then continue to write the safety docs in parallel. But I'm also eager to not overlook the safety docs. Thoughts from others? |
One thing I don't know 100% is how much the memory safety of the scheme relies on salsa correctly tracking dependencies. I'd prefer if it didn't, because I don't trust users not to poke holes in that system, but I feel like as I talked out the reasoning I found myself wanting to rely on "and we know that the fn will have been re-executed". But then I think I did add some judicious panics in there partly for this reason (i.e., to double check users and panic if they messed things up), I just don't know yet if that's enough. |
OK, I pushed an update to the safety documentation that I think covers all the key details in a readable way. It also identifies what I believe to be a flaw in the current setup -- I think it's possible for users to abuse salsa through leaked structs or nondeterminism and access freed memory. It's not easy to do, and I should make a test (will do later). There are various ways to close this hole, and in particular I think one of the planned improvements I had in mind (adopting a sharded-slab like structure) would serve for it. But fundamentally we need some way to test if a pointer is still "in bounds". I'm inclined to land the PR in its current state and work on those improvements as follow-up. |
I don't expect to have time to try out the new branch before the end of next week, but having something to play with certainly is nice (although that's also possible by pointing cargo to this PR's revision).
I'm supportive of this. I would find separate PR's useful for more focused discussions around specific improvements. |
OK, I'm going to land this change. We can discuss future developments on Zulip. One thing I plan to do is explore using sharded-slab in the implementation, which will make the safety concerns much simpler but (maybe, not clearly) cost a bit of performance. It'd be useful to have measurements at some point. |
That said, I do feel certain that the lifetimes and requirement to use the |
I think you should merge and publish this version. I want to start experimenting, so I can help filing possible issues and help with the developments, if they are accepted. |
I'm going to land it. I'm not inclined to publish though I would be, I think, ok with something like |
This branch implements the Salsa 3.0 plan, at least in broad strokes. It's not quite ready to merge, needs more documentation, especially around the use of unsafe. Rather than write an extensive comment here, I'm going to push some documentation commits.