Serialization to disk #10

nikomatsakis · 2018-10-01T09:27:51Z

We should support some way to serialize the state of our queries to disk and then reload them for a future session. This is a lot of work and we can learn from rustc, of course. We'd want to do the reloading lazilly, for example.

I definitely want to punt on this.

matklad · 2018-11-15T13:55:18Z

A similar, but different feature is to allow to transparently spill rarely used values to disk.

IntelliJ relies on similar feature heavily: when you open a multi-million line project with lots of dependencies, indices become really huge.

Note that this is a significantly different setup from rustc, which operates on a crate at a time, and has a reasonable natural cap on the amount of data it must process simultaneously.

matklad · 2019-01-23T12:59:57Z

A very wise observation from: rust-lang/rfcs#1317 (comment)

In a strictly on-demand setting (IDE, not a compiler), serialization to disk creates more problems than it solves.

lnicola · 2019-12-04T16:27:40Z

In a strictly on-demand setting (IDE, not a compiler), serialization to disk creates more problems than it solves.

Note that some popular IDEs like Visual Studio actually use a disk database. VS migrated a while ago from a custom format to a SQLite database: https://devblogs.microsoft.com/cppblog/introducing-c-experimental-editor-tools/.

lpil · 2020-03-05T15:38:31Z

Hi! This would be a desirable feature for me. Is this being worked on?

Not trying to rush you, just trying to evaluate how suitable this library is for my use-case. Thank you. :)

matklad · 2020-03-05T15:39:27Z

No, this is not being actively worked on at the moment.

…

On Thu, 5 Mar 2020 at 16:38, Louis Pilfold ***@***.***> wrote: Hi! This would be a desirable feature for me. Is this being worked on? Not trying to rush you, just trying to evaluate how suitable this library is for my use-case. Thank you. :) — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#10?email_source=notifications&email_token=AANB3M3YCOSNRH5FCEPGGTLRF7BPRA5CNFSM4FYGX55KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEN5XSAQ#issuecomment-595294466>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AANB3M56WYJNTWGXN5QGE73RF7BPRANCNFSM4FYGX55A> .

lpil · 2020-03-05T15:40:17Z

Thank you

fogti · 2020-04-09T23:02:20Z

I think serialization should be generally opt-in:

maybe at salsa::database level:
rather coarse, without lazy loading or transparent spilling, useful for "whole session" store/load and short-term-running scenarios
or even per query:
fine-grained, with lazy loading and maybe transparent spilling, useful to reduce RAM usage in long-term-running scenarios

I think I already have a kind of usage scenario ("scenario" as in "salsa is currently not used, but I investigate potential usages") in zs-filecrawler.

Click to expand

That program first walks through a file list and computes the hash of each file. Then it iterates over the list of hashes, takes the first associated file, and calls a user-defined hook script on that file. It caches the hash list and the progress. It might not really fit the usual `salsa` usage scenario, but the target is similiar: avoid redoing work.

QueryGroup 1: 
  file_content(filepath) <-- hash_data(filepath)
  ^-[maybe lazy input]      --> association [filepath -> hash_of_file_data]

QueryGroup 2:
  hash2file(hash)    <-- call_hook(hash)
  ^-[input, from QG1]   --> implicit association [hash -> done(hook return value)]

Currently, I just take the "session serialization approach", deserialize at startup, and serialize at shutdown/interrupt, but this may lose some progress. I think that the zs-filecrawler utility program could benefit from salsa, but it requires some way to serialize the state (the split into two QueryGroups would simulate that, but it makes interleaving both parts more difficult, and reduces potential benefits).

MichaReiser · 2024-04-20T15:50:32Z

Thanks for creating salsa. It's an outstanding piece of software and an extremely valuable inspiration resource.

We're exploring adding incremental computation to Ruff, a static analysis tool for Python that is preliminary used from the CLI but also comes with an LSP. We're intrigued by salsa's model. It's nice how it handles much of the caching complexity for you. However, we believe that a persistent cache is essential for us because subsequent check times are important when using the CLI locally or in CI. That's how I came across this issue.

Is this a feature where active contributions would be welcomed? Are there ides on how this could be implemented in Salsa 2022 that I could explore further?

nikomatsakis mentioned this issue Oct 1, 2018

Hashing of result values #11

Closed

nikomatsakis added the rfc Active discussion about a possible future feature label Oct 1, 2018

nikomatsakis added this to the Far future milestone Oct 1, 2018

lnicola mentioned this issue Jan 3, 2020

Compress rarely modified files rust-lang/rust-analyzer#869

Closed

bjorn3 mentioned this issue Jun 26, 2021

[Feature Request]: Persistent caches rust-lang/rust-analyzer#4712

Open

Verdagon mentioned this issue Aug 14, 2021

Query-based architecture ValeLang/Vale#300

Open

Gleyder42 mentioned this issue May 15, 2023

Implement demand driven compilation. Gleyder42/colomar#7

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Serialization to disk #10

Serialization to disk #10

nikomatsakis commented Oct 1, 2018

matklad commented Nov 15, 2018 •

edited

Loading

matklad commented Jan 23, 2019

lnicola commented Dec 4, 2019

lpil commented Mar 5, 2020

matklad commented Mar 5, 2020 via email

lpil commented Mar 5, 2020

fogti commented Apr 9, 2020

MichaReiser commented Apr 20, 2024

Serialization to disk #10

Serialization to disk #10

Comments

nikomatsakis commented Oct 1, 2018

matklad commented Nov 15, 2018 • edited Loading

matklad commented Jan 23, 2019

lnicola commented Dec 4, 2019

lpil commented Mar 5, 2020

matklad commented Mar 5, 2020 via email

lpil commented Mar 5, 2020

fogti commented Apr 9, 2020

MichaReiser commented Apr 20, 2024

matklad commented Nov 15, 2018 •

edited

Loading