Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PoC of new ODB design #273

Merged
merged 90 commits into from
Dec 18, 2021
Merged

PoC of new ODB design #273

merged 90 commits into from
Dec 18, 2021

Conversation

Byron
Copy link
Member

@Byron Byron commented Dec 8, 2021

After the design itself seemed to be pointing in the right direction, lets get a PoC done quickly to see if it works in reality as well.

Related to #266.

Tasks

  • update odb-design to match latest findings
  • PoC
    • contains(id) implementation along with MRU
    • find_object(id) with MRU
    • auto-setup of slotmap to match what's on disk + some safety
    • implement remaining pack::Find trait methods and try running pack creation on it
    • impl odb::
    • iteration
      • (needs fix for non-threadsafe version though…)
  • Use PoC in object-access test for performance measurements
  • cleanup (remove #![allow()])
  • optimize pub; check module layout

And avoid abstraction to allow swapping in non-sync/threadsafe types.

We can consider doing this once it's clear how all this works. Right
now, we would probably need to include arc-swap in the set of
abstractions.
Deletions happen rarely enough to allow open maps to be kept by handles
until they are discared in their entirety.
…this has many problems.

However, it slowly manifests that ideally we are able to handle a
certain amount of files, mutate many of them at the same time, while
being lock-free.

It's something we can't have though, so maybe just operate on a single
data structure but bank on caching it in the handles itself so that
super-fast and entirely lock-free access isn't even required.

The thing we can do is synchronizing access to the data structure,
loading files one by one instead of in parallel, and claiming that
this initial delay that's longer than it would have to be is acceptable
knowing that from thereon out it will be fast.

It will be fully lazy as well.
This one, however, will collect them all and it's up to the handle
implementation to decide how these are to be searched.

This is only relevant for alternates, and as of now it's actually
impossible to know when the packs of one odb are done to query its loose
db. Instead, all packs of all dbs are queried first, then all loose
object stores. Maybe that's even better this way, who knows.
The Extend outcome was removed as we are not able to know that packs
are all loaded when we collect the outcome, Just because indices and
packs can now be loaded in parallel.
The latter place best describes its purpose.
… big picture (#266)

- midx files don't include CRC32 information anymore. One might think
  that these are still in the single indices, but at least as of now
  it completely ignores these.
- CRC32 can be created on the fly, but there isn't a need even for pack
  to pack copies as the receiver is forced to build an index themselves
  from the entire pack, which forces multiple hash comparisons on the
  way.
Trying to avoid using the 'handle' idea, but now that it's not there
I do like it so much to bring it back and rather think of a better
name for the current top-level handle.
Because this is exactly what it is effectively.
Also add some basic instantiation for the new object store.
The latter is vital to know if something happened in the meantime.
…266)

The state-id hash might be best as crc32 though, let's just use this one
instead.
This allows tests to introspect a little more and provides useful
statistics for servers who would like to decide if a refresh is in
order to release handles or clear out the slot map.
As the single-threaded version is usually much faster and we want to see
this number just to get an understanding of how close our
single/multi-threaded performance differs.
hoping to get past the advisory issue
Remove Outcome as it's a single-variant enum and that's not how it was
intended anyway.
It's vital to get pack-ids right, to keep them stable, available and
convertable from u32 to index ids.
…utable handle (#266)

This makes sense, it's just that the trait isn't made for that and needs
to change. It seems alright to make it way more specific to what the
pack generator needs.
That's OK because ultimately it will copy vast portions of the data.
Now we handle this by doing a mem-move inside of the vec to achieve the
same result.

However, it that's more effort than was before, for sure.
Maybe one may keep the whole entry and let the output::Entry
handle this for us, it would just have to keep track of the data offset.
Now the new ODB _should_ work for creating packs.
It's a bit tricky to use the right kind of handle and transform the
Rc<Store> back into an Arc<Store>, but it works.
@Byron Byron merged commit 7d2e20c into main Dec 18, 2021
@Byron Byron deleted the sync-db-draft branch January 10, 2022 08:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant