PoC of new ODB design #273

Byron · 2021-12-08T13:03:02Z

After the design itself seemed to be pointing in the right direction, lets get a PoC done quickly to see if it works in reality as well.

Related to #266.

Tasks

And avoid abstraction to allow swapping in non-sync/threadsafe types. We can consider doing this once it's clear how all this works. Right now, we would probably need to include arc-swap in the set of abstractions.

Deletions happen rarely enough to allow open maps to be kept by handles until they are discared in their entirety.

…this has many problems. However, it slowly manifests that ideally we are able to handle a certain amount of files, mutate many of them at the same time, while being lock-free. It's something we can't have though, so maybe just operate on a single data structure but bank on caching it in the handles itself so that super-fast and entirely lock-free access isn't even required. The thing we can do is synchronizing access to the data structure, loading files one by one instead of in parallel, and claiming that this initial delay that's longer than it would have to be is acceptable knowing that from thereon out it will be fast. It will be fully lazy as well.

…#266)

This one, however, will collect them all and it's up to the handle implementation to decide how these are to be searched. This is only relevant for alternates, and as of now it's actually impossible to know when the packs of one odb are done to query its loose db. Instead, all packs of all dbs are queried first, then all loose object stores. Maybe that's even better this way, who knows.

The Extend outcome was removed as we are not able to know that packs are all loaded when we collect the outcome, Just because indices and packs can now be loaded in parallel.

The latter place best describes its purpose.

…266)

… big picture (#266) - midx files don't include CRC32 information anymore. One might think that these are still in the single indices, but at least as of now it completely ignores these. - CRC32 can be created on the fly, but there isn't a need even for pack to pack copies as the receiver is forced to build an index themselves from the entire pack, which forces multiple hash comparisons on the way.

Trying to avoid using the 'handle' idea, but now that it's not there I do like it so much to bring it back and rather think of a better name for the current top-level handle.

Because this is exactly what it is effectively. Also add some basic instantiation for the new object store.

The latter is vital to know if something happened in the meantime.

…266)

…266) The state-id hash might be best as crc32 though, let's just use this one instead.

This allows tests to introspect a little more and provides useful statistics for servers who would like to decide if a refresh is in order to release handles or clear out the slot map.

As the single-threaded version is usually much faster and we want to see this number just to get an understanding of how close our single/multi-threaded performance differs.

hoping to get past the advisory issue

Remove Outcome as it's a single-variant enum and that's not how it was intended anyway.

It's vital to get pack-ids right, to keep them stable, available and convertable from u32 to index ids.

…utable handle (#266) This makes sense, it's just that the trait isn't made for that and needs to change. It seems alright to make it way more specific to what the pack generator needs.

That's OK because ultimately it will copy vast portions of the data. Now we handle this by doing a mem-move inside of the vec to achieve the same result. However, it that's more effort than was before, for sure. Maybe one may keep the whole entry and let the output::Entry handle this for us, it would just have to keep track of the data offset.

Now the new ODB _should_ work for creating packs.

It's a bit tricky to use the right kind of handle and transform the Rc<Store> back into an Arc<Store>, but it works.

Use existing git_features facilities (#266)

ed0c266

Byron mentioned this pull request Dec 8, 2021

build 'discovery' ODB with auto-consistency for on-disk files #266

Closed

5 tasks

Byron added 28 commits December 9, 2021 07:54

remove slow/unnecessary threading utilities (#266)

269b7ef

commit to keeping PoC Sync (#266)

26bd96b

And avoid abstraction to allow swapping in non-sync/threadsafe types. We can consider doing this once it's clear how all this works. Right now, we would probably need to include arc-swap in the set of abstractions.

Don't try to communicate deletions (#266)

d5b2256

Deletions happen rarely enough to allow open maps to be kept by handles until they are discared in their entirety.

experiment with a different state representation (#266)

0765512

Adjust data structures to allow concurrent loads of indices and packs (…

fcadf55

…#266)

Integrate loose dbs with the returned result; simplify it (#266)

20b9a51

The Extend outcome was removed as we are not able to know that packs are all loaded when we collect the outcome, Just because indices and packs can now be loaded in parallel.

change!: move bundle::Location to data::entry::Location (#266)

82b9b33

The latter place best describes its purpose.

Adjust to new name/place of bundle::Location (#266)

1f8954d

Notes about multi-pack indices in the current data::entry::location (#…

7eff6bf

…266)

More affirmative notes about multi-pack indices (#266)

dceaea2

First sketch of general store (#266)

fc1b640

Trying to avoid using the 'handle' idea, but now that it's not there I do like it so much to bring it back and rather think of a better name for the current top-level handle.

change!: Rename Handle to Cache (#266)

580e96c

Because this is exactly what it is effectively. Also add some basic instantiation for the new object store.

Add all types the handle would have to store (#266)

e2f0cb0

Rework how state identification is handled (#266)

01f3c21

The latter is vital to know if something happened in the meantime.

Let the amount of actually loaded indices take part in the state-id (#…

d884e77

…266)

put down more types for loading of indices and refresh logic (#266)

9909eaf

Bring in the slotmap (#266)

3a5cb5f

Handle registration (#266)

df4e4eb

Use handle registration to avoid unloading packs; fix state-id hash (#…

a1070de

…266) The state-id hash might be best as crc32 though, let's just use this one instead.

more trustworthy state-id hashing (#266)

4eb43d0

first test to trigger all major code-paths (#266)

25b56c5

Support for metrics in general store handle (#266)

11b98b8

This allows tests to introspect a little more and provides useful statistics for servers who would like to decide if a refresh is in order to release handles or clear out the slot map.

thanks clippy

4ca9e07

thanks clippy

d1a956d

thanks clippy

b0f7328

Byron added 28 commits December 16, 2021 16:45

object access (decode bytes) tests for the new store (#266)

d8bffc3

Avoid runnning parallel libgit 2 too early (#266)

619c115

As the single-threaded version is usually much faster and we want to see this number just to get an understanding of how close our single/multi-threaded performance differs.

update dependencies (#266)

02ec88f

hoping to get past the advisory issue

A way to predict the amount of slots needed for smooth operation (#266)

a3a16d6

cleanup (#266)

a4f3670

Impl git_odb::Write for general::Handle (#266)

b7a6ab7

refactor (#266)

2c23f42

Remove Outcome as it's a single-variant enum and that's not how it was intended anyway.

prepare implementation of location-dependent methods (#266)

5de29f4

It's vital to get pack-ids right, to keep them stable, available and convertable from u32 to index ids.

refactor (#266)

6cb474e

It shows that we can't return anything referenced from the interior-m…

b9f308b

…utable handle (#266) This makes sense, it's just that the trait isn't made for that and needs to change. It seems alright to make it way more specific to what the pack generator needs.

Remove iterator access in favor of fully owned data (#266)

62d3f10

thanks clippy

bf4694c

remaining methods of git-pack::Find (#266)

92b9764

Now the new ODB _should_ work for creating packs.

Use new odb in place of the old one and it works (#266)

8ad25c5

Use new store in git-repository (#266)

2f9e342

A quick and dirty version index iteration (#266)

0384007

A more suitable iterator implementation for general store (#266)

af0cc5f

Make single-threaded programs possible to use with git-repository (#266)

dde5c6b

It's a bit tricky to use the right kind of handle and transform the Rc<Store> back into an Arc<Store>, but it works.

fix docs (#266)

360bf9d

minor improvements to module layout, docs (#266)

0364f48

change!: move loose::iter::Iter to loose::Iter (#266)

8bb5c9a

adapt to changes in git-odb (#266)

a44dd4b

dynamic store module cleanu (#266)

494772c

change!: move sink::Sink to the top-level exclusively (#266)

ab4e726

refactor (#266)

3da91ce

refactor (#266)

52a4dcd

refactor (#266)

b88f253

Byron merged commit 7d2e20c into main Dec 18, 2021

Byron deleted the sync-db-draft branch January 10, 2022 08:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PoC of new ODB design #273

PoC of new ODB design #273

Byron commented Dec 8, 2021 •

edited

Loading

PoC of new ODB design #273

PoC of new ODB design #273

Conversation

Byron commented Dec 8, 2021 • edited Loading

Tasks

Byron commented Dec 8, 2021 •

edited

Loading