Make small retrieval 200x faster #7693

magik6k · 2021-11-26T18:10:20Z

This PR adds direct random-access capability to the lotus DAGStore mount.

Right now, when any retrieval is performed, DAGStore will read the entire piece and store it in a temporary file for random access. This is acceptable when retrieving the entire piece, but incredibly inefficient when performing retrievals for small % of the original piece.

By avoiding this copy step, small retrievals in many cases become sub-second, and time-to-first-byte in all retrievals is drastically reduced.

Example timings

Running current master, retrieving a few kb from a 16G piece

 ./lotus client ls --miner f0... --maxPrice 0 bafy...
Recv 0 B, Paid 0 FIL, Open (New), 0s
Recv 0 B, Paid 0 FIL, DealProposed (WaitForAcceptance), 9ms
Recv 0 B, Paid 0 FIL, DealAccepted (Accepted), 45ms
Recv 0 B, Paid 0 FIL, PaymentChannelSkip (Ongoing), 45ms
Recv 0 B, Paid 0 FIL, Complete (CheckComplete), 1m6.321s
Recv 0 B, Paid 0 FIL, WaitForLastBlocks (WaitingForLastBlocks), 1m6.338s
Recv 79 KiB, Paid 0 FIL, BlocksReceived (WaitingForLastBlocks), 1m6.34s
Recv 79 KiB, Paid 0 FIL, AllBlocksReceived (FinalizingBlockstore), 1m6.354s
Recv 79 KiB, Paid 0 FIL, BlockstoreFinalized (Completed), 1m6.354s

Running this PR, also retrieving a few kb from a different 16G piece (different piece to avoid any potential caching)

./lotus client ls --miner f02620 --maxPrice 0 QmebwyN5z8CiczuTFsfr3rio3ttHao8CoZwMeFHVu4Xhmd
Recv 0 B, Paid 0 FIL, Open (New), 0s
Recv 0 B, Paid 0 FIL, DealProposed (WaitForAcceptance), 10ms
Recv 0 B, Paid 0 FIL, DealAccepted (Accepted), 67ms
Recv 0 B, Paid 0 FIL, PaymentChannelSkip (Ongoing), 67ms
Recv 0 B, Paid 0 FIL, Complete (CheckComplete), 267ms
Recv 0 B, Paid 0 FIL, WaitForLastBlocks (WaitingForLastBlocks), 280ms
Recv 2.025 KiB, Paid 0 FIL, BlocksReceived (WaitingForLastBlocks), 284ms
Recv 2.025 KiB, Paid 0 FIL, AllBlocksReceived (FinalizingBlockstore), 284ms
Recv 2.025 KiB, Paid 0 FIL, BlockstoreFinalized (Completed), 285ms

raulk

Took a cursory look, and the approach generally LGTM. However, something to call out is that efficiency will be (a) DAG-shape dependent and (b) request-dependent.

Re: (a), a DAG composed of tiny blocks will lead to substantial read amplification and roundtripping, and I expect this approach to be inefficient (comparatively how inefficient with the baseline approach will depend on (b)).

Re: (b), if the caller is fetching the root CID of the deal with the default selector, this equates to fetching the entire deal DAG. At that point it could be more efficient to copy the shard data and serve from local, like we were doing before.

@magik6k -- what do you think about introducing a "planner" component that decides which strategy to use based on (b)?

For (a) we should eventually store metadata about a DAG, something like a DESCRIBE DAG, as we receive it, along with some stats (average min/max/avg node size, etc). This would allow us to strategise around (a).

magik6k · 2021-11-26T19:24:37Z

@raulk From my testing, when retrieving the whole piece, reads tend to be sequential because graphsync requests data from the DAGStore in the same order it was written to the CAR file, and the pieceReader is exploiting this property.

For some, bigger partial retrievals most accesses should still be mostly sequential, but possibly with multiple "heads", but here more data/experimentation is needed.

(basically, in my testing, I didn't come up with an example in which copying data would be more efficient)

raulk · 2021-11-26T19:29:07Z

@magik6k Oh yeah, that's a good point. Both the layout of data in disk and the retrieval order are depth-first. That fact, along with the "burn" leniency, should solve both challenges in majority of cases 👍 Maybe we can think of some selectors and DAG combinations whose pattern of access would be different, and test those?

codecov · 2021-11-26T19:32:45Z

Codecov Report

Merging #7693 (4bcde2f) into master (3b1d86b) will decrease coverage by 0.03%.
The diff coverage is 68.18%.

@@            Coverage Diff             @@
##           master    #7693      +/-   ##
==========================================
- Coverage   39.56%   39.53%   -0.04%     
==========================================
  Files         637      638       +1     
  Lines       67924    67990      +66     
==========================================
+ Hits        26872    26877       +5     
- Misses      36446    36488      +42     
- Partials     4606     4625      +19

Impacted Files	Coverage Δ
markets/dagstore/piecereader.go	`56.66% <56.66%> (ø)`
extern/sector-storage/mock/mock.go	`65.37% <66.66%> (ø)`
extern/sector-storage/piece_provider.go	`55.69% <68.42%> (-1.07%)`	⬇️
markets/dagstore/miner_api.go	`63.09% <70.00%> (ø)`
markets/sectoraccessor/sectoraccessor.go	`72.30% <71.42%> (-2.30%)`	⬇️
markets/dagstore/mocks/mock_lotus_accessor.go	`80.95% <82.60%> (+0.46%)`	⬆️
markets/dagstore/mount.go	`85.71% <100.00%> (+12.38%)`	⬆️
node/builder_miner.go	`94.89% <100.00%> (+0.03%)`	⬆️
node/modules/storageminer_dagstore.go	`61.53% <100.00%> (ø)`
journal/types.go	`86.66% <0.00%> (-13.34%)`	⬇️
... and 32 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3b1d86b...4bcde2f. Read the comment docs.

dirkmc

One question but in general LGTM 👍

markets/dagstore/piecereader.go

hunjixin · 2022-01-05T03:05:41Z

is this pr was rollback?

magik6k added 6 commits November 26, 2021 17:40

dagstore mount: Add random access support

8d955d5

Fix dagstore pieceReader burn logic

a9ee263

Fix sector-storage tests

f6de16e

Fix imports

abec7dd

Add startOffset support to mock SectorMgr.ReadPiece

743ce5a

Tweak MaxPieceReaderBurnBytes

331702c

magik6k requested a review from a team as a code owner November 26, 2021 18:10

jennijuju requested review from aarshkshah1992, nonsense, raulk, masih and dirkmc and removed request for aarshkshah1992 November 26, 2021 18:22

raulk reviewed Nov 26, 2021

View reviewed changes

jennijuju removed the request for review from masih November 26, 2021 19:07

jennijuju modified the milestones: v1.13.1, v1.13.2 Nov 26, 2021

dagstore pieceReader: add debug log on stream restart

9110e6f

magik6k force-pushed the feat/randacc-dagstore-mount branch from 6be6dc4 to 9110e6f Compare November 26, 2021 19:25

Request correct read size with startOffset in pieceProvider

05aa860

dirkmc approved these changes Nov 29, 2021

View reviewed changes

markets/dagstore/piecereader.go Outdated Show resolved Hide resolved

dagstore pieceReader: Cleanup reader nil check

4bcde2f

magik6k merged commit 26c3752 into master Nov 29, 2021

magik6k deleted the feat/randacc-dagstore-mount branch November 29, 2021 15:10

jennijuju added the docs-needed label Nov 29, 2021

magik6k mentioned this pull request Dec 8, 2021

Make retrieval even faster #7746

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make small retrieval 200x faster #7693

Make small retrieval 200x faster #7693

magik6k commented Nov 26, 2021

raulk left a comment •

edited

Loading

magik6k commented Nov 26, 2021

raulk commented Nov 26, 2021

codecov bot commented Nov 26, 2021 •

edited

Loading

dirkmc left a comment

hunjixin commented Jan 5, 2022

Make small retrieval 200x faster #7693

Make small retrieval 200x faster #7693

Conversation

magik6k commented Nov 26, 2021

raulk left a comment • edited Loading

Choose a reason for hiding this comment

magik6k commented Nov 26, 2021

raulk commented Nov 26, 2021

codecov bot commented Nov 26, 2021 • edited Loading

Codecov Report

dirkmc left a comment

Choose a reason for hiding this comment

hunjixin commented Jan 5, 2022

raulk left a comment •

edited

Loading

codecov bot commented Nov 26, 2021 •

edited

Loading