Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

booster-http: disable on-the-fly indexing by default #646

Merged
merged 1 commit into from
Aug 24, 2022
Merged

Conversation

dirkmc
Copy link
Contributor

@dirkmc dirkmc commented Jul 8, 2022

No description provided.

@jacobheun
Copy link
Contributor

What's the consequence of having this disabled? I get the resource savings here by having it off, but if this is disabled that risks a failed retrieval, correct?

If that's the case I think we should default to "make it work out of the box", and then document and allow tuning for performance via params, noting the trade off of shutting certain things off.

@dirkmc
Copy link
Contributor Author

dirkmc commented Aug 23, 2022

A piece is stored in a sector.
A CAR file is stored in a piece (the piece contains the CAR file + some padding).

In order to retrieve a CAR file from a sector, we need to know

  • where the piece is within the sector
  • where the end of the CAR file is within the piece
                            v
[sectorsectorsector{<carfile>piecepadding}sectorsector]

We figure out where the end of the file CAR file is by

  • getting the CAR file index from the DAG store
  • finding the highest offset in the index
  • calculating the size of the last CAR file segment

Indexing-on-the-fly is when

  • a client makes a request to retrieve a CAR file
  • the piece has not finished indexing
  • we build an index of the CAR file on-the-fly in the booster-http process
  • we follow the above steps to figure out where the end of the CAR file is

In practice this process is relatively expensive, because booster-http needs to retrieve the entire piece and index it. If there are lots of retrieval requests booster-http will do this repeatedly.

So I think it makes sense to disable this functionality by default. When the client makes a request for a CAR file in a piece that has not been indexed, the SP should notice errors in their logs and manually run commands to index the piece.

@jacobheun
Copy link
Contributor

I'm assuming the manual intervention that's needed here is if indexing fails, is that correct? If it's processing and will eventually succeed I think that's fine. I'd like to understand what the manual intervention pieces of this look like so that we can try and make that more prevalent to SP's, or better yet, automatic.

@dirkmc
Copy link
Contributor Author

dirkmc commented Aug 23, 2022

Yes the manual intervention is to fix indexing if it fails 👍

The actual steps depend on exactly went wrong. It could be that the sealing node was down when indexing was attempted, a corrupted datastore, a corrupted CAR file, etc. I think we should address those as part of the boost doctor work.

@jacobheun
Copy link
Contributor

yeah, that seems reasonable to focus that on showcasing with the doctor effort.

Copy link
Contributor

@ribasushi ribasushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️

@dirkmc dirkmc merged commit 50bc2e6 into main Aug 24, 2022
@dirkmc dirkmc deleted the feat/http-idx branch August 24, 2022 08:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants