Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate long TTFB for retrievals in the deal dashboard #421

Open
davidd8 opened this issue May 12, 2022 · 7 comments
Open

Investigate long TTFB for retrievals in the deal dashboard #421

davidd8 opened this issue May 12, 2022 · 7 comments
Labels
P1 P1 Issue workstream/dashboards This issue blocks the mainnet DSR dashboards.

Comments

@davidd8
Copy link
Collaborator

davidd8 commented May 12, 2022

The observable dashboard is showing high average TTFB metrics for retrievals, on the order of hours. Is this an issue with the dashboard, or are retrievals taking this long to get started in data transfer (could it be related to a concurrency limit set by SPs, noted in https://www.notion.so/pl-strflt/Estuary-Elijah-1-5-22-505f2f1ac57648f1bd983323ffb47d48)?

@davidd8 davidd8 added workstream/dashboards This issue blocks the mainnet DSR dashboards. P1 P1 Issue labels May 12, 2022
@davidd8 davidd8 assigned davidd8 and unassigned davidd8 May 12, 2022
@dkkapur
Copy link
Collaborator

dkkapur commented May 12, 2022

the graphql endpoint has the ttfb data per deal / dealbot task coming in:

  query: `query {FinishedTasks(UUIDs: ${JSON.stringify(uuids)}) { All { MinerLatencyMS TimeToFirstByteMS TimeToLastByteMS ClientVersion MinerVersion ProposalCID DealIDString}}}`})

@dkkapur
Copy link
Collaborator

dkkapur commented May 12, 2022

from @willscott, TTFB is calculated as

i think it's when the state change to transfering / data received first happens after when the request starts

@dkkapur
Copy link
Collaborator

dkkapur commented May 12, 2022

as a step 1 here, would be great to just get the distribution of TTFBs for all the retrieval attempts in the last week. would help identify if everything has gotten worse or we just have a few outliers (and if so, which SP IDs those are coming from).

@davidd8
Copy link
Collaborator Author

davidd8 commented May 12, 2022

cc @kylehuntsman

@kylehuntsman
Copy link

I agree, I think the metric is correct in showing the average time, but the underlying data could be misrepresenting the practical norm. We could calculate the median as a real quick sanity check.

@davidd8
Copy link
Collaborator Author

davidd8 commented May 13, 2022

I did a quick check, and it looks like the median is about 2.5hrs with the lowest TTFB at 68m and the highest around 10hrs. So the numbers are consistently higher than expected.

@brendalee
Copy link

out of curiosity, is it possible to get the minerIDs for these? maybe we can try to understand if it was for unsealed data or there was some other issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 P1 Issue workstream/dashboards This issue blocks the mainnet DSR dashboards.
Projects
None yet
Development

No branches or pull requests

4 participants