Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Followup to #7478: Move Cover Tars -> Zips #9560

Closed
12 tasks
mekarpeles opened this issue Jul 12, 2024 · 3 comments · Fixed by #9752
Closed
12 tasks

Followup to #7478: Move Cover Tars -> Zips #9560

mekarpeles opened this issue Jul 12, 2024 · 3 comments · Fixed by #9752
Assignees
Labels
Lead: @cdrini Issues overseen by Drini (Staff: Team Lead & Solr, Library Explorer, i18n) [managed] Module: Cover Service Cover Store (book covers service) Priority: 1 Do this week, receiving emails, time sensitive, . [managed] State: Blocked Work has stopped, waiting for something (Info, Dependent fix, etc. See comments). [managed] Theme: Performance Issues related to UI or Server performance. [managed] Type: Feature Request Issue describes a feature or enhancement we'd like to implement. [managed]

Comments

@mekarpeles
Copy link
Member

mekarpeles commented Jul 12, 2024

Problem

A clear and concise description of what you want to happen

Open Library relies on Archive.org as a storage layer for millions of book covers and has a resolver at covers.openlibrary.org which looks in the Open Library cover DB for a cover and determines where it lives (either on OL disk or in an archive.org item).

Right now w're seeing what appears to be high volume, programatic access of our covers... Instead of downloading the entire tar, they are downloading a cover at a time rapidly. This is causing performance issues on Archive.org nodes because accessing tars is much slower than zip.

We should block this high volume access first of all as a stop gap, but either way we should prioritize moving these tars to zips.

Prerequisites

Phase I

The strategy is:

  • keep all tars in their archive.org cover items
  • produces corresponding zips for each batch
  • update the OL cover database to point to the zip files
  • test that the covers are indeed serving the zips, not the tars
  • remove the tars

Phase II

There is one exceptional case, which is that there is a batch of tars on disk that we'd like to be zips. One solution (while not ideal) is to perform this same process, to:

  • upload the tars to an archive.org item but DON'T serve from them
  • run Hank's derive scripts to convert those tar to zips
  • delete the tars from archive.org
  • update the OL db to use these zips
  • confirm they're working
  • remove these final remaining tars from OL disk.

Proposal & Constraints

What is the proposed solution / implementation?

Is there a precedent of this approach succeeding elsewhere?

Which suggestions or requirements should be considered for how feature needs to appear or be implemented?

Leads

Related files

Stakeholders


Instructions for Contributors

  • Please run these commands to ensure your repository is up to date before creating a new branch to work on this issue and each time after pushing code to Github, because the pre-commit bot may add commits to your PRs upstream.
@mekarpeles mekarpeles added Type: Feature Request Issue describes a feature or enhancement we'd like to implement. [managed] Priority: 1 Do this week, receiving emails, time sensitive, . [managed] Module: Cover Service Cover Store (book covers service) Theme: Performance Issues related to UI or Server performance. [managed] Needs: Triage This issue needs triage. The team needs to decide who should own it, what to do, by when. [managed] Lead: @cdrini Issues overseen by Drini (Staff: Team Lead & Solr, Library Explorer, i18n) [managed] Needs: Lead labels Jul 12, 2024
@mekarpeles mekarpeles added this to the Sprint 2024-07 milestone Jul 12, 2024
@mekarpeles mekarpeles self-assigned this Jul 12, 2024
@hbromley
Copy link

hbromley commented Jul 12, 2024

See jira issue PBOX-3879 for creation of the fixer op that would make the zips.

@mekarpeles mekarpeles removed Needs: Triage This issue needs triage. The team needs to decide who should own it, what to do, by when. [managed] Needs: Lead labels Jul 15, 2024
@cdrini cdrini added the State: Blocked Work has stopped, waiting for something (Info, Dependent fix, etc. See comments). [managed] label Jul 29, 2024
@cdrini
Copy link
Collaborator

cdrini commented Jul 29, 2024

Marking as Blocked since waiting for PBOX-3879.

@hbromley
Copy link

hbromley commented Aug 1, 2024

Jira issue PBOX-3879 is now closed, and the new fixer op is currently being deployed to the PRI servers where it would run.

Let me know if you need any help running it on all your items that contain tars.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Lead: @cdrini Issues overseen by Drini (Staff: Team Lead & Solr, Library Explorer, i18n) [managed] Module: Cover Service Cover Store (book covers service) Priority: 1 Do this week, receiving emails, time sensitive, . [managed] State: Blocked Work has stopped, waiting for something (Info, Dependent fix, etc. See comments). [managed] Theme: Performance Issues related to UI or Server performance. [managed] Type: Feature Request Issue describes a feature or enhancement we'd like to implement. [managed]
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants