Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow performance after docker upgrade #7667

Closed
7 tasks done
mekarpeles opened this issue Mar 15, 2023 · 0 comments
Closed
7 tasks done

Slow performance after docker upgrade #7667

mekarpeles opened this issue Mar 15, 2023 · 0 comments
Assignees
Labels
Affects: Operations Affects the IA DevOps folks Module: Solr Issues related to the configuration or use of the Solr subsystem. [managed] Priority: 0 Fix now: Issue prevents users from using the site or active data corruption. [managed] Theme: Performance Issues related to UI or Server performance. [managed] Theme: Provisioning Type: Post-Mortem Log for when having to resolve a P0 issue

Comments

@mekarpeles
Copy link
Member

mekarpeles commented Mar 15, 2023

Summary

  • What is wrong?

tl;dr: apt-get install apparmor

Site is extremely slow, solr auto-restarting frequently, 503s for merge queue (EDIT: some of these issues preceded docker upgrade)

We checked on

  • What caused it?

At ~11am PT @cclauss performed a docker upgrade on ol-home0 re: #7626 (comment). The upgrade was completed successfully however no containers were running.

Initially, we suspected ~performance issues with ol-www0? We ran a sudo docker restart openlibrary_web_nginx_1 openlibrary_web_haproxy_1 on ol-www0 which seemed to work for a moment.

  • What fixed it?

We ssh'ed to ol-home0 and noticed no containers running via docker ps. Identified this was related to the recent docker upgrade. When we tried to manually restart containers:

cd /opt/openlibrary
export COMPOSE_FILE="docker-compose.yml:docker-compose.production.yml"
export    HOSTNAME="$HOSTNAME"
docker compose --profile ol-home0 up -d

And failed with error:

Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: unable to apply AppArmor profile: AppArmor failed to apply profile: write /proc/self/attr/apparmor/exec: no such file or directory: unknown

We asked our newest team member ChatGPT for help and didn't get great suggestions however, we found a hint in docker/for-linux#1199 which suggested apt install AppArmor. After a restart, and re-running of the commands above, the systems were back up!

  • Meta Problems
  • Notifications when big production changes to #openlibrary
  • Attempt install in dev environment prior to production
  • Upgrade
  • Followup actions:

https://github.com/internetarchive/openlibrary/wiki/Production-Service-Architecture#performing-upgrades

Steps to close

  1. Assignment: Is someone assigned to this issue? (notetaker, responder)
  2. Labels: Is there an Affects: label applied?
  3. Diagnosis: Add a description and scope of the issue
  4. Updates: As events unfold, is notable provenance documented in issue comments? (i.e. useful debug commands / steps / learnings / reference links)
  5. "What caused it?" - please answer in summary
  6. "What fixed it?" - please answer in summary
  7. "Followup actions:" actions added to summary
@mekarpeles mekarpeles added Module: Solr Issues related to the configuration or use of the Solr subsystem. [managed] Theme: Performance Issues related to UI or Server performance. [managed] Priority: 0 Fix now: Issue prevents users from using the site or active data corruption. [managed] Type: Post-Mortem Log for when having to resolve a P0 issue Affects: Operations Affects the IA DevOps folks labels Mar 15, 2023
@mekarpeles mekarpeles self-assigned this Mar 15, 2023
@mekarpeles mekarpeles changed the title Slow performance & solr restarts Slow performance after docker upgrade Mar 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Affects: Operations Affects the IA DevOps folks Module: Solr Issues related to the configuration or use of the Solr subsystem. [managed] Priority: 0 Fix now: Issue prevents users from using the site or active data corruption. [managed] Theme: Performance Issues related to UI or Server performance. [managed] Theme: Provisioning Type: Post-Mortem Log for when having to resolve a P0 issue
Projects
None yet
Development

No branches or pull requests

2 participants