Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster image load #189

Merged
merged 10 commits into from
Oct 11, 2023
Merged

Faster image load #189

merged 10 commits into from
Oct 11, 2023

Conversation

hackartisan
Copy link
Member

@hackartisan hackartisan commented Oct 10, 2023

This improves performance in 3 ways:

  1. Instead of running an aws ls per each resource in the database, we run it once per "disk" in the image filenames. This lets us call it 22 times, instead of 40 thousand times.

  2. We just create the CardImage objects by parsing the file name string into the correct "path" value. This way we don't need to fetch each of the 40 thousand guide card objects out of the database.

  3. Uses the ruby async library to take advantage of multiple cores and use fibers for concurrent programming

This PR also changes the progress bar to a display an elapsed time, ETA, exact percent, and total.

@hackartisan hackartisan force-pushed the 188-faster-image-load branch from 5ce9731 to 27223df Compare October 11, 2023 15:28
Copy link
Contributor

@bess bess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@hackartisan
Copy link
Member Author

hackartisan commented Oct 11, 2023

currently progressbar is not printing great -- there's no time info and every update is printing on a new line, which is not great but otherwise this seems to work, and successfully runs the code concurrently.

This splits the 22 sets of images up more evenly, and uses a bit fewer
concurrent resources. Previous iteration overwhelmed the staging machine
into a memory allocation crash. Resources have also been increased, and
now there are 8 cores, so 8 seems good for that reason, too.
@hackartisan hackartisan force-pushed the 188-faster-image-load branch 5 times, most recently from 829d66c to cdd4eed Compare October 11, 2023 18:52
Copy link
Contributor

@bess bess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I've never seen Async::Semaphore before. Seems super useful!

@bess
Copy link
Contributor

bess commented Oct 11, 2023

Also, "This lets us call it 22 times, instead of 40 thousand times" is possibly the best code commit message ever. 🤣

@hackartisan hackartisan force-pushed the 188-faster-image-load branch from cdd4eed to 794497f Compare October 11, 2023 18:57
@hackartisan
Copy link
Member Author

hackartisan commented Oct 11, 2023

To get this to run without can't set a guard page: Cannot allocate memory I had to set vm.max_map_count=2097152 in /etc/sysctl.conf then sudo reboot. I'm about to do that on the prod box also (done).

The boxes have been increased to have 8 cores, 64G memory. It's running at 7.6G memory use right now on staging. I think the network or postgres machine is the bottleneck at this point, because it blasts through the ones that it finds during the find or create step, and then slows down once it hits the ones it hasn't found. I think this means it will run really fast on prod, because we were so close to having loaded all the images.

@hackartisan hackartisan merged commit 720f8fd into main Oct 11, 2023
@hackartisan hackartisan deleted the 188-faster-image-load branch October 11, 2023 21:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants