Faster image load #189

hackartisan · 2023-10-10T19:30:03Z

This improves performance in 3 ways:

Instead of running an aws ls per each resource in the database, we run it once per "disk" in the image filenames. This lets us call it 22 times, instead of 40 thousand times.
We just create the CardImage objects by parsing the file name string into the correct "path" value. This way we don't need to fetch each of the 40 thousand guide card objects out of the database.
Uses the ruby async library to take advantage of multiple cores and use fibers for concurrent programming

This PR also changes the progress bar to a display an elapsed time, ETA, exact percent, and total.

instead of aws ls per database object closes #188

also fix spec nesting

bess

🎉

hackartisan · 2023-10-11T15:53:34Z

currently progressbar is not printing great -- there's no time info and every update is printing on a new line, which is not great but otherwise this seems to work, and successfully runs the code concurrently.

This splits the 22 sets of images up more evenly, and uses a bit fewer concurrent resources. Previous iteration overwhelmed the staging machine into a memory allocation crash. Resources have also been increased, and now there are 8 cores, so 8 seems good for that reason, too.

bess

Cool, I've never seen Async::Semaphore before. Seems super useful!

bess · 2023-10-11T18:57:03Z

Also, "This lets us call it 22 times, instead of 40 thousand times" is possibly the best code commit message ever. 🤣

hackartisan · 2023-10-11T20:01:29Z

To get this to run without can't set a guard page: Cannot allocate memory I had to set vm.max_map_count=2097152 in /etc/sysctl.conf then sudo reboot. I'm about to do that on the prod box also (done).

The boxes have been increased to have 8 cores, 64G memory. It's running at 7.6G memory use right now on staging. I think the network or postgres machine is the bottleneck at this point, because it blasts through the ones that it finds during the find or create step, and then slows down once it hits the ones it hasn't found. I think this means it will run really fast on prod, because we were so close to having loaded all the images.

hackartisan added 8 commits October 10, 2023 15:03

Refactor progress bar

b4d3c7e

Refactor CardImageLoadingService to use aws ls per disk

95b4009

instead of aws ls per database object closes #188

Use percentage progress bar with ETA, and log each disk number

0e25dd9

Update readme with new image load time estimate

5e9a5fe

Don't fetch disk 11 with disk 1.

500cb88

also fix spec nesting

Fix rubocop

73c3052

simplify s3 command

6d90cdc

Use async to speed up db writes

27223df

hackartisan force-pushed the 188-faster-image-load branch from 5ce9731 to 27223df Compare October 11, 2023 15:28

bess approved these changes Oct 11, 2023

View reviewed changes

hackartisan force-pushed the 188-faster-image-load branch 5 times, most recently from 829d66c to cdd4eed Compare October 11, 2023 18:52

bess approved these changes Oct 11, 2023

View reviewed changes

Reduce resources used on 2nd async with a semaphore

794497f

hackartisan force-pushed the 188-faster-image-load branch from cdd4eed to 794497f Compare October 11, 2023 18:57

hackartisan merged commit 720f8fd into main Oct 11, 2023

hackartisan deleted the 188-faster-image-load branch October 11, 2023 21:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster image load #189

Faster image load #189

hackartisan commented Oct 10, 2023 •

edited

Loading

bess left a comment

hackartisan commented Oct 11, 2023 •

edited

Loading

bess left a comment

bess commented Oct 11, 2023

hackartisan commented Oct 11, 2023 •

edited

Loading

Faster image load #189

Faster image load #189

Conversation

hackartisan commented Oct 10, 2023 • edited Loading

bess left a comment

Choose a reason for hiding this comment

hackartisan commented Oct 11, 2023 • edited Loading

bess left a comment

Choose a reason for hiding this comment

bess commented Oct 11, 2023

hackartisan commented Oct 11, 2023 • edited Loading

hackartisan commented Oct 10, 2023 •

edited

Loading

hackartisan commented Oct 11, 2023 •

edited

Loading

hackartisan commented Oct 11, 2023 •

edited

Loading