-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster image load #189
Faster image load #189
Conversation
instead of aws ls per database object closes #188
also fix spec nesting
5ce9731
to
27223df
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
currently progressbar is not printing great -- there's no time info and every update is printing on a new line, which is not great but otherwise this seems to work, and successfully runs the code concurrently. |
This splits the 22 sets of images up more evenly, and uses a bit fewer concurrent resources. Previous iteration overwhelmed the staging machine into a memory allocation crash. Resources have also been increased, and now there are 8 cores, so 8 seems good for that reason, too.
829d66c
to
cdd4eed
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, I've never seen Async::Semaphore before. Seems super useful!
Also, "This lets us call it 22 times, instead of 40 thousand times" is possibly the best code commit message ever. 🤣 |
cdd4eed
to
794497f
Compare
To get this to run without The boxes have been increased to have 8 cores, 64G memory. It's running at 7.6G memory use right now on staging. I think the network or postgres machine is the bottleneck at this point, because it blasts through the ones that it finds during the find or create step, and then slows down once it hits the ones it hasn't found. I think this means it will run really fast on prod, because we were so close to having loaded all the images. |
This improves performance in 3 ways:
Instead of running an
aws ls
per each resource in the database, we run it once per "disk" in the image filenames. This lets us call it 22 times, instead of 40 thousand times.We just create the CardImage objects by parsing the file name string into the correct "path" value. This way we don't need to fetch each of the 40 thousand guide card objects out of the database.
Uses the ruby async library to take advantage of multiple cores and use fibers for concurrent programming
This PR also changes the progress bar to a display an elapsed time, ETA, exact percent, and total.