Tips on speeding up database build times? #1110
-
I've been doing some image classification using Ludwig, and so far I've gotten around 75% accuracy. I want to try more configurations for the model, but every time I change my configuration file it gets stuck on An example of what one of my config files looks like:
For reference, I have about 105 12MP images that I'm preprocessing each run. Should it take this long? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
@why-does-ie-still-exist there are a few things you can do to improve this. Ludwig actually builds a cache of processed data after you run it the first time specifically to avoid this phenomenon, although there's an open issue (will solve it soon) about a bug that makes it recreate the cache when it is not needed #1078 . So when that issue is solved, this should not happen anymore (unless you change the preprocessing in your model definition). Once Ludwig does preprocessing it creates a .hdf5 and .json file with the same name of the dataset, if in subsequent runs you provide those instead of the csv as inputs you should not pay the cost of the preprocessing as those files are the actual caches. Additionally, you can speedup the process by setting multiple threads to run in parallel with the Finally, 12mp images may be quite big, until the issue I was telling you before is fixed, I would suggest reducing them to the desired size before. Hope this helps! |
Beta Was this translation helpful? Give feedback.
-
Lol, just realized I've been running Colab without GPU acceleration. The time went from 25 minutes to less than 2 🤦. That's one way to speed up your database build+training times. |
Beta Was this translation helpful? Give feedback.
@why-does-ie-still-exist there are a few things you can do to improve this.
Ludwig actually builds a cache of processed data after you run it the first time specifically to avoid this phenomenon, although there's an open issue (will solve it soon) about a bug that makes it recreate the cache when it is not needed #1078 . So when that issue is solved, this should not happen anymore (unless you change the preprocessing in your model definition).
Once Ludwig does preprocessing it creates a .hdf5 and .json file with the same name of the dataset, if in subsequent runs you provide those instead of the csv as inputs you should not pay the cost of the preprocessing as those files are the actual c…