-
Notifications
You must be signed in to change notification settings - Fork 100
Project Meeting 2024.05.30
Michelle Bina edited this page May 31, 2024
·
1 revision
- Phase 9a Progress Update
-
Full Scale Performance: Single Process, Sharrow on
- 100% sample run has been completed. It's not taking a lot of memory, but the run time is high (36 hours). (For comparison, multiple processing with sharrow off takes about 3 hours).
- There are still memory spikes, related to the preprocessor (part of the code that isn't run through sharrow). Highest memory peak is 420 GBs. A month ago, Sijia ran with a 50% sample and estimiated the full sample would take about 500 GBs, so the 420 GBs is within expectations (or lower).
- Jeff did a quick look through some spec files, where the model was taking a long time, comparing those specs in the SANDAG model with those in the MTC model (that runs quickly), and noticed that the SANDAG files included a np.where function that is not in the MTC model. Jeff did a simple test compiling with numba, using the np.where function and it ran super slow. He rewrote the same process without that function and the runtime is about 400 times faster. It is hypothesized that not include np.where functions through sharrow would make the model run faster. There are two ways this could be addressed:
- Add something to sharrow that rewrites any np.where functions <--- Jeff will try this first (in the next day) and assess it's feasibility
- Go through individual spec files and rewrite the conditional statements everywhere (though this may have adverse impacts on the non-sharrow version)
- Some things that still don't seem to make sense:
- We don't understand why this isn't a problem in the multiprocessing run with sharrow on. <--- We will circle back to this after Ali finishes a running with all the environment variables set to 1.
- The workplace location choice model is running slowly but doesn't have any np.where functions.
- Current latest run times:
- 1-zone MTC
- Single-process without sharrow: 21 hours
- Single-process with sharrow: 7.7 hours
- Multiprocess without sharrow: 2.9 hours
- Multiprocess with sharrow: 1.8 hours
- 2-zone SANDAG
- Single-process without sharrow: 26 hours
- Single-process with sharrow: 36 hours
- Multiprocess without sharrow: 3 hours
- Multiprocess with sharrow: 2.5 hours
- 1-zone MTC
- Next steps (while Jeff is on vacation over the next two weeks)
- Working on explicit chunking (sharrow on and sharrow off) <--- Ali is running now
- Testing single process sharrow off with environment variables set to 1, using 10% sample <--- Ali to run
- Partial rewrite (pending Jeff’s investigation above): Sijia could take the np.where function out of a couple of component specs and see if that does reduce the run time significantly.
- If it's decided to convert np.where to explicit expression, it would be good to test if it makes the non-sharrow version worse.
- Another experiment would be to take a workplace location spec and cut down to a single expression and reprofile the code and see if it runs wildly faster or a little faster. If it’s a lot faster, than there’s a problem expression in the ones that we took out. Start with a simple spec file and then kept adding until the performance goes down. <--- Wait until we work through the other things first before doing this experiment.
- Sijia created an issue under ABM3: Sharrow test crashes with utility not aligned error
- Jeff is pretty sure that the issue is with the spec (like fast math is turned on or off incorrectly in that particular run). Sijia will dig into this.