-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add options to handle larger dataset for location models #687
Add options to handle larger dataset for location models #687
Conversation
use feather and pickle for faster loading split x_ca processing into idca using chunking_size
and update comments default behavior does not change, to enable use of feather and the use of chunking, do this: # load from original model, data = component_model(modelname, return_data=True, alt_values_to_feather=True, chunking_size=chunking_size)
b8beb66
to
425467e
Compare
…into location_estimation_patch
@jpn-- I have brought my branch up to date with develop now, and the merge should work now. |
@bwentl Does this work for all location/destination models? Non-mandatory tour destination, for example? Thanks! |
@stefancoe I believe so. The model type and the list of output data are the same for mandatory and non-mandatory tour destinations, so it should work. The only thing that needs to be changed is the |
@bwentl Thanks- I'll try it out. |
Currently, the
location_choice.py
used byfrom activitysim.estimation.larch import component_model
cannot load the estimation data in reasonable length of time. See issue #686.Description of the problem
The main reason why that is the case is that activitysim uses a chooser-variable table by alternatives – “cv” (zone 1, 2, etc. as columns, and attributes as rows), whereas larch uses a idca table by attributes – “ca” (dist, accessibility, etc. attributes as columns and zones as rows). In order to work with larch, activitysim converts the table format using the cv_to_ca function. However, this function does not scale well with large tables, and we found some modification is required to process the data.
Changes proposed
Here is a summary of the changes we did to
location_choice.py
to ensure the workplace location model data can be processed faster:alt_values_to_feather=False
(which is the default) prevent errors. This however should not happen if location model specifications are properly written.chunking_size=None
, which fall back to the original method of processing the entire dataset.Usage
In the estimation notebook for school or workplace location models, modify the way model and data are loaded with
activitysim.estimation.larch.component_model
: