Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add options to handle larger dataset for location models #687

Merged

Conversation

bwentl
Copy link
Contributor

@bwentl bwentl commented Jun 23, 2023

Currently, the location_choice.py used by from activitysim.estimation.larch import component_model cannot load the estimation data in reasonable length of time. See issue #686.

Description of the problem

The main reason why that is the case is that activitysim uses a chooser-variable table by alternatives – “cv” (zone 1, 2, etc. as columns, and attributes as rows), whereas larch uses a idca table by attributes – “ca” (dist, accessibility, etc. attributes as columns and zones as rows). In order to work with larch, activitysim converts the table format using the cv_to_ca function. However, this function does not scale well with large tables, and we found some modification is required to process the data.

Changes proposed

Here is a summary of the changes we did to location_choice.py to ensure the workplace location model data can be processed faster:

  • Added the conversion of csv data to feather to speed up subsequent reads of the estimation data bundle – note that we did ran into issue with loading versions of csv data that have mixed type in a column when using feather, in that case, set alt_values_to_feather=False (which is the default) prevent errors. This however should not happen if location model specifications are properly written.
  • Add chunking to the processing of the alternatives data, we typically use chunking size of 50000 rows of the cv-format table, but you might need to adjust this depending the spec of your machine or the requirement of your model. The default of this is chunking_size=None, which fall back to the original method of processing the entire dataset.

Usage

In the estimation notebook for school or workplace location models, modify the way model and data are loaded with activitysim.estimation.larch.component_model:

modelname = "workplace_location"

from activitysim.estimation.larch import component_model

# set chunking size for x_ca. Note that this parameter has no effect if "{name}_x_ca.pkl" cache is present.
chunking_size = 50000

# load from original
model, data = component_model(modelname,
                              return_data=True,
                              alt_values_to_feather=True,
                              chunking_size=chunking_size)

bwentl added 3 commits June 23, 2023 09:51
use feather and pickle for faster loading
split x_ca processing into idca using chunking_size
and update comments
default behavior does not change, to enable use of feather
and the use of chunking, do this:

    # load from original
    model, data = component_model(modelname,
                                  return_data=True,
                                  alt_values_to_feather=True,
                                  chunking_size=chunking_size)
@bwentl bwentl force-pushed the location_estimation_patch branch from b8beb66 to 425467e Compare November 2, 2023 00:01
@jpn-- jpn-- added the Phase 8 Feature planned for inclusion in Phase 8 release label Jan 30, 2024
@jpn-- jpn-- changed the base branch from main to develop January 31, 2024 18:53
@bwentl
Copy link
Contributor Author

bwentl commented Feb 6, 2024

@jpn-- I have brought my branch up to date with develop now, and the merge should work now.

@stefancoe
Copy link
Contributor

@bwentl Does this work for all location/destination models? Non-mandatory tour destination, for example? Thanks!

@bwentl
Copy link
Contributor Author

bwentl commented Feb 7, 2024

@bwentl Does this work for all location/destination models? Non-mandatory tour destination, for example? Thanks!

@stefancoe I believe so. The model type and the list of output data are the same for mandatory and non-mandatory tour destinations, so it should work. The only thing that needs to be changed is the modelname when you load the data for estimation.

@stefancoe
Copy link
Contributor

@bwentl Thanks- I'll try it out.

@jpn-- jpn-- merged commit 6a20954 into ActivitySim:develop Feb 8, 2024
18 checks passed
@jpn-- jpn-- mentioned this pull request Feb 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Phase 8 Feature planned for inclusion in Phase 8 release
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants