-
Notifications
You must be signed in to change notification settings - Fork 100
Estimation Mode
Ben Stabler edited this page Mar 6, 2020
·
45 revisions
See Phase 5 Task 9 for an overview of the task.
- larch (includes MTC nested mode choice and destination choice examples; uses numpy; oriented toward practical linear models)
- PandasBiogeme (i.e. the latest pip-able biogeme version; includes MNL and NL; oriented toward research (non-linear) models)
- pylogit (not too active; slower for large alternative MNL like dest choice)
- choicemodels (uses pylogit; fast large alternative MNL like dest choice; no nested logit)
We need to build a table of characteristics, pros, cons and select the best estimation tool to prototype estimation mode integration.
Inspired by DaySim estimation mode design, which means new terms / alternative model structures (say changing the nesting structure) are done first in asim and then required data is written out to the estimation tool
A script will transform the activitysim estimation data export to estimation formats, run the estimation tool, and update the activitysim model coefficients.
The basic workflow:
- Run a 10k household sample up through a model such as tour_mode_choice to create a synthetic version of a household travel survey in activitysim format up to that point. In reality the user uses their actual household survey and not a synthetic one, but this is a detail for subsequent work on this task.
- The output pipeline then becomes the input to running activitysim in estimation mode
- We then run just tour_mode_choice again, but this time with estimation mode set to true to write out all the required data for re-estimating the model. We're calling this the estimation data bundle, which is a subset of the trace data, but for every household :
- chooser table
- expression values table
- raw_utilities table (to determine if alternatives are available)
- Plus model specification inputs such as the yaml, spec, and coefficients file
- A pandas script then:
- Reads the chooser table, expression values, raw_utilities, tour_mode_choice.yaml, tour_mode_choice.csv, and tour_mode_choice_coeffs.csv
- Transforms the data into the formats required by the estimation tool
- Runs estimation
- Writes the output coefficients back to activitysim format
Some additional considerations
- We need to better separate coefficients from data so the coefficients can be re-estimated in the estimation tool and then easily passed back to asim for model simulation. I think this means every logit model needs an explicit coefficients.csv input file now.
- We will also read in the observed_choice (alternative) in order to use it instead of the chosen alternative for subsequent model step re-estimation. This includes for post-processor annotators as well.
- If destination sampling is done, then the observed alternative may not be in the estimation data bundle. Therefore, we'll just run it without sampling for now so we get all the alternatives. It should be ok since the sample size is not very big.
- We will prototype tour mode choice
- We will use CSV formats
- Create the 10k HH asim format HH survey through tour_mode_choice
- Clean-up coefficients separation #303 and write out the estimation data bundle #304
- Write pandas script to transform data, run estimation tool, transform coefficient file, etc.
- We'll create a first version and then iterate as needed
- For now we're focused on just the integration with the estimation tool; we'll work on the "using the observed choice and running downstream models" need in the second half of this task