Improved test system to cover activitysim use cases

Work-in-progress

Purpose and Need

The purpose of this improvement to ActivitySim is to develop a solution that provides additional assurances that future updates to ActivitySim will more easily work for existing users and their use cases. Now that ActivitySim is beginning to be used in multiple regions, the need for additional test coverage and processes for updating the test coverage has increased. This increased need for test coverage pertains to several situations, including when setting up a new model, with differences in inputs and configurations, when adding new model components (and/or revisions to the core) in order to implement new features, and when implementing model components at a scale previously untested. This improved test system plan is in response to task 6 prototype multiple models test system.

Examples

Generally speaking, there are two types of ActivitySim examples: test examples and agency examples.

Test examples - these are the core ActivitySim maintained and tested examples developed to date. The current test examples are mtc, estimation, marin (tour mode choice for TVPB), and multizone (both a two and three zone simple version of example_mtc for exercising support for multiple zone systems). These examples are owned by the project.
Agency examples - these are agency partner model implementations currently being setup. The current agency examples are PSRC, SEMCOG, ARC, and soon SANDAG. These examples can be configured in whatever way imagined by the agency to meet their modeling needs and may include new planned software components for contribution to ActivitySim. These examples are owned by the agency.

There exist multiple versions of these examples, which are used for various testing purposes:

Full scale - a full scale data setup, including all households, zones, skims, time periods, etc. This is a "typical" model setup used for application. This setup can be used to test the model results and performance since model results can be compared to observed/known answers and runtimes can be compared to industry experience.
Cropped - a subset of households and zones for efficient / portable running for testing. This setup can really only be used to test the software since model results are difficult to compare to observed/known answers. Depending on the question, this setup may be able to answer questions related to runtime.
Other - a specific route/path through the code for testing. For example, the estimation example tests the estimation mode functionality and the marin example tests running just tour mode choice.

Testing

As of writing, the test system includes only test examples. Agency examples are not formally included in the test system and therefore there are no formal assurances that future updates to ActivitySim will work for the agency examples. However, the test examples have many similarities to the agency examples and so it is very likely that revisions to the code based checked/verified against the test examples will work. The purpose of this plan is to go a step further to providing assurances, and to do so by establishing a framework for testing agency examples as well.

The proposed test plan for test examples versus agency examples will be different:

Test examples test ActivitySim features, stability, components, etc. This set of tests is run by our TravisCI system and is a central feature of our software development process.
Agency examples include two simple tests:
- Run the cropped version from start to finish to ensure it runs and the results are the same (a regression test).
- Run the full scale example and produce summary statistics of model results to validate the model. A good starting point for the summary statistics validation script is trips by mode and zone district.

Storage

Both types of examples are stored in GitHub repositories for version control and collaborative maintenance. There are two storage locations:

The activitysim package example folder - this stores the example setup files, cropped data, regression test script, expected results, example cropping script, change log, etc.
The activitysim_resources repository - this stores just the full scale example data inputs using Git LFS. This two-part solution allows for the main activitysim repo to remain relatively lightweight, while providing an organized and accessible storage solution for the full scale example data. The example_manifest.yaml maintains a dictionary of all the examples and how to get them and run them.

Updates

When a new version of the code is pushed to develop:

The core test system is run and code/examples updated as needed to ensure the tests pass
If an agency example previous ran without future warnings (i.e. is up-to-date) then we will ensure it remains up-to-date
If an agency example previously threw future warnings (i.e. is not up-to-date) then we will not update it

When an agency wants to update their example:

It is important to keep the agency examples up to date to minimize the cost/effort of updating to new versions of ActivitySim
Agencies have some time (like 3-6 months) to update their example through a pull request.
This pull request changes nothing outside their example folder.
The test/cropped example must run without warnings.
The full scale version is run elsewhere and must pass the validation script.

When an agency example includes new submodels and/or contributions to the core that need to be pulled/accepted:

The agency example must be up-to-date with the latest develop version of the code
The agency example must include a test/cropped example that implements the two tests and the tests must pass
The full scale version must be run elsewhere and must pass the validation script
The new submodels and/or contributions to the core will be reviewed by the repository manager (and it's likely some revisions will be required for acceptance)
Key items in the review include python code, documentation, and testable examples for all new components

ARC example in more detail

Running the System

The system is currently run by hand - i.e. manually - since it may involve getting and running several large examples that take many hours to run. The system could be fully automated, and either run in the cloud (on AWS for example) or on a local server (on a bench contractor server for example).

System Costs

There are non-trivial costs associated with multiple aspects of developing and supporting agency examples:

Computing time and persistent storage costs
Labor costs to develop the automated system
Labor costs to manually run the system until an automated version has been deployed

How should support for agency examples be paid for? Some options are:

Included with ActivitySim membership
An additional optional fee beyond ActivitySim membership
A third-party vendor supplies the service

Provide feedback

Saved searches

Use saved searches to filter your results more quickly