Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge changes to figure out what's going on with overpass test #3

Merged
merged 40 commits into from
Jul 11, 2024

Conversation

nataliejschultz
Copy link
Owner

No description provided.

Mahadik, Mukul Chandrakant and others added 30 commits January 26, 2024 13:51
Changed print statements to logging.debug() to ensure these are visible easily and in sequence with other logging statements in Cloudwatch logs.

Added more information including:
- length of predictions
- failed assertion message in case of multiple users for same trip list
Cleanup fixes including log statements for PR #944
* Added details in logging statements

Completed marked future fixes for PR-944
Provided more detailed info regarding number of distinct users, as well  as number of predictions made for the trip_list.

* Implemented logic to clear models

Clearing or trimming number of models for a particular user from the Stage_updateable_models collection.

Constant K_MODELS_COUNT defines the number of models to keep.

Currently tested with model data available in stage dataset snapshot.

Pending - Complete testing with assertions to be added for testing.

* Added function call

Called the trim_model_entries function in upsert_model to delete models before inserting new ones.

* Squashed commit of the following:

commit 54659fb
Merge: cf0c9e2 1159eac
Author: K. Shankari <shankari@eecs.berkeley.edu>
Date:   Thu Dec 21 20:17:15 2023 -0800

    Merge pull request #951 from MukuFlash03/fix-vuln

    Bulk deletion of site-package/tests

commit 1159eac
Author: Mahadik, Mukul Chandrakant <MukulChandrakant.Mahadik@nrel.gov>
Date:   Thu Dec 21 20:43:39 2023 -0700

    Bulk deletion of site-package/tests

    Added a one line pipe command to remove all occurrences of site-packages/tests occurring in miniconda main directory.

commit cf0c9e2
Merge: d2f38bc 3be2757
Author: K. Shankari <shankari@eecs.berkeley.edu>
Date:   Thu Dec 21 17:47:27 2023 -0800

    Merge pull request #950 from MukuFlash03/fix-vuln

    Remove obsolete package versions

commit 3be2757
Author: Mahadik, Mukul Chandrakant <MukulChandrakant.Mahadik@nrel.gov>
Date:   Thu Dec 21 18:05:23 2023 -0700

    Remove obsolete package versions

    Cleaned up older versions for two packages:
    urllib3 - deleted stale version folders
    python - deleted tests folder

commit d2f38bc
Merge: 978a719 c1b0889
Author: K. Shankari <shankari@eecs.berkeley.edu>
Date:   Wed Dec 20 14:31:09 2023 -0800

    Merge pull request #949 from MukuFlash03/fix-vuln

    Fixing latest Docker image vulnerabilities

commit c1b0889
Author: Mahadik, Mukul Chandrakant <MukulChandrakant.Mahadik@nrel.gov>
Date:   Mon Dec 18 11:04:25 2023 -0700

    Upgraded Ubuntu base image

    Latest Ubuntu base image was just released officially by Docker which contains updated version of libc6 and libc-bin.

commit 07747d0
Author: Mahadik, Mukul Chandrakant <MukulChandrakant.Mahadik@nrel.gov>
Date:   Fri Dec 15 18:38:12 2023 -0700

    Fixing latest Docker image vulnerabilities

    AWS Inspector found the following vulnerable packages:

    CRITICAL
    perl

    HIGH
    nghttp2, libnghttp2-14
    cryptography, libssl3
    cryptography
    libc6, libc-bin

    Upgraded perl, libssl3, nghttp2 packages by upgrading base Ubuntu image to latest of the same LTS version - jammy (22.04).

    Cryptography package was fixed by mentioning required version to be installed using conda.

    Libc6, Libc-bin can be fixed by using apt-get upgrade but this upgrades all packages which is not recommended as a blanket upgrade fix.

* Updated Test file with dummy models

Was using pre-built models for testing that were present in stage_snapshot dataset but these won't be available in the normal dataset.

Hence, took reference from emission.tests.modellingTests.TestRunGreedyModel.py for creating test models using dummy trip data.

Creating multiple such models and then checking for trimming operation success.

* Reverted commit 31626ba + Removed log statements

Commit 31626ba included changes related to vulnerability fixes, hence reverted them.

Had added some log statements related to scalability fixes for model loading.
Removed them and will add in separate PR.

* Changed imports to access K_MODEL_COUNT

Removed redundant "esda" import added possibly during local testing.

Removed get_model_limit() as K_MODEL_COUNT is a class variable and can be accessed directly using class name.

* Revert "Squashed commit of the following:"

This reverts commit 31626ba.

Earlier commit f54e85f missed out the discarded reverted changes relating to the vulnerability fixes.
Hence removing them in these.

* Included maximum_stored_model_limit parameter in trip_model config file

Decided upon threshold value for model count above which redundant models will be deleted.
This was agreed upon to be 3 and is defined in the trip_model.config.json.sample file.

Necessary changes have been made in the related files to use this config value.

* Addressed latest review comments

- Improved logging statements to included whether model count has changed.
- Skipped check for saved test data in Test file as we clear teardown() related collections in the DB.

* Restored inferrers.py

Did this to revert a whitespace change which was possibly due to removing a line at the end of the file.

We don't want it to be modified as it is unrelated to this set of PR changes.

Restored using this command by pointing to specific commit hash for the first instance where the file was modified.

git restore --source=79ad1cc~1 emission/analysis/classification/inference/labels/inferrers.py

* Updated Test for Storing limited models

Added test to ensure that only the latest models are stored by comparing the write_ts times.

* Revert whitespace changes

* Revert whitespace changes

---------

Co-authored-by: Mahadik, Mukul Chandrakant <MukulChandrakant.Mahadik@nrel.gov>
Co-authored-by: K. Shankari <shankari@eecs.berkeley.edu>
…tionary (#954)

* Filtered out rows with dictionary in user_label_df

The idea is to check the data type using isinstance() and then apply this check on the entire data frame as a whole instead of doing it iteratively on each row which is much slower.

These rows are then filtered out of the original dataframe leaving behind only the non-dict rows.

* Added log statements to indicate dataframe filtering done

Added log statement to the greedy_similarity_binning, to indicate filtering is being done for the dictionary elements in the dataframe if the column is 'trip_user_input'.

* Modified filtering of survey inputs

Now filtering survey inputs before dataframe itself is created by checking whether each dictionary value is a value or a nested dictionary.

* Add TODO so I can merge this

---------

Co-authored-by: Mahadik, Mukul Chandrakant <MukulChandrakant.Mahadik@nrel.gov>
Co-authored-by: K. Shankari <shankari@eecs.berkeley.edu>
1. Updated ubuntu version to latest version.
This fixes libgnutls30 package.

2. Cryptography package updated to suggested version.

3. Updated bash package to latest version manually.
Latest Ubuntu version still contains vulnerable package so had to manually upgrade.
These are the server side changes related to
e-mission/e-mission-docs#1062

The changes are fairly straightforward, and consistent with
https://github.com/e-mission/e-mission-docs/blob/2665b39e1335ea04896b6944a4a065e7887b6cdc/docs/dev/back/adding_a_new_data_type.md?plain=1#L4

Concretely:
- we add a new `bluetoothble` wrapper
- we add references to it to `entry.py` and `builtin_timeseries.py`
- we add formatters for both android and iOS

A new wrinkle this time is that we are modifying the FSM, so there are also
new transitions. Those needed to be added to the enums in the transition
wrapper, and to the maps in the formatters so that the enums could be created
properly.

Bonus fix: check for the `None` transition properly on android and iOS to
avoid spurious errors

```
>>> broken_transition_example = {'_id': ObjectId('661b129fc271a44bb612b464'), 'metadata': {'time_zone': 'America/Los_Angeles', 'plugin': 'none', 'write_ts': 1713050268.574551, 'platform': 'ios', 'read_ts': 0, 'key': 'statemachine/transition', 'type': 'message'}, 'user_id': UUID('f1aaae55-fc42-4527-bf7f-33f84d7c8c2f'), 'data': {'currState': 'STATE_ONGOING_TRIP', 'transition': None, 'ts': 1713050268.574418}}
>>> broken_transition_example_entry = ad.AttrDict(broken_transition_example)
>>> enufit.format(broken_transition_example_entry)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/kshankar/Desktop/data/e-mission/gis_branch_tests/emission/net/usercache/formatters/ios/transition.py", line 64, in format
    data.transition = transition_map[entry.data.transition].value
KeyError: None

----- fixed code ------

>>> importlib.reload(enufit)
<module 'emission.net.usercache.formatters.ios.transition' from '/Users/kshankar/Desktop/data/e-mission/gis_branch_tests/emission/net/usercache/formatters/ios/transition.py'>
>>> enufit.format(broken_transition_example_entry)
AttrDict({'_id': ObjectId('661b129fc271a44bb612b464'), 'user_id': UUID('f1aaae55-fc42-4527-bf7f-33f84d7c8c2f'), 'metadata': AttrDict({'time_zone': 'America/Los_Angeles', 'plugin': 'none', 'write_ts': 1713050268.574551, 'platform': 'ios', 'read_ts': 0, 'key': 'statemachine/transition', 'type': 'message', 'write_local_dt': LocalDate({'year': 2024, 'month': 4, 'day': 13, 'hour': 16, 'minute': 17, 'second': 48, 'weekday': 5, 'timezone': 'America/Los_Angeles'}), 'write_fmt_time': '2024-04-13T16:17:48.574551-07:00'}), 'data': AttrDict({'curr_state': 2, 'transition': None, 'ts': 1713050268.574551, 'local_dt': AttrDict({'year': 2024, 'month': 4, 'day': 13, 'hour': 16, 'minute': 17, 'second': 48, 'weekday': 5, 'timezone': 'America/Los_Angeles'}), 'fmt_time': '2024-04-13T16:17:48.574551-07:00'})})
```

Testing done:
Used the corresponding changes in
e-mission/e-mission-phone#1144
to simulate BLE as follows:
- Region exit
- A few range updates until the `ble_beacon_found` transition was generated
- Turned on location mocking from the android and iOS simulators, and manually
  generated the start trip transition on android
- Clicked "range update" at random times during the simulated trip
- BLE beacon lost from the UI
- Turn off location mocking
- Force end trip transition

Testing Results:

Android:

```
START 2024-04-13 17:36:02.096342 POST /usercache/put
END 2024-04-13 17:36:02.313529 POST /usercache/put ebc13f1b-671b-4094-bce6-fed342da7e9c 0.2171182632446289
START 2024-04-13 17:36:02.583812 POST /usercache/get
END 2024-04-13 17:36:02.591868 POST /usercache/get ebc13f1b-671b-4094-bce6-fed342da7e9c 0.007989168167114258
```

```
>>> edb.get_usercache_db().count_documents({"metadata.key": "background/bluetooth_ble"})
57
>>> edb.get_timeseries_db().count_documents({"metadata.key": "background/bluetooth_ble"})
0
```

```
2024-04-13 17:37:57,635:DEBUG:140704655566784:write_ts = 1713054811.255
2024-04-13 17:37:57,635:DEBUG:140704655566784:module_name = emission.net.userca
che.formatters.android.bluetooth_ble
2024-04-13 17:37:57,636:DEBUG:140704655566784:write_ts = 1713054811.294
2024-04-13 17:37:57,636:DEBUG:140704655566784:module_name = emission.net.userca
che.formatters.android.bluetooth_ble
2024-04-13 17:37:57,636:DEBUG:140704655566784:write_ts = 1713054811.316
2024-04-13 17:37:57,636:DEBUG:140704655566784:module_name = emission.net.userca
che.formatters.android.bluetooth_ble
2024-04-13 17:37:57,637:DEBUG:140704655566784:write_ts = 1713054811.339
2024-04-13 17:37:57,637:DEBUG:140704655566784:module_name = emission.net.userca
che.formatters.android.bluetooth_ble
2024-04-13 17:37:57,637:DEBUG:140704655566784:write_ts = 1713054811.369
```

```
>>> edb.get_usercache_db().count_documents({"metadata.key": "background/bluetooth_ble"})
0
>>> edb.get_timeseries_db().count_documents({"metadata.key": "background/bluetooth_ble"})
57
```

iOS

```
START 2024-04-13 16:17:50.707151 POST /usercache/put
START 2024-04-13 16:17:50.737703 POST /usercache/get
END 2024-04-13 16:17:50.763880 POST /usercache/get f1aaae55-fc42-4527-bf7f-33f84d7c8c2f 0.026064157485961914
END 2024-04-13 16:17:51.340867 POST /usercache/put f1aaae55-fc42-4527-bf7f-33f84d7c8c2f 0.6329052448272705
```

```
>>> edb.get_usercache_db().count_documents({"metadata.key": "background/bluetooth_ble"})
74
```

```
DEBUG:root:module_name = emission.net.usercache.formatters.ios.bluetooth_ble
DEBUG:root:write_ts = 1713050204.075974
DEBUG:root:module_name = emission.net.usercache.formatters.ios.bluetooth_ble
DEBUG:root:write_ts = 1713050204.077563
DEBUG:root:module_name = emission.net.usercache.formatters.ios.bluetooth_ble
DEBUG:root:write_ts = 1713050204.078974
DEBUG:root:module_name = emission.net.usercache.formatters.ios.bluetooth_ble
DEBUG:root:write_ts = 1713050207.32417
DEBUG:root:module_name = emission.net.usercache.formatters.ios.bluetooth_ble
DEBUG:root:write_ts = 1713050207.326033
```

```
>>> edb.get_usercache_db().count_documents({"metadata.key": "background/bluetooth_ble"})
0
>>> edb.get_timeseries_db().count_documents({"metadata.key": "background/bluetooth_ble"})
74
>>> edb.get_timeseries_error_db().count_documents({"metadata.key": "background/bluetooth_ble"})
0
```
Our negative test in this test suite used `transition': none`
as an invalid transition. However, in
c86fcf5
we started handling `None` transitions properly

So we need to change it to a transition that is actually invalid.

Testing done:
Before this, `testMoveWhenEmpty` failed.
Now, `testMoveWhenEmpty` passes
`save` -> `replace_one` with upsert
`remove` -> `delete_one`
🗃️ Plumb through the BLE objects to the server
This will provide a way for the pipeline to access the dynamic config.
the first time `get_dynamic_config` is called, it fetches the config associated with the study from Github. The study is defined by the environment variable `STUDY_CONFIG`.
The downloaded config is cached within one session (each pipeline run is a different session), so subsequent calls to `get_dynamic_config` will not need to re-fetch.
Testing done:
-loaded a dump of dfc-fermata from 24 april into my local database
-ran `. setup/setup.sh` to update dependencies (e-mission-common is a new dependency)
-located one user (the opcode I've been using the most for my own testing). cleared pipeline for that user via `./e-mission-py.bash bin/reset_pipeline.py -e nrelop_dfc-fermata_test_*`
-ran pipeline for that user, supplying the STUDY_CONFIG as an environment variable `STUDY_CONFIG=dfc-fermata ./e-mission-py.bash bin/debug/intake_single_user.py -e nrelop_dfc-fermata_test_*`
-observed log output. BLE modes were detected for at least some of the sections
```
2024-05-02 17:11:52,566:DEBUG:140704482301888:Returning cached dynamic config for dfc-fermata at version 5
2024-05-02 17:11:52,566:DEBUG:140704482301888:getting BLE sensed vehicle for section from 1713192674 to 1713193146
2024-05-02 17:11:52,593:DEBUG:140704482301888:After filtering, 471 BLE ranging entries during the section
2024-05-02 17:11:52,595:DEBUG:140704482301888:after counting, ble_beacon_counts = {'dfc0:fff0': 471}
2024-05-02 17:11:52,595:DEBUG:140704482301888:found vehicle Jack's Mazda 3 with BLE beacon dfc0:fff0
```
Some sections still have a ble_sensed_mode of None but this is plausible because we hadn't integrated the BLE sensing until a couple weeks ago and I've been using this opcode longer than that.
- observed that the sections have `ble_sensed_mode`:
```
ts = esta.TimeSeries.get_time_series(user_id)
entries = ts.find_entries(["analysis/cleaned_section"])
print([e['data']['ble_sensed_mode'] for e in entries])
```
Output shows 31 sections with 'None' and 9 sections with the following:
```
{'value': 'car_jacks_mazda3', 'bluetooth_major_minor': ['dfc0:fff0'], 'text': "Jack's Mazda 3", 'baseMode': 'CAR', 'met_equivalent': 'IN_VEHICLE', 'kgCo2PerKm': 0.16777, 'vehicle_info': {'type': 'car', 'license': 'JHK ****', 'make': 'Mazda', 'model': '3', 'year': 2014, 'color': 'red', 'engine': 'ICE', 'mpg': 33}}
```
Since sections now have a `ble_sensed_mode` property in addition to `sensed_mode`, we can use the `ble_sensed_mode` as the basis for a new kind of section summary.

To enable this, the `get_section_summary` function was refactored to accept an additional argument, which can be either `sensed_mode` or `ble_sensed_mode`. If it's `ble_sensed_mode`, the summary values will be summed and grouped by the `baseMode` of the vehicle that was sensed by BLE. If the additional argument is not given, it just defaults to `sensed_mode` (same behavior as before)

Testing done:
-using a dump of dfc-fermata from 24 april loaded into my local database
-reset and re-ran the pipeline for a user
-inspected the `ble_sensed_summary`s of the resulting confirmed trips:

```
ts = esta.TimeSeries.get_time_series(user_id)
entries = ts.find_entries(["analysis/confirmed_trip"])
for e in entries:
    print(e['data']['ble_sensed_summary'])
```

Output:
```
{'distance': {'UNKNOWN': 4294.125850161847}, 'duration': {'UNKNOWN': 10379.947390556335}, 'count': {'UNKNOWN': 2}}
{'distance': {'UNKNOWN': 1511402.9404509964}, 'duration': {'UNKNOWN': 131559.69582343102}, 'count': {'UNKNOWN': 4}}
{'distance': {'UNKNOWN': 34462.99760301494}, 'duration': {'UNKNOWN': 977.1180453300476}, 'count': {'UNKNOWN': 3}}
{'distance': {'UNKNOWN': 19485.795458445176}, 'duration': {'UNKNOWN': 3718.8894975185394}, 'count': {'UNKNOWN': 2}}
{'distance': {'UNKNOWN': 850.4852679201318}, 'duration': {'UNKNOWN': 15861.93504524231}, 'count': {'UNKNOWN': 1}}
{'distance': {'UNKNOWN': 47455.78820030493}, 'duration': {'UNKNOWN': 7241.158731937408}, 'count': {'UNKNOWN': 3}}
{'distance': {'UNKNOWN': 1551245.9810044689}, 'duration': {'UNKNOWN': 13537.040816783905}, 'count': {'UNKNOWN': 5}}
{'distance': {'UNKNOWN': 713.4707286619786}, 'duration': {'UNKNOWN': 1448.6543893814087}, 'count': {'UNKNOWN': 1}}
{'distance': {'UNKNOWN': 3981236.406827432}, 'duration': {'UNKNOWN': 26685.684997320175}, 'count': {'UNKNOWN': 2}}
{'distance': {'UNKNOWN': 333.2678850005484}, 'duration': {'UNKNOWN': 21.80204725265503}, 'count': {'UNKNOWN': 1}}
{'distance': {'UNKNOWN': 2866.186136018516}, 'duration': {'UNKNOWN': 3249.8193860054016}, 'count': {'UNKNOWN': 1}}
{'distance': {'UNKNOWN': 2645.5964919933926}, 'duration': {'UNKNOWN': 750.7993273735046}, 'count': {'UNKNOWN': 1}}
{'distance': {'UNKNOWN': 2442.091053542208}, 'duration': {'UNKNOWN': 489.8975965976715}, 'count': {'UNKNOWN': 1}}
{'distance': {'UNKNOWN': 4449.832978365483}, 'duration': {'UNKNOWN': 2710.8742623329163}, 'count': {'UNKNOWN': 1}}
{'distance': {'UNKNOWN': 6409.15214382765}, 'duration': {'UNKNOWN': 745.2548396587372}, 'count': {'UNKNOWN': 1}}
{'distance': {'UNKNOWN': 163.76752941202727}, 'duration': {'UNKNOWN': 12.731597900390625}, 'count': {'UNKNOWN': 1}}
{'distance': {'UNKNOWN': 4440.681532305813}, 'duration': {'UNKNOWN': 93.64632654190063}, 'count': {'UNKNOWN': 3}}
{'distance': {'UNKNOWN': 366.71884583356365}, 'duration': {'UNKNOWN': 1299.6981484889984}, 'count': {'UNKNOWN': 1}}
{'distance': {'UNKNOWN': 2078.9642225193907}, 'duration': {'UNKNOWN': 11704.498789787292}, 'count': {'UNKNOWN': 1}}
{'distance': {'E_CAR': 2328.846316296626, 'UNKNOWN': 6971.404103278014}, 'duration': {'E_CAR': 3389.3658237457275, 'UNKNOWN': 945.0989944934845}, 'count': {'E_CAR': 1, 'UNKNOWN': 2}}
{'distance': {'UNKNOWN': 155.63066751202888}, 'duration': {'UNKNOWN': 116.02340650558472}, 'count': {'UNKNOWN': 1}}
{'distance': {'E_CAR': 7177.555954021382, 'UNKNOWN': 43.50518475446066}, 'duration': {'E_CAR': 1140.2288630008698, 'UNKNOWN': 175.48782801628113}, 'count': {'E_CAR': 1, 'UNKNOWN': 1}}
{'distance': {'UNKNOWN': 846.2578386118126}, 'duration': {'UNKNOWN': 766.3598084449768}, 'count': {'UNKNOWN': 1}}
{'distance': {'E_CAR': 104.02576498063684}, 'duration': {'E_CAR': 38.734825134277344}, 'count': {'E_CAR': 1}}
{'distance': {'E_CAR': 8250.614829636044}, 'duration': {'E_CAR': 825.4048693180084}, 'count': {'E_CAR': 1}}
{'distance': {'E_CAR': 6171.384206967509}, 'duration': {'E_CAR': 781.507465839386}, 'count': {'E_CAR': 1}}
{'distance': {'E_CAR': 1065.2594223724116, 'UNKNOWN': 73.02336166659416}, 'duration': {'E_CAR': 491.0299062728882, 'UNKNOWN': 105.17022156715393}, 'count': {'E_CAR': 1, 'UNKNOWN': 1}}
{'distance': {'E_CAR': 733.2246673403289}, 'duration': {'E_CAR': 845.3612954616547}, 'count': {'E_CAR': 1}}
{'distance': {'E_CAR': 4275.13516142465}, 'duration': {'E_CAR': 541.7157530784607}, 'count': {'E_CAR': 1}}
{'distance': {'E_CAR': 7391.707788417107}, 'duration': {'E_CAR': 611.8730540275574}, 'count': {'E_CAR': 1}}
{'distance': {'E_CAR': 10035.89912694906}, 'duration': {'E_CAR': 1205.663957118988}, 'count': {'E_CAR': 1}}
{'distance': {'E_CAR': 9085.234840541858}, 'duration': {'E_CAR': 3597.4890780448914}, 'count': {'E_CAR': 1}}
```
When conda installs dependencies, it will attempt to use git for any dependencies that reference a git repository (e.g. `git+https://github.com/my/repo`)
So git must be installed on the container before dependencies are installed, or it will fail.
`ble_sensed_summary` is a new property for confirmed trips, so tests fail on the old snapshots. New generated snapshots are the same, but with different uuid and oids, and with empty ble_sensed_summary values in each confirmed trip.
Currently we don't have any tests with BLE sensor data, so there are no tests that have a `ble_sensed_summary` that is filled in. Coming later...
This test is failing because it expects shankari_2016-06-20.expected_confirmed_trips to have a specific UUID, which is hardcoded in the test.

This came up the last time these "expected" snapshots were re-generated: 3c52375

I think it will be annoying to keep replacing the user_id every time the expected outputs change. So I just reworked the test to use whatever user_id is in the file rather than some arbitrary one
🎇 BLE matching in the pipeline
We use a `git` based reference to the `em-common` repo.
The related PR added git to the test dockerfile
#965
to get the tests to pass.

We need to add a similar command to the production Dockerfile
* Trying to fix regression

Attempting to get the nominatim tests to pass after regression was identified.

* Getting the test to run

Have to get the test to run to see if it fixes it!

* Push test

Re-enabled nominatim test on my branch, so hopefully it will run on this push

* Reverting nominatim-docker-test.yml
The existing implementation that was here was addressing the localdate fields separately and treating them like a set of filters rather than one date range from start -> end.
The "set of filters" was good enough for some things, like filtering to a specific month, but not for the canonical datetime range filtering where you want everything between an arbitrary start and end datetime.
The complexity in the logic to form the necessary NoSQL queries is more than expected, but after the dust settles it's about the same amount of code as the old implementation was.

In short, this implementation finds what units are used by both start and end localdates, walks down them (from largest unit to smallest unit), and combines $and and $or blocks to determine the upper and lower bound
make localdate queries behave like canonical datetime range filters
The updated in-app dashboard will be using this new time type via `/result/metrics/yyyy_mm_dd`.
When using this time type, `e-mission-common` is used for the metrics calculations, which has a new structure (#966 (comment))
`start_time` and `end_time` are given as ISO dates
Make the text for push notifications configurable, translatable, and filterable to only include users with at least one recent trip that has no user input.
The new config field 'push_notifications' has 'title' and 'message' (the text, per language) and 'recent_user_input_threshold' (the number of days to consider for determining 'recent user input').
Note the caveat about partially-labeled trips.
make push notifs configurable, i18n, & only for users missing labels
Co-authored-by: K. Shankari <shankari@eecs.berkeley.edu>
JGreenlee and others added 10 commits May 29, 2024 00:30
I recently rewrote the local dt queries used by TimeComponentQuery (#968) to make them behave as datetime range queries. But this results in some pretty complex queries which could have nested $and and $or conditions. It felt overengineered but that was the logic required if we were to query date ranges by looking at local dt objects where year, month, day, etc are all recorded separately.
Unfortunately, those queries run super slowly on production. I think this is because the $and / $or conditions prevented MongoDB from optimizing efficiently via indexing

Looked for a different solution, I found something better and simpler.
At first I didn't think using fmt_time fields with $lt and $gt comparisons would work becuase they're ISO strings, not numbers.
But luckily MongoDB can compare strings this way. it performs a lexicographical comparison where 0-9 < A-Z < a-z (like ASCII).
ISO has the standard format 0000-00-00T00:00:00
The dashes and the T will always be in the same places, so effectively, only the numbers will be compared. And the timezone info is at the end, so it doesn't get considered as long as we don't include it in the start and end inputs.
The only thing that specifically needs to be handled is making the end range inclusive. If I query from "2024-06-01" to "2024-06-03", an entry like "2024-06-03T23:59:59-04:00" should match. But lexicographically, that comes after "2024-06-03".
Appending a "Z" to the end range solves this.

The result of all this is that the TimeQuery class (timequery.py) is now able to handle either timestamp or ISO dates, and this is what metrics.py will use for summarize_by_yyyy_mm_dd.
#970 (comment)
> I realized that the TimeQuery on `e-mission-server` (which works w/ MongoDB) mirrors TimeQuery in the native code plugins (which work with SQL dbs).
>
> For that reason, I don't want to rename anything in `TimeQuery` right now. So instead of extending the functionality of `TimeQuery` to work with either ts or fmt_time, I'm going to make a new class called `FmtTimeQuery` (or similar)
>
> I briefly read up on SQL queries to make sure they work the same way when comparing strings. They do – so later on, we could implement `FmtTimeQuery` on the native code UserCache plugin too
#970 (comment)

reverts #968 back to the original implementation since we now have a faster way to do "range" queries via FmtTimeQuery.

To make it clearer that FmtTimeQuery is for ranges and TimeComponentQuery is for filters, I renamed some functions and added more descriptive comments/docstrings
for yyyy_mm_dd metrics, query by fmt_time instead of local_dt
Integration tests for Geofabrik Overpass API
@nataliejschultz nataliejschultz merged commit d9b1ea8 into nataliejschultz:overpass Jul 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants