Update greedy_similarity_binning.py #930

humbleOldSage · 2023-08-11T08:22:04Z

Added a new list type variable to store labels at one place in order of trips processed. Explained at e-mission/e-mission-eval-private-data#35 (comment) and e-mission/e-mission-eval-private-data#35 (comment).

Added a new list type variable to store labels at one place in order of trips processed.

shankari · 2023-08-11T20:39:55Z

emission/analysis/modelling/trip_model/greedy_similarity_binning.py

@@ -121,6 +121,7 @@ class label to apply:
        self.is_incremental = config['incremental_evaluation']

        self.bins: Dict[str, Dict] = {}
+        self.tripLabels=[]


I don't see why we need a separate array here.
the bins already are indexed by the bin_id, and bin has the labels in it

"25": { 'feature_rows': .... 'labels':... }

So I don't see what this is adding to the data structure

The clustering.py file ( which calls this function) file builds on data_loc dataframe that requires

Trip 1 --> bin No. (say 2) Trip 2. --> bin No. (say 3) Trip 3 --> bin No.( say 2) Trip 4 --> bin No.( say 1) Trip 5 --> bin No.( say 1) . . .

However, the way they are stored here is .

"1": { 'feature_rows': [ [feature of Trip4], [feature of Trip5]] 'labels': ... } "2": { 'feature_rows': [ [feature of Trip1], [feature of Trip3]] 'labels': ... } "3": { 'feature_rows': [ [feature of Trip2]] 'labels': ... }

Surely, we can extract the trip and its bin from the way they are already stored, but creating a separate array initially itself feels more efficient.

But the problem with creating an additional data structure is that then we are changing production code to make analysis easier. We don't really need the trip -> bin mapping on production, and at some point, might want to include memoization as well. So I would prefer doing the additional work in clustering.py to create the dataframe from the output of the production model and not the other way around.

That makes sense. I'll remove the extra data structure and move all the additional computations to clustering.py.

These changes were done to return `entry` type data ( alongside dataframe) to clustering_example.ipynb.

shankari · 2023-08-16T04:24:35Z

emission/storage/timeseries/builtin_timeseries.py

+
+    entryList=[]
+


@humbleOldSage this is wrong. There is not only one set of entries in the database. Please read and understand the data model from chapter 5 of my thesis.

Figured this was not necessary

shankari · 2023-08-16T04:25:12Z

emission/storage/timeseries/builtin_timeseries.py

+    def getEntryList(self):
+        return self.entryList
+


again, this is wrong as well. you can't return the entry list because there is not just one.

there are existing methods to get an entry list. the trip model uses existing methods. You should not have to make changes to the BuiltinTimeseries for this change.

shankari · 2023-08-16T04:26:00Z

emission/storage/timeseries/builtin_timeseries.py

+        for e in entry_it:
+            BuiltinTimeSeries.entryList.append(map_fn(e)) 
+        df = pd.DataFrame(BuiltinTimeSeries.entryList)
+


humbleOldSage · 2023-08-16T19:19:00Z

NO changes needed here if, e-mission/e-mission-eval-private-data#37 are approved.

shankari · 2023-08-16T22:34:35Z

Closing this since all changes are in e-mission/e-mission-eval-private-data#37

Update greedy_similarity_binning.py

61ab4b9

Added a new list type variable to store labels at one place in order of trips processed.

shankari reviewed Aug 11, 2023

View reviewed changes

shankari mentioned this pull request Aug 12, 2023

Update clustering.py e-mission/e-mission-eval-private-data#37

Merged

Updated builtin_timeseries.py

2b5b06a

These changes were done to return `entry` type data ( alongside dataframe) to clustering_example.ipynb.

humbleOldSage requested a review from shankari August 16, 2023 00:20

shankari requested changes Aug 16, 2023

View reviewed changes

humbleOldSage requested a review from shankari August 16, 2023 19:19

shankari closed this Aug 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update greedy_similarity_binning.py #930

Update greedy_similarity_binning.py #930

humbleOldSage commented Aug 11, 2023 •

edited

Loading

shankari Aug 11, 2023

humbleOldSage Aug 11, 2023 •

edited

Loading

shankari Aug 11, 2023

humbleOldSage Aug 12, 2023 •

edited

Loading

shankari Aug 16, 2023

humbleOldSage Aug 16, 2023

shankari Aug 16, 2023

shankari Aug 16, 2023

shankari Aug 16, 2023

humbleOldSage commented Aug 16, 2023

shankari commented Aug 16, 2023

Update greedy_similarity_binning.py #930

Update greedy_similarity_binning.py #930

Conversation

humbleOldSage commented Aug 11, 2023 • edited Loading

shankari Aug 11, 2023

Choose a reason for hiding this comment

humbleOldSage Aug 11, 2023 • edited Loading

Choose a reason for hiding this comment

shankari Aug 11, 2023

Choose a reason for hiding this comment

humbleOldSage Aug 12, 2023 • edited Loading

Choose a reason for hiding this comment

shankari Aug 16, 2023

Choose a reason for hiding this comment

humbleOldSage Aug 16, 2023

Choose a reason for hiding this comment

shankari Aug 16, 2023

Choose a reason for hiding this comment

shankari Aug 16, 2023

Choose a reason for hiding this comment

shankari Aug 16, 2023

Choose a reason for hiding this comment

humbleOldSage commented Aug 16, 2023

shankari commented Aug 16, 2023

humbleOldSage commented Aug 11, 2023 •

edited

Loading

humbleOldSage Aug 11, 2023 •

edited

Loading

humbleOldSage Aug 12, 2023 •

edited

Loading