[Issue #1365] Create script to setup the current opportunities table #1577

chouinar · 2024-03-29T18:21:35Z

Summary

Time to review: 10 mins

Changes proposed

Created a script that fetches all opportunities, determines what the current opportunity summary + opportunity status should be, and sets them

Added some miscellaneous backend task utilities for logging / metrics

Context for reviewers

https://app.gitbook.com/o/cFcvhi6d0nlLyH2VzVgn/s/v1V0jIH7mb7Yb3jlNrgk/engineering/learnings/opportunity-endpoint-data-model#calculating-current-summary provides the context for this change.

In simplest terms, the algorithm is actually quite simple:

Fetch all opportunities
For each, determine the latest forecast and non-forecast summary
Determine which of these 2 summaries can be used (they must not be deleted, must on/after the post date)
Choose the non-forecast if possible, otherwise the forecast, otherwise null
Set the current opportunity summary + opportunity status accordingly based on the dates

Note that there are a few other small pieces in the logic, mostly to handle logging + metrics + performance. Just for performance and tracking reasons, if the summary+status are already their desired values, we don't "re-update" them, just leave them alone. If they don't match, then we'll update them accordingly, including potentially removing the current opportunity summary entirely.

Additional information

There is more work we can do regarding logging, and metrics, but this is a pretty solid start that will work well with our eventual integration to New Relic. We log information in two separate ways:

Aggregate metrics
Specific metrics

Aggregate metrics are task-level metrics that are largely counts/times. Things like "number of opportunities processed" or "duration of process". These are all handled by the new Task class. Anywhere you see self.increment("something") updates a counter on the class that is used to track metrics. These are useful for high-level tracking of metrics, and tracking trends.

Right now, these aggregate metrics get logged at the end of the process, and currently look like:

Specific metrics are those that are attached to an individual log message (in this case, pertaining to a specific opportunity). This can be useful for investigating, or debugging issues with an opportunity. The log messages allow us to tell exactly what happened to an opportunity.

These look something like this (two separate opportunities that hit different scenarios):

coilysiren

2 high level questions:

What was the motivation behind creating a task for this? it seems you should be able to calculate these values in just-in-time, instead of deriving and saving them ahead of time?
Is the idea that these would go into a step function (or similar) cron-job?

coilysiren · 2024-03-29T19:02:16Z

api/src/task/opportunities/set_current_opportunities_task.py

+            "opportunity_id": opportunity.opportunity_id,
+            "existing_opportunity_status": opportunity.opportunity_status,
+        }
+        log_extra |= get_log_extra_for_summary(


Today I learned! https://stackoverflow.com/questions/3929278/what-does-ior-do-in-python

Yeah, the shorthand for python dictionary operations is pretty useful / fairly recent.

coilysiren · 2024-03-29T19:18:08Z

api/src/task/task.py

@@ -0,0 +1,72 @@
+import abc


(non-blocking comment) This is great! I feel like it would be nice-to-have for the copy oracle data task to use this.

Yep, this is a port of an approach I used on some past backend scripts. If whatever replaces the copy-oracle commands is still run via python scripts, then we can make sure to re-use this for that purpose.

coilysiren · 2024-03-29T19:48:50Z

api/tests/src/task/opportunities/test_set_current_opportunities_task.py

@@ -0,0 +1,520 @@
+from dataclasses import dataclass


⭐⭐ (optional but seriously consider) I hate to see this given how many tests you wrote, but... It looks like the code coverage for set_current_opportunities_task.py is 97%. Can you check that it's not missing any important logic branches?

Looks like 3 lines.

The "main" function for the script. I can technically get that to be called.

A line that technically can't be hit in an if/else but makes it clearer / makes MyPy happy (null check)

A line that I can actually add something for

I'll fix 1&3 and get it to 99%

api/tests/src/task/opportunities/test_set_current_opportunities_task.py

chouinar · 2024-04-01T14:46:05Z

@coilysiren

What was the motivation behind creating a task for this? it seems you should be able to calculate these values in just-in-time, instead of deriving and saving them ahead of time?

Is the idea that these would go into a step function (or similar) cron-job?

Yes, we'll want to setup a cron job for this as well (thinking 5-10 minutes?). But no urgency as this won't do anything until the transformation process is in place.

As for why we do this as a script rather than live, it is for a few reasons:

The search query is already pretty complex and will get more complex. Having to do this logic in SQL is probably doable, but just adding a ton of complexity.
There are several places we'd need to do this calculation eventually (search, get opportunity, process for importing into the search index, and backend scripts that generate csvs/json files) and implementing the same logic in all of them isn't ideal.
I'm imagining a future where grantors can have more flexibility in what revision/forecast/etc. is actually the "active" one. If they want to manually set this, then we need to store that. This automated process will likely be gradually modified, and this gives us a single point to do those changes. We can also make the script only modify "legacy" opportunities and have a separate process for new ones. This should actually be quite flexible
I like writing backend scripts / processing a lot of data

[Issue #1365] Create script to setup the current opportunities table

7a72ae5

chouinar requested review from acouch, jamesbursa and coilysiren March 29, 2024 18:21

chouinar requested review from aplybeah and SammySteiner as code owners March 29, 2024 18:21

github-actions bot added python api labels Mar 29, 2024

coilysiren approved these changes Mar 29, 2024

View reviewed changes

Added a test, minor fix to a validator

956dc50

chouinar merged commit 3289e3a into main Apr 1, 2024
7 of 8 checks passed

chouinar deleted the chouinar/1365-set-opportunities branch April 1, 2024 15:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Issue #1365] Create script to setup the current opportunities table #1577

[Issue #1365] Create script to setup the current opportunities table #1577

chouinar commented Mar 29, 2024

coilysiren left a comment

coilysiren Mar 29, 2024

chouinar Apr 1, 2024

coilysiren Mar 29, 2024

chouinar Apr 1, 2024

coilysiren Mar 29, 2024

chouinar Apr 1, 2024

chouinar commented Apr 1, 2024

[Issue #1365] Create script to setup the current opportunities table #1577

[Issue #1365] Create script to setup the current opportunities table #1577

Conversation

chouinar commented Mar 29, 2024

Summary

Time to review: 10 mins

Changes proposed

Context for reviewers

Additional information

coilysiren left a comment

Choose a reason for hiding this comment

coilysiren Mar 29, 2024

Choose a reason for hiding this comment

chouinar Apr 1, 2024

Choose a reason for hiding this comment

coilysiren Mar 29, 2024

Choose a reason for hiding this comment

chouinar Apr 1, 2024

Choose a reason for hiding this comment

coilysiren Mar 29, 2024

Choose a reason for hiding this comment

chouinar Apr 1, 2024

Choose a reason for hiding this comment

chouinar commented Apr 1, 2024