Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue #1365] Create script to setup the current opportunities table #1577

Merged
merged 2 commits into from
Apr 1, 2024

Conversation

chouinar
Copy link
Collaborator

Summary

Fixes #1365

Time to review: 10 mins

Changes proposed

Created a script that fetches all opportunities, determines what the current opportunity summary + opportunity status should be, and sets them

Added some miscellaneous backend task utilities for logging / metrics

Context for reviewers

https://app.gitbook.com/o/cFcvhi6d0nlLyH2VzVgn/s/v1V0jIH7mb7Yb3jlNrgk/engineering/learnings/opportunity-endpoint-data-model#calculating-current-summary provides the context for this change.

In simplest terms, the algorithm is actually quite simple:

  • Fetch all opportunities
  • For each, determine the latest forecast and non-forecast summary
  • Determine which of these 2 summaries can be used (they must not be deleted, must on/after the post date)
  • Choose the non-forecast if possible, otherwise the forecast, otherwise null
  • Set the current opportunity summary + opportunity status accordingly based on the dates

Note that there are a few other small pieces in the logic, mostly to handle logging + metrics + performance. Just for performance and tracking reasons, if the summary+status are already their desired values, we don't "re-update" them, just leave them alone. If they don't match, then we'll update them accordingly, including potentially removing the current opportunity summary entirely.

Additional information

There is more work we can do regarding logging, and metrics, but this is a pretty solid start that will work well with our eventual integration to New Relic. We log information in two separate ways:

  1. Aggregate metrics
  2. Specific metrics

Aggregate metrics are task-level metrics that are largely counts/times. Things like "number of opportunities processed" or "duration of process". These are all handled by the new Task class. Anywhere you see self.increment("something") updates a counter on the class that is used to track metrics. These are useful for high-level tracking of metrics, and tracking trends.

Right now, these aggregate metrics get logged at the end of the process, and currently look like:
Screenshot 2024-03-29 at 2 18 58 PM


Specific metrics are those that are attached to an individual log message (in this case, pertaining to a specific opportunity). This can be useful for investigating, or debugging issues with an opportunity. The log messages allow us to tell exactly what happened to an opportunity.

These look something like this (two separate opportunities that hit different scenarios):
Screenshot 2024-03-29 at 2 20 39 PM

Copy link
Collaborator

@coilysiren coilysiren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 high level questions:

  1. What was the motivation behind creating a task for this? it seems you should be able to calculate these values in just-in-time, instead of deriving and saving them ahead of time?
  2. Is the idea that these would go into a step function (or similar) cron-job?

"opportunity_id": opportunity.opportunity_id,
"existing_opportunity_status": opportunity.opportunity_status,
}
log_extra |= get_log_extra_for_summary(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the shorthand for python dictionary operations is pretty useful / fairly recent.

@@ -0,0 +1,72 @@
import abc
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(non-blocking comment) This is great! I feel like it would be nice-to-have for the copy oracle data task to use this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this is a port of an approach I used on some past backend scripts. If whatever replaces the copy-oracle commands is still run via python scripts, then we can make sure to re-use this for that purpose.

@@ -0,0 +1,520 @@
from dataclasses import dataclass
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⭐⭐   (optional but seriously consider) I hate to see this given how many tests you wrote, but... It looks like the code coverage for set_current_opportunities_task.py is 97%. Can you check that it's not missing any important logic branches?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like 3 lines.

  1. The "main" function for the script. I can technically get that to be called.
  2. A line that technically can't be hit in an if/else but makes it clearer / makes MyPy happy (null check)
  3. A line that I can actually add something for

I'll fix 1&3 and get it to 99%

@chouinar
Copy link
Collaborator Author

chouinar commented Apr 1, 2024

@coilysiren

  • What was the motivation behind creating a task for this? it seems you should be able to calculate these values in just-in-time, instead of deriving and saving them ahead of time?
  • Is the idea that these would go into a step function (or similar) cron-job?

Yes, we'll want to setup a cron job for this as well (thinking 5-10 minutes?). But no urgency as this won't do anything until the transformation process is in place.

As for why we do this as a script rather than live, it is for a few reasons:

  1. The search query is already pretty complex and will get more complex. Having to do this logic in SQL is probably doable, but just adding a ton of complexity.
  2. There are several places we'd need to do this calculation eventually (search, get opportunity, process for importing into the search index, and backend scripts that generate csvs/json files) and implementing the same logic in all of them isn't ideal.
  3. I'm imagining a future where grantors can have more flexibility in what revision/forecast/etc. is actually the "active" one. If they want to manually set this, then we need to store that. This automated process will likely be gradually modified, and this gives us a single point to do those changes. We can also make the script only modify "legacy" opportunities and have a separate process for new ones. This should actually be quite flexible
  4. I like writing backend scripts / processing a lot of data

@chouinar chouinar merged commit 3289e3a into main Apr 1, 2024
7 of 8 checks passed
@chouinar chouinar deleted the chouinar/1365-set-opportunities branch April 1, 2024 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Task]: Create backend script that sets the current opportunity summary for each opportunity
2 participants