-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Affiliate Data API #21
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Left a comment regarding the unit tests discussion. Other than that, nice work Ben! Although it is worth mentioning, I'm not an expert in Flask.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job, Ben!
I left a minor test suggestion. Other than that, LGTM!
Thanks @gentrexha - note also that I am adding the tests you suggested. Will have them in tomorrow morning. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
One fundamental decision I would like talk about is when we should update the cache. Should we do it in the background continuously or only when the API is pinged.
I think its not nice if a page takes 30 secs to load. Hence, I would prefer to do it in the background
src/fetch/affiliate_data.py
Outdated
results = self.dune.refresh(Query(AFFILIATE_QUERY_ID), ping_frequency=10) | ||
log.info(f"query execution {results.execution_id} succeeded") | ||
|
||
last_update = results.times.execution_ended_at or utc_now() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need the or statement? Wouldn't it be better to throw an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because execution_ended_at
is an optional field and we want to have an actual value. Luckily, the field will should always be populated if the query execution was successful. I could throw an error instead if you prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think an error would be best practice to know if something did not go well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although we would already have errored on the refresh if this field was not available. I will also error here if it happens. Will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I would rather set a warning here and use utc now because it makes sense and its not clear what to do otherwise. We definitely don't want the app to crash. I could also log an error (so we know about it), but not crash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In conclusion I don't think we should "throw" and error, but rather "log" an error (so we are informed, but the app doesn't crash). This would lead us to be notice and report a bug in Dune.
|
||
if self.cache_expired(): | ||
log.debug("cache expired - refreshing results") | ||
self._update_memory() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that the api query could take 30 secs? We should check with the front-end whether this is okay.
IF it is not too expensive, we could also just update the cache in the background all 5 minutes and don't update on new requests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to update in the background a bit less frequently.
The original Idea I had for this was that we refresh whenever a request comes in AND the cache is expired, but we return the value we currently have. This way we don't need to have scheduled refreshes, but demand triggers an update whenever necessary. Furthermore, nobody would every have to wait because we always return something. I still need to figure out how exactly to implement this, but would prefer to do it in a follow up PR (as it is a more sophisticated feature).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original Idea I had for this was that we refresh whenever a request comes in AND the cache is expired, but we return the value we currently have. This way we don't need to have scheduled refreshes, but demand triggers an update whenever necessary. Furthermore, nobody would every have to wait because we always return something. I still need to figure out how exactly to implement this, but would prefer to do it in a follow up PR (as it is a more sophisticated feature).
But if we would like to ever serve two requests at the same time in parallel, then it could be that both trigger dune to recalculate it, right? Maybe it's not so important, as we will not get there soon. Not sure how we would handle it.
(I don't know how concurrency in python works.)
What do you dislike about my proposal? Is it cost saving?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am okay to do it in another PR. sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea... the only thing about the proposal I am not fond of is the 5 minute updates. Would prefer something like 30. I also don't know how to implement the update in parallel with the service at this moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important to remember that prices may not even become available so frequently.
I found these docs on concurrency in python 3.11. Will take a look tomorrow. |
|
||
app = Flask(__name__) | ||
load_dotenv() | ||
cached_fetcher = CachingAffiliateFetcher( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to use flask's APScheduler
to trigger a cache refresh periodically (e.g. once every 5 minutes): https://github.com/viniciuschiele/flask-apscheduler/blob/master/examples/jobs.py
This would allow us to get rid of cache invalidation altogether (which should simplify the code). Basically the cache never becomes invalid it only gets updated periodically (and serves what ever data is current). We should add a prometheus metric on the last updated timestamp and alert if the update wasn't triggered for too long.
This design will also make sure there is only ever one thing writing the cache which makes us have to worry much less about concurrency.
Closing because this program has been abandoned. Not sure how we expect to show users their Total Trade Volume in the interface.... |
Related to cowprotocol/dune-bridge#52
This PR implements the last change to fully replace the Dune Bridge project.
TODO
Add more tests (e.g. for API route)
Test Plan
Run the API server (requires Dune API key)
Try it out