Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Affiliate Data API #21

Closed
wants to merge 7 commits into from
Closed

Affiliate Data API #21

wants to merge 7 commits into from

Conversation

bh2smith
Copy link
Contributor

@bh2smith bh2smith commented Dec 19, 2022

Related to cowprotocol/dune-bridge#52

This PR implements the last change to fully replace the Dune Bridge project.

  • A basic API endpoint for user account profiles
  • When the address is not a user or an invalid string, default (zero) profile is returned

TODO

Add more tests (e.g. for API route)

Test Plan

Run the API server (requires Dune API key)

Try it out

http://127.0.0.1:5000/profile/xyz
127.0.0.1:5000/profile/0x247b317521d7edcfaf9b6d6c21b55217e5c34e0a

@bh2smith bh2smith requested a review from a team December 19, 2022 11:32
Copy link

@gentrexha gentrexha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Left a comment regarding the unit tests discussion. Other than that, nice work Ben! Although it is worth mentioning, I'm not an expert in Flask.

@bh2smith bh2smith requested a review from a team January 17, 2023 12:19
Copy link

@gentrexha gentrexha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job, Ben!

I left a minor test suggestion. Other than that, LGTM!

@bh2smith
Copy link
Contributor Author

Thanks @gentrexha - note also that I am adding the tests you suggested. Will have them in tomorrow morning.

Copy link

@josojo josojo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

One fundamental decision I would like talk about is when we should update the cache. Should we do it in the background continuously or only when the API is pinged.

I think its not nice if a page takes 30 secs to load. Hence, I would prefer to do it in the background

src/fetch/affiliate_data.py Show resolved Hide resolved
results = self.dune.refresh(Query(AFFILIATE_QUERY_ID), ping_frequency=10)
log.info(f"query execution {results.execution_id} succeeded")

last_update = results.times.execution_ended_at or utc_now()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the or statement? Wouldn't it be better to throw an error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because execution_ended_at is an optional field and we want to have an actual value. Luckily, the field will should always be populated if the query execution was successful. I could throw an error instead if you prefer.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an error would be best practice to know if something did not go well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although we would already have errored on the refresh if this field was not available. I will also error here if it happens. Will do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would rather set a warning here and use utc now because it makes sense and its not clear what to do otherwise. We definitely don't want the app to crash. I could also log an error (so we know about it), but not crash.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In conclusion I don't think we should "throw" and error, but rather "log" an error (so we are informed, but the app doesn't crash). This would lead us to be notice and report a bug in Dune.


if self.cache_expired():
log.debug("cache expired - refreshing results")
self._update_memory()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that the api query could take 30 secs? We should check with the front-end whether this is okay.
IF it is not too expensive, we could also just update the cache in the background all 5 minutes and don't update on new requests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to update in the background a bit less frequently.

The original Idea I had for this was that we refresh whenever a request comes in AND the cache is expired, but we return the value we currently have. This way we don't need to have scheduled refreshes, but demand triggers an update whenever necessary. Furthermore, nobody would every have to wait because we always return something. I still need to figure out how exactly to implement this, but would prefer to do it in a follow up PR (as it is a more sophisticated feature).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original Idea I had for this was that we refresh whenever a request comes in AND the cache is expired, but we return the value we currently have. This way we don't need to have scheduled refreshes, but demand triggers an update whenever necessary. Furthermore, nobody would every have to wait because we always return something. I still need to figure out how exactly to implement this, but would prefer to do it in a follow up PR (as it is a more sophisticated feature).

But if we would like to ever serve two requests at the same time in parallel, then it could be that both trigger dune to recalculate it, right? Maybe it's not so important, as we will not get there soon. Not sure how we would handle it.
(I don't know how concurrency in python works.)

What do you dislike about my proposal? Is it cost saving?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am okay to do it in another PR. sure

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea... the only thing about the proposal I am not fond of is the 5 minute updates. Would prefer something like 30. I also don't know how to implement the update in parallel with the service at this moment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important to remember that prices may not even become available so frequently.

@bh2smith
Copy link
Contributor Author

I found these docs on concurrency in python 3.11. Will take a look tomorrow.

https://docs.python.org/3/library/concurrency.html


app = Flask(__name__)
load_dotenv()
cached_fetcher = CachingAffiliateFetcher(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to use flask's APScheduler to trigger a cache refresh periodically (e.g. once every 5 minutes): https://github.com/viniciuschiele/flask-apscheduler/blob/master/examples/jobs.py

This would allow us to get rid of cache invalidation altogether (which should simplify the code). Basically the cache never becomes invalid it only gets updated periodically (and serves what ever data is current). We should add a prometheus metric on the last updated timestamp and alert if the update wasn't triggered for too long.

This design will also make sure there is only ever one thing writing the cache which makes us have to worry much less about concurrency.

src/fetch/affiliate_data.py Outdated Show resolved Hide resolved
src/api/affiliate.py Outdated Show resolved Hide resolved
@bh2smith
Copy link
Contributor Author

Closing because this program has been abandoned. Not sure how we expect to show users their Total Trade Volume in the interface....

@bh2smith bh2smith closed this Apr 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants