-
-
Notifications
You must be signed in to change notification settings - Fork 7.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implemented bucket algorithm for handling bursts / simultaneous requests #373
Conversation
Update from master
updating
Update Fork
Update from master
update from master
…ck off, that facilitates bursts of asyncio requests , whilst staying under the strict pre-defined rate limits for each exchange. SH
…mly selecting the exponential backoff time in the bucket algorithm rate limiter
… loading all tickers
Update from master
This is awesome! Respect, man! I'm going to spend some time merging this, will let you know on the progress! |
Reference: #297 |
# print('Waiting for tokens: Exchange: {0}'.format(self.id)) | ||
self.add_new_tokens() | ||
seconds_delays = [0.01, 0.1, 0.7, 1, 1.5, 2] | ||
delay = random.choice(seconds_delays) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@samholt, these two lines are not very clear to me, namely, the choice for 0.01, 0.1, 0.7, 1, 1.5 and 2. I can see that you are randomizing the delays here, but I'm not sure why. Can you please elaborate on that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Certainly, CS/EEE theory from how routers / ethernet optimally scales bandwidth to the maximal optimal rate for the resources. I.e. the Ethernet protocol or similarly the IEEE 802.3 CSMA/CD standard, both implement an exponential back off algorithm (https://en.wikipedia.org/wiki/Exponential_backoff) to avoid multiple accesses to the same resource (a collision) (This prevents hundreds of asyncio tasks checking for new available tokens at the same time). From theory the randomised delays should follow a similar process to https://en.wikipedia.org/wiki/Exponential_backoff#Example_exponential_backoff_algorithm, however a simplified version here is to just combine 1 - 3, to achieve an approximation, which is crudely the set above. Upon second thought the times should be exponentially distributed in the set, thus if we assume a minimum delay of 1ms and max delay of 500ms, of a set of 5, thus: [0.001, 0.005, 0.022, 0.106, 0.5].
import numpy as np
x = np.linspace(np.log(0.001), np.log(0.5),5)
y = np.exp(x)
for i in y:
print(round(i,3))
# print('Adding new tokens: Exchange: {0}'.format(self.id)) | ||
now = time.monotonic() | ||
time_since_update = now - self.rateLimitUpdateTime | ||
new_tokens = math.floor( ( 0.8 * 1000.0 * time_since_update ) / self.rateLimit ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... and this one, the 0.8
magic constant looks misterious ) I may be missing something here %)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially new tokens was to set to be dynamically calculated, purposely based on how many tokens are initially sent out to adjust the token generation, so as to over a fixed time span (e.g. 60 seconds), the bucket algorithm to send the same number of requests a single delayed poller, sending out requests at a periodic time ( period of the rateLimit). Thus stay under the rate limit, and support bursts of requests over time, for optimal request rates etc.
The theory is as follows:
- Traditional linear rate limit poller, rateLimit = 1.5 seconds, over a period for example T = 60 seconds, will send 60/1.5 = 40 requests, or here each request is represented as a token.
- Dynamic rate limiter, must still satisfy sending 40 requests or less, over this same period of T = 60 seconds. Thus if the first burst, takes 16 requests, that leaves 40 - 16 = 24 requests left, of which should be added to the token bucket linearly, i.e. over this time period of T, thus a new token/request must be added every 2.5 seconds. This ensures that the original rate limit is met, whilst allowing bursts of requests over short periods of time.
Thus the new_tokens = (1/2.5)* time_since_update.
These simple equations can be represented as:
However upon calculating this for a typical exchange with a rateLimit as defined above, we can express the new token rate a percentage of the old rate limit, i.e. 2.5/1.5, which can be factored out, to yield.
new_tokens = (1.5/2.5)*(1/1.5)*time_since_update
new_tokens = 0.6 * (1/rateLimit) * time_since_update
Thus we should approximately take 0.6 or 60% of the linear rate limit if we use the constants defined above. Although this works, I found after repeated tests that exchanges can be throttled faster than this and don't error with DDOS. The 0.8, or 80% of (1/rateLimit), was one of the highest rates, that exchanges could still be throttled at without reporting DDOS or rate limit. Keen to hear everyones views on this, especially if we should stick to the lower limits, what T time period rate limits are usually calculated for behind exchanges, and if we should always try to maximise the rate at which we can send requests.
# Some exchanges not all the symbols can fetch tickers for | ||
symbols_to_load = [symbol for symbol in exchange.symbols if not '.d' in symbol] | ||
if exchange.id == 'bitmex': | ||
symbols_to_load = ['BTC/USD', 'B_BLOCKSZ17', 'DASHZ17', 'ETC7D', 'ETHZ17', 'LTCZ17', 'XBJZ17', 'XBTZ17', 'XMRZ17', 'XRPZ17', 'XTZZ17', 'ZECZ17'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, this line for Bitmex will break the test upon monthly contract expiration. Basically, their tickers change continuously, so we have to make another workaround for it. But, this isn't a big problem, will fix it anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting yes, it was just a quick fix, as errors were occurring for attempting to fetch all of the individual tickers for each symbol contained in bitmex.symbols, it seems some of them are not supported, however still listed ? Keen to see your fix :)
elif exchange.id == 'virwox': | ||
symbols_to_load = [symbol for symbol in symbols_to_load if symbol != 'CHF/SLL'] | ||
input_coroutines = [exchange.fetchTicker(symbol) for symbol in symbols_to_load] | ||
tickers = await asyncio.gather(*input_coroutines) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though sending 16 requests at once to fetch tickers can work, it will most likely result in a ban from Bittrex or Kraken, if you try doing the same with private requests and with L2 orderbooks continuously...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, it sends 16 requests, then adds the possibility for a new request at slightly less than the rate limit, to make sure over a fixed time period the specified rate limit is met for the exchange. Looking at Bittrex API documentation, I can't readily see any information about their API rate limit ? (https://bittrex.com/home/api) Multiple tests to their api using this seems to work, however long polling of continuous order books is yet to be tested (I plan to add a test to test loading all order books for all exchanges). Yes I agree with Kraken in its current implementation, can make it work for Kraken by specifically adding a lower starting 'rateLimitMaxTokens' : 8, and 'rateLimitTokens' : 8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it sends 16 requests, then adds the possibility for a new request at slightly less than the rate limit, to make sure over a fixed time period the specified rate limit is met for the exchange.
Yep, and if those 16 requests are private requests (fetching balance, placing and canceling orders, fetching orders/trades history), those will fail most of the time, because practically none of the exchanges tolerate that rate.
Looking at Bittrex API documentation, I can't readily see any information about their API rate limit ?
Yes, they don't have it documented, but it's been reverse-engineered to be approx 1 request per second. They don't want to be transparent, I don't really know why. Liqui will also complain upon querying at rates higher than one or two requests per second. And it takes Kraken approx 10 (!) seconds to process an order, and it has been so for past few weeks, so, i guess sending 16 of them at once is not our option ) I am still adding some minor edits here and there to complete the merge, but most of it is already there ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good, agreed, perhaps we should limit the throttle for asyncio to only public calls ? And keep the old rate limit delay structure for private API calls ? How do we solve this ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say, we are fine. It's just that I made that functionality optional, so users may or may not enable it at their discretion depending on their use case. So if they design a poller application for alerts or something, they can go ahead and turn it full on (no warranties though). And if they want to do trading, then they can leave it off until we come up with a generic solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice very good
So, after having a deeper look into it, apart from the above, my thoughts are:
|
Now merging it into the master, thx again! |
Great thank you, working with CCXT full time, so please let me know any other features I can implement, very interested in calculating arbitrage with the order books, thus will write a test to get the L2 order books simultaneously for all exchanges. Re: Above,
Let me know your thoughts :) |
It's exactly what we should do :)
Yep. We have been doing some preparation work to make that possible (and it's done). We had to separate the ... In fact, we thought, maybe we should move it from ccxt, because that functionality can be detached very-well from there into a separate lib with a single configurable |
I merged it, but, for some reason GitHub is not showing my commits to this branch here. |
Excellent I can see the merge thank you :) Answered all your questions detailed, keen to change the random delays to: [0.001, 0.005, 0.022, 0.106, 0.5], see above reason why, in short its better as these times are exponentially created. Also will work next to write a test to pull in all the L2 order books for all the exchanges, do DM with any suggestions you might have, or how you would like :) |
…exponential backoff random selection times (in seconds) and lowering Krakens starting tokens, to avoid being rate limited on Kraken
ccxt 1.91.21
Implemented token bucket algorithm for handling burst / simultaneous requests. (https://en.wikipedia.org/wiki/Token_bucket). Implemented in python3, specifically for asyncio. Have written tests in the test_asyncio.py, which tests it for that exchange when the load tickers method is not available, thus it loads all the tickers, using this method.
Works for all exchanges in python3. Noticed bug that causes python 2 to break in tests, simply that running python2 test/test.py, ends up calling functions inside test/test_asyncio.py, needs further analysis to properly seperate these function namespaaces. However the token bucket algorithm implementation is sound.
Do let me know what we need to change, to successfully merge this into master. Best, Sam.