Rate limiting SFTP operations #420
-
Does anyone have any suggestions or thoughts for how one might apply a rate limit to SFTP operations (get/put)? Some of my use of the library runs into issues where the local client (on cloud hosted infrastructure) overwhelms the remote device (embedded with slow M2M mobile connections). While this is likely out of scope for the library itself to solve, limiting the maximum rate helps in practice. If there's already a means of doing this in the API I've just missed a hint would help a lot. Otherwise, I've come up with a solution that may be inelegant but at least avoids my remote devices being knocked offline by a flood of data: import asyncio
import asyncssh
async def example(path: str, rate: float):
delay = 1 / (rate * 2**10)
async with asyncssh.connect("192.168.1.1", username="some_user", client_keys=["testkey"], known_hosts=None) as conn:
async with conn.start_sftp_client() as sftp:
async with sftp.open(path, pflags_or_mode="rb", max_requests=1) as f:
length = await f.stat()
length = length.size
pos = 0
buffer = []
while pos < length:
data = await f.read(2**10)
if data:
buffer.append(data)
pos += len(data)
if len(data) == 2**10:
await asyncio.sleep(delay)
# do something with the buffer here |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
Something like this should work, though I think you'll find that issuing a single 1 KB read at a time will already run pretty slow even without the asyncio.sleep() call. Depending on the round-trip time between the client and server, the minimum time for each read to complete could already be in milliseconds, ending up being larger than the delay you are adding unless you set the "rate" to be much less than 1. With the current way of calculating rate, it looks like it is in MBytes/sec. Was that your intent? If you want much slower rates than that, you might want to have the units be something like bytes/second, which would mean a calculation like The biggest issue I see here that you're not taking into account how long the read() call actually took, though, so whatever rate you set you'll end up getting something slower than that. You'd need to keep a value of the next wakeup time in absolute time, and do your sleep each time through the loop as that next wakeup time minus now. That way, the sleep will be shorted based on how much time the other steps in the loop took. Each time through, you increment the I'd also encourage you to just experiment with changing the read size and not sleeping at all. I have a feeling in this case that would slow things down enough (making the read size smaller to run slower), with a lot less complexity. |
Beta Was this translation helpful? Give feedback.
-
Here's a more complete example of what I was thinking: import asyncio
import asyncssh
async def example(path: str, rate: float, block_size = 1024):
loop = asyncio.get_event_loop()
async with asyncssh.connect('localhost') as conn:
async with conn.start_sftp_client() as sftp:
async with sftp.open(path, "rb") as f:
length = (await f.stat()).size
pos = 0
buffer = []
start = last = loop.time()
while pos < length:
data = await f.read(block_size)
if not data:
break
buffer.append(data)
pos += len(data)
now = loop.time()
delta = now - last
delay = block_size / rate - delta
print(f'{block_size} {delta: .3f} {delay: .3f}')
if delay > 0:
await asyncio.sleep(delay)
now += delay
last = now
# do something with the buffer here
print(f'{length} bytes in {now - start: .3f} seconds, '
f'{length / (now - start): .0f} bytes/sec')
asyncio.run(example('/tmp/test', 1024)) With /tmp/test containing 10240 bytes, the output I get in this case is:
The rate is in bytes per second for this version. I also provided an option to select the block size of the reads. In cases where it is unable to keep up with the requested rate, it won't burst, just letting that one round run at the lower rate. However, in any case where it completes quicker, it will sleep the appropriate amount of additional time to hit the requested rate for that round. This version adds a sleep even after the last read, just to make the math work out on the total transfer time it reports. This might also be useful if you wanted to schedule multiple of these transfers back to back, so you don't end up with the series of requests not maintaining the requested rate. Consider the extreme case where the file being transferred is less than the block size. It would basically run with no delays at all if you didn't sleep after that initial read before beginning the next transfer. |
Beta Was this translation helpful? Give feedback.
Here's a more complete example of what I was thinking: