Rate limiting SFTP operations #420

anthonyjmartinez · 2021-11-13T15:13:50Z

anthonyjmartinez
Nov 13, 2021

Does anyone have any suggestions or thoughts for how one might apply a rate limit to SFTP operations (get/put)? Some of my use of the library runs into issues where the local client (on cloud hosted infrastructure) overwhelms the remote device (embedded with slow M2M mobile connections). While this is likely out of scope for the library itself to solve, limiting the maximum rate helps in practice. If there's already a means of doing this in the API I've just missed a hint would help a lot. Otherwise, I've come up with a solution that may be inelegant but at least avoids my remote devices being knocked offline by a flood of data:

import asyncio
import asyncssh

async def example(path: str, rate: float):
    delay = 1 / (rate * 2**10)
    async with asyncssh.connect("192.168.1.1", username="some_user", client_keys=["testkey"], known_hosts=None) as conn:
        async with conn.start_sftp_client() as sftp:
            async with sftp.open(path, pflags_or_mode="rb", max_requests=1) as f:
                length = await f.stat()
                length = length.size
                pos = 0
                buffer = []
                while pos < length:
                    data = await f.read(2**10)
                    if data:
                        buffer.append(data)
                        pos += len(data)
                        if len(data) == 2**10:
                            await asyncio.sleep(delay)

                # do something with the buffer here

Answered by ronf

Nov 13, 2021

Here's a more complete example of what I was thinking:

import asyncio
import asyncssh

async def example(path: str, rate: float, block_size = 1024):
    loop = asyncio.get_event_loop()

    async with asyncssh.connect('localhost') as conn:
        async with conn.start_sftp_client() as sftp:
            async with sftp.open(path, "rb") as f:
                length = (await f.stat()).size
                pos = 0
                buffer = []
                start = last = loop.time()

                while pos < length:
                    data = await f.read(block_size)
                    if not data:
                        break

                    buffer.append(data)
                    …

View full answer

ronf · 2021-11-13T17:33:10Z

ronf
Nov 13, 2021
Maintainer

Something like this should work, though I think you'll find that issuing a single 1 KB read at a time will already run pretty slow even without the asyncio.sleep() call. Depending on the round-trip time between the client and server, the minimum time for each read to complete could already be in milliseconds, ending up being larger than the delay you are adding unless you set the "rate" to be much less than 1.

With the current way of calculating rate, it looks like it is in MBytes/sec. Was that your intent? If you want much slower rates than that, you might want to have the units be something like bytes/second, which would mean a calculation like delay = 2**10 / rate. You could also do KBytes per second with delay = 1 / rate, assuming you keep your read size as 1 KB.

The biggest issue I see here that you're not taking into account how long the read() call actually took, though, so whatever rate you set you'll end up getting something slower than that. You'd need to keep a value of the next wakeup time in absolute time, and do your sleep each time through the loop as that next wakeup time minus now. That way, the sleep will be shorted based on how much time the other steps in the loop took. Each time through, you increment the next_wakeup time by delay, but you sleep only next_wakeup - now, where now is the current time, possibly testing it for negative and skipping the sleep for that round when that happens. This lets you "catch up" by bursting a bit if a particular read takes a long time, though for this application you might want to let it fall behind in that case, setting next_wakeup to now whenever it's in the past.

I'd also encourage you to just experiment with changing the read size and not sleeping at all. I have a feeling in this case that would slow things down enough (making the read size smaller to run slower), with a lot less complexity.

1 reply

anthonyjmartinez Nov 13, 2021
Author

The test from the example was run over my local network, and indeed I did notice that even without the asyncio.sleep just doing a lot of 1K reads slowed things down substantially from a wide open call to get itself. Getting the delay right for an official feature would make a lot of sense. In my actual use case having it more as a hard limit that can’t even be reached has some benefit as bursting above the ceiling even once can have disastrous consequences that need to be dealt with elsewhere.

Thank you for your detailed response. Greatly appreciated.

ronf · 2021-11-13T21:14:52Z

ronf
Nov 13, 2021
Maintainer

Here's a more complete example of what I was thinking:

import asyncio
import asyncssh

async def example(path: str, rate: float, block_size = 1024):
    loop = asyncio.get_event_loop()

    async with asyncssh.connect('localhost') as conn:
        async with conn.start_sftp_client() as sftp:
            async with sftp.open(path, "rb") as f:
                length = (await f.stat()).size
                pos = 0
                buffer = []
                start = last = loop.time()

                while pos < length:
                    data = await f.read(block_size)
                    if not data:
                        break

                    buffer.append(data)
                    pos += len(data)

                    now = loop.time()
                    delta = now - last
                    delay = block_size / rate - delta

                    print(f'{block_size} {delta: .3f} {delay: .3f}')

                    if delay > 0:
                        await asyncio.sleep(delay)
                        now += delay

                    last = now

                # do something with the buffer here

                print(f'{length} bytes in {now - start: .3f} seconds, '
                      f'{length / (now - start): .0f} bytes/sec')

asyncio.run(example('/tmp/test', 1024))

With /tmp/test containing 10240 bytes, the output I get in this case is:

1024  0.001  0.999
1024  0.005  0.995
1024  0.004  0.996
1024  0.004  0.996
1024  0.004  0.996
1024  0.005  0.995
1024  0.006  0.994
1024  0.002  0.998
1024  0.003  0.997
1024  0.006  0.994
10240 bytes in  10.000 seconds,  1024 bytes/sec

The rate is in bytes per second for this version. I also provided an option to select the block size of the reads.

In cases where it is unable to keep up with the requested rate, it won't burst, just letting that one round run at the lower rate. However, in any case where it completes quicker, it will sleep the appropriate amount of additional time to hit the requested rate for that round.

This version adds a sleep even after the last read, just to make the math work out on the total transfer time it reports. This might also be useful if you wanted to schedule multiple of these transfers back to back, so you don't end up with the series of requests not maintaining the requested rate. Consider the extreme case where the file being transferred is less than the block size. It would basically run with no delays at all if you didn't sleep after that initial read before beginning the next transfer.

1 reply

anthonyjmartinez Nov 13, 2021
Author

Excellent, and thanks again.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rate limiting SFTP operations #420

{{title}}

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Rate limiting SFTP operations #420

anthonyjmartinez Nov 13, 2021

Replies: 2 comments · 2 replies

ronf Nov 13, 2021 Maintainer

anthonyjmartinez Nov 13, 2021 Author

ronf Nov 13, 2021 Maintainer

anthonyjmartinez Nov 13, 2021 Author

anthonyjmartinez
Nov 13, 2021

Replies: 2 comments 2 replies

ronf
Nov 13, 2021
Maintainer

anthonyjmartinez Nov 13, 2021
Author

ronf
Nov 13, 2021
Maintainer

anthonyjmartinez Nov 13, 2021
Author