-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concurrent fetches are slower than sequential #131
Comments
There was an issue in my benchmarking script that was calculating the elapsed time wrongly. That's why threaded fetch times are reported longer than they should have been. However, results are still not how I would like them to be. Following are some timing summaries with single iteration for different number of concurrent item fetches.
|
Hmm… Looking at your benchmarking script it immediately strikes me that you create/destroy a thread for each request. That definitely decreses performance, try using If you're ready for real Minimal example code would be something like this (Python 3.5+, untested): import asyncio
import sys
import ipfsapi
async def do_test(client):
hashes = await asyncio.gather(
client.add_bytes(b"bla1"),
client.add_bytes(b"bla2"),
client.add_bytes(b"bla3")
)
contents = await asyncio.gather(
client.cat(hashes[0].result()),
client.cat(hashes[1].result()),
client.cat(hashes[2].result())
)
async def amain(argv=sys.argv[1:], loop=asyncio.get_event_loop()):
async with ipfsapi.connect_async(session=True, loop=loop) as client:
do_test(client)
loop = asyncio.get_event_loop()
loop.run_until_complete(amain(loop=loop)) |
While in our actual application we can certainly use persistent thread pools, in the benchmarking script however we are reporting timing in two different ways. If you look at the sample output in the benchmark repo's README file, attempts on individual items are reporting elapsed time only for the
This could be a really good approach, however it will enforce us to use Py3 in IPWB (currently it is only Py2 supported). I am personally in favor of just supporting Py3 and not Py2 (oduwsdl/ipwb#51 (comment)), but @machawk1 insists on continuing to support Py2. |
Aside: if we can achieve the anticipated speedup using async/threading, it might be worth supporting only Python 3 in ipwb. |
In IPWB (an web archiving replay system) we store archived HTTP headers and payloads in two separate objects for better deduplication of payloads. At the time of replay, we fetch the two objects, combine them, make necessary modifications, and return the response to the client. In our initial implementation we used to fetch the two objects sequentially, but there is an opportunity to perform these fetches concurrently and minimize the response time.
Recently, we experimented with Python threads (oduwsdl/ipwb#425), but the response time became worse than before. A simple ApacheBench test took over 17 seconds with threaded fetches while the same took less than 8 seconds with sequential code.
To ensure that we can reproduce the issue in an isolated environment, I have created a separate repository, IPFS API Concurrency Test, to test this behavior. While the #59 suggests that the API is not async-friendly, I am not sure how related it is to this issue. I would assume, with concurrent threads it should not take any longer to finish if not quicker. The worst case time should still be bound by the sequential access time. However, my profiling results are showing the opposite. The more concurrent requests are made, the slower it becomes.
The text was updated successfully, but these errors were encountered: