Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use asyncio synchronization instead of anyio #922

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

MarkusSintonen
Copy link
Contributor

@MarkusSintonen MarkusSintonen commented Jun 10, 2024

Summary

First PR in series of optimizations to improve performance of httpcore and even reach performance levels of aiohttp.
Related discussion encode/httpx#3215 (comment)

Previously:
old1

With PR:
new1

Request latency is more stable and the overall duration of the benchmark improves by 3.5x.

The difference diminishes when server latency increases over 100ms or so.

Checklist

  • I understand that this PR may be closed in case there was no previous discussion. (This doesn't apply to typos!)
  • I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
  • I've updated the documentation accordingly.

@tomchristie
Copy link
Member

@MarkusSintonen Thanks so much for your work on this. Could we isolate the benchmarking into a separate PR?

@MarkusSintonen MarkusSintonen force-pushed the asyncio-synchronization branch from 9064a6c to 0239c8c Compare June 10, 2024 13:22
# await AsyncShieldCancellation.shield(cleanup)

@staticmethod
async def shield(shielded: Callable[[], Coroutine[Any, Any, None]]) -> None:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to convert into shielding of a coroutine as there is no shielding context manager variant in asyncio

@MarkusSintonen
Copy link
Contributor Author

@MarkusSintonen Thanks so much for your work on this. Could we isolate the benchmarking into a separate PR?

Added here #923

httpcore/__init__.py Outdated Show resolved Hide resolved
@MarkusSintonen MarkusSintonen force-pushed the asyncio-synchronization branch 3 times, most recently from f4968e1 to 6ef797e Compare June 10, 2024 15:23
Copy link
Contributor

@T-256 T-256 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great,
Also, is this mean we no more need anyio dependency for httpx[asyncio] installation on Python>=3.11 ?

httpcore/_synchronization.py Outdated Show resolved Hide resolved
httpcore/_synchronization.py Outdated Show resolved Hide resolved
Comment on lines 110 to 95
if (lock := self._lock) is None:
lock = self.setup()
await lock.acquire()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer setup method returns None at here.

Suggested change
if (lock := self._lock) is None:
lock = self.setup()
await lock.acquire()
if self._lock is None:
self.setup()
await self._lock.acquire()

Copy link
Contributor Author

@MarkusSintonen MarkusSintonen Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yeah, I used that although all the mypy ignores this requires looks a bit uglier to me. So from types perspective the original was somewhat "better"

httpcore/_synchronization.py Outdated Show resolved Hide resolved
httpcore/_synchronization.py Outdated Show resolved Hide resolved
httpcore/_synchronization.py Outdated Show resolved Hide resolved
httpcore/_synchronization.py Outdated Show resolved Hide resolved
httpcore/_synchronization.py Outdated Show resolved Hide resolved
httpcore/_synchronization.py Outdated Show resolved Hide resolved
@MarkusSintonen
Copy link
Contributor Author

MarkusSintonen commented Jun 11, 2024

Great, Also, is this mean we no more need anyio dependency for httpx[asyncio] installation on Python>=3.11 ?

Yes thats right, once we also bring back the native asyncio based network to _backends.

Actually I fixed the code now so that anyio is not required either in Python<3.11

Copy link
Contributor

@T-256 T-256 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM Thank You @MarkusSintonen for this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I fixed the code now so that anyio is not required either in Python<3.11

Then, you could also cleanup unused variable anyio on this file.

Copy link
Contributor Author

@MarkusSintonen MarkusSintonen Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its still needed there for the check in the file as anyio needs to be present due to the networking backend. Later we can probably drop it

@T-256
Copy link
Contributor

T-256 commented Jun 11, 2024

Yes thats right, once we also bring back the native asyncio based network to _backends.

I'll wait you to create a PR from your commit to track and review it separately.

Comment on lines +302 to +303
if not closing:
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain this change to me?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If its an empty list then we dont need to go through await async_cancel_shield below, right? If that was the question

try:
await asyncio.shield(inner_task)
break
except asyncio.CancelledError:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we not just using a standard shield here?

inner_task = asyncio.create_task(shielded())
await asyncio.shield(inner_task)

Copy link
Contributor Author

@MarkusSintonen MarkusSintonen Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its to wait for the inner_task to finish (unless something is repeatedly trying to cancel this). But if we dont need to wait for the given task to finish then it is not needed and we can use just the simple shield (ie it runs in background to completion when there is a cancellation)

@tomchristie
Copy link
Member

This PR highlights to me that the approach to shielding here is wrong.
We're using cancellation shielding solely to deal with shielding connection closes.
Rather than use shielding what we actually ought to be doing is...

  • Unconditionally marking the connection state as closed. (Doesn't require I/O, can't be cancelled.)

And then...

  • Performing the network stream close. (If we perform a cancellation operation then it's reasonable that SSL connections might not be "gracefully" closed.)

This is a bit fiddly to get right, needs some careful attention to detail.

@MarkusSintonen
Copy link
Contributor Author

@tomchristie do you mean the previous contextmanager based shielding also had issues? Requiring additional reordering of closing handlers? If it didnt have issues then we could still use the anyio CancelScope just for shielding? But it means the anyio dependency cant be fully removed.

@MarkusSintonen
Copy link
Contributor Author

Or yeah did you mean the non-IO related state management should be extracted out from the async closing when shielding with asyncio.shield. That's probably right

@tomchristie
Copy link
Member

@MarkusSintonen I've attempted to answer my intent here through implementation. See #927. (Currently draft, and approach would need to be applied also to the HTTP/2 implementation, and the closing behaviour in the connection pool)

@MarkusSintonen
Copy link
Contributor Author

@MarkusSintonen I've attempted to answer my intent here through implementation. See #927. (Currently draft, and approach would need to be applied also to the HTTP/2 implementation, and the closing behaviour in the connection pool)

Big thanks! Now I understand, so yeah it was indeed about "separating" the sync state management and the async IO part. Ie the streams aclose is the cancellable part and the state management being separate from async flow can not be cancelled.

@MarkusSintonen MarkusSintonen force-pushed the asyncio-synchronization branch from 333c20c to eb72670 Compare June 15, 2024 15:44
@MarkusSintonen MarkusSintonen force-pushed the asyncio-synchronization branch from eb72670 to 96858c4 Compare June 15, 2024 15:45
@MarkusSintonen MarkusSintonen mentioned this pull request Jun 16, 2024
3 tasks
@agronholm
Copy link
Contributor

Would you mind testing my latest PR branch (agronholm/anyio#761)? In my own testing, it about halved the performance overhead of anyio.Lock on asyncio, and was actually faster than asyncio's own native Lock class.

@MarkusSintonen
Copy link
Contributor Author

Would you mind testing my latest PR branch (agronholm/anyio#761)? In my own testing, it about halved the performance overhead of anyio.Lock on asyncio, and was actually faster than asyncio's own native Lock class.

Nice, thank you! I can try sometime next week.

@MarkusSintonen
Copy link
Contributor Author

MarkusSintonen commented Jul 22, 2024

Would you mind testing my latest PR branch (agronholm/anyio#761)? In my own testing, it about halved the performance overhead of anyio.Lock on asyncio, and was actually faster than asyncio's own native Lock class.

@agronholm I run the tests and it is indeed better with the fix. But seems there is still some overhead with the implementation when comparing to asyncio:

Baseline (Anyio 4.4.0):
baseline

Anyio PR:
branch

Asyncio:
asyncio

(Above tests were run with httpcore optimizations from other pending PRs to not count in the issues from httpcore itself)

@agronholm
Copy link
Contributor

Would you mind testing my latest PR branch (agronholm/anyio#761)? In my own testing, it about halved the performance overhead of anyio.Lock on asyncio, and was actually faster than asyncio's own native Lock class.

@agronholm I run the tests and it is indeed better with the fix. But seems there is still some overhead with the implementation when comparing to asyncio:

Baseline (Anyio 4.4.0): baseline

Anyio PR: branch

Asyncio: asyncio

(Above tests were run with httpcore optimizations from other pending PRs to not count in the issues from httpcore itself)

Have you identified where this overhead is coming from? Can we eliminate synchronization as the cause after that PR?

@agronholm
Copy link
Contributor

If you run the benchmarks after replacing the locks and semaphores in the asyncio implementation with the corresponding AnyIO primitives, that should shed some light on the issue.

@MarkusSintonen
Copy link
Contributor Author

MarkusSintonen commented Jul 23, 2024

If you run the benchmarks after replacing the locks and semaphores in the asyncio implementation with the corresponding AnyIO primitives, that should shed some light on the issue.

It seems to be the anyio.Lock. When I just replace it with asyncio.Lock the issue disappears (with all the rest still using anyio variants). (HTTP2 also utilizes the Semaphore but I haven't tested that here.)

@agronholm
Copy link
Contributor

In the AnyIO variant, the acquisition of an unowned lock would be slower because this operation contains a yield point unlike in the asyncio variant. Would you mind testing against AnyIO lock after commenting out the checkpoint calls around lines 1690-1697 in backends/_asyncio.py?

@MarkusSintonen
Copy link
Contributor Author

MarkusSintonen commented Jul 23, 2024

In the AnyIO variant, the acquisition of an unowned lock would be slower because this operation contains a yield point unlike in the asyncio variant. Would you mind testing against AnyIO lock after commenting out the checkpoint calls around lines 1690-1697 in backends/_asyncio.py?

Indeed, it is happening because of the await AsyncIOBackend.cancel_shielded_checkpoint() (at line 1694)

@agronholm
Copy link
Contributor

Yeah, so the problem here is that this yield point is needed to maintain the same semantics as with Trio. This is Trio's Lock.acquire():

    @enable_ki_protection
    async def acquire(self) -> None:
        """Acquire the lock, blocking if necessary."""
        await trio.lowlevel.checkpoint_if_cancelled()
        try:
            self.acquire_nowait()
        except trio.WouldBlock:
            # NOTE: it's important that the contended acquire path is just
            # "_lot.park()", because that's how Condition.wait() acquires the
            # lock as well.
            await self._lot.park()
        else:
            await trio.lowlevel.cancel_shielded_checkpoint()

If there wasn't a yield point, then this could become a busy-wait loop:

async def loop(lock):
    while True:
        async with lock:
            print("I'm holding the lock")

@MarkusSintonen
Copy link
Contributor Author

If there wasn't a yield point, then this could become a busy-wait loop:

So this is needed for the CancelScope logic in AnyIO? What if nothing is using the CancelScope in the caller stack? Is the great penalty avoidable if user is not actually using CancelScopes?

@agronholm
Copy link
Contributor

No, the checkpoint_if_cancelled() call handles CancelScope support. The cancel_shielded_checkpoint() there prevents the busy-wait loop which will happen if you run the above loop with asyncio primitives, as they weren't designed with the right principles.

@agronholm
Copy link
Contributor

To summarize, asyncio's incorrect design makes it faster here.

@MarkusSintonen
Copy link
Contributor Author

MarkusSintonen commented Jul 23, 2024

Ok I see thanks for the explanation! So in essence anyio.Lock can be used to work around the issue (do Python core devs know about it? :D). httpcore doesnt utilize the _synchronization.AsyncLock in such ways so it seems its safe to replace the anyio.Lock with asyncio.Lock to not pay the penalty.

@agronholm
Copy link
Contributor

agronholm commented Jul 23, 2024

Ok I see thanks for the explanation! So in essence anyio.Lock can be used to work around the issue (do Python core devs know about it? :D). httpcore doesnt utilize the _synchronization.AsyncLock in such ways so it seems its safe to replace the anyio.Lock with asyncio.Lock to not pay the penalty.

Only in the asyncio backend. In the AnyIO backend, you would need to use AnyIO primitives, lest you render the whole backend pointless.

@agronholm
Copy link
Contributor

So in essence anyio.Lock can be used to work around the issue (do Python core devs know about it? :D).

Asyncio has so many design issues (the ill-fated uncancellation mechanism being that latest example) that at this point it's beyond saving. Even GvR (who's the original author of asyncio) told me to my face that he'd prefer people start using something else.

@rattrayalex
Copy link

What is the status of this PR? Should it be reviewed+merged, closed, or something else?

@agronholm
Copy link
Contributor

What is the status of this PR? Should it be reviewed+merged, closed, or something else?

Perhaps we should see how agronholm/anyio#761 affects the real-world performance before jumping the gun?

@MarkusSintonen
Copy link
Contributor Author

What is the status of this PR? Should it be reviewed+merged, closed, or something else?

Perhaps we should see how agronholm/anyio#761 affects the real-world performance before jumping the gun?

Also posting here, I rerun the benchmarks with above fix from AnyIO using fast_acquire=True for Lock (and Semaphore) in httpcore. The results look pretty good with AnyIO fix and fast_acquire=True usage.

First results with all optimization PRs applied (to not count other httpcore issues in).

With fast_acquire=False:
Näyttökuva 2024-09-02 kello 20 15 55

With fast_acquire=True:
Näyttökuva 2024-09-02 kello 20 16 46

Lastly results without all optimization PRs applied (how it would look against httpcore master, with its other issues).

With fast_acquire=False:
Näyttökuva 2024-09-02 kello 20 22 11

With fast_acquire=True:
Näyttökuva 2024-09-02 kello 20 20 50

@MarkusSintonen
Copy link
Contributor Author

Should we keep AnyIO but instead use the fast_acquire=True mode in _synchronization.py? Cc @tomchristie

@agronholm
Copy link
Contributor

FYI, AnyIO 4.5.0 is out now with the fast_acquire option.

@tomchristie
Copy link
Member

Should we keep AnyIO but instead use the fast_acquire=True mode in _synchronization.py?

Exercise caution. What does the pull request for that look like? That'll be a useful highlighter.

@MarkusSintonen
Copy link
Contributor Author

Exercise caution. What does the pull request for that look like? That'll be a useful highlighter.

Here is the PR https://github.com/encode/httpcore/pull/953/files

@agronholm
Copy link
Contributor

I see a slight issue there, not among the changes but still: AnyIO 4.5.0 requires trio >= 0.26.1 in its trio extra, but those requirements limit it to < 0.26.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants