Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add HTTP3 support #829

Open
wants to merge 15 commits into
base: master
Choose a base branch
from
Open

Conversation

karpetrosyan
Copy link
Member

@karpetrosyan karpetrosyan commented Oct 18, 2023

This pull request tries to add HTTP/3 support.

As we know, the HTTP/2 and HTTP/3 protocols are very similar, except for the protocol they use.
This PR simply follows the steps described below.

  • Add the connect_udp method to "httpcore._backends.base.NetworkBackend".
  • Implement connect_udp only for the synchronous backend (only for now).
  • Add http3 extra into pyproject.toml
  • Create httpcore/_http3.py file
  • Implement HTTP/3 in that file, keeping the logic and flow maximum similar to the logic that we are using in _http2.py.

To support the HTTP/3 protocol, we need the aioquic package, which is a well-tested and well-designed implementation for the HTTP/3 and QUIC protocols.

For more details, see the issue in HTTPX, where the author of aioquic provides basic HTTP/3 integration for httpx.

There is a very basic example of how you can use HTTP/3 with the httpcore.

from httpcore import Origin
from httpcore import Request
from httpcore import HTTP3Connection
from httpcore import SyncBackend

host = "www.youtube.com"
port = 443

stream = SyncBackend().connect_udp(host=host, port=port)
conn = HTTP3Connection(
    origin=Origin(b"https", host.encode(), port), stream=stream
)

request = Request(
    method=b"GET",
    url=f"https://{host}",
    headers=[("host", host)],
    extensions={"timeout": {"read": 5, "write": 5}},
)


response = conn.handle_request(request=request)
print(response)   # <Response [200]>
print(response.extensions["http_version"])  # b'HTTP/3'
print(response.read())  # ...

Or with the high-level API:

import httpcore

pool = httpcore.ConnectionPool(http1=False, http3=True)

response = pool.request(
    "GET", "https://www.youtube.com", extensions={"timeout": {"read": 5, "write": 5}}
)

print(response)  # <Response [200]>
print(response.extensions["http_version"])  # b'HTTP/3'
print(response.read())  # ...

@karpetrosyan karpetrosyan marked this pull request as draft October 18, 2023 14:10
@karpetrosyan karpetrosyan added the enhancement New feature or request label Oct 18, 2023
@karpetrosyan karpetrosyan force-pushed the support-http3 branch 2 times, most recently from e2af16a to 8fc5ff3 Compare October 19, 2023 08:32
@karpetrosyan
Copy link
Member Author

This is how I see the http3 implementation in httpcore.

The goals here are:

  • Make the http3 event-based implementation very similar to the http2 implementation to not make the maintenance process too complicated.
  • Fully cover the http3 implementation with the tests, using the event-based mocking logic, when we mock not the network stream but the underlying http3 IO-less connection, which gives us all the events connected to http3.
  • Add the HTTP3 section to the documentation, like the HTTP2 section.
  • And the last, but most important, is to deliver HTTP3 support to HTTPX!

I want to keep this pull request as simple as possible, but I'm also thinking about including alt-svc support as @tomchristie suggested in encode/httpx#275 in this pull request.

That is:

It's looking to me like httpx should never end up making an HTTP/3 request on an initial outgoing request, because either:

  • We see the upgrade in an Alt-Svc response headers, in which case we've already sent the request, and started receiving the response, not much point in tearing the connection down.
  • We might potentially see an ALTSVC HTTP/2 frame, but we don't want to block on waiting for that before starting to send a request (since it may not exist).
    So I think the best we'll be able to do is storing altsvc information whenever it comes through, and potentially making subsequent requests over HTTP/3 using that information.

@karpetrosyan karpetrosyan marked this pull request as ready for review October 20, 2023 10:38
@karpetrosyan karpetrosyan requested review from tomchristie and a team October 20, 2023 10:39
@karpetrosyan karpetrosyan self-assigned this Oct 20, 2023
@karpetrosyan
Copy link
Member Author

Any thoughts or ideas, @encode/maintainers?

@zanieb zanieb self-requested a review November 2, 2023 14:12
@tomchristie
Copy link
Member

Thanks @karpetrosyan!

Any thoughts or ideas

Here's my initial high level thoughts...

  • Review of the current landscape... Which sites currently use HTTP/3 and which browsers can you demonstrate using it? How can someone else observe this?
  • What's the use-case for HTTP/3 in httpx - are there conditions under which it's beneficial to the user?
  • How do we intend to maintain the HTTP/3 work alongside the existing HTTP/2 work with a minimal maintenance load?
  • What discovery mechanism are browsers currently using for HTTP/3 detection? Is detection over DNS records currently deployed and used?

@karpetrosyan
Copy link
Member Author

Thank you for reviewing, Tom.

Excellent questions; here are my thoughts on that.

Review of the current landscape... Which sites currently use HTTP/3 and which browsers can you demonstrate using it? How can someone else observe this?

As an example, here are a few large corporations that support HTTP/3:

Using this script, you can already test it with httpcore.

import httpcore
import logging

logging.basicConfig(level=1)
pool = httpcore.ConnectionPool(http1=False, http3=True)

websites = [
    "https://google.com",
    "https://youtube.com",
    "https://instagram.com",
    "https://spotify.com",
    "https://cloudflare.com",
]


for website in websites:
    response = pool.request("GET", website, extensions={"timeouts": {"connect": 2}})
    print(response)

which browsers can you demonstrate using it

According to Wikipedia, all major browsers support the HTTP/3 protocol.

HTTP/3 is (at least partially) supported by 94% of tracked web browser installations (96% of "tracked mobile" and 94% of "tracked desktop" web browsers),[7] and 26% of the top 10 million websites.[8] It has been supported by Chromium (and derived projects including Google Chrome, Microsoft Edge, Samsung Internet, and Opera)[9] since April 2020 and by Mozilla Firefox since May 2021.[7][10] Safari 14 implemented the protocol but it remains disabled by default.[11]

You can also use this website to determine whether the request was sent over HTTP/3 or HTTP/1.1, and then open the dev tool to view the schema and headers that were sent over the network.

To learn more about HTTP/3 state in 2023, visit https://blog.cloudflare.com/http3-usage-one-year-on/.

What's the use-case for HTTP/3 in httpx - are there conditions under which it's beneficial to the user?

Here are some of the reasons why we should add HTTP/3 support.

  • It's very simple to support with our existing httpcore design. As you can see, I implemented HTTP/3 by adding a single _http3.py file and making only minor changes to other files, so the first reason is that it is simple to do thanks to the fantastic httpcore design
  • HTTPX is becoming a more appealing library for newcomers. They can debug websites with HTTP/3, or they can experiment with it for fun.
  • It is a more recent and improved version of HTTP/2.0. The usability of HTTP/3 was also discussed in the relevant issue of the httpx project, which can be found here.

How do we intend to maintain the HTTP/3 work alongside the existing HTTP/2 work with a minimal maintenance load?

Yes, this is an important question.

One of the goals was to implement HTTP/3 with as few differences as possible from HTTP/2.
I even copied the _http2.py file and only made http3-related changes, so the _http2.py and _http3.py files are 95% identical.

We can assume that _http2.py and _http3.py are the same files, but instead of the h2 library, _http3.py uses aioquic.
That keeps maintenance as simple as we already have.

What discovery mechanism are browsers currently using for HTTP/3 detection? Is detection over DNS records currently deployed and used?

This question has already been discussed in encode/httpx#275.
We can simply use a special header (Alt-Svc) and then use http3 in subsequent requests if we know the server supports HTTP/3.

There is also a section in RFC that describes the connection setup process, so you can find more detailed information there.

@seidnerj
Copy link

Would love to see HTTP3 support in httpx

Copy link
Contributor

@T-256 T-256 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO when enabling multiple HTTP versions, could consider http3 to have higher precedence than two others, since it uses UDP, though could still fallback to TCP.

httpcore/_async/connection.py Show resolved Hide resolved
httpcore/_async/connection.py Show resolved Hide resolved
@T-256
Copy link
Contributor

T-256 commented Dec 17, 2023

IMO when enabling multiple HTTP versions, could consider http3 to have higher precedence than two others, since it uses UDP, though could still fallback to TCP.

An alternative example for this change:

can_connect_tcp = True

if self._http3:
    try:
        from . import AsyncHTTP3Connection

        stream = await self._connect_http3(request)
        self._connection = AsyncHTTP3Connection(
            origin=self._origin,
            stream=stream,
            keepalive_expiry=self._keepalive_expiry,
        )

        can_connect_tcp = False
    except Exception as exc:
        if not (self._http1 or self._http2):
            raise exc

if can_connect_tcp:
    stream = await self._connect(request)

    ssl_object = stream.get_extra_info("ssl_object")
    http2_negotiated = (
        ssl_object is not None
        and ssl_object.selected_alpn_protocol() == "h2"
    )
    if http2_negotiated or (self._http2 and not self._http1):
        from .http2 import AsyncHTTP2Connection

        self._connection = AsyncHTTP2Connection(
            origin=self._origin,
            stream=stream,
            keepalive_expiry=self._keepalive_expiry,
        )
    else:
        self._connection = AsyncHTTP11Connection(
            origin=self._origin,
            stream=stream,
            keepalive_expiry=self._keepalive_expiry,
        )

@karpetrosyan
Copy link
Member Author

Ugh, let's consider next steps here.
I believe we should clarify some points, particularly how http3 negotiation should be implemented.


HTTP/3 Negotiation

There are at least three approaches we could take to solve this problem:

  • Alt-Svc header
  • HTTP/3 first, then HTTP/1 and HTTP/2.
  • HTTPS DNS records

Let's go over each one and provide some useful links so you can dig deeper.

Alt-Svc

Alt-Svc is a HTTP header that indicates that there are alternative services located on some port that use some protocol, and that the client can switch to that service if the protocol provided by that service is preferred.

HTTP servers, for example, frequently use this Alt-Svc header to inform browsers that they support the HTTP/3 protocol.

Alt-Svc: h3-25=":443";

In the world of HTTPX, we could potentially store information about the origin's alternative services and send subsequent requests based on supported protocols.

Also, the server may provide additional information with the alternative service, such as an expiration time, which we must respect and avoid using stale information about the alternative service.

Here is an example of an Alt-Svc header that could be interpreted as "I support HTTP/3 protocol on port 443, but you should not rely on this information after an hour."

Alt-Svc: h3-25=":443"; ma=3600

This also complicates the use of this approach, so we should account for it.

See also: https://http3-explained.haxx.se/en/h3/h3-altsvc

HTTP/3 first, then HTTP/1 and HTTP/2

The idea behind this approach is to always try HTTP/3 and, if that fails, fall back to HTTP/1 or HTTP/2 over TCP.

It appears that browsers do not use this approach, at least because it complicates the connection process and makes request sending even slower if the connection is reverted to TCP after attempting UDP.

In HTTPX, we can try HTTP/3 first if all other protocols are disabled, so the client indicated that it only wants to use HTTP/3 connections.

You can already send such requests by specifying that you only want to use the HTTP/3 protocol, as in:

pool = httpcore.ConnectionPool(http1=False, http2=False, http3=True)
response = pool.request("GET", "https://cloudflare.com")

HTTPS DNS records

HTTPS RR (HTTPS Resource Records) are relatively new DNS records that delivers configuration information and parameters for how to access a service via HTTPS.

An HTTPS RR can be used to optimize the process of connecting to a service using HTTPS.
Clients can use this record to negotiate protocols at the DNS layer rather than at the TLS layer, as we do with HTTP/1 and HTTP/2.

You can think of HTTPS records as TLS alpn for the DNS layer.

Here are some useful resources on this subject.

https://developer.mozilla.org/en-US/docs/Glossary/HTTPS_RR
https://datatracker.ietf.org/doc/draft-ietf-dnsop-svcb-https/00/
https://blog.cloudflare.com/speeding-up-https-and-http-3-negotiation-with-dns
https://emilymstark.com/2020/10/24/strict-transport-security-vs-https-resource-records-the-showdown.html

@karpetrosyan
Copy link
Member Author

karpetrosyan commented Dec 22, 2023

I'll leave some key differences between our http3 and http2 implementations here to make it easier to review.

Notes

If you are unfamiliar with HTTP/3 and HTTP/2, I recommend the following resources:

In a nutshell, HTTP3 uses the QUIC protocol, which is based on UDP and implements all the necessary logic, for example, re-transmissions.

Unlike in HTTP2, where we just have a single connection object that h2 provides us, and we can feed him data and ask him for data to send through the wire, this process is a little bit complicated in the HTTP3 implementation because now we have two such objects.

One is the quic connection itself, which handles all the data flow, what we should send, and what we have received. The second is the h3 connection, which is the HTTP/3 connection state, which can receive a quic event and understand what it should do next.

This separation can somewhat help the developer to distinguish the connection layer and the HTTP layer, so we have, for example, StreamReset and ConnectionTerminated events that are QUIC events, and we have DataReceived and ResponseReceived that are H3 events.

In h2, we do not have such separation because all the staff is implemented on top of the TCP protocol, whereas now we have an additional layer where stream handling happens.

The additional layer is also the reason why there are two "connection" objects in the aioquic package, because unlike in the http2 implementation, where we care only about tcp and http, now we should think about udp, quic, and http.

The QUIC layer also helps us to get rid of flow window control, connection setup, and related things that are now handled by the QUIC protocol itself.

Changed

Events

First, here is how we import HTTP2 events and HTTP3 events

import h2.events
from aioquic.h3 import events as h3_events
from aioquic.quic import events as quic_events

We handle five events in HTTP2 implementation; here is how those events look in HTTP3.

  • h2.events.ResponseReceived -> h3_events.ResponseReceived
  • h2.events.DataReceived -> h3_events.DataReceived
  • h2.events.ConnectionTerminated -> quic_events.ConnectionTerminated
  • h2.events.StreamReset -> quic_events.StreamReset
  • h2.events.StreamEnded -> None

Here is the code reference for that part in the http3.py and http2.py

Removed

Method _send_connection_init.

Because aioquic handles all connection establishment, this method is unnecessary in the http3 implementation.

Method _receive_remote_settings_change.

In http3 implementation, this staff is handled by aioquic.

Flow control window

In http3.py, we use QUIC, which handles all flow control for the entire abstract stream, so we do not handle this in our application.

@rthalley
Copy link

rthalley commented Jan 2, 2024

Re DNS HTTPS records, dnspython has good support for them, but dnspython also uses httpx (and thus httpcore) for DNS-over-HTTPS, so I'm not completely sure how to deal with the chicken-and-egg mutual module dependency issues, but I'm happy to work with the httpcore team.

@mborsetti
Copy link

Where is HTTP/3 support on the release schedule?

},
)
except BaseException as exc: # noqa: PIE786
with AsyncShieldCancellation():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might need changing wrt #927

@karpetrosyan
Copy link
Member Author

I’m not sure why the pipeline failed, but the implementation works. I would like to continue working on this, and we need to cover the implementation with tests. What do you think, @encode/maintainers? Do we have any blockers? I would also appreciate a review from @jlaine, if possible.

@karpetrosyan
Copy link
Member Author

I can already see a 10-20% speed boost on my machine compared to our HTTP/2 implementation as well.

@graingert
Copy link
Member

graingert commented Sep 19, 2024

I'm a bit concerned about how the pyopenssl context is configured. I think this would break httpx.get(..., verify=cafile)

Generally SSL in httpcore is configured by passing in a SSLContext but this PR seems to bypass that and pass certify.where()

@graingert
Copy link
Member

I think the way to do it is move httpx.create_ssl_context( into httpcore then add an http3 kwarg that makes it return a dataclass with both an ssl context and a pyopenssl context as private fields

@graingert
Copy link
Member

graingert commented Sep 23, 2024

I've been thinking about this for a while and using two different contexts for the same httpx session is cryptographically fishy (and probably slow - loading the cert store twice). I've had a quick look at the anyio trio and sync ssl streams and I'm happy to make a PR to make httpcore support either ssl or pyopenssl contexts then we can require a pyopenssl context for http3=True

@graingert
Copy link
Member

I've misunderstood how tls in aioquic works, I saw the dep on pyopenssl and made the assumption it's used for TLS. However aioquic uses it's own TLS 1.3 implementation which requires cadata, cafile, capath etc passed in, so we will need to create our own context that wraps the stdlib ssl context and the required parameters for aioquic

@tomchristie
Copy link
Member

Pointers?


async def _do_handshake(self, request: Request) -> None:
assert hasattr(self._network_stream, "_addr")
self._quic_conn.connect(addr=self._network_stream._addr, now=monotonic())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should use the event loop time, so that trio can use auto jump clock


return events

async def _write_outgoing_data(self, request: Request) -> None:
Copy link
Member

@graingert graingert Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to call quic.get_timer() and scheduler a timer so that quic can queue lost datagrams

): # pragma: no cover
from .http3 import AsyncHTTP3Connection

stream = await self._connect_http3(request)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be doing happy eyeballs

raise self._read_exception # pragma: nocover

try:
data = await self._network_stream.read(self.READ_NUM_BYTES, timeout)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need a background task that's constantly reading any datagrams from the server as they can be sent unsolicited

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants