Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UDS Support #138

Closed
parity3 opened this issue Aug 6, 2020 · 15 comments · Fixed by #139
Closed

UDS Support #138

parity3 opened this issue Aug 6, 2020 · 15 comments · Fixed by #139
Labels
enhancement New feature or request

Comments

@parity3
Copy link

parity3 commented Aug 6, 2020

This httpx chanelog PR, as part of a movement to sunset usage of urllib3 in favor of httpcore, mentions UDS support as a temporary casualty of the process because "maintaining that would have meant having to push back our work towards a 1.0 release".

Regarding putting this support back into httpcore, there has been recent work done in this proof of concept (thanks @florimondmanca ) that suggests that including UDS support inside the library would not be an overwhelming task.

I personally use a lot of inter-service communication via unix sockets so this would be (opinion) a welcome addition as a first-class citizen of the new release.

I am brand new to this library; I have only used pre-httpcore httpx. After upgrading past 0.12 of httpx, I was surprised that my code could no longer use the uds= keyword when creating clients, enough to blow up on the trio gitter (apologies). I now understand that keeping to a release schedule and making everyone happy is an extremely hard task!

@tomchristie suggested this issue be created to start a discussion here. Go!

@tomchristie
Copy link
Member

I'm not sure if we do or don't want this built in, but something that'd help it along immensely would be a PR adding support for it back into httpcore, based on the initial work in encode/httpx#511 - tho it's less clear to me if integration with our connection pooling etc really makes sense in this context.

Something else that'd help is a really end user motivation for UDS support. Why might an end user want to plug the client into a UDS? What behaviour does it allow, why is it a useful thing to do, and how do we document the use case to our users?

@tomchristie
Copy link
Member

It's also perfectly possible that we'd prefer to instead have UDS support maintained as a third party library. Given that we've gone to the trouble of a really nicely spec'ed Transport API, that's a very doable thing, and could well be a better approach.

One other thing worth evaluating - what support do each of requests, urllib3, aiohttp have for UDS?

Finally is there any sense in HTTP/2 over UDS or does it not really make much sense in that context?

@florimondmanca
Copy link
Member

florimondmanca commented Aug 6, 2020

tho it's less clear to me if integration with our connection pooling etc really makes sense in this context.

That's my biggest concern really. From my POC implementation linked by OP, all we need is an HTTP/1.1 connection plugged into a UDS socket. We don't need connection pooling, HTTP/2, proxies, none of all that.

Which is why I'm currently inclined to still consider this should be a third party package, which potentially vendors the HTTP/1.1 connection implementation if it's not up for importing private API.

This is on par with the Requests ecosystem, since UDS lives in requests-unixsocket I believe (tho it's implemented as an adapter because requests doesn't have a notion of transport, which ends up being a bit clunky).

Why might an end user want to plug the client into a UDS?

I think I've shared this use case before, but for example the local Docker API is only available through a Unix Socket (the docker.sock Unix socket that's maintained by the Docker daemon). This is the most obvious example I know, and I believe it was the one we used in the previous docs as an end user example of using UDS. Pretty sure there are other such "services" that expose a local API via UDS. :)

Still not as widespread as "make an HTTP request over the network", obviously.

@florimondmanca
Copy link
Member

One argument in favor of having it in code tho is that

  • It's honestly quite low maintenance.
  • No need to go through hoops of importing private API, since we can do that in core.

I think packaging up my implementation gist as a PR on the HTTPCore repo would help better evaluate these points... Anyone up for it could just go at it I guess. :-)

@tomchristie
Copy link
Member

Right, well a UDSTransport class that we support directly inside httpcore might make sense.

(Rather than a UDS=... argument on the connection pool)

@florimondmanca
Copy link
Member

Exactly, yes. :-)

@parity3
Copy link
Author

parity3 commented Aug 6, 2020

I did not search on this before, did not realize you guys had previously talked at length on this.

The way I would implement UDS completely independent of any library is to make a context which spawns a sub-process, which finds a free port and runs socat to tunnel the tcp traffic, as mentioned in this SO answer.

Why might an end user want to plug the client into a UDS? What behavior does it allow, why is it a useful thing to do, and how do we document the use case to our users?

Here are some reasons I can think of to use UDS over TCP ports:

  • UDS is easier to reason about because it does not have to worry about dealing with firewall rules considering it's automatically assumed to be only accessible by user permissions / localhost
  • it can be easier to track connections by naming the endpoints / putting them in a hierarchy (instead of doing a netstat / ss / dns lookup and finding the owning processes, etc)
  • in some cases, a slight performance gain, although for python purposes, probably negligible

Finally is there any sense in HTTP/2 over UDS or does it not really make much sense in that context?

In terms of the multiplexing benefits, in theory, you're removing additional descriptor needs from handling multiple 1.1 connections. Also it could be used for testing/etc. But if it's easier to just implement for 1.1 that's probably ok too.

@tomchristie
Copy link
Member

Hrm, thinking about it again, I'm not so sure. Surely you really do want the standard connection pooling behaviour, so that you're opening multiple connections if you've got multiple concurrent requests.

@florimondmanca
Copy link
Member

florimondmanca commented Aug 6, 2020

My thinking was rather a simplistic form of pooling directly on the transport, eg "connect if not connected yet".

We'd connect on the first request and then send all subsequent requests on the same connection. You're passing the UDS path on init, so there's really only ever one connection (no equivalent to per request host-port like in the TCP case).

Or am I missing something?

Edit: ah well yes, when one thinks about "what happens for concurrent requests" then obviously you'll want multiple connections + pooling so that request A doesn't interact with request B... Hmm.

@parity3
Copy link
Author

parity3 commented Aug 6, 2020

I don't think there's anything about the high-level "pool" concept that would need huge divergence in API/logic or user expectations when it comes to UDS vs TCP. But perhaps I'm missing something too. I can't speak to httpcore/httpx's pooling strategy specifically.

@florimondmanca
Copy link
Member

Actually, it's really more complex than I thought.

First, a bit of an overview of other clients:

What other clients do

With curl:

curl --unix-socket /var/run/docker.sock http://localhost/info

With requests-unixsocket:

import requests_unixsocket

session = requests_unixsocket.Session()

r = session.get('http+unix://%2Fvar%2Frun%2Fdocker.sock/info')

(Yes, percent-encoding is recommended in the official docs. Also they're recommending (but not enforcing) the usage of http+unix. It's non-standard though, and it was rejected by curl, see curl/curl#1338.)

Considerations on HTTP over UDS

Some points on which I was mistaken before:

  • HTTP/2 is probably a valid thing to do when speaking HTTP over a Unix Domain socket (the UDS is only a data transport after all, yet the actual application layer remains HTTP).
  • HTTPS is also valid. Eg you can protect the Docker daemon behind TLS, so that the request with curl becomes curl --unix-socket /var/run/docker.sock https://localhost/info --cert [...] --key [...] --cacert [...]. That said, there doesn't seem to be a notion of port 80 vs port 443 (the socket is encrypted, but there's no port required when connecting).
  • For the Docker use case, obviously the Docker API can also be requested over normal network-HTTP (not "just UDS").

So URLs should be restricted to Tuple[bytes, bytes, None, bytes], where the scheme is http or https, and the hostname is irrelevant (but must still be non-empty).

Now, what do we do with this…?

Options

A/ Add UDS support into core via AsyncConnectionPool

We could update AsyncConnectionPool so that it accepts a uds parameter, the path to a UDS socket file. This means we can reuse the HTTP/1.1-HTTP/2 implementations, as well as SSL support.

But… a single uds path means the "per-origin" part of the connection pooling algorithm isn't used.

So instead, should we go the requests-unixsocket way and accept the UDS socket path as the hostname?

url = (b"http", b"/var/run/docker.sock", None, b"/containers/json")

It's a bit dirty, because eg curl treats the hostname as irrelevant (the path to the Unix socket is passed as a separate option), so imo we should too.

So, sticking to a uds=... transport init parameter is probably fine. When this option is passed all connections will be established through the socket (so the transport/client will be "UDS-only").

async with AsyncConnectionPool(uds="/var/run/docker.sock") as transport:
    url = (b"http", b"localhost", None, b"/info")
    headers = [(b"host", b"localhost")]
    response = await transport.request(b"GET", url, headers)

B/ Separate AsyncUDSTransport class

Alternatively, we could build UDS support as a separate transport class, i.e. different from AsyncConnectionPool.

Such a separate class might be able to reuse AsyncHTTPConnection if implemented in core. If implemented as a third-party package, if most of it will have to be re-implemented unless the author is okay importing private API. This includes HTTP/1.1 and HTTP/2 implementations, and the enclosing AsyncHTTPConnection.

As discussed previously, it should probably accept a path=... parameter (instead of allowing users to pass the socket path as part of the URL). This would simplify any re-implementation of connection pooling, since we'd only need to keep track of connections with a max_connections limit (no need for per-origin pooling, since they're all going to the same destination).

async with AsyncUDSTransport(path="/var/run/docker.sock", max_connections=10) as transport:
    containers_url = (b"http", b"localhost", None, b"/containers/json")
    images_url = (b"http", b"localhost", None, b"/images/json")

    response = await transport.request(b"GET", containers_url)

    # Connection is reused.
    response = await transport.request(b"GET", images_url)

    async with trio.open_nursery() as nursery:
        # One request will reuse the connection, the other will create a new connection.
        nursery.start_soon(transport.request, b"GET", containers_url)
        nursery.start_soon(transport.request, b"GET", images_url)

I think from an implementation perspective, Option A/ is certainly the easiest. Having it in core is the easiest as well. Accept uds=..., which makes the connection pool's connections connect via open_unix_connection() instead of open_tcp_connection() (for asyncio), and voilà.

@parity3
Copy link
Author

parity3 commented Aug 6, 2020

Another thought: allow users to add a pre- connected socket into the pool manually. That way, the library doesn't even need to be aware of whether it's TCP vs UDS. I bet that's like a 2 line change to expose an API method?

It would require some documentation / tips on outside setup / insertion of such a pre-connected UDS socket, but maybe that's easier vs doing all the socket establishment/handling inside the library.

EDIT: you'd also need a nice way to bind a request to that specific added connection in the pool somehow (some special origin style or something)

@florimondmanca
Copy link
Member

Interesting idea yeah, though it's most likely not as simple as you might think.

  • The method would need to accept the origin to which to attach the connection alongside the socket (this is so that the pool can know when to use this socket vs create or reuse another connection).
  • There are quite a few other options such as "should the connection be HTTP/2" or "should it be TLS" (and in that case, is your socket already wrapped?) that the method might need to support?
  • We have to support multiple I/O frameworks, and so we'd need a way to say "create an AsyncSocketStream from a pre-configured standard socket" for each supported async library, and the equivalent for SyncSocketStream, which is an operation we don't have yet in our backend implementations.

But definitely worth considering too, as a more general solution...

@parity3
Copy link
Author

parity3 commented Aug 7, 2020

I was thinking you'd just create an identifier, like so:

uds = await create_and_connect_unix_socket()
connection_prefs = dict(use_http11=True)
pool_id = 'udsgroup'
pool.insert_connection(pool_id, uds, **prefs)
uds2 = await create_and_connect_unix_socket()
pool.insert_connection(pool_id, uds2, **prefs)

url = 'http://www.google.com'

await pool.request(url, pool_id=pool_id)

# OR

# no idea if we can piggy back off of some other established policy for providing extra info here
url = f'http://[{pool_id}]:www.google.com'
await pool.request(url)

Then the pool would treat the connection as a brand new empty-state connection, and would do everything it normally does with the exception of the actual socket connect.

I don't know exactly where you put the connection_prefs currently in the API (whether pool or request level) but the library user should know not to issue mixed-protocol requests on a socket that they inserted.

I should probably stop talking now without actually looking through documentation and/or soure code.. but I don't really have time to do that. Maybe I can just be the "outside naive voice" in this issue thread :)

@tomchristie
Copy link
Member

So yup, uds=... is clearly the way forward here.
We'll end up with:

transport = httpcore.SyncConnectionPool(uds=...)

And document it's usage alongside local_address=... (See encode/httpx#1165)

@tomchristie tomchristie changed the title Include UDS support (back again) in next release UDS Support Aug 11, 2020
@tomchristie tomchristie added the enhancement New feature or request label Aug 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants