Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for socket options #668

Merged
merged 21 commits into from
May 23, 2023
Merged

Conversation

karpetrosyan
Copy link
Member

@karpetrosyan karpetrosyan commented Apr 12, 2023

Closes #662

]


def create_connection(
Copy link
Member Author

@karpetrosyan karpetrosyan Apr 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is very similar to the built-in create_connection function, but it can also accept socket options.
We can't set socket options after a connection has been made because some options affect the connection itself, so we must set them before the connection was made.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'd suggest we add a comment noting that.

Perhaps rephrased to "This function is equivalent to Python's socket.create_connection() function, but also accepts socket options." and linking to the source on GitHub?

@karpetrosyan
Copy link
Member Author

karpetrosyan commented Apr 12, 2023

If we want to support socket options for async interfaces, we need to change a lot of things. Currently, the 'get_asynclib' function is used to get a specific backend interface provided by anyio, then the "connect_tcp" and "connect_unix" asynchronous functions of that module are used to create a connection; how can we override these methods while keeping it as simple as possible?

@karpetrosyan
Copy link
Member Author

As a simple solution, we can associate each library with its function and then call that function.

Old

async def connect_unix(path: Union[str, "PathLike[str]"]) -> UNIXSocketStream:
    path = str(Path(path))
    return await get_asynclib().connect_unix(path)

New

async def connect_unix(path: Union[str, "PathLike[str]"]) -> UNIXSocketStream:
    path = str(Path(path))
    library = sniffio.current_async_library()
    map = {
        "trio": trio_connect_tcp,
        "asyncio": asyncio_connect_tcp,
    }
    return map[library](path)

@tomchristie
Copy link
Member

tomchristie commented Apr 18, 2023

If we want to support socket options for async interfaces, we need to change a lot of things. Currently, the 'get_asynclib' function is used to get a specific backend interface provided by anyio, then the "connect_tcp" and "connect_unix" asynchronous functions of that module are used to create a connection

It took me a little while to track down what you're referring to here.

how can we override these methods while keeping it as simple as possible?

We'd need to work this through on a case-by-case basis.

The three cases to cover are sync, trio, and asyncio via anyio.

We'd want the PR to start by adapting each of...

  • httpcore/backends/sync.py
  • httpcore/backends/trio.py
  • httpcore/backends/asyncio.py

...and then work through from there.

Let's link to each of these here so it's easier to review how we'd handle this...

sock = socket.create_connection(
address, timeout, source_address=source_address
)

stream: trio.abc.Stream = await trio.open_tcp_stream(
host=host, port=port, local_address=local_address
)

stream: anyio.abc.ByteStream = await anyio.connect_tcp(
remote_host=host,
remote_port=port,
local_host=local_address,
)

You've already demonstrated how we'd adapt the sync case.

The trio case would require less duplication to resolve if python-trio/trio#281 was addressed, but otherwise needs their connect_tcp() function to be duplicated.

I think you're pointing out that the asyncio-via-anyio case is actually really awkward to resolve. And yes, that looks correct to me. It does look like duplicating the anyio connect_tcp implementation would be a bit awkward.

🤔

@tomchristie
Copy link
Member

Related, and possibly helpful here - it might be worthwhile taking a look at what aiohttp use for their connect_tcp implementation.

@karpetrosyan
Copy link
Member Author

karpetrosyan commented Apr 18, 2023

There is an elegant solution, which is to ignore the socket option, which affects the connect method itself.

Our issue is that the socket bind and connect methods behave differently depending on socket options, such as socket.SO_REUSEADDR can help us in connecting to a socket that was previously force closed and is now waiting 1-2 minutes for the socket to be released.

That is very fun, but trio does not support local address binding...

There is also REUSEPORT, which allows you to listen to an IP port pair with multiple sockets, but it is far from httpcore because httpcore is a client library and does not support REUSEPORT at all.

I looked for options that affect bind and connect methods and that httpcore or httpx users can use in their applications, but I couldn't find anything.

I believe it is better to ignore such options and set options after the backend library has been created a connection.
This solution avoids code duplication while keeping code as simple as possible.

This is how the updated function could look.

class Backend(AsyncNetworkBackend):
    async def connect_tcp(
         ...,
         socket_options
    ):
        ...
        stream = await backend.connect_tcp(
                    remote_host=host,
                    remote_port=port,
                    local_host=local_address,
                )
        for option in socket_options:
             stream._socket.setsockopt(*option)
        return AsyncIOStream(stream)

@tomchristie what are you think about this solution?

@karpetrosyan
Copy link
Member Author

Also, because we don't need options affecting bind or connect, we can use socket.create_connection instead of our newly created one to avoid code duplication

@tomchristie
Copy link
Member

Okay, setting socket options after the connect is probably(?) okay for our use-case.

For the async case I'd suggest we push the implementation into httpcore/backends/trio.py and
httpcore/backends/asyncio.py, with httpcore/backends/auto.py calling through to them.

Copy link
Member

@tomchristie tomchristie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good.

Let's also...

@tomchristie
Copy link
Member

We should also take a call on if we really do want this, or if what we actually need is just a better default for the sync backed so that it includes TCP_NODELAY.

See #651 (comment)

@karpetrosyan
Copy link
Member Author

We should also take a call on if we really do want this, or if what we actually need is just a better default for the sync backed so that it includes TCP_NODELAY.

See #651 (comment)

I think we should make TCP_NODELAY the default, but also make socket options configurable; there are many options in unix sockets that developers can use, and this can be a powerful feature for developers who know what they're doing.

Additionally, fixing encode/httpx#2635 requires these configurable sockets. which initially inspired me to open this PR :D

@@ -109,6 +110,7 @@ def __init__(
self._network_backend = (
SyncBackend() if network_backend is None else network_backend
)
self._socket_options = () if socket_options is None else socket_options
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self._socket_options = () if socket_options is None else socket_options
self._socket_options = socket_options

it looks to me like we should leave self._socket_options as optional throughout.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we always make sure to check if socket_options is not None before using it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would make sense for us to do the if socket_options is None dance at the last possible point, in the network backends.

@@ -54,6 +55,7 @@ def __init__(
self._connection: Optional[ConnectionInterface] = None
self._connect_failed: bool = False
self._request_lock = Lock()
self._socket_options = () if socket_options is None else socket_options
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self._socket_options = () if socket_options is None else socket_options
self._socket_options = socket_options

let's keep this optional just now.

@@ -187,7 +187,7 @@ def test_debug_request(caplog):
(
"httpcore.connection",
logging.DEBUG,
"connect_tcp.started host='example.com' port=80 local_address=None timeout=None socket_options=()",
"connect_tcp.started host='example.com' port=80 local_address=None timeout=None socket_options=None",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 yes, this is better.

@@ -79,6 +79,7 @@ The connection pool instance is also the main point of configuration. Let's take
a particular address family. Using `local_address="0.0.0.0"` will
connect using an `AF_INET` address (IPv4), while using `local_address="::"`
will connect using an `AF_INET6` address (IPv6).
* `socket_options`: Socket options that have to be included in the TCP socket when the connection was established.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's be consistent with our placement of socket_options in the list, so that the docs match the code.

Copy link
Member

@tomchristie tomchristie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, yeah.

I've included some possible comments here. (?)

httpcore/backends/asyncio.py Show resolved Hide resolved
httpcore/backends/sync.py Show resolved Hide resolved
httpcore/backends/trio.py Show resolved Hide resolved
karpetrosyan and others added 3 commits May 23, 2023 12:13
Co-authored-by: Tom Christie <tom@tomchristie.com>
Co-authored-by: Tom Christie <tom@tomchristie.com>
Co-authored-by: Tom Christie <tom@tomchristie.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow socket options to be configured.
2 participants