Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for TLS in TLS for HTTPS proxies. #1806

Merged
merged 1 commit into from
Aug 19, 2020

Conversation

jalopezsilva
Copy link
Contributor

@jalopezsilva jalopezsilva commented Feb 25, 2020

This PR aims to add support for HTTPS proxies by adding a class to easily provide TLS in TLS support. TLS within TLS is not supported easily within the ssl library. The SSLSocket actually takes over the existing socket (https://github.com/python/cpython/blob/master/Lib/ssl.py#L999-L1006)
instead of wrapping it entirely. The only way to support to TLS within TLS is with the SSLContext.wrap_bio methods.

The first commit introduces a class called SSLTransport which wraps a socket in TLS using the provided SSL context. I'm currently in the process of adding unit tests and integration tests but it would be useful if we can merge #1679 for that.

Once I can fully integrate this within urlib3 I'll perform some production testing on my own to validate the changes work as expected.

@pquentin
Copy link
Member

Now that #1679 is merged, I'd love to move this forward. So... this appears to make sense, and is surprisingly compact, but I would love to get sample code or even an integration with urllib3 to be able to play with it.

@jalopezsilva jalopezsilva force-pushed the tls_in_tls branch 2 times, most recently from 1f14923 to e1f7a8c Compare March 31, 2020 21:47
@codecov
Copy link

codecov bot commented Mar 31, 2020

Codecov Report

Merging #1806 into master will decrease coverage by 0.96%.
The diff coverage is 87.64%.

Impacted file tree graph

@@             Coverage Diff             @@
##            master    #1806      +/-   ##
===========================================
- Coverage   100.00%   99.03%   -0.97%     
===========================================
  Files           23       23              
  Lines         2055     2171     +116     
===========================================
+ Hits          2055     2150      +95     
- Misses           0       21      +21     
Flag Coverage Δ
#unittests 98.66% <87.64%> (-0.85%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/urllib3/contrib/ssl.py 81.73% <81.73%> (ø)
src/urllib3/connection.py 98.78% <91.30%> (-1.22%) ⬇️
src/urllib3/connectionpool.py 100.00% <100.00%> (ø)
src/urllib3/poolmanager.py 100.00% <100.00%> (ø)
src/urllib3/util/connection.py 100.00% <100.00%> (ø)
src/urllib3/util/ssl_.py 100.00% <100.00%> (ø)
src/urllib3/__init__.py 100.00% <0.00%> (ø)
src/urllib3/response.py 100.00% <0.00%> (ø)
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cbb6be7...19ed6f4. Read the comment docs.

@jalopezsilva jalopezsilva force-pushed the tls_in_tls branch 3 times, most recently from 2436dd4 to 0185939 Compare March 31, 2020 23:52
@jalopezsilva
Copy link
Contributor Author

Alright @pquentin, I've updated the PR with an initial integration. A couple of things to keep in mind:

  • No support for less than Python3 - The wrap_bio methods used to perform the TLS in TLS connection were only added in Python 3. I gated the feature following the same conventions I saw in the code base.
  • No support for SSL-less environments - Similar to the previous comment, if we don't have support for SSL, there's no way we'll be able to do TLS in TLS. I gated the feature following the code base conventions.
  • Kept the allow_insecure_proxy option - For now, I kept the flag we added in the last PR that toggles the behavior to use the absolute URI for HTTPS proxes with HTTPS connections. The option would only be used if provided by the user and would be helpful in cases where we can't do TLS in TLS (less than Python3). We also found this useful for inspecting outgoing traffic for our internal services. (i.e to verify they're not sending prohibited data externally) I understand if you want to remove it though, we can chat about this more.

There are a couple of things I want to do before merging this but I wanted to send it early to get some feedback. Items I still have to address:

  • More tests on the SSLTransport component.
  • Improve the integration. I have mixed feelings about the connection_requires_http_tunnel method. I might change how that currently works.
  • Production testing. There are two teams internally that need this patch. I'm planning on updating our local versions and running it on production for a while to make sure it works as expected.

Play with it and let me know if you have any feedback!

@pquentin
Copy link
Member

pquentin commented Apr 1, 2020

@jalopezsilva Thanks! Those assumptions make sense to me. I believe @sethmlarson has stronger feelings about the allow_insecure_proxy option, but that's something we can figure out later.

I restarted a failing PyPy run (sorry, our PyPy tests are quite flaky), and now the codecov checks work as expected since as you noted coverage is not at 100% yet. For what it's worth, https://codecov.io/gh/urllib3/urllib3/pull/1806/tree/src/urllib3 is the best way to see see covered lines.

I'll try to play with it, and will also try to find a way to review it. Please tell us when this runs in production. :)

Copy link
Member

@sethmlarson sethmlarson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking on this task, it's a whole pile of work! 💪

Here's a list of comments and discussion points I could think of.

src/urllib3/connection.py Outdated Show resolved Hide resolved
src/urllib3/contrib/ssl.py Show resolved Hide resolved
src/urllib3/poolmanager.py Outdated Show resolved Hide resolved
src/urllib3/poolmanager.py Outdated Show resolved Hide resolved
src/urllib3/poolmanager.py Outdated Show resolved Hide resolved
test/test_ssl.py Outdated Show resolved Hide resolved
test/test_util.py Outdated Show resolved Hide resolved
test/test_util.py Outdated Show resolved Hide resolved
src/urllib3/poolmanager.py Outdated Show resolved Hide resolved
@jalopezsilva
Copy link
Contributor Author

jalopezsilva commented Apr 3, 2020

Hey @sethmlarson, thanks for the quick review. I'll address most of your comments. I wanted to highlight my response to two of them here.

  • Need for a proxy SSLContext - We actually need this! We're moving all of our internal clients to use mutual authentication for our outgoing proxy. Instead of relying on a Proxy-Authorization header, we'll use a client certificate to authenticate with the proxy. In order to avoid the client certificate being sent to the destination server, we need two different SSLContexts, one for the proxy and one for the destination. What I'm doing in the code right now is use the proxy SSLContext if provided, if not, use the same one that we'll use for the destination server. (i.e we create a SSLContext based on ca_certs provided by the user). Should I pass it directly instead of using the ProxyConfig object? Whatever works, I thought the ProxyConfig object made it easier to manage.

  • Keeping the _allow_https_proxy_to_see_traffic option - I kept the option as an alternative to the TLS within TLS for HTTPS destinations with an HTTPS proxy. When the option is set, TLS within TLS is skipped. This is a nice to have for us, not a need to have. It makes it easier to inspect the outgoing traffic when running a corporate proxy. (i.e no need to MitM) The client would have to opt-in by setting this option to true. That said, I understand not everyone is running in a trusted corporate environment. It could be dangerous if users don't understand the implication of the option. I would prefer to keep it and make sure it's properly documented but I'm flexible and we can remove it if needed. Let me know.

@sethmlarson
Copy link
Member

sethmlarson commented Apr 3, 2020

@jalopezsilva Thanks for all you're doing for this feature, I'm really happy with the work so far. It's an amazing improvement to the library and I'm excited to be able to ship it to our users. ✨

  • You're totally spot-on about the client certificates, keep proxy_ssl_context.
  • Having the ProxyConfig object be mutable worries me. Is there a way we can track the destination_scheme differently such that the config object can be immutable?
  • _allow_https_proxy_to_see_traffic correct me if I'm not seeing this correctly, but this is essentially saying "I'm running an HTTPS proxy but I want to use forwarding proxy mode even if I request an HTTPS URL from my proxy? If we're keeping this feature let's promote it to public and then document it in laymans terms with a warning and technically as well w/ proxy mode and "who can see what".

Also feel free to call me out if I'm totally missing something here, this is a lot to think about this early in the morning. :)

@jalopezsilva jalopezsilva mentioned this pull request Apr 9, 2020
@jalopezsilva jalopezsilva force-pushed the tls_in_tls branch 3 times, most recently from 0d681e2 to 6c84fe3 Compare April 17, 2020 01:25
@jalopezsilva
Copy link
Contributor Author

jalopezsilva commented Apr 29, 2020

Quick update for you @pquentin, @sethmlarson, I'm testing this with a couple of teams in production.

We ran into an interesting bug that I thought would share. Do you know why requests by default will attempt to use pyopenssl if available? https://github.com/psf/requests/blob/master/requests/__init__.py#L95-L100

PyOpenSSLContext doesn't support the wrap_bio methods needed for TLS in TLS and ends up generating en error. I can check if we're using pyopenssl and throw an error in this case but I was curious why we still default to pyopenssl.

I suspect we did this before python had SNI support on the ssl module. ( Less than python 3.2 ) That said I'm surprised we don't have a check for ssl.HAS_SNI in requests nowdays to toggle pyopenssl or not. I can also bring this up on the requests repo, just figured I would ask here first.

@sethmlarson
Copy link
Member

sethmlarson commented Apr 30, 2020

@jalopezsilva You're discovering some of the difficult parts of our current TLS situation with urllib3+requests. :) Others and I have voiced concern over having pyOpenSSL be the preferred TLS implementation when available, especially nowadays with SNI on almost all supported platforms.

I'd like to remove the unconditional use of pyOpenSSL, that'd definitely be something that would move us forwards. That way on the few platforms without SNI installing requests[security] wouldn't break, and then in the future the install of pyOpenSSL and requests[security] extra can be removed/blanked altogether.

As for this feature as it currently stands without BIO support on pyOpenSSL we can't provide TLS+TLS for HTTPS proxies. If we're going to implement it now we'd have to error out and explain to users why they're experiencing the error w/ potential fixes which may include uninstalling pyOpenSSL if Requests still unconditionally patches. Maybe we're fine with that given the low occurrence of TLS+TLS w/ requests[security]?

Also are we still unsure about SecureTransport, is there something we can do there?

Note as a part of 2.x I'd really like to drop our pyOpenSSL TLS implementation as well.

cc @nateprewitt

@sethmlarson
Copy link
Member

I opened a PR on Requests to drop the unconditional monkey-patching: psf/requests#5443

@pquentin
Copy link
Member

Cool, that requests PyOpenSSL change is now released!

@jalopezsilva Can we do anything to help?

@jalopezsilva
Copy link
Contributor Author

Sorry for the lack of updates here! I got sidetracked at work with other priorities as well with everything going on in the world. Production testing has been mostly good, we found some bugs and got some feature requests that teams had. I'll put up a brief summary here:

  • Bug: Using PyOpenSSL as a default on requests was the # 1 bug we found. I had added a workaround in my internal patch but some teams were still affected. I'm glad the fix has finally rolled out. I'll integrate it internally and see if it works as expected.

  • Feature: Using environment variables instead of passing certs + keys. This was a common request teams had. Quite often they were not using urllib3 directly, but rather other library that used urlib3 underneath. Most of the time, other libraries don't have an easy way to pass extra parameters to urllib3 so even though urllib3 supported TLS in TLS, teams couldn't configure it. I was hoping to add environment variables such as PROXY_TLS_CERT and PROXY_TLS_KEY and use them to generate the proxy SSLContext. What do you think?

  • Feature More than a urllib3 feature this is a ssl python module feature. PyOpenSSL supports creating SSLContext with certificates/key from memory. ssl doesn't. We had some teams excited to use this feature but they couldn't because they needed the in-memory support. While investigating this, I confirmed that PyOpenSSL supports wrap_bio methods which we can potentially use here as well. This is something that we can do as a follow up once we release this initial support.

  • Bug: There was a tricky bug involving SNI with a particular server that I was troubleshooting when I got side tracked. We found a server that wasn't taking the SNI hint when we used TLS in TLS. A regular TLS connection seemed to work as expected. I have to go back and confirm the problem wasn't in our code, or CPython. My hope is that this was a problem with the server implementation but I'll confirm once I have more data.

And that's about it. I'm currently closing up some other projects and my plan is to get back at this on July so we can close it out. I'll post here if I need help with anything!

@pquentin
Copy link
Member

pquentin commented Jul 1, 2020

Thanks for the update!

I was hoping to add environment variables such as PROXY_TLS_CERT and PROXY_TLS_KEY and use them to generate the proxy SSLContext. What do you think?

@sigmavirus24 I've seen you mention that environment variables were generally a bad idea for urllib3/requests (psf/requests#3070 (comment)), do you think you can expand in order to explain when it's a bad idea and when it's not? It's possible to configure urllib3 with requests adapters, but it's more cumbersome.

PyOpenSSL supports creating SSLContext with certificates/key from memory. ssl doesn't.

This would be really nice if ssl supported that, yeah. It would make writing tests using trustme easier, for example.

@jalopezsilva
Copy link
Contributor Author

I've seen you mention that environment variables were generally a bad idea for urllib3/requests (psf/requests#3070 (comment)), do you think you can expand in order to explain when it's a bad idea and when it's not? It's possible to configure urllib3 with requests adapters, but it's more cumbersome.

If I have to guess this is related to the HTTP_PROXY and HTTPS_PROXY environment variables.

These environment variables can be used to configure if a proxy should be used or not when contacting a destination. The problem arises when these environment variables modify all of your requests to use a proxy. One of the problems that I've seen in the past is when users use urllib3/requests to contact internal (not requiring a proxy) and external destinations (requiring a proxy). If you unintentionally set the HTTP_PROXY environment variables, all of your traffic will attempt to use the proxy which is incredibly annoying and difficult to debug.

The environment variables I was thinking (PROXY_SSL_CERT, and PROXY_SSL_KEY) would only allow you to configure the cert and key provided to the proxy if and only if you have already specified you want to talk to a proxy using HTTPS. Hope that makes sense.

I'm prepping the changes with the environment variables included. We can discuss during code review too if needed.

@jalopezsilva jalopezsilva force-pushed the tls_in_tls branch 3 times, most recently from 957e172 to f57c734 Compare August 14, 2020 00:16
@jalopezsilva
Copy link
Contributor Author

jalopezsilva commented Aug 14, 2020

All right folks, it's time.

I have done my best to test these changes on production and address the bugs that we have identified. I'm updating the PR with the latest changes. For now, I have sent a single commit which adds the SSLTransport class and corresponding unit tests. Let's start the review with this for now.

I hope to update tomorrow with the second commit (which has the urllib3 integration).

Btw, there seems to be an unrelated regression on the Python3.10 CI:

  AssertionError: would build wheel with unsupported tag ('cp310', 'cp310', 'linux_x86_64')

https://travis-ci.org/github/urllib3/urllib3/jobs/717766744

This was referenced Mar 15, 2021
@jalopezsilva jalopezsilva deleted the tls_in_tls branch September 26, 2021 23:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants