Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hang in x509certificate build chain when using a Let's Encrypt cert. #24527

Closed
iftahbe opened this issue Dec 26, 2017 · 47 comments
Closed

Hang in x509certificate build chain when using a Let's Encrypt cert. #24527

iftahbe opened this issue Dec 26, 2017 · 47 comments
Labels
area-System.Security os-linux Linux OS (any supported distro) question Answer questions and provide assistance, not an issue with source code or documentation.
Milestone

Comments

@iftahbe
Copy link

iftahbe commented Dec 26, 2017

In Ubuntu 16.04, dotnet core 2.0.3
Running a webhost console app with a server certificate issued by Let's Encrypt
Server is listening to https://172.31.46.243:443 (my private IP address)

Trying to run:

openssl s_client -connect <my-public-dns-name>:443 -msg 

Output:

ubuntu@ip-172-31-46-243:~$ openssl s_client -connect <my-public-dns-name>:443 -msg
CONNECTED(00000003)
>>> TLS 1.2  [length 0005]
    16 03 01 01 2c
>>> TLS 1.2 Handshake [length 012c], ClientHello
    01 00 01 28 03 03 0d 9b 38 e7 53 ba e2 ba 5b f1
    11 23 57 2b 7b 18 e1 4a d7 2e 1c de d2 43 bb e6
    2d f5 ab 43 bd fa 00 00 aa c0 30 c0 2c c0 28 c0
    24 c0 14 c0 0a 00 a5 00 a3 00 a1 00 9f 00 6b 00
    6a 00 69 00 68 00 39 00 38 00 37 00 36 00 88 00
    87 00 86 00 85 c0 32 c0 2e c0 2a c0 26 c0 0f c0
    05 00 9d 00 3d 00 35 00 84 c0 2f c0 2b c0 27 c0
    23 c0 13 c0 88 00 a4 00 a2 00 a0 00 9e 00 67 00
    40 00 3f 00 3e 00 33 00 32 00 31 00 30 00 9a 00
    99 00 98 00 97 00 45 00 44 00 43 00 42 c0 31 c0
    2d c0 29 c0 25 c0 0e c0 04 00 9c 00 3c 00 2f 00
    96 00 41 c0 11 c0 07 c0 0c c0 02 00 05 00 04 c0
    12 c0 08 00 16 00 13 00 10 00 0d c0 0d c0 03 00
    0a 00 ff 01 00 00 55 00 0b 00 04 03 00 01 02 00
    0a 00 1c 00 1a 00 17 00 19 00 1c 00 1b 00 18 00
    1a 00 16 00 0e 00 0d 00 0b 00 0c 00 09 00 0a 00
    23 00 00 00 0d 00 20 00 1e 06 01 06 02 06 03 05
    01 05 02 05 03 04 01 04 02 04 03 03 01 03 02 03
    03 02 01 02 02 02 03 00 0f 00 01 01

<<< Here it gets stuck for about 2 minutes

I ran strace on the server app - during the handshake it tries to connect to 192.35.177.64 on port 80!
This IP address belongs to a certificate authority (IdenTrust).
The operation gets stuck (EINPROGRESS) because port 80 is not allowed for outbound connections on my server.

Output of strace:

ubuntu@ip-172-31-46-243:~$ sudo strace -fp 5546 -Tfte trace=network
strace: Process 5546 attached with 73 threads
[pid  5571] 16:16:53 accept4(196, NULL, NULL, SOCK_CLOEXEC|SOCK_NONBLOCK) = 537 <0.000033>
[pid  5571] 16:16:53 setsockopt(537, SOL_TCP, TCP_NODELAY, [1], 4) = 0 <0.000037>
[pid  5571] 16:16:53 getpeername(537, {sa_family=AF_INET, sin_port=htons(33460), sin_addr=inet_addr("<my-public-ip>")}, [16]) = 0 <0.000034>
[pid  5571] 16:16:53 getsockname(537, {sa_family=AF_INET, sin_port=htons(443), sin_addr=inet_addr("172.31.46.243")}, [16]) = 0 <0.000033>
[pid  5571] 16:16:53 accept4(196, NULL, NULL, SOCK_CLOEXEC|SOCK_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable) <0.000033>
strace: Process 5652 attached
[pid  5652] 16:16:53 socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 538 <0.000045>
[pid  5652] 16:16:53 connect(538, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("172.31.0.2")}, 16) = 0 <0.000032>
[pid  5652] 16:16:53 sendmmsg(538, {{{msg_name(0)=NULL, msg_iov(1)=[{"u\315\1\0\0\1\0\0\0\0\0\0\4apps\tidentrust\3com\0"..., 36}], msg_controllen=0, msg_flags=MSG_DONTROUTE|MSG_CTRUNC|MSG_EOR|MSG_WAITALL|MSG_FIN|MSG_SYN|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC|0x13300010}, 36}, {{msg_name(0)=NULL, msg_iov(1)=[{"\21\215\1\0\0\1\0\0\0\0\0\0\4apps\tidentrust\3com\0"..., 36}], msg_controllen=0, msg_flags=MSG_EOR|MSG_WAITALL|MSG_RST|MSG_ERRQUEUE|MSG_MORE|MSG_CMSG_CLOEXEC|0x13680010}, 36}}, 2, MSG_NOSIGNAL) = 2 <0.000038>
[pid  5652] 16:16:53 recvfrom(538, "u\315\201\200\0\1\0\2\0\0\0\0\4apps\tidentrust\3com\0"..., 2048, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("172.31.0.2")}, [16]) = 83 <0.000033>
[pid  5652] 16:16:53 recvfrom(538, "\21\215\201\200\0\1\0\1\0\1\0\0\4apps\tidentrust\3com\0"..., 65536, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("172.31.0.2")}, [16]) = 115 <0.000039>
[pid  5652] 16:16:53 +++ exited with 0 +++
[pid  5627] 16:16:53 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 538 <0.000041>
[pid  5627] 16:16:53 connect(538, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("192.35.177.64")}, 16) = -1 EINPROGRESS (Operation now in progress) <0.000027>
strace: Process 5653 attached
strace: Process 5654 attached
strace: Process 5655 attached
[pid  5571] 16:17:15 accept4(196, NULL, NULL, SOCK_CLOEXEC|SOCK_NONBLOCK) = 539 <0.000085>
[pid  5571] 16:17:15 setsockopt(539, SOL_TCP, TCP_NODELAY, [1], 4) = 0 <0.000037>
[pid  5571] 16:17:15 getpeername(539, {sa_family=AF_INET, sin_port=htons(64433), sin_addr=inet_addr("52.39.142.198")}, [16]) = 0 <0.000041>
[pid  5571] 16:17:15 getsockname(539, {sa_family=AF_INET, sin_port=htons(443), sin_addr=inet_addr("172.31.46.243")}, [16]) = 0 <0.000040>
[pid  5571] 16:17:15 accept4(196, NULL, NULL, SOCK_CLOEXEC|SOCK_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable) <0.000040>
strace: Process 5659 attached
[pid  5659] 16:17:15 socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 540 <0.000039>
[pid  5659] 16:17:15 connect(540, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("172.31.0.2")}, 16) = 0 <0.000021>
[pid  5659] 16:17:15 sendmmsg(540, {{{msg_name(0)=NULL, msg_iov(1)=[{"\376\216\1\0\0\1\0\0\0\0\0\0\4apps\tidentrust\3com\0"..., 36}], msg_controllen=0, msg_flags=0}, 36}, {{msg_name(0)=NULL, msg_iov(1)=[{"~\25\1\0\0\1\0\0\0\0\0\0\4apps\tidentrust\3com\0"..., 36}], msg_controllen=0, msg_flags=0}, 36}}, 2, MSG_NOSIGNAL) = 2 <0.000034>
[pid  5659] 16:17:15 recvfrom(540, "\376\216\201\200\0\1\0\2\0\0\0\0\4apps\tidentrust\3com\0"..., 2048, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("172.31.0.2")}, [16]) = 83 <0.000047>
[pid  5659] 16:17:15 recvfrom(540, "~\25\201\200\0\1\0\1\0\1\0\0\4apps\tidentrust\3com\0"..., 65536, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("172.31.0.2")}, [16]) = 115 <0.000023>
[pid  5659] 16:17:15 +++ exited with 0 +++
[pid  5632] 16:17:15 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 540 <0.000029>
[pid  5632] 16:17:15 connect(540, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("192.35.177.64")}, 16) = -1 EINPROGRESS (Operation now in progress) <0.000028>
strace: Process 5660 attached
[pid  5571] 16:17:15 accept4(196, NULL, NULL, SOCK_CLOEXEC|SOCK_NONBLOCK) = 541 <0.000063>
[pid  5571] 16:17:15 setsockopt(541, SOL_TCP, TCP_NODELAY, [1], 4) = 0 <0.000037>
[pid  5571] 16:17:15 getpeername(541, {sa_family=AF_INET, sin_port=htons(64435), sin_addr=inet_addr("52.39.142.198")}, [16]) = 0 <0.000031>
[pid  5571] 16:17:15 getsockname(541, {sa_family=AF_INET, sin_port=htons(443), sin_addr=inet_addr("172.31.46.243")}, [16]) = 0 <0.000031>
[pid  5571] 16:17:15 accept4(196, NULL, NULL, SOCK_CLOEXEC|SOCK_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable) <0.000039>
strace: Process 5661 attached
[pid  5661] 16:17:15 socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 542 <0.000057>
[pid  5661] 16:17:15 connect(542, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("172.31.0.2")}, 16) = 0 <0.000053>
[pid  5661] 16:17:15 sendmmsg(542, {{{msg_name(0)=NULL, msg_iov(1)=[{"M\24\1\0\0\1\0\0\0\0\0\0\4apps\tidentrust\3com\0"..., 36}], msg_controllen=0, msg_flags=0}, 36}, {{msg_name(0)=NULL, msg_iov(1)=[{"\300\36\1\0\0\1\0\0\0\0\0\0\4apps\tidentrust\3com\0"..., 36}], msg_controllen=0, msg_flags=0}, 36}}, 2, MSG_NOSIGNAL) = 2 <0.000042>
[pid  5661] 16:17:15 recvfrom(542, "M\24\201\200\0\1\0\2\0\0\0\0\4apps\tidentrust\3com\0"..., 2048, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("172.31.0.2")}, [16]) = 83 <0.000058>
[pid  5661] 16:17:15 recvfrom(542, "\300\36\201\200\0\1\0\1\0\1\0\0\4apps\tidentrust\3com\0"..., 65536, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("172.31.0.2")}, [16]) = 115 <0.000042>
[pid  5661] 16:17:15 +++ exited with 0 +++
[pid  5622] 16:17:15 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 542 <0.000040>
[pid  5622] 16:17:15 connect(542, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("192.35.177.64")}, 16) = -1 EINPROGRESS (Operation now in progress) <0.000047>

Also tried to install the CA certificates (Let's Encrypt Authority X3 and DST Root CA X3) in the OS using both .NET API and update-ca-certificates. Doesn't help.
(As described here: https://github.com/dotnet/corefx/issues/16879)

Opening port 80 (outbound) solves the issue and the handshake completes successfully.
Is there a way to do a TLS handshake on Linux without allowing outbound connections on http port 80?

@ayende
Copy link
Contributor

ayende commented Dec 26, 2017

What is more, we obviously don't know want any incoming TCP connection to result in a remote call.
The same behavior is not present on Windows
This is Let's Encrypt cert on the server

@Clockwork-Muse
Copy link
Contributor

You might try specifying the SSL_CERT_DIR/SSL_CERT_FILE environment variables, although presumably that's going to behave the same.

Are you doing client certificate authentication? Might it be trying to check certificate revocation?

@mnordhoff
Copy link

@Clockwork-Muse Maybe. The IP is used for both the CA Issuers and CRL URIs in the intermediate certificate. So it could be trying to download the root, or check if the intermediate is revoked. Or both. https://crt.sh/?id=15706126

@Clockwork-Muse
Copy link
Contributor

@mnordhoff - ...is the server going to be checking the revocation status of its own certificate?

Further question for OP - does this happen with each connection, or only on the initial one?

@ayende
Copy link
Contributor

ayende commented Dec 26, 2017

@Clockwork-Muse We want to use client cert, yes. But as you can see from openssl s_client -connect <my-public-dns-name>:443 -msg even without the client cert, indeed, before the ServerHello is sent we are already hanging.

It happens on any connection, only for that particular Let's Encrypt cert.
We tested this on a self signed cert, and it didn't occur.

@ayende
Copy link
Contributor

ayende commented Dec 27, 2017

I've used strace -k to try to dig deeper, the call is actually invoked via:

[pid  1246] connect(210, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("192.35.177.64")}, 16) = -1 EINPROGRESS (Operation now in progress) <0.000050>
 > /lib/x86_64-linux-gnu/libpthread-2.23.so(__connect_nocancel+0x24) [0x107cd]
 > /usr/lib/x86_64-linux-gnu/libcurl.so.4.4.0(curl_easy_send+0x4664) [0x32644]
 > /usr/lib/x86_64-linux-gnu/libcurl.so.4.4.0(curl_easy_send+0x515c) [0x3313c]
 > /usr/lib/x86_64-linux-gnu/libcurl.so.4.4.0(curl_formget+0x13b5b) [0x21f9b]
 > /usr/lib/x86_64-linux-gnu/libcurl.so.4.4.0(curl_share_strerror+0x24c) [0x3927c]
 > /usr/lib/x86_64-linux-gnu/libcurl.so.4.4.0(curl_multi_remove_handle+0xb38) [0x36468]
 > /usr/lib/x86_64-linux-gnu/libcurl.so.4.4.0(curl_multi_perform+0x10d) [0x36f0d]
 > /usr/lib/x86_64-linux-gnu/libcurl.so.4.4.0(curl_easy_perform+0x13b) [0x2d9bb]
 > unexpected_backtracing_error [0x7fbb484cac66]

@Clockwork-Muse
Copy link
Contributor

...okay, can we get a minimal repro? In particular, how are you loading the server certificate; if it's via a .pfx, what happens if you add the entire chain to the file? If it's via a store and X509CertificateCollection.Find(...), what happens if you supply validOnly: false?

@ayende
Copy link
Contributor

ayende commented Dec 27, 2017

@Clockwork-Muse Okay, I managed to get a proper reproduction of this, see the code below.

In order to reproduce, you'll need to have a Let's Encrypt certificate.

Run this command on a Ubuntu box as:

dotnet run /path/to/cert

Then use another shell window to run:

openssl s_client -connect 127.0.0.1:5000

using System;
using System.Net;
using System.Net.Security;
using System.Net.Sockets;
using System.Security.Authentication;
using System.Security.Cryptography.X509Certificates;

namespace ConsoleApp1
{
    class Program
    {
        static void Main(string[] args)
        {
            var cert = new X509Certificate2(args[0]);
            var tcpListener = new TcpListener(IPAddress.Loopback, 5000);
            tcpListener.Start();
            Console.WriteLine("Running...");
            while (true)
            {
                using (var client = tcpListener.AcceptTcpClient())
                {
                    try
                    {
                        using (var sslStream = new SslStream(client.GetStream()))
                        {
                            Console.WriteLine("Connected, starting handshake...");
                            sslStream.AuthenticateAsServer(
                                cert,
                                clientCertificateRequired: false,
                                enabledSslProtocols: SslProtocols.Tls12,
                                checkCertificateRevocation: false);

                            Console.WriteLine("Done with handshake");
                        }
                    }
                    catch (Exception e)
                    {
                        Console.WriteLine(e.Message);
                    }
                }
            }
        }
    }
}

The output of running this with strace:

$ :~/tmp$ strace -e trace=network dotnet /home/ubuntu/tmp/bin/Debug/netcoreapp2.0/tmp.dll cluster.server.pfx
socket(PF_INET, SOCK_STREAM|SOCK_CLOEXEC, IPPROTO_TCP) = 27
socket(PF_INET, SOCK_DGRAM|SOCK_CLOEXEC, IPPROTO_IP) = 28
socket(PF_INET6, SOCK_DGRAM|SOCK_CLOEXEC, IPPROTO_IP) = 28
bind(27, {sa_family=AF_INET, sin_port=htons(5000), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
listen(27, 2147483647)                  = 0
Running...
accept4(27, {sa_family=AF_INET, sin_port=htons(33268), sin_addr=inet_addr("127.0.0.1")}, [16], SOCK_CLOEXEC) = 30
Connected, starting handshake...
recvmsg(30, {msg_name(0)=NULL, msg_iov(1)=[{"\26\3\1\1,", 5}], msg_controllen=0, msg_flags=0}, 0) = 5
recvmsg(30, {msg_name(0)=NULL, msg_iov(1)=[{"\1\0\1(\3\3\352\321T!{\t\313\356P\220\302G\250\214\\V~T\341N\3CR^\276<"..., 300}], msg_controllen=0, msg_flags=0}, 0) = 300
socket(PF_INET6, SOCK_DGRAM, IPPROTO_IP) = 42
socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 42
connect(42, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("192.35.177.64")}, 16) = -1 EINPROGRESS (Operation now in progress)

Note that to reproduce the hang you'll need to mark outgoing port 80 as drop.
(iptables -A OUTPUT -p tcp -d 192.35.177.64 --dport 80 -j DROP might do it, I just blank block all outgoing port 80)

[EDIT] Add C# syntax highlighting by @karelz

@ayende
Copy link
Contributor

ayende commented Dec 27, 2017

By the way, the output of the console app is connected (immediately) and then waiting for about two minutes for the handshake to complete.

@karelz
Copy link
Member

karelz commented Dec 28, 2017

@Priya91 can you please take a look? Do we have enough information to analyze / reproduce locally?
Is it something we can influence, or is it libcurl behavior?

@ayende
Copy link
Contributor

ayende commented Dec 28, 2017

Was able to dig a bit deeper, here is the stack trace that it is holding:

 	System.Security.Cryptography.X509Certificates.dll!Internal.Cryptography.Pal.CertificateAssetDownloader.DownloadAsset(string uri, ref System.TimeSpan remainingDownloadTime)	C#	No symbols loaded.
 	System.Security.Cryptography.X509Certificates.dll!Internal.Cryptography.Pal.CertificateAssetDownloader.DownloadCertificate(string uri, ref System.TimeSpan remainingDownloadTime)	C#	No symbols loaded.
 	System.Security.Cryptography.X509Certificates.dll!Internal.Cryptography.Pal.OpenSslX509ChainProcessor.DownloadCertificate(byte[] authorityInformationAccess, ref System.TimeSpan remainingDownloadTime)	C#	No symbols loaded.
 	System.Security.Cryptography.X509Certificates.dll!Internal.Cryptography.Pal.OpenSslX509ChainProcessor.FindIssuer(System.Security.Cryptography.X509Certificates.X509Certificate2 cert, System.Security.Cryptography.X509Certificates.X509Certificate2Collection[] stores, System.Collections.Generic.HashSet<System.Security.Cryptography.X509Certificates.X509Certificate2> downloadedCerts, ref System.TimeSpan remainingDownloadTime)	C#	No symbols loaded.
 	System.Security.Cryptography.X509Certificates.dll!Internal.Cryptography.Pal.OpenSslX509ChainProcessor.FindCandidates(System.Security.Cryptography.X509Certificates.X509Certificate2 leaf, System.Security.Cryptography.X509Certificates.X509Certificate2Collection extraStore, System.Collections.Generic.HashSet<System.Security.Cryptography.X509Certificates.X509Certificate2> downloaded, System.Collections.Generic.HashSet<System.Security.Cryptography.X509Certificates.X509Certificate2> systemTrusted, ref System.TimeSpan remainingDownloadTime)	C#	No symbols loaded.
 	System.Security.Cryptography.X509Certificates.dll!Internal.Cryptography.Pal.ChainPal.BuildChain(bool useMachineContext, Internal.Cryptography.ICertificatePal cert, System.Security.Cryptography.X509Certificates.X509Certificate2Collection extraStore, System.Security.Cryptography.OidCollection applicationPolicy, System.Security.Cryptography.OidCollection certificatePolicy, System.Security.Cryptography.X509Certificates.X509RevocationMode revocationMode, System.Security.Cryptography.X509Certificates.X509RevocationFlag revocationFlag, System.DateTime verificationTime, System.TimeSpan timeout)	C#	No symbols loaded.
 	System.Security.Cryptography.X509Certificates.dll!System.Security.Cryptography.X509Certificates.X509Chain.Build(System.Security.Cryptography.X509Certificates.X509Certificate2 certificate, bool throwOnException)	C#	No symbols loaded.
 	System.Security.Cryptography.X509Certificates.dll!System.Security.Cryptography.X509Certificates.X509Chain.Build(System.Security.Cryptography.X509Certificates.X509Certificate2 certificate)	C#	No symbols loaded.
 	System.Net.Security.dll!System.Net.Http.TLSCertificateExtensions.BuildNewChain(System.Security.Cryptography.X509Certificates.X509Certificate2 certificate, bool includeClientApplicationPolicy)	C#	No symbols loaded.
 	System.Net.Security.dll!Interop.OpenSsl.AllocateSslContext(System.Security.Authentication.SslProtocols protocols, Microsoft.Win32.SafeHandles.SafeX509Handle certHandle, System.Security.Cryptography.SafeEvpPKeyHandle certKeyHandle, System.Net.Security.EncryptionPolicy policy, bool isServer, bool remoteCertRequired)	C#	No symbols loaded.
 	System.Net.Security.dll!System.Net.Security.SafeDeleteSslContext.SafeDeleteSslContext(System.Net.Security.SafeFreeSslCredentials credential, bool isServer, bool remoteCertRequired)	C#	No symbols loaded.
 	System.Net.Security.dll!System.Net.Security.SslStreamPal.HandshakeInternal(System.Net.Security.SafeFreeCredentials credential, ref System.Net.Security.SafeDeleteContext context, System.Net.Security.SecurityBuffer inputBuffer, System.Net.Security.SecurityBuffer outputBuffer, bool isServer, bool remoteCertRequired)	C#	No symbols loaded.
 	System.Net.Security.dll!System.Net.Security.SecureChannel.GenerateToken(byte[] input, int offset, int count, ref byte[] output)	C#	No symbols loaded.
 	System.Net.Security.dll!System.Net.Security.SecureChannel.NextMessage(byte[] incoming, int offset, int count)	C#	No symbols loaded.
 	System.Net.Security.dll!System.Net.Security.SslState.StartSendBlob(byte[] incoming, int count, System.Net.AsyncProtocolRequest asyncRequest)	C#	No symbols loaded.
 	System.Net.Security.dll!System.Net.Security.SslState.ProcessReceivedBlob(byte[] buffer, int count, System.Net.AsyncProtocolRequest asyncRequest)	C#	No symbols loaded.
 	System.Net.Security.dll!System.Net.Security.SslState.StartReadFrame(byte[] buffer, int readBytes, System.Net.AsyncProtocolRequest asyncRequest)	C#	No symbols loaded.
 	System.Net.Security.dll!System.Net.Security.SslState.StartReceiveBlob(byte[] buffer, System.Net.AsyncProtocolRequest asyncRequest)	C#	No symbols loaded.
 	System.Net.Security.dll!System.Net.Security.SslState.ForceAuthentication(bool receiveFirst, byte[] buffer, System.Net.AsyncProtocolRequest asyncRequest)	C#	No symbols loaded.
 	System.Net.Security.dll!System.Net.Security.SslState.ProcessAuthentication(System.Net.LazyAsyncResult lazyResult)	C#	No symbols loaded.
 	System.Net.Security.dll!System.Net.Security.SslStream.AuthenticateAsServer(System.Security.Cryptography.X509Certificates.X509Certificate serverCertificate, bool clientCertificateRequired, System.Security.Authentication.SslProtocols enabledSslProtocols, bool checkCertificateRevocation)	C#	No symbols loaded.
>	tmp.dll!ConsoleApp1.Program.Main(string[] args) Line 27	C#	Symbols loaded.

@Clockwork-Muse
Copy link
Contributor

So (without digging into the actual code) it looks like it's building the server certificate chain for whatever reason. Shouldn't a normal certificate have whatever is necessary to say "I'm me, these guys say so"?

Hm, try setting those environment variables, maybe openssl isn't looking at wherever you put them into the trust store.
And possibly try loading the certificate (server and the chain) into a X509Certificate2Collection via Import, then pull from there.

....as a side question, does this happen if you just open a stream without it being a webserver? Like wrapping SslStream around a MemoryStream and putting things on the pipe?

@ayende
Copy link
Contributor

ayende commented Dec 28, 2017

Even more minimal reproduction, use with the attached certificate.

            var cert = new X509Certificate2("test.pfx");
            var chain = new X509Chain();
            chain.ChainPolicy.VerificationFlags = X509VerificationFlags.AllFlags;
            chain.ChainPolicy.RevocationFlag = X509RevocationFlag.ExcludeRoot;
            chain.ChainPolicy.RevocationMode = X509RevocationMode.NoCheck;
            Console.WriteLine(chain.Build(cert));

The root cause here is the following:

   X509v3 extensions:
            X509v3 Authority Key Identifier: 
                keyid:2C:42:A9:B8:45:2C:11:BB:57:25:40:FC:9D:8E:46:18:6D:FD:54:27

            Authority Information Access: 
                CA Issuers - URI:http://example.com/go-away

cert.zip

In particular, this seems to be related to this code:
https://github.com/dotnet/corefx/blob/0101324f1ac8555c910bbbfb46c511bb86f0fbe4/src/System.Security.Cryptography.X509Certificates/src/Internal/Cryptography/Pal.Unix/OpenSslX509ChainProcessor.cs#L536-L561

In other words, if there is an authority information access with a URL, the act of building a X509Chain would force it to be hit, and it can take multiple minutes to resolve if this is not accessible.

The code seems to hint that there is some timeout there, controlled by UrlRetrievalTimeout, but I can't see where this is actually being set, in particular, this should be set to zero:

https://github.com/dotnet/corefx/blob/0101324f1ac8555c910bbbfb46c511bb86f0fbe4/src/System.Security.Cryptography.X509Certificates/src/System/Security/Cryptography/X509Certificates/X509Chain.cs#L127

But even calling chain.ChainPolicy.UrlRetrievalTimeout = TimeSpan.Zero; is bad for us, because:

https://github.com/dotnet/corefx/blob/0101324f1ac8555c910bbbfb46c511bb86f0fbe4/src/System.Security.Cryptography.X509Certificates/src/Internal/Cryptography/Pal.Unix/ChainPal.cs#L37

I was able to get it working fast by setting the timeout to a negative value, which is good.
However, there is no way to inject this behavior here:

https://github.com/dotnet/corefx/blob/0101324f1ac8555c910bbbfb46c511bb86f0fbe4/src/Common/src/System/Net/Http/TlsCertificateExtensions.cs#L78

Where this is actually happening whenever you use an SslStream that will get called and effectively hang us.

Note that even with port 80 being open, this includes the cost of a remote call on every SSL connection being made when using a server.

[EDIT] Add C# sytnax highlighting by @karelz

@ayende
Copy link
Contributor

ayende commented Dec 28, 2017

@Clockwork-Muse This looks like it impacts only AuthenticateAsServer

@ayende
Copy link
Contributor

ayende commented Dec 28, 2017

Regarding the severity of this issue.
Assuming I'm correct with my thinking, anything that uses Authority Information Access (which at least Let's Encrypt but I saw that in many other certs as well) is going to be making a remote call to the cert issuer (or multiple of them, actually) with no timeout.
This is if there is nothing blocking the call, of course.

What is worse, it is downloading the certificate that it already have in the same cert.

It also means that any SSL based server on CoreCLR is effectively hostage to the infrastructure of the issuers (at all levels) for their certs.

@ayende
Copy link
Contributor

ayende commented Dec 28, 2017

It looks like a workaround might be to put the issuer of the certificate in the stores, but I can't get it to work.
https://github.com/dotnet/corefx/blob/0101324f1ac8555c910bbbfb46c511bb86f0fbe4/src/System.Security.Cryptography.X509Certificates/src/Internal/Cryptography/Pal.Unix/OpenSslX509ChainProcessor.cs#L530

This code also hangs:

static void Main(string[] args)
{
    using (var userIntermediateStore = new X509Store(StoreName.CertificateAuthority, StoreLocation.CurrentUser))
    {
        userIntermediateStore.Open(OpenFlags.ReadWrite);

        var col = new X509Certificate2Collection();
        col.Import(args[0]);

        foreach (var item in col)
        {
            userIntermediateStore.Add(item);
        }
    }

    var cert = new X509Certificate2(args[0]);

    Console.WriteLine("Started");

    var chain = new X509Chain();
    chain.ChainPolicy.VerificationFlags = X509VerificationFlags.AllFlags;
    chain.ChainPolicy.RevocationFlag = X509RevocationFlag.ExcludeRoot;
    chain.ChainPolicy.RevocationMode = X509RevocationMode.NoCheck;
    Console.WriteLine(chain.Build(cert));

}

[EDIT] Add C# syntax highlighting by @karelz

@ayende
Copy link
Contributor

ayende commented Dec 29, 2017

Another impact here, this will do the call in a synchronous fashion, even if you called AuthenticateAsServerAsync

@Priya91
Copy link
Contributor

Priya91 commented Dec 29, 2017

Based on @ayende's condensed repro, looks like the hang is happening in building the cert chain. Routing to System.Security.

@Priya91 Priya91 changed the title TLS handshake uses http port 80 Hang in x509certificate build chain when using a Let's Encrypt cert. Dec 29, 2017
@karelz
Copy link
Member

karelz commented Dec 29, 2017

@bartonjs can you please comment?

@bartonjs
Copy link
Member

Just like Windows, if the next step in the chain cannot be completed it will try to download the next element from the AIA record before failing.

The solution would be to download the chain via some other process and put the intermediates into the CurrentUser\CA (intermediate) store.

If the chain successfully builds and the root was trusted then any intermediates that were downloaded along the way get cached into the CurrentUser\CA store so that future chain builds don't touch the network.

The thing to note here is it's not downloading the presented certificate, but looking for the issuing CA certificate.

@mnordhoff
Copy link

It seems to be downloading the root, not the intermediate.

For example:

The IP address in the original post matches http://apps.identrust.com/, which is where the intermediate says you can download the root. Not where the end-entity certificates say you can download the intermediate.

(IdenTrust also uses the same IP for their CRL, so it could perhaps be downloading that, too.)

@ayende
Copy link
Contributor

ayende commented Dec 30, 2017

@bartonjs That doesn't seem right, see the code sample here:
https://github.com/dotnet/corefx/issues/26061#issuecomment-354364996

I'm explicitly registering anything in there in the current user store, but that doesn't help.
We have also tried to manually registered anything, without any help.

@Clockwork-Muse
Copy link
Contributor

@ayende -
You're wrong, the code sample works correctly (doesn't hang/dial out) - if you gave it a pfx with a chain. The one you included has only the leaf certificate, so of course it goes looking for the chain. Your repro is "broken" because it's not giving the system enough information.

I don't know what's going wrong with the manual registration, but it's possible it's the wrong "current user"? On windows the system would also check LocalMachine, not so sure what happens on Linux.

Oh, you can also set the SSL_CERT_FILE/SSL_CERT_DIR environment variables, and it works too.

@ayende
Copy link
Contributor

ayende commented Dec 31, 2017

@Clockwork-Muse Okay, when registering the full chain with this code, first running it with port 80 open and then port 80 closed, the problem is resolved.

static void Main(string[] args)
{
    var cert = new X509Certificate2(args[0]);

    Console.WriteLine("Started");

    var chain = new X509Chain
    {
        ChainPolicy =
        {
            VerificationFlags = X509VerificationFlags.AllFlags,
            RevocationFlag = X509RevocationFlag.ExcludeRoot,
            RevocationMode = X509RevocationMode.NoCheck
        }
    };
    Console.WriteLine(chain.Build(cert));

    using (var userIntermediateStore = new X509Store(StoreName.CertificateAuthority, StoreLocation.CurrentUser))
    {
        userIntermediateStore.Open(OpenFlags.ReadWrite);

        foreach (var element in chain.ChainElements)
        {
            if(element.Certificate.Thumbprint == cert.Thumbprint)
                continue;
            var found = userIntermediateStore.Certificates
                .Find(X509FindType.FindBySerialNumber, element.Certificate.SerialNumber, false);
            if(found.Count != 0)
                continue;

            userIntermediateStore.Add(element.Certificate);
        }
    }
}

However, given that this has such a sever impact, and that this is something that is not really needed (everything still works, just very slowly), is there a way to turn this off?
I'm concerned that the certificate may change in the future, leading to the same trouble.

@Clockwork-Muse
Copy link
Contributor

Except you shouldn't need to do that at all.

If you add the chain to the pfx file (so it contains the server cert and the entire chain) and load via X509Certificate2Collection.Import and grab the cert from there, it doesn't dial out.
If you generate a chain cert file and set the OpenSSL environment variables I mentioned, it doesn't dial out.

You don't have to add the certificates to the store at all. Even if you did, that should be a single-time thing, at application startup, and you don't need to open port 80 while doing so.

@ayende
Copy link
Contributor

ayende commented Dec 31, 2017

@Clockwork-Muse The certificate is actually generated by Let's Encrypt, so I don't control that.
I guess that I can do this at certificate generation time, right?

 var c = new X509Certificate2Collection();
 c.Import(@"server.pfx", (string)null, X509KeyStorageFlags.Exportable);
var chain = new X509Chain();
chain.Build(c[0]);

foreach (var item in chain.ChainElements)
{
                if (item.Certificate.Thumbprint == c[0].Thumbprint)
                    continue;

                c.Add(item.Certificate);
}

 File.WriteAllBytes("server2.pfx", c.Export(X509ContentType.Pfx, (string)null));

@Clockwork-Muse
Copy link
Contributor

....that might work, although the more usual thing is probably to use OpenSSL to include the chain. For that matter, you could probably have Let's Encrypt include the chain when it signs your cert.
Which brings up another question - I'm assuming that you are generating the private key (which you should be), which means you are generating the .pfx, which means it was you who wasn't including the entire chain.
If you don't want to modify the file, though, set the environment variable with the location of the chain file, and you're good.

@ayende
Copy link
Contributor

ayende commented Jan 1, 2018

I actually want to have the full chain there, yes.
And yes, I'm generating the whole thing.

I'm generating the CSR for Let's Encrypt using this code:

https://github.com/ravendb/ravendb/blob/v4.0/src/Raven.Server/Commercial/SetupManager.cs#L383-L392

Using:
https://github.com/fszlin/certes/blob/master/src/Certes/Pkcs/CertificationRequestBuilderBase.cs#L146

I tried searching for this, but I don't see how I can get it to generate the full thing.

For that matter, looking at how this is done, it looks right:

https://github.com/fszlin/certes/blob/master/src/Certes/Acme/AcmeCertificate.cs#L46-L54

Looking into this further, but I would appreciate any pointers.

@Clockwork-Muse
Copy link
Contributor

...hmm, from a cursory look it seems to be including the chain by default.

Oh, it turns out I was slightly wrong - the certs do have to make it into the store to be used to build the chain, not just imported into a collection that you pull from (I must have been leaving them in the store on accident between tests). Still, the initial code you posted should have worked just fine (I cleaned it up a bit during testing):

static void Main(string[] args)
{
    var col = new X509Certificate2Collection();
    col.Import(args[0]);
    foreach (var item in col)
    {
        Console.WriteLine($"{item.Subject} - {item.Issuer}");
    }

    using (var userIntermediateStore = new X509Store(StoreName.CertificateAuthority, StoreLocation.CurrentUser, OpenFlags.ReadWrite))
    {        
        if (userIntermediateStore.Certificates.Count == 0)
        {
            Console.WriteLine("No certificates in store");
        }
        userIntermediateStore.AddRange(col);
    }

    // Since I'm not sure what order the certs are in, 
    // although new X509Certificate2(someFileWithPrivateKey) seems to grab the one with a private key.
    var cert = col.Find(X509FindType.FindBySubjectName, "--your subject name here--", false)[0];

    Console.WriteLine("Started");

    var chain = new X509Chain();
    chain.ChainPolicy.VerificationFlags = X509VerificationFlags.AllFlags;
    chain.ChainPolicy.RevocationFlag = X509RevocationFlag.ExcludeRoot;
    chain.ChainPolicy.RevocationMode = X509RevocationMode.NoCheck;
    Console.WriteLine(chain.Build(cert));

    // Test cleanup
    using (var userIntermediateStore = new X509Store(StoreName.CertificateAuthority, StoreLocation.CurrentUser, OpenFlags.ReadWrite))
    {
        foreach (var item in userIntermediateStore.Certificates)
        {
            userIntermediateStore.Remove(item);
        }
    }
}

... try running that over the real certificate (not the test one you posted, that only has one), and look at how many certificates it lists: there should be at least 2, probably 3, and maybe 4+, if you have a full chain in the .pfx

@ayende
Copy link
Contributor

ayende commented Jan 1, 2018

@Clockwork-Muse Okay, I might be missing something, but I just created this certificate:
server3.zip

You can see that it does include the root certificate and the entire chain.

image

This is still going to make a remote call when we try to build the certificate chain.

I'm now pretty sure that the workaround of manually inserting the entire chain to the StoreName.CertificateAuthority will prevent it, but I would really like to avoid that.

Adding & cleaning this isn't going to work (concurrency, to start) and while we expect this to run mostly on servers, I don't want to be one of those "you installed a CA on my machine" kind of emails to hit my mailbox (even if in this case, the cert is well known and should have been trusted.

As I can see, all the information to figure out the certificate is in the pfx, so it shouldn't be doing this, right?

@Clockwork-Muse
Copy link
Contributor

...you don't have to remove the certs. In fact, I'd expect you to not do so (except during an uninstall), I was only doing it because test code should clean up after itself.

I don't want to be one of those "you installed a CA on my machine" kind of emails to hit my mailbox (even if in this case, the cert is well known and should have been trusted.

That would be extremely strange. The only people who would be complaining would be you, currently, right? If you're distributing this as server software that you're expecting customers to install/deploy, they should be providing their own certificate. If you're currently running a public webserver, that's an assumption you're making - that your clients trust the CA that issued your server certificate.
And currently the certificates are being added to the CurrentUser store. Meaning:

  1. If the application is running under it's own user (like it should be), you're not going to interfere with another process anyways. This assumes you're running on some shared server with multiple hosted applications, whereas the modern world is heading towards Docker/containers...
  2. Adding a new CA to the server is pointless. If people are worried about MITM, they have to prevent new CAs being added to the client. If your server has an identity from a CA that isn't trusted by the client by default, it doesn't matter whether or not the server has the CA root certificate.
  3. If people are worried about your code doing nefarious things (like downloading more instructions later), it's too late: they've already run potentially compromised code. And you don't need a CA issued cert in that case, either: you'd go with some self-signed certificate (because otherwise the CA could revoke the certificate).

.... so I don't really understand what it is you're worried about on that front.

if you don't want to add the certificates to the store, and you're on Linux, set the environment variables in your launch script.

As I can see, all the information to figure out the certificate is in the pfx, so it shouldn't be doing this, right?

Maybe? I don't know, perhaps @bartonjs could chime in about this (whether a pfx could satisfy its own chain, although the fact you can have multiple unrelated certs/chains would probably throw a wrench in that), but to my mind, this would really be expected behavior. It's like you were issued a drivers license or other id - the server is going: "okay, and I trusted that guy over there to issue this to me? Yes, I know what the chain was, I'm asking if I trust them.". That is, there's an inherent trust relationship expected if somebody is saying "you're you"; they have to be somebody you trust to make that assertion (so are expected to be in the store). Personally, though, I'd just add the certificates to the store and be done with it.

@bartonjs
Copy link
Member

bartonjs commented Jan 2, 2018

As I can see, all the information to figure out the certificate is in the pfx, so it shouldn't be doing this, right?

Once the PFX has been read nothing remembers that it any particular cert therein had affinity to other files in that same container. Since the PFX itself is defined in terms of authsafes I'll make a safe-based analogy:

The end-entity certificate is like a laser-inscribed diamond. You can look at the diamond and see that it has an issuance number. So now the jeweller wants to see that it matches the description. And you say "well, I have the certificate of issuance". "Great, where is it?" "Well, it was stored in the safe on the next shelf over from this diamond." The last sentence doesn't really help the jeweller, and is a bit out of place :). Weak analogy, yes, but that's the best I could do off the top of my head.

Though I'm having a bit of clear-headedness reassert itself now that vacation is (almost) over:

I ran strace on the server app - during the handshake it tries to connect to 192.35.177.64 on port 80!
This IP address belongs to a certificate authority (IdenTrust).
The operation gets stuck (EINPROGRESS) because port 80 is not allowed for outbound connections on my server.

In addition to certificates for a chain resolution, any CRLs or OCSP responses are usually done over HTTP (80), NOT HTTPS (443). That's because you could end up in a cycle of initiating TLS to verify TLS (including "I need to download this file from you before I know I can trust you to download the file" when the CRL/OCSP endpoint is protected by the same CA). Since the certs, CRLs, and OCSP responses are not sensitive (no encryption required) and provide their own integrity (via their signatures) HTTP is not a security problem. Blocking port 80 outbound, though, is a security problem since you're effectively saying you'll never check a CRL (unless you have an alternate means of loading CRLs).

@mnordhoff
Copy link

Is it really necessary to download the CRL to check if the intermediate certificate has been revoked? If an intermediate with tens of millions of previously valid certificates is being suddenly revoked, it would be headline news.

@ayende
Copy link
Contributor

ayende commented Jan 2, 2018

A common scenario for us is that the user generates the certificate via Let's Encrypt, so we handle the whole thing. In fact, that is the most likely scenario for a developing setting up a local instance is just that, and then we may need to walk the chain and add it.

@bartonjs Make perfect sense with regards to not getting into recursive HTTPS. And thanks for the note on the blocking of port 80 with CRL, that is very useful.

What I'm concerned about is that there is no caching and this is exposed to users.

@ayende
Copy link
Contributor

ayende commented Jan 2, 2018

I've tracked this further and I think that this is quite serious. The same code path is invoked when using SslStream as a server and as a client.
That means that if you have a server that is using client certificate authentication, a malicious user can cause the server to make a request on any SSL handshake.
That request is done in an synchronous manner and hold up the thread, the user can setup the client certificate that they send so the chain will lookup any arbitrary URL they chose.

Because they control the URL, they can chose to make the server in question respond slowly enough to hold up the thread for a significant amount of time. Rinse and repeat a few time and you have all the server threads busy waiting for the CA Issuers URL to come back.

This looks like a pretty simple remote denial of service attack on any SSL endpoint that uses client certificates. I tested this (on Windows) and it exhibit the same behavior on both CoreCLR and full .NET.

Attached is a simple program that reproduce this behavior, showing how AuthenticateAsServer will make a remote call (controlled by the client) as part of the handshake.

Repro.zip

What worries me is that this is done before any authentication_ is made, so it doesn't even give the server the chance to decide whatever they really want to do it or not.

@ayende
Copy link
Contributor

ayende commented Jan 2, 2018

A mitigation here is that the X509Chain already has a timeout property that will avoid this, but this is not used in this particular code path and there is no way to set it from the outside.

@Clockwork-Muse
Copy link
Contributor

That means that if you have a server that is using client certificate authentication, a malicious user can cause the server to make a request on any SSL handshake.

false
With client certificates, you can completely disable revocation checking, or make it do offline checking. You could also instead implement custom chain validation that does things like make sure the chain is trusted before trying the CRL (the current method is going to get all the information, I would assume). Since you wouldn't accept an unknown CA for client certificate validation any more than a client would accept an unknown CA for server validation.

Because they control the URL, they can chose to make the server in question respond slowly enough to hold up the thread for a significant amount of time. Rinse and repeat a few time and you have all the server threads busy waiting for the CA Issuers URL to come back.

.... why bother having something else be slow? If you want to delay the server you're talking to, just make your conversation slow, don't refer them to somebody else that's slow. (Referring them to somebody else to tie up that person can be done in simpler and easier ways, too, so that's mostly out) And having the method call be async wouldn't likely help if your connection was responding slowly either - in fact it's possible it would make the problem worse.

@ayende
Copy link
Contributor

ayende commented Jan 2, 2018

@Clockwork-Muse Did I miss something with disabling revocation checking?
Here is the code from the repro I sent.

ssl.AuthenticateAsServer(
    serverCertificate: cert, 
    clientCertificateRequired: true, 
    enabledSslProtocols: System.Security.Authentication.SslProtocols.Tls12, 
    checkCertificateRevocation: false);

At least on the face of it, it looks like it is trying to disable certificate revocation, but it is still doing that.
Is there something else I also need to do to avoid this?

As for why this worth bothering with? The answer is timeout, or to be rather more exact, lack thereof.
A network stream that is too slow gives me the option to abort, I have things like ReadTimeout and WriteTimeout that I can set, and I can be sure to always use async operations that won't block my thread.

This, however, will ignore any timeout specified and will block a thread. That means that even if my code is careful about such things, it is easy to cause fatal starvation.
Note that in my testing I was able to get it to take 30 seconds (on Windows, on Linux it takes about 2 minutes) and 2 retries before it gave up.

The code in question was:

using (var tcp = tcpListener.AcceptTcpClient())
using (var stream = tcp.GetStream())
{
    stream.ReadTimeout = 100;
    stream.WriteTimeout = 100;
    using (var ssl = new SslStream(stream, false, (sender, remoteCert, chain, errors) =>
    {
        return true;
    }))
    {
        Console.WriteLine("Server connected");
        var sp = Stopwatch.StartNew();

        ssl.WriteTimeout = 100; 
        ssl.ReadTimeout = 100;

        await ssl.AuthenticateAsServerAsync(
            serverCertificate: cert,
            clientCertificateRequired: true,
            enabledSslProtocols: System.Security.Authentication.SslProtocols.Tls12,
            checkCertificateRevocation: false);
        Console.WriteLine("Server authenticated in " + sp.ElapsedMilliseconds);
        ssl.WriteByte(1);

    }
}

Note that I don't expect this code to take so long, in fact, I'm trying hard to protect against it.

Looking at the code, on Windows it looks like the main issue is here:
https://github.com/dotnet/corefx/blob/4b1cf8d60e2ea1d1feac972dfb8e6884fcb45f2f/src/System.Net.Security/src/System/Net/Security/SecureChannel.cs#L1003-L1015

Note that this builds the chain (in VerifyCertificateProperties, where it hangs).
I also tested this:

var hangingTcpListener = new TcpListener(IPAddress.Loopback, 9999);
hangingTcpListener.Start();
WaitAndHangUp(hangingTcpListener);

var chain2 = new X509Chain
{
    ChainPolicy =
    {
        RevocationMode= X509RevocationMode.NoCheck
    }
};
chain2.Build(new X509Certificate2(@"client.pfx"));

So this is manually and explicitly asking to not do any checks, but that still hangs.

Am I missing something?

@Clockwork-Muse
Copy link
Contributor

I think you might be: I can't repro. Note that we're not necessarily dealing with just revocation, but verifying the entire chain. Personally, I think what's happening is that your individual hosts are still validating their own certs, not the incoming ones.

Try these projects on for size (you'll have to update the paths for the included certs and location you put the project, but I don't know where you've got things. Don't forget the environment variables for the trust file in launch.json):
server.zip
client.zip

Of particular note is what happens when the client authenticates with only its immediate cert (ie - only sends one certificate and a reference to its issuer) - no call out is made. Or at least I'm not observing anything with wireshark when I do so.

@ayende
Copy link
Contributor

ayende commented Jan 3, 2018

The sample code I sent was actually on Windows, where the problem also exists and reproduces.
I agree that the issue is not just with revocation.
I'm testing now on the Linux box.

@ayende
Copy link
Contributor

ayende commented Jan 3, 2018

Also, I'm not sure if this is related, but in the certs, you have:

     Alternative Name:
          URL=www.poison.com

While the Let's Encrypt cert has:

     Alternative Name:
          URL=http://cert.stg-root-x1.letsencrypt.org/

I'm guessing that there might be an issue here with invalid URL that might contribute?

@ayende
Copy link
Contributor

ayende commented Jan 3, 2018

On Linux, with outgoing 80 port blocked (and without SSL_CERT_FILE), hang.
With port 80 unblocked, no hang.

In both cases, the client is using cert: unknown-only.pfx

With the SSL_CERT_FILE set to the ca-chain.cert file, there is no hang.

However, what is the equivalent of doing this on Windows?

Also, this means that I've to set it up on my Linux box (easily done, of course).
I don't really care what I put in there, because internally we do explicit checks by certificate thumbprint.
Is there any recommendation regarding this?

@ayende
Copy link
Contributor

ayende commented Jan 3, 2018

Okay, testing further on Linux shows that this isn't actually a workaround.
Only for that particular certificate.

When I tested the certificate with:

Authority Information Access:
                CA Issuers - URI:http://example.com/go-away

Using the SSL_CERT_FILE, it is still hanging.

@Clockwork-Muse
Copy link
Contributor

Clockwork-Muse commented Jan 3, 2018

On Linux, with outgoing 80 port blocked (and without SSL_CERT_FILE), hang.
With port 80 unblocked, no hang.

In both cases, the client is using cert: unknown-only.pfx

With the SSL_CERT_FILE set to the ca-chain.cert file, there is no hang.

However, what is the equivalent of doing this on Windows?

Yeah, but it's not hanging due to the client, it's hanging due to the server checking its own chain (although I'll verify that by getting a better URI/reverify that that URL should be used anyways). That's because setting SSL_CERT_FILE is equivalent to putting the certificates into the Windows trust store (although for that process only - at least I'm assuming it's single process only). If you don't set the environment variable, or add to the store manually, the certificate tries to get the complete chain so it can tell clients where it came from.

Okay, testing further on Linux shows that this isn't actually a workaround.
Only for that particular certificate.

...only for that particular server certificate? Are you using the server certificate I included, or one of your own (my server cert also has that AIA entry)? If it's one of yours, did you update the chain in the SSL_CERT_FILE to include your root? It shouldn't be leaving the box at all.

@ayende
Copy link
Contributor

ayende commented Jan 3, 2018

it's not hanging due to the client, it's hanging due to the server

No, that isn't the case. The server is using your certificate, not mine.
And against the same server, using your unknown-only.pfx completes quickly, so that isn't the server validating its own cert. It is the server validating the client cert.

only for that particular server certificate?

Only for that client cert, because with the same server cert and config, using another cert from the client triggered the same behavior.

There are two separate things going on there.

  1. The original issue with the server cert causing remote calls. This is resolved as far as I'm concerned, because I have the workaround of registering the issuer's certs in the user's intermediate store and I can arrange things that a server cert will always have the full chain.
    I still don't like it and would much rather have a way to limit the time it spends doing that, but the workaround is enough.

  2. The real issue from my perspective is that I believe that I discovered that I the client can cause the server to make request that has no timeout and will block the thread.

Here is my setup, one machine is running Linux, which is running your server and using 26061.corefx.pfx as the server side cert.
The client is running on a Windows machine without any network limitations in all cases.

Client using uknown.pfx file:

image

This is great, since it means that we aren't waiting. And here is us using uknown-only.pfx:

image

I then tested the client using 26061.corefx.pfx:

image

You can see here that this caused an issue, probably because the 26061 chain isn't available in the ca file.
I then tested the cert.pfx from earlier in this thread, giving us:

image

At this point, I'm pretty sure that the problem is only on the server side.
I have done all these tests from separate machines, the server always running on Linux and with varying network conditions.
The client always on Windows and with full network access.

It is important to note that I'm quite certain that the same thing reproduces on Windows as well, with the same behavior as outline before on this thread.

Specifying SSL_CERT_FILE seems to help, because if I'm running the cert.pfx test without specifying this
I'm getting results of around 2 minutes.
However, that is still not a good place to be at.

Because I can't control what certificate the user will be sending the server, that means that the client can choose to block a thread on my server very easily without anything to stop it.
This is regardless of any timeouts that you have specified and regardless of the "check certificate revocation" and whatever I'm using async or sync calls.
This means that a client can spin off a few requests and effectively hang my application because all the request processing threads are hanging.

@Clockwork-Muse
Copy link
Contributor

There are two separate things going on there.

  1. The original issue with the server cert causing remote calls. This is resolved as far as I'm concerned, because I have the workaround of registering the issuer's certs in the user's intermediate store and I can arrange things that a server cert will always have the full chain.
    I still don't like it and would much rather have a way to limit the time it spends doing that, but the workaround is enough.

... the proper way to handle this is to register the server cert chain at install/deploy/config time, not during application startup. That essentially makes the amount of time it spends 0. On vanilla Windows this is putting it straight into the regular store. On Linux and Docker, almost certainly setting the process environment variables.

  1. The real issue from my perspective is that I believe that I discovered that I the client can cause the server to make request that has no timeout and will block the thread.

Finally, can confirm that the client can cause the server to head off-box trying to resolve the trust chain for the client certificate; you were right (whether my prior mistake was an invalid URL or not putting it in enough places I'm not sure). I'm not 100 percent convinced that it's completely as serious, haven't had a chance to do a full dive (and probably the others want to take a look at it too).

I then tested the client using 26061.corefx.pfx:

That's a strange result; that cert has the entire chain, and that entire chain is in the environment variable trust store. I can't get it to go off box regardless, but when I use the same certificate for the server and the client the server is reporting that the client certificate is null during the validation callback. That's only on the conversation between my two Linux VMs; I can't get it to repro on windows or on WSL (which should be just another Linux box). Can somebody else take a look at it?

@wfurt
Copy link
Member

wfurt commented Oct 11, 2019

is there any work left on this @bartonjs? This was opened as question and it seems like the problem is understood and 3.0 changes should make this significantly better.

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 5.0 milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 19, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Security os-linux Linux OS (any supported distro) question Answer questions and provide assistance, not an issue with source code or documentation.
Projects
None yet
Development

No branches or pull requests

9 participants