-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hang in x509certificate build chain when using a Let's Encrypt
cert.
#24527
Comments
What is more, we obviously don't know want any incoming TCP connection to result in a remote call. |
You might try specifying the Are you doing client certificate authentication? Might it be trying to check certificate revocation? |
@Clockwork-Muse Maybe. The IP is used for both the CA Issuers and CRL URIs in the intermediate certificate. So it could be trying to download the root, or check if the intermediate is revoked. Or both. https://crt.sh/?id=15706126 |
@mnordhoff - ...is the server going to be checking the revocation status of its own certificate? Further question for OP - does this happen with each connection, or only on the initial one? |
@Clockwork-Muse We want to use client cert, yes. But as you can see from It happens on any connection, only for that particular Let's Encrypt cert. |
I've used
|
...okay, can we get a minimal repro? In particular, how are you loading the server certificate; if it's via a .pfx, what happens if you add the entire chain to the file? If it's via a store and |
@Clockwork-Muse Okay, I managed to get a proper reproduction of this, see the code below. In order to reproduce, you'll need to have a Let's Encrypt certificate. Run this command on a Ubuntu box as:
Then use another shell window to run:
using System;
using System.Net;
using System.Net.Security;
using System.Net.Sockets;
using System.Security.Authentication;
using System.Security.Cryptography.X509Certificates;
namespace ConsoleApp1
{
class Program
{
static void Main(string[] args)
{
var cert = new X509Certificate2(args[0]);
var tcpListener = new TcpListener(IPAddress.Loopback, 5000);
tcpListener.Start();
Console.WriteLine("Running...");
while (true)
{
using (var client = tcpListener.AcceptTcpClient())
{
try
{
using (var sslStream = new SslStream(client.GetStream()))
{
Console.WriteLine("Connected, starting handshake...");
sslStream.AuthenticateAsServer(
cert,
clientCertificateRequired: false,
enabledSslProtocols: SslProtocols.Tls12,
checkCertificateRevocation: false);
Console.WriteLine("Done with handshake");
}
}
catch (Exception e)
{
Console.WriteLine(e.Message);
}
}
}
}
}
} The output of running this with strace:
Note that to reproduce the hang you'll need to mark outgoing port 80 as drop. [EDIT] Add C# syntax highlighting by @karelz |
By the way, the output of the console app is connected (immediately) and then waiting for about two minutes for the handshake to complete. |
@Priya91 can you please take a look? Do we have enough information to analyze / reproduce locally? |
Was able to dig a bit deeper, here is the stack trace that it is holding:
|
So (without digging into the actual code) it looks like it's building the server certificate chain for whatever reason. Shouldn't a normal certificate have whatever is necessary to say "I'm me, these guys say so"? Hm, try setting those environment variables, maybe openssl isn't looking at wherever you put them into the trust store. ....as a side question, does this happen if you just open a stream without it being a webserver? Like wrapping |
Even more minimal reproduction, use with the attached certificate. var cert = new X509Certificate2("test.pfx");
var chain = new X509Chain();
chain.ChainPolicy.VerificationFlags = X509VerificationFlags.AllFlags;
chain.ChainPolicy.RevocationFlag = X509RevocationFlag.ExcludeRoot;
chain.ChainPolicy.RevocationMode = X509RevocationMode.NoCheck;
Console.WriteLine(chain.Build(cert)); The root cause here is the following:
In particular, this seems to be related to this code: In other words, if there is an authority information access with a URL, the act of building a X509Chain would force it to be hit, and it can take multiple minutes to resolve if this is not accessible. The code seems to hint that there is some timeout there, controlled by But even calling I was able to get it working fast by setting the timeout to a negative value, which is good. Where this is actually happening whenever you use an Note that even with port 80 being open, this includes the cost of a remote call on every SSL connection being made when using a server. [EDIT] Add C# sytnax highlighting by @karelz |
@Clockwork-Muse This looks like it impacts only |
Regarding the severity of this issue. What is worse, it is downloading the certificate that it already have in the same cert. It also means that any SSL based server on CoreCLR is effectively hostage to the infrastructure of the issuers (at all levels) for their certs. |
It looks like a workaround might be to put the issuer of the certificate in the stores, but I can't get it to work. This code also hangs: static void Main(string[] args)
{
using (var userIntermediateStore = new X509Store(StoreName.CertificateAuthority, StoreLocation.CurrentUser))
{
userIntermediateStore.Open(OpenFlags.ReadWrite);
var col = new X509Certificate2Collection();
col.Import(args[0]);
foreach (var item in col)
{
userIntermediateStore.Add(item);
}
}
var cert = new X509Certificate2(args[0]);
Console.WriteLine("Started");
var chain = new X509Chain();
chain.ChainPolicy.VerificationFlags = X509VerificationFlags.AllFlags;
chain.ChainPolicy.RevocationFlag = X509RevocationFlag.ExcludeRoot;
chain.ChainPolicy.RevocationMode = X509RevocationMode.NoCheck;
Console.WriteLine(chain.Build(cert));
} [EDIT] Add C# syntax highlighting by @karelz |
Another impact here, this will do the call in a synchronous fashion, even if you called |
Based on @ayende's condensed repro, looks like the hang is happening in building the cert chain. Routing to System.Security. |
Let's Encrypt
cert.
@bartonjs can you please comment? |
Just like Windows, if the next step in the chain cannot be completed it will try to download the next element from the AIA record before failing. The solution would be to download the chain via some other process and put the intermediates into the CurrentUser\CA (intermediate) store. If the chain successfully builds and the root was trusted then any intermediates that were downloaded along the way get cached into the CurrentUser\CA store so that future chain builds don't touch the network. The thing to note here is it's not downloading the presented certificate, but looking for the issuing CA certificate. |
It seems to be downloading the root, not the intermediate. For example:
The IP address in the original post matches http://apps.identrust.com/, which is where the intermediate says you can download the root. Not where the end-entity certificates say you can download the intermediate. (IdenTrust also uses the same IP for their CRL, so it could perhaps be downloading that, too.) |
@bartonjs That doesn't seem right, see the code sample here: I'm explicitly registering anything in there in the current user store, but that doesn't help. |
@ayende - I don't know what's going wrong with the manual registration, but it's possible it's the wrong "current user"? On windows the system would also check LocalMachine, not so sure what happens on Linux. Oh, you can also set the |
@Clockwork-Muse Okay, when registering the full chain with this code, first running it with port 80 open and then port 80 closed, the problem is resolved.
However, given that this has such a sever impact, and that this is something that is not really needed (everything still works, just very slowly), is there a way to turn this off? |
Except you shouldn't need to do that at all. If you add the chain to the pfx file (so it contains the server cert and the entire chain) and load via You don't have to add the certificates to the store at all. Even if you did, that should be a single-time thing, at application startup, and you don't need to open port 80 while doing so. |
@Clockwork-Muse The certificate is actually generated by Let's Encrypt, so I don't control that.
|
....that might work, although the more usual thing is probably to use OpenSSL to include the chain. For that matter, you could probably have Let's Encrypt include the chain when it signs your cert. |
I actually want to have the full chain there, yes. I'm generating the CSR for Let's Encrypt using this code: https://github.com/ravendb/ravendb/blob/v4.0/src/Raven.Server/Commercial/SetupManager.cs#L383-L392 I tried searching for this, but I don't see how I can get it to generate the full thing. For that matter, looking at how this is done, it looks right: https://github.com/fszlin/certes/blob/master/src/Certes/Acme/AcmeCertificate.cs#L46-L54 Looking into this further, but I would appreciate any pointers. |
...hmm, from a cursory look it seems to be including the chain by default. Oh, it turns out I was slightly wrong - the certs do have to make it into the store to be used to build the chain, not just imported into a collection that you pull from (I must have been leaving them in the store on accident between tests). Still, the initial code you posted should have worked just fine (I cleaned it up a bit during testing): static void Main(string[] args)
{
var col = new X509Certificate2Collection();
col.Import(args[0]);
foreach (var item in col)
{
Console.WriteLine($"{item.Subject} - {item.Issuer}");
}
using (var userIntermediateStore = new X509Store(StoreName.CertificateAuthority, StoreLocation.CurrentUser, OpenFlags.ReadWrite))
{
if (userIntermediateStore.Certificates.Count == 0)
{
Console.WriteLine("No certificates in store");
}
userIntermediateStore.AddRange(col);
}
// Since I'm not sure what order the certs are in,
// although new X509Certificate2(someFileWithPrivateKey) seems to grab the one with a private key.
var cert = col.Find(X509FindType.FindBySubjectName, "--your subject name here--", false)[0];
Console.WriteLine("Started");
var chain = new X509Chain();
chain.ChainPolicy.VerificationFlags = X509VerificationFlags.AllFlags;
chain.ChainPolicy.RevocationFlag = X509RevocationFlag.ExcludeRoot;
chain.ChainPolicy.RevocationMode = X509RevocationMode.NoCheck;
Console.WriteLine(chain.Build(cert));
// Test cleanup
using (var userIntermediateStore = new X509Store(StoreName.CertificateAuthority, StoreLocation.CurrentUser, OpenFlags.ReadWrite))
{
foreach (var item in userIntermediateStore.Certificates)
{
userIntermediateStore.Remove(item);
}
}
} ... try running that over the real certificate (not the test one you posted, that only has one), and look at how many certificates it lists: there should be at least 2, probably 3, and maybe 4+, if you have a full chain in the .pfx |
@Clockwork-Muse Okay, I might be missing something, but I just created this certificate: You can see that it does include the root certificate and the entire chain. This is still going to make a remote call when we try to build the certificate chain. I'm now pretty sure that the workaround of manually inserting the entire chain to the Adding & cleaning this isn't going to work (concurrency, to start) and while we expect this to run mostly on servers, I don't want to be one of those "you installed a CA on my machine" kind of emails to hit my mailbox (even if in this case, the cert is well known and should have been trusted. As I can see, all the information to figure out the certificate is in the pfx, so it shouldn't be doing this, right? |
...you don't have to remove the certs. In fact, I'd expect you to not do so (except during an uninstall), I was only doing it because test code should clean up after itself.
That would be extremely strange. The only people who would be complaining would be you, currently, right? If you're distributing this as server software that you're expecting customers to install/deploy, they should be providing their own certificate. If you're currently running a public webserver, that's an assumption you're making - that your clients trust the CA that issued your server certificate.
.... so I don't really understand what it is you're worried about on that front. if you don't want to add the certificates to the store, and you're on Linux, set the environment variables in your launch script.
Maybe? I don't know, perhaps @bartonjs could chime in about this (whether a pfx could satisfy its own chain, although the fact you can have multiple unrelated certs/chains would probably throw a wrench in that), but to my mind, this would really be expected behavior. It's like you were issued a drivers license or other id - the server is going: "okay, and I trusted that guy over there to issue this to me? Yes, I know what the chain was, I'm asking if I trust them.". That is, there's an inherent trust relationship expected if somebody is saying "you're you"; they have to be somebody you trust to make that assertion (so are expected to be in the store). Personally, though, I'd just add the certificates to the store and be done with it. |
Once the PFX has been read nothing remembers that it any particular cert therein had affinity to other files in that same container. Since the PFX itself is defined in terms of authsafes I'll make a safe-based analogy: The end-entity certificate is like a laser-inscribed diamond. You can look at the diamond and see that it has an issuance number. So now the jeweller wants to see that it matches the description. And you say "well, I have the certificate of issuance". "Great, where is it?" "Well, it was stored in the safe on the next shelf over from this diamond." The last sentence doesn't really help the jeweller, and is a bit out of place :). Weak analogy, yes, but that's the best I could do off the top of my head. Though I'm having a bit of clear-headedness reassert itself now that vacation is (almost) over:
In addition to certificates for a chain resolution, any CRLs or OCSP responses are usually done over HTTP (80), NOT HTTPS (443). That's because you could end up in a cycle of initiating TLS to verify TLS (including "I need to download this file from you before I know I can trust you to download the file" when the CRL/OCSP endpoint is protected by the same CA). Since the certs, CRLs, and OCSP responses are not sensitive (no encryption required) and provide their own integrity (via their signatures) HTTP is not a security problem. Blocking port 80 outbound, though, is a security problem since you're effectively saying you'll never check a CRL (unless you have an alternate means of loading CRLs). |
Is it really necessary to download the CRL to check if the intermediate certificate has been revoked? If an intermediate with tens of millions of previously valid certificates is being suddenly revoked, it would be headline news. |
A common scenario for us is that the user generates the certificate via Let's Encrypt, so we handle the whole thing. In fact, that is the most likely scenario for a developing setting up a local instance is just that, and then we may need to walk the chain and add it. @bartonjs Make perfect sense with regards to not getting into recursive HTTPS. And thanks for the note on the blocking of port 80 with CRL, that is very useful. What I'm concerned about is that there is no caching and this is exposed to users. |
I've tracked this further and I think that this is quite serious. The same code path is invoked when using Because they control the URL, they can chose to make the server in question respond slowly enough to hold up the thread for a significant amount of time. Rinse and repeat a few time and you have all the server threads busy waiting for the This looks like a pretty simple remote denial of service attack on any SSL endpoint that uses client certificates. I tested this (on Windows) and it exhibit the same behavior on both CoreCLR and full .NET. Attached is a simple program that reproduce this behavior, showing how What worries me is that this is done before any authentication_ is made, so it doesn't even give the server the chance to decide whatever they really want to do it or not. |
A mitigation here is that the |
false
.... why bother having something else be slow? If you want to delay the server you're talking to, just make your conversation slow, don't refer them to somebody else that's slow. (Referring them to somebody else to tie up that person can be done in simpler and easier ways, too, so that's mostly out) And having the method call be |
@Clockwork-Muse Did I miss something with disabling revocation checking?
At least on the face of it, it looks like it is trying to disable certificate revocation, but it is still doing that. As for why this worth bothering with? The answer is timeout, or to be rather more exact, lack thereof. This, however, will ignore any timeout specified and will block a thread. That means that even if my code is careful about such things, it is easy to cause fatal starvation. The code in question was:
Note that I don't expect this code to take so long, in fact, I'm trying hard to protect against it. Looking at the code, on Windows it looks like the main issue is here: Note that this builds the chain (in
So this is manually and explicitly asking to not do any checks, but that still hangs. Am I missing something? |
I think you might be: I can't repro. Note that we're not necessarily dealing with just revocation, but verifying the entire chain. Personally, I think what's happening is that your individual hosts are still validating their own certs, not the incoming ones. Try these projects on for size (you'll have to update the paths for the included certs and location you put the project, but I don't know where you've got things. Don't forget the environment variables for the trust file in launch.json): Of particular note is what happens when the client authenticates with only its immediate cert (ie - only sends one certificate and a reference to its issuer) - no call out is made. Or at least I'm not observing anything with wireshark when I do so. |
The sample code I sent was actually on Windows, where the problem also exists and reproduces. |
Also, I'm not sure if this is related, but in the certs, you have:
While the Let's Encrypt cert has:
I'm guessing that there might be an issue here with invalid URL that might contribute? |
On Linux, with outgoing 80 port blocked (and without In both cases, the client is using cert: With the However, what is the equivalent of doing this on Windows? Also, this means that I've to set it up on my Linux box (easily done, of course). |
Okay, testing further on Linux shows that this isn't actually a workaround. When I tested the certificate with:
Using the |
Yeah, but it's not hanging due to the client, it's hanging due to the server checking its own chain (although I'll verify that by getting a better URI/reverify that that URL should be used anyways). That's because setting
...only for that particular server certificate? Are you using the server certificate I included, or one of your own (my server cert also has that AIA entry)? If it's one of yours, did you update the chain in the |
No, that isn't the case. The server is using your certificate, not mine.
Only for that client cert, because with the same server cert and config, using another cert from the client triggered the same behavior. There are two separate things going on there.
Here is my setup, one machine is running Linux, which is running your server and using Client using This is great, since it means that we aren't waiting. And here is us using I then tested the client using You can see here that this caused an issue, probably because the At this point, I'm pretty sure that the problem is only on the server side. It is important to note that I'm quite certain that the same thing reproduces on Windows as well, with the same behavior as outline before on this thread. Specifying Because I can't control what certificate the user will be sending the server, that means that the client can choose to block a thread on my server very easily without anything to stop it. |
... the proper way to handle this is to register the server cert chain at install/deploy/config time, not during application startup. That essentially makes the amount of time it spends 0. On vanilla Windows this is putting it straight into the regular store. On Linux and Docker, almost certainly setting the process environment variables.
Finally, can confirm that the client can cause the server to head off-box trying to resolve the trust chain for the client certificate; you were right (whether my prior mistake was an invalid URL or not putting it in enough places I'm not sure). I'm not 100 percent convinced that it's completely as serious, haven't had a chance to do a full dive (and probably the others want to take a look at it too).
That's a strange result; that cert has the entire chain, and that entire chain is in the environment variable trust store. I can't get it to go off box regardless, but when I use the same certificate for the server and the client the server is reporting that the client certificate is null during the validation callback. That's only on the conversation between my two Linux VMs; I can't get it to repro on windows or on WSL (which should be just another Linux box). Can somebody else take a look at it? |
is there any work left on this @bartonjs? This was opened as question and it seems like the problem is understood and 3.0 changes should make this significantly better. |
In Ubuntu 16.04, dotnet core 2.0.3
Running a webhost console app with a server certificate issued by
Let's Encrypt
Server is listening to https://172.31.46.243:443 (my private IP address)
Trying to run:
Output:
I ran strace on the server app - during the handshake it tries to connect to 192.35.177.64 on port 80!
This IP address belongs to a certificate authority (IdenTrust).
The operation gets stuck (EINPROGRESS) because port 80 is not allowed for outbound connections on my server.
Output of strace:
Also tried to install the CA certificates (Let's Encrypt Authority X3 and DST Root CA X3) in the OS using both .NET API and update-ca-certificates. Doesn't help.
(As described here: https://github.com/dotnet/corefx/issues/16879)
Opening port 80 (outbound) solves the issue and the handshake completes successfully.
Is there a way to do a TLS handshake on Linux without allowing outbound connections on http port 80?
The text was updated successfully, but these errors were encountered: