-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
X509Chain building is slow on Linux #28618
Comments
Related to dotnet/aspnetcore#7081. |
#15113 is going to change this code a bit; looking at perf here probably doesn't make sense until after that's done. |
Do we know if #15113 is really funded (it is marked as 3.0, is someone working on it?)? Note that the difference in perf is a ~700X. Do we have any reason to believe the Linux implementation is that much worse? My guess is that there is some caching that happens on Windows that is not happening on Linux. It would be good to at least understand the rough architectural blocks involved. My guess is that #15113 (which is about status checking an X509 certificate) is probably independent (or maybe actually makes caching harder). Ideally we confirm or deny my guess above, and if we determine that it is caching and roughly independent of #15113 (and relatively easy), we should go to some trouble to do that first if #15113 is in any danger of being cut. |
Essentially it's needed because Let's Encrypt is popular and only does OCSP; so there'll be a big kerfuffle with security signoff if it doesn't get done. So... yes, me :).
~75x. But on different machines; and I'm guessing the Linux one is a VM.
For an average cert chain Windows has to do ~5 P/Invoke transitions, on Linux we do ~500, mostly because of trying to piece back together the shape that the API forces us to return data.
There's an inversion of control needed to bring the feature online, and taking the new needs into consideration along with the perf delta an optimization for the successful chain case will likely result in bringing the OpenSSL version down to ~5 P/Invokes. So really the comment is "I'm already going to be doing a lot of work in that space, no one should touch it until I'm done". |
It is, in Azure. It's a "Standard D4s v3 (4 vcpus, 16 GB memory)". |
Thanks @bartonjs, you put my mind at ease. My main concern was that we were making this work item dependent on a work item which was not clear had an owner. You confirmed you are the owner, understand why the current implementation is slow, and have strong reasons to believe that the work you will do will also fix this. I am happy... Thanks |
Is this still on your radar for .net core 3? I wanted to track the progress for dotnet/aspnetcore#7081 , but can't find the right dashboard metric in powerbi. |
Yes, seeing if it can be improved further (some ideas come to mind) is still in scope for 3.0. |
Today's investigation says that if LM\Root, LM\CA, CU\Root, CU\CA, and CU\My were all built into permanently cached STACK_OF(X509*) values (SafeX509StackHandle) that tight-loop chains move from ~30ms to ~0.95ms on my test machine (1001 iterations, discounting the first one from the sample set) when the chain is valid and no revocation checks are performed. Ideally the cache invalidation logic won't add too much back to that. |
@sebastienros / @halter73 : FYI, a big perf improvement just went in for X509Chain.Build on Linux. Using one of @stephentoub's SslStream tests we saw about 4 minutes reduction on 10,000 handshakes, ~24ms per. Hopefully your TLS benchmarks agree when you get a build with this change. |
@sebastienros From your graph, it looks like X509Chain.Build for Linux still has a big performance gap compared with on Windows. Anyway, my project has suffered from this issue, how can I verify the fix in my local environment? shall I try dotnet core nightly build? |
I run @stephentoub's above test "X509Chain build" with dotnet 3.0 nightly build 3.0.100-preview6-012026, and see huge improvement. My question is do you have a plan to backport to dotnet core 2.1, 2.2? After the fix: Before the fix: |
There is no plan to do so. |
Thanks for such a huge improvement! |
@ccic the nightly builds and docker images should have this change by now |
@stephentoub @bartonjs But on my local VM which shows huge improvement never connects "apps.digsigtrust.com". Is there anything wrong for my environment? Those two VMs are: Ubuntu 16.04.4 LTS |
@ccic If you're seeing activity on EVERY request, that suggests that either a) The Azure VM doesn't trust the root, so ends up not caching data. The only "intermittent" thing I can envision is OCSP/CRL expiry, if revocation is enabled on your tests. And the only thing I could think that would be different is that the Azure VM ends up hitting a different physical endpoint (due to routing rules) and receives a different response than your faster/successful machine. |
@bartonjs I wonder why the caching fails, so I did more tests, and found it may be related to the certificate. Only Chain build for "Let's Encrypt certificate" is very slow. My previous test run"Let's Encrypt certificate" on Azure VM, but run @stephentoub's test embedded certificate on local VM, that is not correct. I have 2 certificates to check: @stephentoub's test embedded certificate (embedded in above source code), and Let's Encrypt certificate. When I run Chain build perf test on Let's Encrypt certificate, it takes >3 seconds for 1000 iterations and connects to "apps.digsigtrust.com", but for @stephentoub's test embedded certificate, it takes ~0.05 seconds for 1000 iterations. Experiments: Apart from Chain build perf test, I added another function to check the certificate status information.
I run all the certificates check on Windows 10 machine, Ubuntu 16.04 local VM (openssl 1.1.1), and Azure Ubuntu 18.04 VM (openssl 1.1.1). I found "Let us Encrypt certificate" shows different behavior compared with the another certificate. @stephentoub's test embedded certificate and my service certificate on Windows 10 are invalid, and the revocation function does not work properly.
Let's Encrypt certificate on windows 10 is valid, and does revocation check.
On Ubuntu, all certificates are invalid and "unable to get certificate CRL"
Let's Encrypt certificate also has the same output on Ubuntu 16.04 and Ubuntu 18.04, either local VM or Azure VM
|
@ccic If your system isn't considering Let's Encrypt as trusted that'll definitely throw things off. I can't explain the "apps.digsigtrust.com", since that's not part of the Let's Encrypt infrastructure.
A better printing of the chain is Console.WriteLine("Chain Element Status:");
foreach (X509ChainElement element in chain.ChainElements)
{
Console.Write(" ");
Console.WriteLine(element.Certificate.Subject);
foreach (X509ChainStatus status in element.ChainElementStatus)
{
Console.WriteLine(" {0} ({1})", status.Status, status.StatusInformation);
}
Console.WriteLine();
}
Console.WriteLine("Chain Summary:");
foreach (X509ChainStatus status in chain.ChainStatus)
{
Console.WriteLine(" {0} ({1})", status.Status, status.StatusInformation);
} At least the part where it shows the elements. Somewhere you're not getting good chains; but that seems more to be system configuration than anything else. |
@bartonjs I checked "apps.digsigtrust.com" whose IP is 192.35.177.64, and it points to https://www.identrust.com/. I found Let's Encrypt used idenTrust to cross sign its certificate. See https://letsencrypt.org/certificates. I checked the certificate of chain1.pem(https://letsencrypt.org/certs/lets-encrypt-x3-cross-signed.pem.txt) with openssl command: openssl x509 -in chain1.pem -noout -text I'm investigating further why this happens.
|
The boxes running this are not directly comparable, however the magnitude of the difference is such that it doesn't really matter.
On my Windows 10 machine, I get numbers like this:
On my Ubuntu 18.04 machine, I get numbers like this:
Repro:
cc: @bartonjs
The text was updated successfully, but these errors were encountered: