-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
.NET Core 2.1 SocketsHttpHandler does not use Negotiate / SPNego #27415
Comments
Your attached output from the wire trace doesn't show that. The client-side HTTP is picking Negotiate.
However, the server is still not authenticating and returning back a 401. |
Hmm, could be a copy paste error. Will re-test on Monday and update. |
Hey, sorry for the slow response... So I've checked again and it's indeed using NTLM with that Negotiate response ( NTLM_NEGOTIATE). Wireshark can actually dissect and show that it's NTLMSSP_NEGOTIATE as show below. And here is the snip when I disable SocketsHttpHandler |
So, what you're really saying is that the HTTP stack correctly responded with Negotiate scheme. But Negotiate ended up "negotiating" NTLM instead of Kerberos. That happens a lot when the requirements for a valid Kerberos infrastructure don't exist. For example, if the client machine is not joined to the Windows Active Directory (or Linux Kerberos) domain of the server, or timestamps aren't matching etc. In your example, the server is a Linux machine using Kerberos. I'm assuming that the Linux machine is also the Kerberos ticketing server, or is there a separate machine for that? Usually, this kind of problem is a configuration problem with the machines and not a problem with the client-side HTTP stack since it is picking Negotiate scheme correctly. cc: @wfurt @geoffkizer @karelz |
TLDR: Unlike the rest of the Windows web clients (browsers, .NET full, etc) SocketsHTTPHandler is not canonicalizing the given host when trying to request the SPN which breaks Kerberos if the Url has a CNAME and the SPN is only on the DNS A record of the host. Details phew. This is what's happening in very high level when it's working as expected ( after purged krb tickets and flushed DNS cache - which actually helped me nailing it down - always purge and always flush!)
And this is what's happening with SocketsHttpHandler
So the reason SocketsHTTPHandler is not working is because it's trying to find the SPN for the CNAME instead of canonicalizing it with a forward lookup. So SocketsHTTPHandler would need to stand in line and stay consistent. Linux |
Thanks for the additional details. SocketsHttpHandler on Windows uses the Windows SSPI libraries for doing Negotiate and NTLM protocols. I do not think it is something that is directly controllable by SocketsHttpHandler. It would need to be investigated further. We have tested Linux clients against a Windows server/ActiveDirectory domain. We've demonstrated that Negotiate scheme will use Kerberos if all the machines are configured properly. See: #26418. But we have not extensively tested a Windows client using Negotiate (Kerberos) into a Linux server environment. |
In Linux, canonicalization is controlled by configuration knobs in In Windows, SSPI does not canonicalize, and AFAIK, there are no global knobs to turn. But many applications, including all of the major browsers, WinHTTP (and thus legacy .NET HTTP clients), and most (but not all) OS components do forward canonicalize CNAMEs prior to calling into SSPI. People/things now generally expect that behavior, witness the outcry when Chrome accidentally stopped pre-canonicalizing recently: https://bugs.chromium.org/p/chromium/issues/detail?id=872665 |
My guess is that this will reproduce on windows->windows as well ( haven't tested it yet). |
So, this scenario does work using WinHttpHandler (WinHTTP) on the client-side? So, did you turn off SocketsHttpHandler (via AppContext switch for example) and demonstrate that the scenario works? See: https://github.com/dotnet/core/blob/master/release-notes/2.1/2.1.0.md
If this works with WinHttpHandler, then it should be possible to fix it for SocketsHttpHandler. WinHttpHandler uses native WinHTTP which uses the same Windows SSPI libraries as SocketsHttpHandler for doing Negotiate and NTLM. |
Yep. see my very first post with the below workaround to make this work |
I think we just need to use the canonical host name here (as returned by Dns.GetHostEntry) instead of the hostname in the uri. Correct? |
Almost. That would handle CNAMEs and partially qualified names of As (that become fully qualified when the OS resolver appends one of the configured search suffixes). But...
|
@mattpwhite Thank you for the added details regarding CNAMEs and the issues with LLMNR, NETBIOS, etc. Since you observe the correct behavior in .NET Framework, we will look at that implementation to see where it differs from .NET Core SocketsHttpHandler. That will give us more insight into the correct implementation. |
I've done some research into why this works on .NET Framework. .NET Framework does make sure to do canonicalization when it is computing the proper SPN to use:
which then calls internal method So, we would need to use similar DNS resolution logic in SocketsHttpHandler. |
I was able to research this problem with a Windows-Windows setup in our separate Enterprise Testing environment. Given an IIS server called "corefx-net-iis" on a domain called "corefx-net.contoso.com", we are able to get Negotiate to use Kerberos with using any of the following URI's. // Use A record of server
string server = "http://corefx-net-iis/test/NegotiateTest.ashx";
string server = "http://corefx-net-iis.corefx-net.contoso.com/test/NegotiateTest.ashx";
// Use CNAME of server
string server = "http://iis-server/test/NegotiateTest.ashx";
string server = "http://iis-server.corefx-net.contoso.com/test/NegotiateTest.ashx"; "iis-server.corefx-net.contoso.com" is a CNAME. But for .NET Core 2.1.5, Negotiate will only use Kerberos when using the original FQDN of the server (A record): string server = "http://corefx-net-iis/test/NegotiateTest.ashx";
string server = "http://corefx-net-iis.corefx-net.contoso.com/test/NegotiateTest.ashx"; Any of the DNS names using the CNAME results in Negotiate using NTLM. |
Thanks @davidsh for your research and the effort to reproduce this issue! If NTLM is disabled due to security considerations(which can be the case in sensitive environments), then calls with CName on .NET Core 2.1.5 won't be able to authenticate and fail, so that can be a good test for the fix. Here is how to disable NTLM: Or if NTLM is not supported by the target server running on Linux in a trusted Kerberos realm then again the auth will fail ( my original use-case, which is much more involved to have a lab for). |
Just to be clear on that particular Linux kerberos setting, it only affects whether REVERSE DNS is done on ip addresses. It doesn't change how FORWARD normalization is done with respect to traversing CNAME records.
.NET Framework has never done any reverse DNS lookup checks. And in general, it won't do any normalization of the SPN name (from the hostname in the Uri specified in the http request) if the hostname is actually an IP address, i.e. "http://10.0.0.5/NegotiateEndpoint" I am working on a fix for this issue. But the PR will likely only match existing .NET Framework behavior and won't do any reverse DNS checks nor any specific Linux kerberos config file lookups. |
Thanks for pointing this out! The correct setting is actually dns_canonicalize_hostname which is a "relatively" recent ( khmm 5y old, but that's recent in Krb) addition.
https://web.mit.edu/kerberos/krb5-1.12/doc/admin/conf_files/krb5_conf.html#libdefaults So a correct implementation would need to honor that setting on Linux and if it's not then at least it should be mentioned somewhere so it's clear and avoids confusions ( probably saving few hours of debugging ). Just to be clear, I am already a happy person with your proposed fix of having the existing .NET framework behavior, but for the sake of completeness wanted to mention that Linux krb5.conf knob. Thanks! |
Well, under no circumstances should .NET be trying to parse a krb5.conf. On non-Windows systems, my understanding is that no name canonicalization should need to be performed before calling into a GSSAPI/Kerberos implementation because that library already takes care of this (or not, depending on how someone chose to configure it for their environment). My understanding is also that the typical configuration is forward canonicalization on, reverse off. Forward canonicalization was just implicitly enabled by MIT for some time, though as @csharmath points out, it's now a knob. The Windows case is different because SSPI does not do canonicalization on behalf of applications. There is no way for a developer/administrator to express how they would like all applications to behave, so the most reasonable thing to do is to just do what IE did way back when and the other browsers subsequently emulated - forward canoncialize, no reverse. FWIW, browsers did eventually add knobs to customize this behavior on Windows (https://www.chromium.org/developers/design-documents/http-authentication, https://blogs.technet.microsoft.com/askds/2009/06/22/internet-explorer-behaviors-with-kerberos-authentication/). The reason these are application knobs on Windows is because the application actually controls it; disabling CNAME resolution wouldn't work if SSPI did it for you in the way that MIT does in a default configuration. |
Make sense, thanks for this! |
This is what the current .NET Framework behavior is. And this is what the fix for .NET Core will be also. |
fyi. I'll be OOF for about a week+. So, I'll be submitting the PR for this fix as soon as I get back. |
SocketsHttpHandler was not normalizing the DNS name prior to using it for the SPN (Service Principal Name). So, when using URI's that involve a CNAME, it was using the CNAME directly and not evaluating it to the normalized FQDN A record of the host. This change fixes the behavior to match .NET Framework so that CNAMEs are resolved properly. We can use the standard Dns.GetHostEntryAsync() API to resolve the name. From a performance perspective, this additional DNS API call is limited to just the SPN calculation for NT Auth. Calling this API doesn't impact the performance on the wire since the OS will cache DNS calls. Wireshark confirms that no additional DNS protocol packets will be sent. .NET Framework actually caches the normalized DNS resolution on the ServicePoint object when it opens up a connections. Thus, it doesn't have to call Dns.GetHostEntryAsync() for the SPN calculation. While a future PR could further optimize SocketsHttpHandler to also cache this DNS host name, it isn't clear it would result in measurable performance gain. I tested this change in a separate Enterprise testing environment I set up. I created a CNAME for a Windows IIS server in a Windows domain-joined environment and demonstrated that the Negotiate protocol results in a Kerberos authentication (and doesn't fall back to NTLM). Fixes #32328
thanks for the fix @davidsh ! |
Yes, the fix is in the master branch for 3.0. |
Thank you for your fix @davidsh ! Just wondering if it would be possible to backport this merge request to either 2.2 or one of it's servicing releases so this fix become available for 2.2 as well and for PowerShell Core 6.2 ? |
@csharmath we do not port changes to servicing branches unless there is a very good reason - i.e. impact on larger set of customers, without reasonable workaround. Is that the case here? |
It's hard for me to tell the impact of this, but surely impacts all shops using Negotiate with CNames. The workaround is to either use the DNS A record or disable SocketsHttpHandler (more preferable in cases when CNames can change).
or set the env var
Once set, these settings have a potential to be easily forgotten to be undone by dev teams after moving to 3.0 and missing out on the new perf improvements (unless these settings will be ignored with 3.0). |
@csharmath correct, disabling SockertsHttpHandler is not something we recommend. However, porting every fix into servicing would basically make the servicing branch new master (incl. instability, lower quality, higher chance of other regressions, etc.) Does it have specific impact on your environment(s)? Is the first workaround reasonable / acceptable in the meantime for you? |
To be clear I wasn't trying to propose to port all fixes, but I would consider anything security related as important to evaluate for backporting consideration. The impact of this issue could be a downgrade from Kerberos to NTLM. If a company already made investments to setup and use Kerberos, but for reasons did not completely disable NTLM then this change weakens their security with them potentially not even realizing it. If you put your security hat on, then it's a less optimal situation. Let me paste from MSDN:
|
There is another workaround/mitigation that can be considered. This problem only occurs if a CNAME is used in the URI and that CNAME is not registered as an SPN in Kerberos. So, the workaround is to register this additional CNAME SPN in Windows Active Directory / Kerberos environment. |
I will be forced to deploy the current project I am working on with SocketsHttpHandler disabled. I'm delivering a Web API that needs to pass through delegated auth to a SAP OData service. |
SocketsHttpHandler was not normalizing the DNS name prior to using it for the SPN (Service Principal Name). So, when using URI's that involve a CNAME, it was using the CNAME directly and not evaluating it to the normalized FQDN A record of the host. This change fixes the behavior to match .NET Framework so that CNAMEs are resolved properly. We can use the standard Dns.GetHostEntryAsync() API to resolve the name. From a performance perspective, this additional DNS API call is limited to just the SPN calculation for NT Auth. Calling this API doesn't impact the performance on the wire since the OS will cache DNS calls. Wireshark confirms that no additional DNS protocol packets will be sent. .NET Framework actually caches the normalized DNS resolution on the ServicePoint object when it opens up a connections. Thus, it doesn't have to call Dns.GetHostEntryAsync() for the SPN calculation. While a future PR could further optimize SocketsHttpHandler to also cache this DNS host name, it isn't clear it would result in measurable performance gain. I tested this change in a separate Enterprise testing environment I set up. I created a CNAME for a Windows IIS server in a Windows domain-joined environment and demonstrated that the Negotiate protocol results in a Kerberos authentication (and doesn't fall back to NTLM). Fixes #32328
Hit this issue in another project for another client. Forced to disable SocketsHttpHandler. :/ |
So practically, 1-2 affected projects per year so far. Did you consider updating to .NET Core 3.0 or 3.1 (if you prefer LTS versions)? It is fixed there. |
Fair enough. Small numbers. For me, that represents the last two projects I've worked on. I'm not making the call on version for this one. And the workaround is easy enough once you finally figure out (remember) where the problem lies. Just seems weird to leave the bug there. |
Isn't every bug weird to have? That is their definitions - they are bugs, unexpected behaviors. Fixing is usually done based on wide-spread impact. You can use existence of this bug as reason to upgrade to newer version - take it up with decision makers. If they don't care ... the bug is likely not such high priority for them, or get info from them why they cannot upgrade. |
I guess it's partly frustration on my part. Both projects have burnt many hours over days going back and forth with infrastructure people to try to get the magic soup of SPNs, etc right for Kerberos to work in their environment. Something that certainly isn't my specialty and in these cases don't have the level of access to directly tinker with myself. Bugs like this one are fun because they don't (to me, at least) immediately point to the code. Authentication just doesn't work. So you go back and beg people to triple-check SPNs and firewalls and who-knows-what-else to figure out what part of the environment is not configured right. Until someone finally ran across this issue in GitHub, it never occurred to me that there might be a new network handler in play (by default) that just didn't do the same thing that the old one did. Anyway, the bug has been dealt with. It just requires upgrading. Or a workaround if that is not feasible. |
Overview
While testing PowerShell Core 6.1 I ran into an issue with not being able to authenticate to a Kerberized REST API running on Linux unless I disable SocketsHttpHandler.
PowerShell/PowerShell#7801
It seems like that when the server responds with both Negotiate and NTLM, the SocketsHttpHandler picks NTLM which in my case results in a 401 as the service in question is really expecting Negotiate / SPNego and is not working with NTLM.
As requested by @karelz in https://github.com/dotnet/corefx/issues/30166
I've reproduced it on the daily builds without PowerShell Core involved and same results, so submitting a new issue for this.
Expected result
When server sends multiple auth schemes like Negotiate and NTLM, pick the strongest one which in this case is Negotiate.
Dotnet Info
Example repro
Result: 401
HTTP traffic from packet capture
GET / HTTP/1.1
Host: mykerberossite.lab.local
HTTP/1.1 401 Unauthorized
Date: Mon, 17 Sep 2018 21:31:42 GMT
Server: Apache-Coyote/1.1
WWW-Authenticate: Negotiate
WWW-Authenticate: NTLM
Content-Length: 0
GET / HTTP/1.1
Authorization: Negotiate ****
Host: mykerberossite.lab.local
HTTP/1.1 401 Unauthorized
Date: Mon, 17 Sep 2018 21:31:42 GMT
Server: Apache-Coyote/1.1
WWW-Authenticate: NTLM
Content-Length: 0
Workaround is to disable SocketsHttpHandler
result: 200
HTTP Traffic
GET / HTTP/1.1
Connection: Keep-Alive
Host: mykerberosite.lab.local
HTTP/1.1 401 Unauthorized
Date: Mon, 17 Sep 2018 21:30:27 GMT
Server: Apache-Coyote/1.1
WWW-Authenticate: Negotiate
WWW-Authenticate: NTLM
Content-Length: 0
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
GET / HTTP/1.1
Connection: Keep-Alive
Host: mykerberossite.lab.local
Authorization: Negotiate ***
HTTP/1.1 200 OK
Date: Mon, 17 Sep 2018 21:30:27 GMT
Server: Apache-Coyote/1.1
WWW-Authenticate: Negotiate ***
Cache-Control: no-cache
Expires: -1
Content-Type: text/plain;charset=UTF-8
Content-Length: 103
Keep-Alive: timeout=5, max=99
Connection: Keep-Alive
The text was updated successfully, but these errors were encountered: