Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix tls-psk in net #17741

Closed
wants to merge 12 commits into from
Closed

fix tls-psk in net #17741

wants to merge 12 commits into from

Conversation

shirleyquirk
Copy link
Contributor

@shirleyquirk shirleyquirk commented Apr 16, 2021

fixes #11280
much explanatory stream-of-conciousness of mine on that issue.
i'll try to summarize:

tls-psk overview:

psk in openssl allows the setting of callbacks for the client to select a psk based on a server hint, and for the server to validate this psk based on the client's identity.
the implementation of these callbacks involves dealing with c strings and char buffers, and is prone to buffer overruns, c.f. #17728
net therefore provides an abstraction for creating this callback, and instead the user provides a second, simpler callback

tls-psk use case

Limited as far as I know. embedded (but that probably wouldn't link against openssl). custom services running in inhospitable infrastructure. Situations where speed, ease of implementation, and not setting up a pki are more important than security. Note that libressl does not support psk at all.

the first problem

the ssl callback needs to know the whereabouts of this second user-supplied callback. Two places have been tried for storing it., in ex_data, and inside the Nim SslContext.

ex_data

this is an opaque storage location provided by openssl that stores void pointers. it's global, not per-context, and to insert data into it one needs to first get a unique index (by calling one ssl function) and then set the exdata at that index to said pointer. (when getting this index it's also possible to attach callbacks that will be called when an ssl_ctx is created, copied, or destroyed)
This was the original place chosen to store the secondary callback.

the second problem

the ssl callback knows which ssl_ctx it's attached to, but doesn't know which index to use to get out the secondary callback
a solution tried was to assume that this index would be zero, but as #4406 shows, this can't be relied upon.

when it broke

#5069
this pr removed all the indexing shenanigans, and placed the callbacks inside of the SslContext. as previously mentioned, this can't work, as the ssl callback has no access to the wrapper.

this solution

instead of storing the secondary callback anywhere, require that it be accessible at compile-time (a not overly-hampering restriction) and generate the ssl callback with a template.
the api is maintained for clientGetPskFunc= and lambdas work, but not closures.

Since none of the psk-associated functions have worked since v0.16, removing any of them is not a breaking change.
I've removed clientGetPskFunc and serverGetPskFunc; they do not seem useful to me, as these functions will be declared in the user's code, and the user can access them by name if they need.

limitations:

Only tls1.2 psk is implemented, tls1.3 of course uses a completely different set of functions. supporting tls1.3psk could be added, but this is a bug/regressionfix, not a feature addition

alternatives:

deprecate and remove psk functionality. it's a lot of work for a very small user base.


SslAcceptResult* = enum
AcceptNoClient = 0, AcceptNoHandshake, AcceptSuccess

SslHandshakeType* = enum
handshakeAsClient, handshakeAsServer

SslClientGetPskFunc* = proc(hint: string): tuple[identity: string, psk: string]
SslClientGetPskFunc* = proc(hint: string): tuple[identity: string, psk: string]{.nimcall.}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically, anything other than {.closure.} is fine, is this over-restrictive?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how it is over-restrictive, the last thing we need is more generic bloat in Nim's core libraries.

@shirleyquirk shirleyquirk marked this pull request as ready for review April 17, 2021 22:58
@shirleyquirk shirleyquirk changed the title steps toward fixing tls-psk in net fix tls-psk in net Apr 17, 2021
@shirleyquirk
Copy link
Contributor Author

"couldn't import SSL_CTX_psk_identity_hint" it's in my openssl 1.0.0, what's that about?

@timotheecour
Copy link
Member

is SSL_CTX_psk_identity_hint available on OSX? if so, in which ssl version?

@shirleyquirk
Copy link
Contributor Author

osx deprecated openssl, it's not available unless installed with homebrew. since other ssl tests work on osx, there must be something there somewhere

@timotheecour
Copy link
Member

timotheecour commented Apr 18, 2021

  • do you have OSX?
    i can reproduce the error on OSX locally.
  • if you don't and need CI, use at least something more targetted (temporarily while debugging), eg:
    disable other CI's you don't need
    remove .github/workflows
    remove pipelines you don't need from azure-pipelines.yml
    only test the test you're interested in, for eg as done here fix #17749 ignore SIGPIPE signals, fix nim CI #17748  #17752 (see koch.nim diff)
  # boot without -d:nimHasLibFFI to make sure this still works
  kochExecFold("Boot in release mode", "boot -d:release -d:nimStrictMode")

  when false: # debugging: when you need to run only 1 test in CI, use something like this:
    execFold("debugging test", "nim r tests/stdlib/tosproc.nim")
    doAssert false, "debugging only"

=> that way, it'll give you faster turnaround, and use less resources

in future work, this can be made easier

  • also, the flag -d:nimDebugDlOpen helps a bit

@shirleyquirk
Copy link
Contributor Author

sorry i know i'm being cheeky
i don't have osx i'm afraid
thank you for the pipeline suggestions i'll do that

@shirleyquirk shirleyquirk marked this pull request as draft April 18, 2021 01:55
@shirleyquirk
Copy link
Contributor Author

also no SSL_CTX_set_psk_server_callback i'm leaning toward this doesn't work on osx

@shirleyquirk
Copy link
Contributor Author

libressl. of course. ok we can do this

@shirleyquirk
Copy link
Contributor Author

shirleyquirk commented Apr 18, 2021

ok, its simple. libressl doesn't support psk. because it's super obscure and only really for embedded. which wouldn't use openssl anyway. but i'm here now.

  • go through openssl, turn off the useless functions
  • test all of openssl with libressl, not on the CI this time.

@timotheecour
Copy link
Member

timotheecour commented Apr 18, 2021

  • this works on osx:
brew install openssl
nim c --lib:lib --threads:on -d:ssl -d:nimDebugDlOpen tests/stdlib/tnetpsk.nim
DYLD_LIBRARY_PATH=/usr/local/opt/openssl/lib tests/stdlib/tnetpsk

of course there's gotta be a better way than using DYLD_LIBRARY_PATH but that's a start

you can also rent one, eg https://www.macincloud.com/ other other platforms

@shirleyquirk
Copy link
Contributor Author

shirleyquirk commented Apr 20, 2021

the versions hack is horrible. shouldn't nim look for the version symlinked by the system, i.e. with no trailing version number, instead of randomly grabbing a potentially outdated one?

Edit: oh, the symlink is only for openssl-devel? smh. ok.

@shirleyquirk shirleyquirk marked this pull request as ready for review April 20, 2021 16:10
@timotheecour
Copy link
Member

the versions hack is horrible.

it is, and PR welcome to improve things; but see past gotchas so mistakes aren't repeated (nowdays with important_packages this shouldn't happen though):

@shirleyquirk
Copy link
Contributor Author

saw them, yes, and there be dragons. i don't think I can remove the version string but I can break it up and document what each bit does and why its in a certain order.

@@ -611,6 +606,9 @@ when defineSsl:
when not defined(openssl10) and not defined(libressl):
let sslVersion = getOpenSSLVersion()
if sslVersion >= 0x010101000 and not sslVersion == 0x020000000:
# XXX always false!
# XXX however, setting non-1.3 ciphers with ciphersuites will
# XXX cause an error, cipherList needs to be split into 1.3 and non-1.3
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does? According to the man page:

Items that are not recognized, because the corresponding ciphers are not compiled in or because they are mistyped, are simply ignored. Failure is only flagged if no ciphers could be collected at all.

Am I missing anything here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see, if you have a cipher list without 1.3 ciphers then it would error out.

I really hate how openssl decides that it needs two functions for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

found it when testing this, trying to specify only a PSK cipher

@@ -611,6 +606,9 @@ when defineSsl:
when not defined(openssl10) and not defined(libressl):
let sslVersion = getOpenSSLVersion()
if sslVersion >= 0x010101000 and not sslVersion == 0x020000000:
# XXX always false!
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because it's parsed as ((not sslVersion) == 0x020000000)

Copy link
Contributor

@dom96 dom96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really missing a lot of context here, looking at some of the links they are from 2016. and I am quite lost. Can you edit your summary in this PR to explain exactly what this is fixing and how?

I left a few comments with questions but I feel like I'm missing far too much context for them to be useful :)

Comment on lines -684 to -685
proc clientGetPskFunc*(ctx: SslContext): SslClientGetPskFunc =
return ctx.getExtraInternal().clientGetPskFunc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep this? Removing it will be a breaking change plus being able to retrieve it seems useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that's possible. However this was a chance to make a clean sweep and I felt freer to make 'breaking' changes as none of the psk functionality has worked since before 1.0 so i know i'm not breaking anyone's code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bear in mind the pskfunc is probably declared as a proc and can be retrieved by name. it's static, after all.

lib/pure/net.nim Outdated Show resolved Hide resolved

pskClientCallback

proc `clientGetPskFunc=`*(ctx: SslContext, fun: static SslClientGetPskFunc) =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why static here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

without static, the fun makes the callback into a closure, which is of course incompatible with openssl.

This is the tradeoff to make psk work at all: the callback function must be known at compile-time. As the test shows, this is not a big issue, ergonomically it's pretty transparent and even lambdas work (as long as they are .nimcall)

As I learned from following your work, accessing the SslClientGetPskFunc at runtime consistently isn't possible, as the callback only has access to the raw ssl context, not the Nim wrapper context. Which is why your solution was to include it within the ex_data. But to get it back out, the callback needs to know the index of said data. which it also cant know.

Furthermore, we can't just assume that the index is 0, as the ex_data is not per context, but global across all contexts, so that solution doesn't work reliably, either.

So I concluded that to keep the abstraction of SslClientGetPskFunc it would have to be a compile-time rewriting, hence the template.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a closure in nim is just 2 function pointers; doesn't openssl allow passing some sort of void* context, with which we could pass the environment needed for the closure?

Copy link
Contributor Author

@shirleyquirk shirleyquirk Apr 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the sslcallback doesn't allow for extra arguments, or we'd pass in the nimcallback there. it's:

pskClientCallback(ssl: SslPtr; hint: cstring; identity: cstring; 
  max_identity_len: cuint; psk: ptr cuchar; max_psk_len: cuint): cuint

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps the signature could be better expressed as identity: var UncheckedArray[cchar] and psk: var UncheckedArray[cuchar] but that's maybe yak shaving

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you already annotated SslClientGetPskFunc with {.nimcall.}, static shouldn't be necessary afaik.

Copy link
Contributor Author

@shirleyquirk shirleyquirk Apr 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you'd have thought so, but if i remove the static the compiler tries it's best to make it a closure anyway and I get an error. maybe that's a bug i should submit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the sslcallback doesn't allow for extra arguments

here's how to pass callbacks: #17741 (comment)

lib/pure/net.nim Outdated Show resolved Hide resolved
@shirleyquirk
Copy link
Contributor Author

thank you @dom96 and @alaviss for having a look. I've updated the summary to provide some more context.

@shirleyquirk shirleyquirk marked this pull request as draft April 21, 2021 01:45
@shirleyquirk shirleyquirk marked this pull request as ready for review April 21, 2021 01:51
@Araq
Copy link
Member

Araq commented Apr 21, 2021

Can't we move to BearSSL instead?

@shirleyquirk
Copy link
Contributor Author

shirleyquirk commented Apr 21, 2021

Can't we move to BearSSL instead?

i support a move away from openssl, the writing is on the wall. I think the path to get there starts with #14719 (make ssl/tls pluggable), putting an abstraction inbetween (async)net and the tls implementation, especially as the post-openssl landscape is pretty fragmented, and any particular ssl backend may well disappear/get heartbleeded at any time. i'm up for helping in that work, it sounds like theres some other interest too. an RFC sounds like a good idea, pinging @snej

@Araq
Copy link
Member

Araq commented Apr 21, 2021

I'm not a fan of "Pluggable" but if it helps to get the ball rolling...

@shirleyquirk
Copy link
Contributor Author

'Abstraction layer' instead of 'pluggable' if that's easier to swallow.

@Araq
Copy link
Member

Araq commented Apr 21, 2021

Not the point ... I cannot design this abstraction layer without looking at e.g. BearSSL's source code, so I might as well try to port it and see how far I get and why ... But that's just me, maybe you have much more experience. ;-)

@shirleyquirk
Copy link
Contributor Author

maybe you have much more experience

lmfao
nope. wasted my youth on Lieder und Kunstschmieden

@shirleyquirk
Copy link
Contributor Author

i can't see the "re-run jobs" option mentioned in contributing.html to restart that crashed check

@shirleyquirk shirleyquirk marked this pull request as draft April 21, 2021 16:44
@shirleyquirk shirleyquirk marked this pull request as ready for review April 21, 2021 16:44
@shirleyquirk shirleyquirk reopened this Apr 21, 2021
SslContextExtraInternal = ref object of RootRef
serverGetPskFunc: SslServerGetPskFunc
clientGetPskFunc: SslClientGetPskFunc
SslServerGetPskFunc* = proc(identity: string): string{.nimcall.}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a breaking change. The only reason to justify this change is that it's security critical. Is that the case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

status quo is: servers using the psk API in net, in Nim >0.15.2 will compile and run happily, but sigsegv when a client initiates a connection with psk.

this is a DDoS vulnerability, at least.

Lets say Hyrum's Law applies, and someone is using this to reliably crash their servers remotely, and have set it up to work with a closure. That will no longer compile. I understand that's a break.

Maybe there's a way to allow embedding a closure in a {.cdecl.} callback but at the risk of sounding lazy or stupid I
a) couldn't figure it out:
the farthest i can get is tnetpsk.nim(42, 30) Error: internal error: inconsistent environment type
b) I doubt the benefit of the added code complexity. ref your first comment on this pr: #17741 (comment)

no ego here, close away if my solution isn't aligned with Nim's requirements, but i honestly can't figure out how to make this work any other way. The fact that I had to add an openssl dependency to the azure osx CI just for this test alone makes this whole PR a bit suspect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm I've an idea

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should work: #17741 (comment)

@shirleyquirk shirleyquirk mentioned this pull request Apr 23, 2021
@@ -115,7 +115,7 @@ jobs:
displayName: 'Install dependencies (i386 Linux)'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i can't see the "re-run jobs" option mentioned in contributing.html to restart that crashed check

see https://nim-lang.github.io/Nim/contributing.html#the-cmdgit-stuff-continuous-integration-ci

follow these instructions to only restart the jobs that failed:

(feel free to improve wording if unclear; note that some of these instructions may require sufficient permisssions in nim repo, or not, i fogot)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They only work if you have commit permission in nim-lang/nim afaik.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR welcome to clarify this in the doc (and whether it's for both azure and github actions or just 1 of those)

@timotheecour
Copy link
Member

the solution to allow passing nim callbacks is explained here: openssl/openssl#4510:

Addition of callback_arg to psk_client_callback functions #4510

You can use the EXDATA in an SSL object to add your specific needs. The problem with changing API of the callback function is that, well, it changes the API and we can't do that in a dot release. Sorry, it was a goof to leave it out, but there is a workaround.

here's how rust uses this: https://docs.rs/openssl/0.9.24/src/openssl/ssl/mod.rs.html

        unsafe {
            let callback = Box::new(callback);
            ffi::SSL_CTX_set_ex_data(
                self.as_ptr(),
                get_callback_idx::<F>(),
                mem::transmute(callback),
            );
            ffi::SSL_CTX_set_psk_client_callback(self.as_ptr(), Some(raw_psk::<F>))

links

SSL_CTX_set_ex_data() is used to store application data at arg for idx into the ctx object.

@shirleyquirk shirleyquirk marked this pull request as draft April 24, 2021 09:28
@stale
Copy link

stale bot commented Apr 24, 2022

This pull request has been automatically marked as stale because it has not had recent activity. If you think it is still a valid PR, please rebase it on the latest devel; otherwise it will be closed. Thank you for your contributions.

@stale stale bot added the stale Staled PR/issues; remove the label after fixing them label Apr 24, 2022
@stale stale bot closed this May 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale Staled PR/issues; remove the label after fixing them
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PSK broken in net.nim, pskClientCallback can't access extraInternal
5 participants