Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Array.from_cpointer with CPointer.create violates memory safety #3668

Closed
jasoncarr0 opened this issue Sep 29, 2020 · 11 comments · Fixed by #3675
Closed

Array.from_cpointer with CPointer.create violates memory safety #3668

jasoncarr0 opened this issue Sep 29, 2020 · 11 comments · Fixed by #3675
Assignees
Labels
bug Something isn't working help wanted Extra attention is needed triggers release Major issue that when fixed, results in an "emergency" release

Comments

@jasoncarr0
Copy link
Contributor

Array's methods and Array.from_cpointer basically trust the user outright; but we can make a null pointer publicly with Pointer.create, and generally blow up

https://stdlib.ponylang.org/builtin-Pointer/#create
https://stdlib.ponylang.org/builtin-Array/#from_cpointer

actor Main
  new create(env: Env) =>
    let arr = Array[U8].from_cpointer(Pointer[U8], 1, 1)
    env.out.print((try arr(0)? else 1 end).string())
@jemc
Copy link
Member

jemc commented Sep 29, 2020

I genuinely thought that Pointer.create returned a tag. In fact when I copied pointer code to Mare, I did it as a tag: https://github.com/jemc/mare/blob/44aee9554115e2b17d54c4fe477481dcc34b8ffc/src/prelude/c_pointer.mare#L15-L18

In the docs in Mare I even explained that this is the reason why it must be tag.

But you're right - in Pony it returns a ref. I swear I thought it was tag all along but I just checked the git blame and can't find any change in the recent past - it appears that it's always been this way.

This would be a breaking change, but I believe it must return tag for the safety reasons discussed in this ticket.

@SeanTAllen
Copy link
Member

What value is there in creating an array that you only have a tag reference to?

@jasoncarr0
Copy link
Contributor Author

jasoncarr0 commented Sep 29, 2020

I think the point was that Pointer.create would return a tag, and Array.from_cpointer would remain unchanged and return ref.

A Pointer[T] ref should grant you the capability to read and write anywhere within the corresponding allocation, which a null point definitely does not grant

@SeanTAllen
Copy link
Member

ah, thanks for the clarification @jasoncarr0

@SeanTAllen SeanTAllen added bug Something isn't working triggers release Major issue that when fixed, results in an "emergency" release help wanted Extra attention is needed labels Sep 29, 2020
@jemc
Copy link
Member

jemc commented Sep 29, 2020

If someone wants to working on fixing this, they'll probably run into the fact that Array and String internally rely on being able to create a ref null pointer, and changing this to tag will break that. So to fix those breakages, we'll need to add ._unsafe() after those constructor calls.

The ._unsafe() call can be used within the builtin package but nowhere else. If this breaks other packages, they'll need to find other ways to change their code to either use tag pointers only, or wait for a "real" ref pointer instead of using a null pointer as a placeholder.

@ergl
Copy link
Member

ergl commented Oct 5, 2020

I've taken a stab at this. There's one issue with the net and process packages requiring usage of a ref or iso variants of Pointer. Since _unsafe() is private, it's not visible from those packages. Here are the cases where it happens:

  1. NetAddress.name requiring an iso pointer

    var host: Pointer[U8] iso = recover Pointer[U8] end
    var serv: Pointer[U8] iso = recover Pointer[U8] end

  2. _ProcessWindows.create requiring a ref pointer

Both of these cases follow the same pattern: passing a NULL pointer to a C function, which fills it with data. This pointer is then used to create a string or array using from_cpointer. Using a tag pattern for the first use-case works, but we can't use a tag pointer to get an array or a string.

To get around this limitation, an option is to create the String or Array on the pony side, and use cpointer() to pass it as a pointer to C, but I don't know the implications of doing this, since the pony String / Array was allocated by Pony, and will be tracked by the garbage collector.

@jemc
Copy link
Member

jemc commented Oct 6, 2020

The tricky thing about this is that we're not really passing a pointer to a C function to fill with data - we're passing the address of the stack memory of the pointer so that the pointer itself will be replaced on the C side.

This is really the crux of the issue. We start with an invalid pointer (which therefore must be tag), and we want the C function to give us back a new pointer that we want to use to construct a String or Array (which therefore must be ref). So conceptually we want the C function to change the reference capability of the pointer value. But the addressof feature doesn't work like this - the "in value" must have the same type (and therefore the same capability) as the "out value".

And I've actually seen other examples in Zulip of users who are dealing with a particular C API that works like this, but they can't call it from Pony because of this problem.

So we're faced with two options to resolve this ticket:

  • 1️⃣ Add new feature(s) to Pony to allow it to have some way of changing the reference capability as part of an addressof passed to an API call
  • 2️⃣ Change the underlying C functions called by NetAddress.name and _ProcessWindows.create so that do not use this pattern of replacing the pointer value with addressof, and either populate a pre-allocated pointer or return the new pointer in the return value.

Path 2️⃣ has the disadvantage that it will only work for fixing the standard library, and any users who want to do something like this will need to write their own C shims around the FFI library they want to wrap.

Path 1️⃣ has the disadvantage of having to design a special feature and syntax that won't be used very often, which will need to go through RFC.

With that in mind, I suggest we take path 2️⃣ for now and consider path 1️⃣ in the future depending on the level of user frustration with the status quo (leaving the potential frustrated users to drive the RFC process).

@jemc
Copy link
Member

jemc commented Oct 6, 2020

Interestingly enough, pony_os_nameinfo (used by NetAddress.name) used to return the two pointers as a tuple rather than using the addressof pattern, but in this commit (dca188c) it was changed, and as the commit message implies ("returning a struct from a C FFI on Windows is complicated"), this wasn't safe on all platforms and in fact this is now illegal in Pony for that reason (#2012).

So we need to find an efficient approach to get these two filled pointers out from the C side. I have a few ideas rattling around, but I'm not sure which one is the most efficient.


For the case of ponyint_win_process_create (used by _ProcessWindows.create) the C API refactor is more straightforward: we need to make it return the error_message pointer instead of using addressof for it, then use addressof for the value that is currently the return value.

@jasoncarr0
Copy link
Contributor Author

jasoncarr0 commented Oct 6, 2020

Some other options:

  1. Allow ref pointers to be null, and account for it in all public interfaces, like Array. That is, Pointer can optionally point to an allocation, rather than always.
  2. Follow Rust in a null-pointer optimization with guaranteed ABI: There Option<&Foo>, which is either a reference to Foo or None, is defined to have the exact same layout as a pointer to Foo. I can see a lot of difficulty with this route though.

@jemc
Copy link
Member

jemc commented Oct 6, 2020

We discussed on today's sync call that we like approach 3️⃣ the best.

To be more specific, we want to change Array.from_cpointer and String.from_cpointer so that it will keep accepting a Pointer ref, but it will check first for the null pointer case. If the pointer is a null pointer, then it will return a zero-length Array or String, ignoring the length that you asked for.

This closes the safety hole, still behaves in a not-surprising way, and won't break any code that works today. Great idea, @jasoncarr0!

@ergl - does this change to the approach make sense?

@ergl
Copy link
Member

ergl commented Oct 6, 2020

@jemc Yes, I think it makes sense. If I understand point 3 correctly, the default Pointer constructor will still be ref, and we just add an extra to String / Array to prevent unsafe behaviour.

Also, you're right that I misunderstood the original code. I tried my approach of constructing a string on the pony side but I couldn't get it to work, now I know why!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed triggers release Major issue that when fixed, results in an "emergency" release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants