-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Returning an owned string #2
Comments
Thank you for starting this project, it's awesome!! Here's my proposal regarding owned String ffi. rust -> ruby owned
|
I did some research. Firstly one could deallocate the unused capacity like so: if s.len() != s.capacity() {
unsafe {
let ptr = s.as_mut_vec().as_mut_ptr();
std::rt::heap::deallocate(
ptr.offset(s.len() as isize),
s.capacity() - s.len(),
std::mem::min_align_of::<u8>()
);
}
} Secondly I'm now sure that From the docs of Thirdly I think I found out that
So |
@flo-l I think that we will need to use Your research is spot-on though - shrinking to fit is required because we need to be able to use the length of the string to rebuild the capacity of the string when passed back to Rust to be freed. A few more points:
It pays to be pedantic - while Rust strings always contain Unicode data, "Unicode" isn't an encoding - Rust has UTF-8 encoded strings.
Similar concerns were brought up before, but evidently it's OK. You really need to use I'd be happy to provide more knowledge. Let me know what you think would be useful to be added to the site! |
I'm also trying to solve this and have tried to combine the feedback posted here so far. extern crate libc;
use libc::c_char;
use std::ffi::CStr;
use std::str;
#[no_mangle]
pub extern fn greet_from_rust(subject_c_ptr: *const c_char) -> *const u8 {
let subject_c = unsafe {
assert!(!subject_c_ptr.is_null());
CStr::from_ptr(subject_c_ptr)
};
let subject = str::from_utf8(subject_c.to_bytes()).unwrap();
let mut new_string: String = "hey, ".to_owned();
new_string.push_str(subject);
new_string.shrink_to_fit();
let new_string_ptr = new_string.as_ptr();
std::mem::forget(new_string);
return new_string_ptr;
} However, I was not able to replace the return type Also, who now owns new_string? Does Ruby and its GC dispose of it later? Can we make this code any faster by not copying everything, but working on the C string/char* directly? |
I haven't checked with valgrind, but I think we're leaking memory with mem::forget, because the ruby code doesn't free the memory (at least I couldn't find any place where it does it). Also, as I mentioned in my previous post, ruby (mri) copies the string contents. This is unfortunate, because it hurts performance badly I guess. Possible solutions: Ad mem::free(): As far as I understand the problem is that ruby and rust don't use the same allocator (ruby: system and rust: jemalloc), so memory allocated at one side can't be deallocated at the other. To mitigate that we could supply a callback fn pointer that frees the string in rust. Ruby code must call the fn after it copied the string. Speaking of copying: I guess the allocator problem explains why ruby has to copy the string. Although ruby is garbage collected, it needs to free the string at some time too. But it can't free a string allocated with a different allocator. I would like to mention that I am by no means an expert, so I could be totally wrong and somebody should confirm that what I'm saying is correct. To sum up we can't make string ffi with ruby more efficient because of problems in ruby world. But maybe we could add some mechanism/function to ruby that lets foreign code supply a pointer to some memory, which ruby treats as string without copying! |
ad UTF-8 strings in ffi: aren't ruby strings UTF-8 encoded internally? Or at least isn't there the option to force a ruby strings encoding to be UTF-8? I don't understand why we need to go through CStr if we have an already perfectly valid UTF-8 string ready! |
That's correct.
Mostly true. You can now choose which allocator you use, which means that one could theoretically export the Ruby allocator into Rust world and then use it. That might be an interesting experiment.
Yep!
That's certainly possible. One could create a
My experience was mostly around JRuby encodings, but IIRC there's multiple underlying representations for ASCII-only strings, UTF-8 strings, and "other" strings.
There may be more efficient ways of translating strings between Rust and Ruby; most of my thinking so far has been around going from Rust to C (a pointer to 8-bit values that is terminated with a NUL character). A Ruby string may be different enough from a C string to make an alternate path more beneficial. |
Maybe I'll take a crack at all this tomorrow... |
Thanks for the review! I think you misunderstood my plan regarding the mechanism for string creation without copying. I don't want to write a Ruby class, I wan't to add a function to the ruby ffi API which takes a pointer to some string data and a size_t representing the length of the allocation (or valid part of the allocation, whatever), which constructs a Ruby String object, that uses the supplied string data directly without copying. It feels wasteful to transform a perfectly valid UTF-8 string to a CString and then transform it back to UTF-8 in ruby world again. Maybe we should open an issue at ruby/ruby to get some feedback? I could do some more research and then open one. |
If such a function didn't already exist, I'd be half-surprised. I'd expect a function that accepts a NUL-terminated string though.
That will be trickier, as it will hit the issue of using the same allocator, but that's at least possible.
Which API in particular are you thinking? The C-Ruby level, or something like the FFI gem? |
I'm thinking of adding it to Ruby.h, aka C-Ruby level. I couldn't find such a thing here. Once it's in Ruby.h I could add it to the ffi gem too. All the str_new_* functions call str_new0() internally, which copies here. Oh and btw, the equivalent functionality for arrays would be even more useful. I'm thinking of doing heavy number crunching with rust, eg. But that's another topic! |
Ok, that all makes sense. Using Rust terminology, you'd want a
I'd generally suggest minimizing the surface area between the two languages. If you can have the Ruby code give the Rust code the data once, then tell it to do a bunch of operations all in Rust-land, then get the results, you only have to round-trip the data once. You can have wrapper Ruby classes for each Rust type you need, but you'd basically just be holding onto a pointer. |
yeah I heard that, sometimes when you're dealing just with the rust community you forget how tough human interaction can be.. You're right with minimizing the surface area between the two languages, but it bugs me that it's necessary. |
The text was updated successfully, but these errors were encountered: