Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

safety: introduce pointer types and their restrictions #208

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

muzarski
Copy link
Collaborator

@muzarski muzarski commented Nov 26, 2024

This is a follow-up to: #207.

It defines a CassPtr type, wrapper over Option<NonNull<T>>, generic over the pointer Properties. Two properties we do care about:

  • ownership - Exclusive (BoxFFI), Shared (ArcFFI), Borrowed (RefFFI)
  • mutability - Const or Mut

It gives us even more guarantees about the pointers. Specifically, it gives us the guarantee about the origin of the pointer - they can only be constructed via Box/Ref/ArcFFI APIs. We lacked this guarantee before this PR.

Pre-review checklist

  • I have split my patch into logically separate commits.
  • All commit messages clearly explain what they change and why.
  • PR description sums up the changes and reasons why they should be introduced.
  • I have implemented Rust unit tests for the features/changes introduced.
  • [ ] I have enabled appropriate tests in .github/workflows/build.yml in gtest_filter.
  • [ ] I have enabled appropriate tests in .github/workflows/cassandra.yml in gtest_filter.

@muzarski muzarski marked this pull request as draft November 26, 2024 15:47
@muzarski muzarski force-pushed the pointer-kinds branch 2 times, most recently from f433610 to 9dadad8 Compare November 26, 2024 17:35
@muzarski muzarski self-assigned this Nov 26, 2024
@muzarski muzarski marked this pull request as ready for review November 29, 2024 12:47
Copy link
Collaborator

@wprzytula wprzytula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach is very impressive.
I couldn't check all the changes, there were too much of them; I just peeked through them. I trust your meticulousness :)

scylla-rust-wrapper/src/collection.rs Outdated Show resolved Hide resolved
scylla-rust-wrapper/src/collection.rs Outdated Show resolved Hide resolved
scylla-rust-wrapper/src/metadata.rs Outdated Show resolved Hide resolved
scylla-rust-wrapper/src/argconv.rs Outdated Show resolved Hide resolved
scylla-rust-wrapper/src/argconv.rs Outdated Show resolved Hide resolved
scylla-rust-wrapper/src/argconv.rs Outdated Show resolved Hide resolved
scylla-rust-wrapper/src/argconv.rs Outdated Show resolved Hide resolved
scylla-rust-wrapper/src/argconv.rs Outdated Show resolved Hide resolved
scylla-rust-wrapper/src/argconv.rs Outdated Show resolved Hide resolved
scylla-rust-wrapper/src/argconv.rs Outdated Show resolved Hide resolved
@muzarski
Copy link
Collaborator Author

muzarski commented Dec 2, 2024

Rebased on master

@muzarski muzarski force-pushed the pointer-kinds branch 2 times, most recently from 98fbac9 to aa90f5a Compare December 2, 2024 19:30
@muzarski
Copy link
Collaborator Author

muzarski commented Dec 2, 2024

v1.1: Addressed @wprzytula comments

@muzarski muzarski requested a review from wprzytula December 2, 2024 19:32
@wprzytula
Copy link
Collaborator

Waiting with review for a rebase on #207.

@muzarski
Copy link
Collaborator Author

muzarski commented Dec 4, 2024

Waiting with review for a rebase on #207.

Everything is up to date

scylla-rust-wrapper/src/argconv.rs Outdated Show resolved Hide resolved
scylla-rust-wrapper/src/argconv.rs Outdated Show resolved Hide resolved
@muzarski
Copy link
Collaborator Author

muzarski commented Dec 4, 2024

v2: Improved pointer API based on @wprzytula suggestions. Thank you!

  • removed Copy and Clone implementations for CassPtr (currently, it's only defined for test purposes. I need to think how to address it some other way)
  • Left CassPtr::into_ref() as is. As discussed in the comment, it is required to "erase" the lifetime of the returned reference. This method consumes the pointer.
  • Introduced CassPtr::as_ref() method. It accepts a reference to the pointer. The lifetime of reference is inherited from pointer's lifetime.
  • Changed CassExclusiveMutPtr::into_mut_ref() to [...]::as_mut_ref(). It accepts &mut CassExclusiveMutPtr. The lifetime of returned mutable reference is inherited from mutable borrow of the pointer. This is really cool, because this ensures aliasing ^ mutability at borrow-checker level.
  • CassPtr::is_null() now borrows the pointer, instead of consuming it
  • CassFuture::set_callback(), and CassFutureCallback now expect a raw pointer. Safe CassPtr::as_raw() method is introduced for this use-case

I still need to think how to address the issue in tests. It's not possible to reuse the pointer, which is not Copy, while passing it to multiple API functions that consume the pointer.

@muzarski muzarski requested a review from wprzytula December 4, 2024 18:37
@wprzytula
Copy link
Collaborator

I still need to think how to address the issue in tests. It's not possible to reuse the pointer, which is not Copy, while passing it to multiple API functions that consume the pointer.

You could have an unsafe clone() method for tests only, then this would not break soundness guarantees of safe code.

@muzarski
Copy link
Collaborator Author

muzarski commented Dec 4, 2024

I still need to think how to address the issue in tests. It's not possible to reuse the pointer, which is not Copy, while passing it to multiple API functions that consume the pointer.

You could have an unsafe clone() method for tests only, then this would not break soundness guarantees of safe code.

Applied the suggestion. At first, I did not understand how is this different from implementing Copy and Clone for test only. But this part: "this would not break soundness guarantees of safe code" convinced me.

scylla-rust-wrapper/src/argconv.rs Outdated Show resolved Hide resolved
scylla-rust-wrapper/src/argconv.rs Outdated Show resolved Hide resolved
Comment on lines 164 to 172
#[no_mangle]
pub unsafe extern "C" fn cass_batch_add_statement(
batch: *mut CassBatch,
statement: *const CassStatement,
mut batch: CassExclusiveMutPtr<CassBatch>,
statement: CassExclusiveConstPtr<CassStatement>,
) -> CassError {
let batch = ptr_to_ref_mut(batch);
let batch = BoxFFI::as_mut_ref(&mut batch).unwrap();
let state = Arc::make_mut(&mut batch.state);
let statement = ptr_to_ref(statement);
let statement = BoxFFI::as_ref(&statement).unwrap();

match &statement.statement {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ 📌 Is this function still correct in case Arc::make_mut(&mut batch.state) returns an Arc to a new allocation, because there already have been multiple owners of state?

Copy link
Collaborator Author

@muzarski muzarski Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is as valid, as it was before the changes - we did not change the logic. But to answer your question (is it valid), I'd need to investigate it. TBH, I haven't touched batches in cpp-rust-driver yet. However, all these Arc::make_mut calls are a bit suspicious to me.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this is out of scope of this PR. Just wanted to raise your awareness about the potential bug here.

Copy link
Collaborator Author

@muzarski muzarski Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I investigated this. It looks fine to me. It looks like a small optimization.

The reason is that we need to clone fields of CassBatch in cass_session_execute_batch - we should not make any assumptions about the provided CassBatch object in the functions - it could have been modified, or even freed:

    let session_opt = ArcFFI::into_ref(session_raw).unwrap();
    let batch_from_raw = BoxFFI::as_ref(&batch_raw).unwrap();
    let mut state = batch_from_raw.state.clone();
    let request_timeout_ms = batch_from_raw.batch_request_timeout_ms;

    // DO NOT refer to `batch_from_raw` inside the async block, as I've done just to face a segfault.
    let batch_exec_profile = batch_from_raw.exec_profile.clone();

So how the optimization works?
This statement let mut state = batch_from_raw.state.clone(); is cheap, because it clones an Arc.

When it works?
Well, Arc::make_mut can potentially allocate, but this will not be the case if there is no concurrent execution of cass_session_execute_batch in the background - i.e., there is no other owner of the state.

Notice that this is all sound, because we do not expose the state to the users in any way.

Copy link
Collaborator

@wprzytula wprzytula Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Please put the above findings in a comment there, not to confuse future readers.

@muzarski
Copy link
Collaborator Author

muzarski commented Dec 9, 2024

Rebased on master

@muzarski
Copy link
Collaborator Author

muzarski commented Dec 9, 2024

v2.1: improved docstrings as suggested by @wprzytula

@muzarski muzarski requested a review from wprzytula December 9, 2024 13:09
@Lorak-mmk
Copy link
Collaborator

I've read the first few commits, and it talks about cass_data_type_new_from_existing expecting a Arc-allocated CassDataType, but I don't see why.
For context, the code of this function:

#[no_mangle]
pub unsafe extern "C" fn cass_data_type_new_from_existing(
    data_type: *const CassDataType,
) -> *const CassDataType {
    let data_type = ptr_to_ref(data_type);
    Arc::into_raw(Arc::new(data_type.clone()))
}

As far as I can tell it copies the whole object, not Arc, and does not assume it is Arc-allocated.

@muzarski
Copy link
Collaborator Author

I've read the first few commits, and it talks about cass_data_type_new_from_existing expecting a Arc-allocated CassDataType, but I don't see why. For context, the code of this function:

#[no_mangle]
pub unsafe extern "C" fn cass_data_type_new_from_existing(
    data_type: *const CassDataType,
) -> *const CassDataType {
    let data_type = ptr_to_ref(data_type);
    Arc::into_raw(Arc::new(data_type.clone()))
}

As far as I can tell it copies the whole object, not Arc, and does not assume it is Arc-allocated.

Sorry for the confusion. I remember that I made a mistake in some commit message, but forgot to fix it. I meant cass_data_type_add_sub_type function:

#[no_mangle]
pub unsafe extern "C" fn cass_data_type_add_sub_type(
    data_type: CassSharedPtr<CassDataType>,
    sub_data_type: CassSharedPtr<CassDataType>,
) -> CassError {
    let data_type = ArcFFI::as_ref(&data_type).unwrap();
    match data_type
        .get_mut_unchecked()
        .add_sub_data_type(ArcFFI::cloned_from_ptr(sub_data_type).unwrap())
    {
        Ok(()) => CassError::CASS_OK,
        Err(e) => e,
    }
}

@muzarski
Copy link
Collaborator Author

Pushed fixed commit message (collection: allocate static empty collections).

@Lorak-mmk
Copy link
Collaborator

The same mistake is present in metadata: store CassDataType behind an Arc commit message.

Other parts of the code make an assumption, that the pointer
representing `CassDataType` was obtained from an Arc allocation.

Take for example `cass_data_type_add_sub_type` - it clones an Arc.

This is a bug, that was fortunately detected by applying more restrictions
on the pointer types (introduced later in this PR).
The same bug as for collection types.
Again, if someone called `cass_data_type_add_sub_type` with
a data type obtained from `cass_column_meta_data_type`, it would
not be a pointer from an Arc allocation.
Weak::as_ptr() can return an invalid pointer. It can be even dangling (non-null).

It's safer to try to upgrade to an Arc. If upgrade was successful,
make use of RefFFI API to return a valid pointer. Otherwise, return
non-dangling null pointer.
Before this PR, the pointer was obtained from a valid
reference &CassFuture, which is totally fine.

However, I want to reduce the ways one can obtain such pointer.
For ArcFFI (shared pointers), I want them to be obtainable only in two ways:
- `ArcFFI::as_ptr()` which accepts an &Arc
- from the user, as a function parameter

This way, we are guaranteed that the pointer comes from a valid Arc allocation
(unless user provided pointer to some garbage, but there is no much we can do about it).
If we assume that user provides a pointer returned from some prior call to
API, we are guaranteed that it is valid, and comes from an Arc allocation (or is null).

I don't want to allow ArcFFI api to create a pointer from a refernce, to prevent
creating a pointer, from example from stack allocated object:
```
let future = CassFuture { ... };
let future_ptr = ArcFFI::as_ptr(&future);
```

This commit may not make much sense now, but all should be clear once
I introduce traits restricting the pointer types later in this PR.
@muzarski
Copy link
Collaborator Author

Rebased on master

Copy link
Collaborator

@Lorak-mmk Lorak-mmk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not review the whole thing, because I spotted some important issues in argconv.rs. In general, please describe with more detail in precision what guarantees / requirements the types / methods require and provide. Without it it is a bit difficult to reason about the code.

Comment on lines 433 to 440
view_meta: *const CassMaterializedViewMeta,
) -> *const CassTableMeta {
let view_meta = RefFFI::as_ref(view_meta);
view_meta.base_table.as_ptr()

match view_meta.base_table.upgrade() {
Some(arc) => RefFFI::as_ptr(&arc),
None => std::ptr::null(),
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for this upgrade to fail? Or does it indicate a bug in the driver?
If it is possible, can we have a test for that?

Comment on lines 42 to 48
impl BoundCallback {
fn invoke(self, fut: &CassFuture) {
fn invoke(self, fut_ptr: *const CassFuture) {
unsafe {
self.cb.unwrap()(fut as *const CassFuture, self.data);
self.cb.unwrap()(fut_ptr, self.data);
}
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this invoke method be unsafe? It does have a requirement for its argument to be a correct (alingment, lifetime etc) pointer.

Comment on lines 262 to 270
pub fn set_callback(
&self,
self_ptr: *const CassFuture,
cb: CassFutureCallback,
data: *mut c_void,
) -> CassError {
let mut lock = self.state.lock().unwrap();
if lock.callback.is_some() {
// Another callback has been already set
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here self_ptr should refer to the same CassFuture as self, right?
Can you instead make the receiver of this function an arc? self: &Arc<CassFuture>

scylla-rust-wrapper/src/argconv.rs Outdated Show resolved Hide resolved
scylla-rust-wrapper/src/argconv.rs Outdated Show resolved Hide resolved
scylla-rust-wrapper/src/argconv.rs Outdated Show resolved Hide resolved
scylla-rust-wrapper/src/argconv.rs Outdated Show resolved Hide resolved
scylla-rust-wrapper/src/argconv.rs Outdated Show resolved Hide resolved
Comment on lines 351 to 532
pub trait BoxFFI: Sized {
fn into_ptr<M: Mutability>(self: Box<Self>) -> CassExclusivePtr<Self, M> {
CassExclusivePtr::from_box(self)
}
unsafe fn from_ptr(ptr: *mut Self) -> Box<Self> {
#[allow(clippy::disallowed_methods)]
Box::from_raw(ptr)
fn from_ptr<M: Mutability>(ptr: CassExclusivePtr<Self, M>) -> Option<Box<Self>> {
ptr.into_box()
}
unsafe fn as_maybe_ref<'a>(ptr: *const Self) -> Option<&'a Self> {
#[allow(clippy::disallowed_methods)]
fn as_ref<M: Mutability>(ptr: &CassExclusivePtr<Self, M>) -> Option<&Self> {
ptr.as_ref()
}
unsafe fn as_ref<'a>(ptr: *const Self) -> &'a Self {
#[allow(clippy::disallowed_methods)]
ptr.as_ref().unwrap()
fn into_ref<'a, M: Mutability>(ptr: CassExclusivePtr<Self, M>) -> Option<&'a Self> {
ptr.into_ref()
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to expose into_ref? It is quite easy to cause a memory leak with it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I investigated where we previously needed to erase the lifetime. There were two cases:

  • iterators with 'static references. I replaced them with generic lifetimes
  • CassSession 'static reference passed to an async block. I changed it, so we pass an Arc to the async block. This is a good change (and a fix), because cpp-driver assumes that session object can be prematurely free'd by the user. It should not cause segfaults in executions that haven't been awaited yet. What's more, there is a test that expects that prematurely freed session awaits all the futures before closing. We are not doing that yet, thus the test is not enabled.

After adjusting these two things, I could get rid of into_ref method.

scylla-rust-wrapper/src/argconv.rs Outdated Show resolved Hide resolved
@muzarski
Copy link
Collaborator Author

v3:

New Origin trait and modified Ownership trait

Introduced new Origin trait which represents an origin of the pointer, and three implementors:

  • FromBox - box allocated, exclusive (potentially mutable) pointers
  • FromArc - arc-allocated, shared (either owned, or borrowed) const pointers
  • FromRef - pointers created from a valid rust reference, always borrowed and const

Modified implementors of Ownership trait. Now we distinguish two ownership kinds:

  • Owned - owned pointer. Data needs to be freed
  • Borrowed - borrowed pointers. Conversions to reference/Arcs from such pointers are unsafe - caller needs to ensure that owner of the data is alive

Additional commit

Moved the changes that introduce CassPtr (and traits) to a separate commit. I hope this aids readability and reduces the noise during review.

Thoroughly documented each type/trait and its methods. Explained what each type guarantees (and what it does not guarantee). Justified the choice of the methods being safe/unsafe.

@muzarski muzarski requested a review from Lorak-mmk December 17, 2024 20:21
cpp-driver does not increase the reference count. The lifetime
of the iterator is bound to the lifetime of result.
It's more readable (and safe) to have an explicit lifetime
instead of lifetime-erased references.
cpp-driver assumes that session object can be prematurely dropped. This means,
that we should increase the reference count in functions where we pass the session
to an async block. This will prevent UAF. There actually is a test case
for this (AsyncTests::Close). However, we cannot enable it yet,
since it expects that prematurely dropped session awaits all async
tasks and before closing.
It's required to enable the doctests for the crate.

I also marked the code snippets from documentation in binding.rs with
`text`. They should not be run as doc tests.
This commit introduces a `CassPtr` type, generic over
pointer `Properties`. It allows specific pointer-to-reference conversions
based on the guarantees provided by the pointer type.
Existing `Ref/Box/Arc`FFIs are adjusted, so they now allow
interaction with new pointer type. You can say, that for a user of
`argconv` API, new type is opaque. The only way to operate on it is to
use the corresponding ffi API.
Disallow implementing two different APIs for specific type.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants