Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Require ObjectReference to be inside an object #1170

Closed
wks opened this issue Jul 19, 2024 · 0 comments · Fixed by #1195
Closed

Require ObjectReference to be inside an object #1170

wks opened this issue Jul 19, 2024 · 0 comments · Fixed by #1195
Labels
A-interface Area: Interface/API C-enhancement Category: Enhancement G-safety Goal: Safety

Comments

@wks
Copy link
Collaborator

wks commented Jul 19, 2024

Status quo

Currently, MMTk defines several addresses of an object.

Name Definition Must be inside object
starting address the return value of memory_manager::alloc Yes
ObjectReference an address that refers to an object No
in-object address at a constant offset from ObjectReference, used to access SFT, side metadata, etc. Yes
header address address used to access in-object metadata Yes

The definition of ObjectReference is VM-specific. We currently allow ObjectReference to be outside an object because some VMs do so. For example, in JikesRVM, an ObjectReference is defined as the address to the array payload of an object if the object is an array. That saves one offset computation for array element access, but when accessing scalar object fields or object headers, the VM will have to use negative offset from the ObjectReference. When we port MMTk from JikesRVM to Rust, we inherited this type. ObjectReference is now the standard way for mmtk-core to refer to an object. We still allow ObjectReference to be outside an object so that when loading from a field in JikesRVM, we directly use the word stored in the field as ObjectReference.

However, because we only map side metadata memory for pages within spaces, addresses outside any space (or unmapped pages) may not have mapped metadata. Similar is true for SFT entries which are allocated by chunk. If we attempt to access metadata or SFT using an address outside the object, it will be a segmentation fault. To solve this problem, we require the VM binding to implement ObjectReference::ref_to_address which computes the "in-object address" of an object which must be inside the object. (#699)

Meanwhile, VMs that use conservative stack scanning needs to read a word from the stack, compute the "in-object address" from it, and see if the VO bit is set at the "in-object address". Because we don't know if a word on the stack is an actual ObjectReference or not, the offset from the ObjectReference to the "in-object address" must be a constant (i.e. can be computed without reading any data from the object body). (Also in #699)

Meanwhile, not all VMs can use "the word stored in the field" as ObjectReference. In some VMs, the thing in a field may be a compressed pointer (OpenJDK), a tagged pointer (V8), an offsetted pointer (Julia), or an indirect handle (Guile or some old version of Hotspot JVM). We solve this problem by letting the VM binding implement the Slot trait and customize the load and store method so that we always represent a word-sized pointer-based ObjectReference to mmtk-core. (#606)

Then we implemented an algorithm for finding the last VO bit from an interior pointer. If neither the ObjectReference nor the "in-object address" is required to be word-aligned, the algorithm will not be able to return an exact ObjectReference, but only an address range where one of the addresses is a valid ObjectReference. That's confusing and inefficient. Now we require that ObjectReference must be word-aligned, while the "in-object address" has no alignment requirements. This makes ObjectReference more likely not to be what's held in an object field because the VM may use the low bits as tags (V8), making the value misaligned. But this is not a problem because the VM binding can fix the alignment in Slot::load and Slot::store. (#1159)

In conclusion, an ObjectReference as required by the current mmtk-core

  • is an address, and
  • must be a constant offset from the "in-object address" because of conservative stack scanning, and
  • must be word-aligned to support searching for ObjectReference from interior pointer, and
  • may or may not be the thing held in the field.

p.s. See #1044 for the discussion about VMs that store handles instead of object addresses in fields.

The problem

mmtk-core doesn't use the raw address of ObjectReference except for debug purposes. Almost all operations are done w.r.t. the "in-object address", including trace_object, is_reachable (via SFT), marking, checking VO bit (via side metadata), checking if an object is within a chunk/block, etc.

Meanwhile, ObjectReference is not always what's in a field, either. It is something defined by the VM binding, passed around in mmtk-core, but has no useful properties except being a constant offset from an "in-object address". The only reason for a VM binding to use an address outside an object as ObjectReference is "it is what's in a field, and we don't want to waste one subtraction for every field load". But that reason may not hold, either because if we don't do the subtraction when loading, we need one subtraction at every subsequent ObjectReference::to_address().

Proposal: Require ObjectReference to be inside an object

We can add one more requirement in addition to the alignment requirement: ObjectReference must be an address inside an object.

That merges the "in-object address" and ObjectReference.

The benefits are obvious:

  • We directly use the raw address of ObjectReference to access SFT and side metadata since it's guaranteed to be inside an object.
  • If a VO bit is set for an address, it will be the exact address for the ObjectReference. There is no confusion about the offset or alignment.
  • Removing a few constants and methods in ObjectModel and ObjectReference. The API will be much simpler.
  • Removing the cost of address computing at every ObjectReference::to_address.

Concretely, we remove ObjectReference::to_address, keeping the to_raw_address, to_header and to_object_start methods. When accessing SFT or side metadata, we simply use ObjectReference::to_raw_address because it will be guaranteed to be inside the object.

We remove the constant IN_OBJECT_ADDRESS_OFFSET and the methods ObjectReference::to_address and ObjectReference::from_address. Note that IN_OBJECT_ADDRESS_OFFSET is not required to be a multiple of word size. Currently, when we set a VO bit from ObjectReference, we may be setting VO bit at an unaligned address, and we need to use the alignment requirement of ObjectReference to infer the only possible raw address of ObjectReference given a VO bit. After removing IN_OBJECT_ADDRESS_OFFSET, we set VO bit exactly at ObjectReference::to_raw_address. It will be both inside the object and aligned. There will be no need to mess with the alignment requirements. If VO bit is set at address X, then ObjectReference::from_raw_address_unchecked(X) will be guaranteed to be a valid ObjectReference.

Potential risks

Performance

By unifying ObjectReference and "in-object address", mmtk-core will no longer call ObjectReference::to_address if there is an offset between the raw address and the "in-object address". This should potentially improve the performance. However, we then requires one subtraction at every Slot::load and an addition at Slot::store. In this sense, we merely moved the overhead from to_address to load and store. We need performance evaluation to see whether the cost increases or decreases after this change. Currently the only VM binding that has different ObjectReference and "in-object address" is JikesRVM. We'll need some test results from JikesRVM.

Engineering

By unifying ObjectReference and "in-object address", mmtk-core will have an easier time mapping a VO bit to its corresponding ObjectReference. But if the VM-level reference value is a pointer outside the object, and such a value can be held on the stack, the conservative stack scanner implemented by the VM will have to compute the "candidate of ObjectReference" by subtracting the value on the stack with a value before passing the "candidate" to memory_manager::is_mmtk_object. That means, if the VM binding doesn't implement the subtraction in ObjectModel::ref_to_address, it must implement it in the conservative stack scanner. That's also shifting the complexity from one place to another. Fortunately, JikesRVM doesn't use conservative stack scanning. If V8 uses conservative stack scanning, it will always have to mask the stack word for alignment due to #1159, regardless of this change.

@wks wks added C-enhancement Category: Enhancement G-safety Goal: Safety A-interface Area: Interface/API labels Jul 19, 2024
@wks wks mentioned this issue Jul 24, 2024
wks added a commit to mmtk/mmtk-jikesrvm that referenced this issue Sep 3, 2024
The main purpose of this PR is make a clear distinction between the
`ObjectReference` type in JikesRVM and the `ObjectReference` type in
mmtk-core.

This PR introduced `JikesObj`, a Rust type that represents the
JikesRVM-level `ObjectReference`. It needs an explicit conversion to
convert to/from the MMTk-level `ObjectReference` types.

The interface between mmtk-core and the mmtk-jikesrvm binding is
refactored to do fewer things with the MMTk-level `ObjectReference`.

- Trait methods that pass `ObjectReference` to the binding, notably the
methods in `ObjectModel`, now simply convert the MMTk-level
`ObjectReference` to `JikesObj`, and then call methods of `JikesObj`.
- Concrete methods for accessing object headers, fields, and layout
information are now implemented by `JikesObj` (and other wrapper types
including `TIB` and `RVMType`).
- The `JikesRVMSlot` trait now does the conversion between `JikesObj`
and the MMTk-level `ObjectReference` when loading or storing a slot.

This allows us to change the definition of the MMTk-level
`ObjectReference` in the future, while concrete methods of `JikesObj`
still use offset constants relative to the JikesRVM-level
`ObjectReference` which will not change.

The interface between the Rust part and the Java part of the binding are
refactored to involve `JikesObj` only.

- API functions in `api.rs` accept `JikesObj` parameters from JikesRVM
and return `JikeObj` to JikesRVM where JikesRVM uses the JikesRVM-level
`ObjectReference`.
- We wrap all JTOC calls into strongly-typed Rust functions, and make
the weakly-typed `jtoc_call!` macro private to the wrappers.

In this way, we ensure none of the API functions or JTOC calls leak the
MMTk-level `ObjectReference` values to JikesRVM, or accidentally
interpret a JikesRVM-level `ObjectReference` as an MMTk-level
`ObjectReference`.

We also do some obvious refactoring that makes the code more readable.:

- Encapsulated many field-loading statements in the form of `(addr +
XXXX_OFFSET)::load<T>()` into dedicated methods.
- Encapsulated the code for determining the overhead of hash fields into
a function `JikesObj::hashcode_overhead` and simplified many methods
that depend on that.
-   Renaming "edge" to "slot" in `RustScanThread.java`.

And obvious bug fixes:

- The call to `DO_REFERENCE_PROCESSING_HELPER_SCAN_METHOD_OFFSET` used
to erroneously interpret 0 as `true`. This has been fixed by relying on
the conversion trait.
- `scan_boot_image_sanity` used to declare an immutable array and let
unsafe `jtoc_call!` code modify it. The array is now defined as mutable.

Related issues and PRs:

- This PR is the 1st step of
#178
- It will ultimately allow mmtk/mmtk-core#1170
to be implemented.
wks added a commit to mmtk/mmtk-jikesrvm that referenced this issue Sep 3, 2024
This PR changes the definition of MMTk-level `ObjectReference` for the
JikesRVM binding so that it now points to the JavaHeader, and is
different from the JikesRVM-level `ObjectReference` (a.k.a. `JikesObj`).
This will guarantee that the MMTk-level ObjectReference is always inside
an object.

Note that this PR does not involve a change in mmtk-core. It changes
`ObjectModel::IN_OBJECT_ADDRESS_OFFSET` to 0 so that the "in-object
address" is identical to the raw address of `ObjectReference`. It
demonstrates the JikesRVM binding can work well with MMTk-level
`ObjectReference` being different from JikesRVM-level `ObjectReference`.

Related issues and PRs.
-   This PR is based on #177
- This PR is the 2nd step of
#178
- It will ultimately allow mmtk/mmtk-core#1170
to be implemented.
github-merge-queue bot pushed a commit that referenced this issue Sep 6, 2024
Require the raw address of `ObjectReference` to be within the address
range of the object it refers to. The raw address is now used directly
for side metadata access and SFT dispatching. This makes "in-object
address" unnecessary, and we removed the concept of "in-object address"
and related constants and methods.

Methods which use the "in-object address" for SFT dispatching or
side-metadata access used to have a `<VM: VMBinding>` type parameter.
This PR removes that type parameter.

Because `ObjectReference` is now both within an object an word-aligned,
the algorithm for searching for VO bits from internal pointers is
slightly simplified. The method `is_mmtk_object` now has undefined
behavior for arguments that are zero or misaligned because they are
obviously illegal addresses for `ObjectReference`, and the user should
have filtered them out in the first place.

Fixes: #1170
@wks wks closed this as completed in #1195 Sep 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-interface Area: Interface/API C-enhancement Category: Enhancement G-safety Goal: Safety
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant