-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Require ObjectReference to be inside an object #1170
Labels
Comments
wks
added
C-enhancement
Category: Enhancement
G-safety
Goal: Safety
A-interface
Area: Interface/API
labels
Jul 19, 2024
Merged
This was referenced Aug 28, 2024
wks
added a commit
to mmtk/mmtk-jikesrvm
that referenced
this issue
Sep 3, 2024
The main purpose of this PR is make a clear distinction between the `ObjectReference` type in JikesRVM and the `ObjectReference` type in mmtk-core. This PR introduced `JikesObj`, a Rust type that represents the JikesRVM-level `ObjectReference`. It needs an explicit conversion to convert to/from the MMTk-level `ObjectReference` types. The interface between mmtk-core and the mmtk-jikesrvm binding is refactored to do fewer things with the MMTk-level `ObjectReference`. - Trait methods that pass `ObjectReference` to the binding, notably the methods in `ObjectModel`, now simply convert the MMTk-level `ObjectReference` to `JikesObj`, and then call methods of `JikesObj`. - Concrete methods for accessing object headers, fields, and layout information are now implemented by `JikesObj` (and other wrapper types including `TIB` and `RVMType`). - The `JikesRVMSlot` trait now does the conversion between `JikesObj` and the MMTk-level `ObjectReference` when loading or storing a slot. This allows us to change the definition of the MMTk-level `ObjectReference` in the future, while concrete methods of `JikesObj` still use offset constants relative to the JikesRVM-level `ObjectReference` which will not change. The interface between the Rust part and the Java part of the binding are refactored to involve `JikesObj` only. - API functions in `api.rs` accept `JikesObj` parameters from JikesRVM and return `JikeObj` to JikesRVM where JikesRVM uses the JikesRVM-level `ObjectReference`. - We wrap all JTOC calls into strongly-typed Rust functions, and make the weakly-typed `jtoc_call!` macro private to the wrappers. In this way, we ensure none of the API functions or JTOC calls leak the MMTk-level `ObjectReference` values to JikesRVM, or accidentally interpret a JikesRVM-level `ObjectReference` as an MMTk-level `ObjectReference`. We also do some obvious refactoring that makes the code more readable.: - Encapsulated many field-loading statements in the form of `(addr + XXXX_OFFSET)::load<T>()` into dedicated methods. - Encapsulated the code for determining the overhead of hash fields into a function `JikesObj::hashcode_overhead` and simplified many methods that depend on that. - Renaming "edge" to "slot" in `RustScanThread.java`. And obvious bug fixes: - The call to `DO_REFERENCE_PROCESSING_HELPER_SCAN_METHOD_OFFSET` used to erroneously interpret 0 as `true`. This has been fixed by relying on the conversion trait. - `scan_boot_image_sanity` used to declare an immutable array and let unsafe `jtoc_call!` code modify it. The array is now defined as mutable. Related issues and PRs: - This PR is the 1st step of #178 - It will ultimately allow mmtk/mmtk-core#1170 to be implemented.
wks
added a commit
to mmtk/mmtk-jikesrvm
that referenced
this issue
Sep 3, 2024
This PR changes the definition of MMTk-level `ObjectReference` for the JikesRVM binding so that it now points to the JavaHeader, and is different from the JikesRVM-level `ObjectReference` (a.k.a. `JikesObj`). This will guarantee that the MMTk-level ObjectReference is always inside an object. Note that this PR does not involve a change in mmtk-core. It changes `ObjectModel::IN_OBJECT_ADDRESS_OFFSET` to 0 so that the "in-object address" is identical to the raw address of `ObjectReference`. It demonstrates the JikesRVM binding can work well with MMTk-level `ObjectReference` being different from JikesRVM-level `ObjectReference`. Related issues and PRs. - This PR is based on #177 - This PR is the 2nd step of #178 - It will ultimately allow mmtk/mmtk-core#1170 to be implemented.
github-merge-queue bot
pushed a commit
that referenced
this issue
Sep 6, 2024
Require the raw address of `ObjectReference` to be within the address range of the object it refers to. The raw address is now used directly for side metadata access and SFT dispatching. This makes "in-object address" unnecessary, and we removed the concept of "in-object address" and related constants and methods. Methods which use the "in-object address" for SFT dispatching or side-metadata access used to have a `<VM: VMBinding>` type parameter. This PR removes that type parameter. Because `ObjectReference` is now both within an object an word-aligned, the algorithm for searching for VO bits from internal pointers is slightly simplified. The method `is_mmtk_object` now has undefined behavior for arguments that are zero or misaligned because they are obviously illegal addresses for `ObjectReference`, and the user should have filtered them out in the first place. Fixes: #1170
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Status quo
Currently, MMTk defines several addresses of an object.
memory_manager::alloc
The definition of
ObjectReference
is VM-specific. We currently allowObjectReference
to be outside an object because some VMs do so. For example, in JikesRVM, anObjectReference
is defined as the address to the array payload of an object if the object is an array. That saves one offset computation for array element access, but when accessing scalar object fields or object headers, the VM will have to use negative offset from theObjectReference
. When we port MMTk from JikesRVM to Rust, we inherited this type.ObjectReference
is now the standard way for mmtk-core to refer to an object. We still allowObjectReference
to be outside an object so that when loading from a field in JikesRVM, we directly use the word stored in the field asObjectReference
.However, because we only map side metadata memory for pages within spaces, addresses outside any space (or unmapped pages) may not have mapped metadata. Similar is true for SFT entries which are allocated by chunk. If we attempt to access metadata or SFT using an address outside the object, it will be a segmentation fault. To solve this problem, we require the VM binding to implement
ObjectReference::ref_to_address
which computes the "in-object address" of an object which must be inside the object. (#699)Meanwhile, VMs that use conservative stack scanning needs to read a word from the stack, compute the "in-object address" from it, and see if the VO bit is set at the "in-object address". Because we don't know if a word on the stack is an actual
ObjectReference
or not, the offset from theObjectReference
to the "in-object address" must be a constant (i.e. can be computed without reading any data from the object body). (Also in #699)Meanwhile, not all VMs can use "the word stored in the field" as
ObjectReference
. In some VMs, the thing in a field may be a compressed pointer (OpenJDK), a tagged pointer (V8), an offsetted pointer (Julia), or an indirect handle (Guile or some old version of Hotspot JVM). We solve this problem by letting the VM binding implement theSlot
trait and customize theload
andstore
method so that we always represent a word-sized pointer-basedObjectReference
to mmtk-core. (#606)Then we implemented an algorithm for finding the last VO bit from an interior pointer. If neither the
ObjectReference
nor the "in-object address" is required to be word-aligned, the algorithm will not be able to return an exactObjectReference
, but only an address range where one of the addresses is a validObjectReference
. That's confusing and inefficient. Now we require thatObjectReference
must be word-aligned, while the "in-object address" has no alignment requirements. This makesObjectReference
more likely not to be what's held in an object field because the VM may use the low bits as tags (V8), making the value misaligned. But this is not a problem because the VM binding can fix the alignment inSlot::load
andSlot::store
. (#1159)In conclusion, an
ObjectReference
as required by the current mmtk-coreObjectReference
from interior pointer, andp.s. See #1044 for the discussion about VMs that store handles instead of object addresses in fields.
The problem
mmtk-core doesn't use the raw address of
ObjectReference
except for debug purposes. Almost all operations are done w.r.t. the "in-object address", includingtrace_object
,is_reachable
(via SFT), marking, checking VO bit (via side metadata), checking if an object is within a chunk/block, etc.Meanwhile,
ObjectReference
is not always what's in a field, either. It is something defined by the VM binding, passed around in mmtk-core, but has no useful properties except being a constant offset from an "in-object address". The only reason for a VM binding to use an address outside an object asObjectReference
is "it is what's in a field, and we don't want to waste one subtraction for every field load". But that reason may not hold, either because if we don't do the subtraction when loading, we need one subtraction at every subsequentObjectReference::to_address()
.Proposal: Require ObjectReference to be inside an object
We can add one more requirement in addition to the alignment requirement:
ObjectReference
must be an address inside an object.That merges the "in-object address" and
ObjectReference
.The benefits are obvious:
ObjectReference
to access SFT and side metadata since it's guaranteed to be inside an object.ObjectReference
. There is no confusion about the offset or alignment.ObjectModel
andObjectReference
. The API will be much simpler.ObjectReference::to_address
.Concretely, we remove
ObjectReference::to_address
, keeping theto_raw_address
,to_header
andto_object_start
methods. When accessing SFT or side metadata, we simply useObjectReference::to_raw_address
because it will be guaranteed to be inside the object.We remove the constant
IN_OBJECT_ADDRESS_OFFSET
and the methodsObjectReference::to_address
andObjectReference::from_address
. Note thatIN_OBJECT_ADDRESS_OFFSET
is not required to be a multiple of word size. Currently, when we set a VO bit fromObjectReference
, we may be setting VO bit at an unaligned address, and we need to use the alignment requirement ofObjectReference
to infer the only possible raw address ofObjectReference
given a VO bit. After removingIN_OBJECT_ADDRESS_OFFSET
, we set VO bit exactly atObjectReference::to_raw_address
. It will be both inside the object and aligned. There will be no need to mess with the alignment requirements. If VO bit is set at addressX
, thenObjectReference::from_raw_address_unchecked(X)
will be guaranteed to be a validObjectReference
.Potential risks
Performance
By unifying
ObjectReference
and "in-object address", mmtk-core will no longer callObjectReference::to_address
if there is an offset between the raw address and the "in-object address". This should potentially improve the performance. However, we then requires one subtraction at everySlot::load
and an addition atSlot::store
. In this sense, we merely moved the overhead fromto_address
toload
andstore
. We need performance evaluation to see whether the cost increases or decreases after this change. Currently the only VM binding that has differentObjectReference
and "in-object address" is JikesRVM. We'll need some test results from JikesRVM.Engineering
By unifying
ObjectReference
and "in-object address", mmtk-core will have an easier time mapping a VO bit to its correspondingObjectReference
. But if the VM-level reference value is a pointer outside the object, and such a value can be held on the stack, the conservative stack scanner implemented by the VM will have to compute the "candidate ofObjectReference
" by subtracting the value on the stack with a value before passing the "candidate" tomemory_manager::is_mmtk_object
. That means, if the VM binding doesn't implement the subtraction inObjectModel::ref_to_address
, it must implement it in the conservative stack scanner. That's also shifting the complexity from one place to another. Fortunately, JikesRVM doesn't use conservative stack scanning. If V8 uses conservative stack scanning, it will always have to mask the stack word for alignment due to #1159, regardless of this change.The text was updated successfully, but these errors were encountered: