Skip to content

Commit

Permalink
[cdac] Physical contract descriptor spec (dotnet#100365)
Browse files Browse the repository at this point in the history
Building on dotnet#100253 , describe an in-memory representation of the toplevel contract descriptor, comprisied of:
* some target architecture properties
* a data descriptor
* a collection of compatible contracts

Contributes to dotnet#99298 
Fixes dotnet#99299

---


* [cdac] Physical contract descriptor spec

* Add "contracts" to the data descriptor

*  one runtime per module

   if there are multiple hosted runtimes, diagnostic tooling should look in each loaded module to discover the contract descriptor

* Apply suggestions from code review

* Review feedback

   - put the aux data and descriptor sizes closer to the pointers

   - Don't include trailing nul `descriptor_size`.  Clarify it is counting bytes and that `descriptor` is in UTF-8

   - Simplify `DotNetRuntimeContractDescriptor` naming discussion

---------

Co-authored-by: Elinor Fung <elfung@microsoft.com>
  • Loading branch information
2 people authored and Ruihan-Yin committed May 30, 2024
1 parent 36a90a6 commit 62570e8
Show file tree
Hide file tree
Showing 3 changed files with 107 additions and 4 deletions.
100 changes: 100 additions & 0 deletions docs/design/datacontracts/contract-descriptor.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
# Contract Descriptor

## Summary

The [data contracts design](./datacontracts_design.md) is a mechanism that allows diagnostic tooling
to understand the behavior of certain .NET runtime subsystems and data structures. In a typical
scenario, a diagnostic tool such as a debugger may have access to a target .NET process (or a memory
dump of such a process) from which it may request to read and write certain regions of memory.

This document describes a mechanism by which a diagnostic tool may acquire the following information:
* some details about the target process' architecture
* a collection of types and their sizes and/or the offsets of certain fields within each type
* a collection of global values
* a collection of /algorithmic contracts/ that are satisfied by the target process

## Contract descriptor

The contract descriptor consists of the follow structure. All multi-byte values are in target architecture endianness.

```c
struct DotNetRuntimeContractDescriptor
{
uint64_t magic;
uint32_t flags;
uint32_t descriptor_size;
const char *descriptor;
uint32_t aux_data_count;
uint32_t pad0;
uintptr_t *aux_data;
};
```

The `magic` is `0x44_4e_43_43_44_41_43_00` ("DNCCDAC\0") stored using the target architecture
endianness. This is sufficient to discover the target architecture endianness by comparing the
value in memory to `0x44_4e_43_43_44_41_43_00` and to `0x00_43_41_44_43_43_4e_44`.

The following `flags` bits are defined:

| Bits 31-2 | Bit 1 | Bit 0 |
| --------- | ------- | ----- |
| Reserved | ptrSize | 1 |

If `ptrSize` is 0, the architecture is 64-bit. If it is 1, the architecture is 32-bit. The
reserved bits should be written as zero. Diagnostic tooling may ignore non-zero reserved bits.

The `descriptor` is a pointer to a UTF-8 JSON string described in [data descriptor physical layout](./data_descriptor.md#Physical_JSON_descriptor). The total number of bytes is given by `descriptor_size`.

The auxiliary data for the JSON descriptor is stored at the location `aux_data` in `aux_data_count` pointer-sized slots.

### Architecture properties

Although `DotNetRuntimeContractDescriptor` contains enough information to discover the target
architecture endianness pointer size, it is expected that in all scenarios diagnostic tooling will
already have this information available through other channels. Diagnostic tools may use the
information derived from `DotNetRuntimeContractDescriptor` for validation.

### Compatible contracts

The `descriptor` is a JSON dictionary that is used for storing the [in-memory data descriptor](./data_descriptor.md#Physical_JSON_Descriptor)
and the [compatible contracts](./datacontracts_design.md#Compatible_Contract).

The compatible contracts are stored in the top-level key `"contracts"`. The value will be a
dictionary that contains each contract name as a key. Each value is the version of the contract as
a JSON integer constant.

**Contract example**:

``` jsonc
{"Thread":1,"GCHandle":1,...}
```

**Complete in-memory data descriptor example**:

``` jsonc
{
"version": "0",
"baseline": "example-64",
"types":
{
"Thread": { "ThreadId": 32, "ThreadState": 0, "Next": 128 },
"ThreadStore": { "ThreadCount": 32, "ThreadList": 8 }
},
"globals":
{
"FEATURE_COMINTEROP": 0,
"s_pThreadStore": [ 0 ] // indirect from aux data offset 0
},
"contracts": {"Thread": 1, "GCHandle": 1, "ThreadStore": 1}
}
```

## Contract symbol

To aid in discovery, the contract descriptor should be exported by the module hosting the .NET
runtime with the name `DotNetRuntimeContractDescriptor` using the C symbol naming conventions of the
target platform.

In scenarios where multiple .NET runtimes may be present in a single process, diagnostic tooling
should look for the symbol in each loaded module to discover all the runtimes.

9 changes: 6 additions & 3 deletions docs/design/datacontracts/data_descriptor.md
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,10 @@ The toplevel dictionary will contain:
* `"types": TYPES_DESCRIPTOR` see below
* `"globals": GLOBALS_DESCRIPTOR` see below

Additional toplevel keys may be present. For example, the in-memory data descriptor will contain a
`"contracts"` key (see [contract descriptor](./contract_descriptor.md#Compatible_contracts)) for the
set of compatible contracts.

### Baseline data descriptor identifier

The in-memory descriptor may contain an optional string identifying a well-known baseline
Expand Down Expand Up @@ -243,9 +247,8 @@ Rationale: This allows tooling to generate the in-memory data descriptor as a si
string. For pointers, the address can be stored at a known offset in an in-proc
array of pointers and the offset written into the constant JSON string.

The indirection array is not part of the data descriptor spec. It is expected that the data
contract descriptor will include it. (The data contract descriptor must contain: the data
descriptor, the set of compatible algorithmic contracts, the aux array of globals).
The indirection array is not part of the data descriptor spec. It is part of the [contract
descriptor](./contract_descriptor.md#Contract_descriptor).



Expand Down
2 changes: 1 addition & 1 deletion docs/design/datacontracts/datacontracts_design.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Diagnostic data contract addressed these challenges by eliminating the need for
Data contracts represent the manner in which a tool which is not the runtime can reliably understand and observe the behavior of the runtime. Contracts are defined by their documentation, and the runtime describes what contracts are applicable to understanding that runtime.

## Data Contract Descriptor
The physical layout of this data is not defined in this document, but its practical effects are.
The physical layout of this data is defined in [the contract descriptor](./contract_descriptor.md) doc. Its practical effects are discussed here.

The Data Contract Descriptor has a set of records of the following forms.

Expand Down

0 comments on commit 62570e8

Please sign in to comment.