Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial spec for the meaning and relationships between data contracts #99936

Merged
merged 7 commits into from
Mar 23, 2024

Conversation

davidwrighton
Copy link
Member

No description provided.

Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-meta
See info in area-owners.md if you want to be subscribed.


Data contracts represent the manner in which a tool which is not the runtime can reliably understand and observe the behavior of the runtime. Contracts are defined by their documentation, and the runtime describes what contracts are applicable to understanding that runtime.

## Data Contract Stream
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned offline, I find it confusing to call it a stream. I would call it "Contract Descriptor" or something like that.

Ideally, the contract descriptor would be physically represented as a constant memory block.

docs/design/coreclr/botr/data_contracts_for_coreclr.md Outdated Show resolved Hide resolved
docs/design/coreclr/botr/data_contracts_for_coreclr.md Outdated Show resolved Hide resolved
docs/design/coreclr/botr/data_contracts_for_coreclr.md Outdated Show resolved Hide resolved
Data contracts represent the manner in which a tool which is not the runtime can reliably understand and observe the behavior of the runtime. Contracts are defined by their documentation, and the runtime describes what contracts are applicable to understanding that runtime.

## Data Contract Stream
The physical layout of this stream is not defined in this document, but its practical effects are.
Copy link
Member

@MichalStrehovsky MichalStrehovsky Mar 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we design the physical layout, it would be good to design them in a way that native AOT can statically lay this out into the executable at compile/link time and we don't need to build this at startup.

My specific area of concern is various addresses that are known when building the C/C++ code in Runtime.lib, and various offsets/addresses that are known when building native code in ILC. Ideally we'd design the contracts in a way that it's possible to consume the contracts like this (either through link.exe/ld merging various parts of the contract into a single data blob, or through the contract design that allows various parts to come from non-contiguous blocks of memory, if this is not possible to merge during linking).

AFAIK today the information that comes from ILC is not part of the contract (various offsets of things related to async) but SOS reads it from the PDB, making assumptions about the name mangling the compiler is doing. I assume those are moving to contracts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a prototype of contracts I made back in 2016. Its basically just a header structure that has pointers to other global structures. The data in all these structures should be known or computable at compile-time so perhaps something close to that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, something like that. But with specific attention to the detail that some things are only known in the obj files that comprise the runtime (e.g. heap ranges), but other things are known in obj files that come from ILC (e.g. offset of various fields on Task or async state machines, or whatever managed types we care about). Whether we could design the physical layout in a way that the data can come from two object files that are linked together with a platform linker and be usable as-is (ideally without any fixing up during startup).


The data contract stream has a set of records of the following forms.

### Global Values
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do global values relate to contracts? I'd propose every global value is owned by a contract which would provide a natural way to document them, group them, and version them.

Each compatible contract is described by a unicode string naming the contract, and a uint32 version. It is an ERROR if multiple versions of a contract are specified in the stream.

### Data Structure Layout
Each data structure layout has a unicode name for the type, followed by a list of fields. These fields can be of primitive types (int8, uint8, int16, uint16, int32, uint32, int64, uint64, pointer) or of another named data structure type. Each field descriptor provides the offset of the field, the name of the field, and the type of the field.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do data structure layouts relate to contracts? Similar to globals I'd propose every piece of layout information is owned by some contract which provides the documentation, versioning, and grouping.

Each data structure layout has a unicode name for the type, followed by a list of fields. These fields can be of primitive types (int8, uint8, int16, uint16, int32, uint32, int64, uint64, pointer) or of another named data structure type. Each field descriptor provides the offset of the field, the name of the field, and the type of the field.

## Versioning of contracts
Contracts are described an integer version number. A higher version number is not more recent, it just means different. In order to avoid conflicts, all contracts should be documented in the main branch of the dotnet repository with a version number which does not conflict with any other. It is expected that every version of every contract describes the same functionality/data layout/set of global values.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect we are going to want some notion of breaking and non-breaking changes. I think major/minor is a simple way to do that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get a concrete example of how this would work? I don't see the need for a minor version number.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I partly agree. I think it's important that a data contract reader is able to reliably tell when it is talking to a runtime that is newer than itself and has a reasonable behavior (either detect that it is out of date or fall back to some compatible behavior).

Mono debugger-libs takes the "compatible behavior" route. the protocol is major/minor versioned and each operation does an "target version is at least X.Y" check when it needs to decide whether to use a newer or older version of an algorithm (or if it needs to detect that the wire data has added fields).


one approach might be to say that the data contract reader knows every contract+version that it is expected to support. if the descriptor includes a contract+version that isn't supported, the reader should abort.

additionally each contract spec should include a list of "subsumed" versions. So contract MethodTable,5 can say it subsumes versions 4,3,1. Whereas MethodTable,2 is unrelated to 5 (either because it is a breaking change or because it's for a different runtime). So a reader implementation for MethodTable,5 can also understand runtimes that expose MethodTable,4 or MethodTable,3. Whereas presumably MethodTable,2 needs an entirely separate implementation.

Copy link
Member

@jkotas jkotas Mar 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect that it is not going to be unusual to amend existing contracts in non-breaking way. It is important to be able to do this in servicing where you do not want to break existing consumers, but you want consumers aware of the amendment to be able to detect it and take advantage of it.

In the scheme with major versions only, you can achieve this by exposing the amendment of MyContract as a new MyContractEx.

In the scheme with major.minor versions, you can achieve this by incrementing the minor version of MyContract and include the amendment in MyContract.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That also is acheived by optional fields and the processing of the data understanding when data is optional vs when it is a required. This notion is captured in data being able to describe itself and the fields it provides. Contract version X indicates fields a and b. If more data is needed, then it can add field c. The processing only knows about a and b and so the new optional field is a non-breaking change. When version X+1 is created, field c can become required or remain optional. I think how one writes the algorithm should dictate the breaking changes, not the data itself.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For algorithmic contracts, you can achieve this by introducing new key/value pair that indicate the newer version.

I'm still dubious of the independent versioning of algorithmic contracts. My biggest concern here is the idea that that design creates an inherent maintenance issue. The issue being we have versioned data and versioned algorithms, so how do they map? If I update Data version X, do I need to update Algorithm version Y? Answering that question should not require a lot of system understanding. It should be defined by the algorithm in some explicit manner.

One could imagine that at reader start-up, or lazily, an audit of the data stream is performed - see the Fall prototype. Each algorithm is then asked to see what is supports and queries all data contracts provided by the target debuggee. If the algorithm see the versions of data contracts that satisfy it, then the algorithm can do its job, otherwise it can't. This reduces any congnitive load for how to version anything other than algorithms as they are versioned based on the data they consume.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue being we have versioned data and versioned algorithms, so how do they map? If I update Data version X, do I need to update Algorithm version Y?

Data structure layouts do not have versions in the current proposals. You use Algorithm version Y with whatever data structure layout is provided.

I general, I agree that the data and algorithms that interpret the data are frequently coupled together, and it is not obvious why they are treated as two different concepts. @noahfalk commented on it as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That also is acheived by optional fields and the processing of the data understanding when data is optional vs when it is a required.

I agree that you can do non-breaking changes via optional additive data elsewhere and if we want to define that this is the way all non-breaking changes will be introduced I'm fine with it. My high order bit was that the design should anticipate making non-breaking changes and describe a mechanism we will use to support it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I would codify the notion that algorithms indicate what is and isn't optional. This almost requires some "audit" concept so the reader can know a priori if some action is going to be successful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The detail I was getting at with separating layouts from the algorithmic contracts is that while some data layouts are implicit in the algorithms, many of them are just working with fields in some set of structures, and on various architectures/os/whatever, the actual layouts of fields in memory can vary a fair bit. Thus the algorithms can be written on top of a fields concept, and the data structure layout describes how to map those fields to actual memory.

Now, some algorithms also rigidly describe the associated data layouts. In those cases there would likely be no reason to have a data structure layout. For instance, the layout of the unwind data for x64 windows unwinding doesn't need a separate layout, the exact layout is well known and part of the algorithm.

@@ -0,0 +1,430 @@
# Data Contracts for CoreCLR, and the data contract stream

Data contracts represent the manner in which a tool which is not the runtime can reliably understand and observe the behavior of the runtime. Contracts are defined by their documentation, and the runtime describes what contracts are applicable to understanding that runtime.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc feels like it shifts back and forth between describing three different things and it wasn't always clear to me which one is being referred to:

  1. A mental model for how people should think about contracts
  2. Specific data about contracts that will be encoded in some form within binaries and process memory
  3. Recommendations for how data contracts will be documented in the runtime repo

I think you could either try to clarify which text is referring to which thing, or you could do one small example of adding a contract to the runtime + its associated documentation. I also have a past example which goes a good bit farther than what I am suggesting here. I'm thinking a few hundred lines total for code edit + docs and don't bother writing any of the code that would parse it.

All global values have a unicode string describing their name, and a value of one of the above types.

### Compatible Contract
Each compatible contract is described by a unicode string naming the contract, and a uint32 version. It is an ERROR if multiple versions of a contract are specified in the stream.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we say the physical encoding isn't specified (yet), I'm not sure how much latitude you are attempting to leave open for later discussion. For example if I proposed the encoding consists of an ordered sequence of contracts where the name of the contract is implied by its index in the sequence, would that fit within this definition?

(Assuming there aren't that many contracts I don't actually care whether the name is there in the data explicitly or not, but if the number of contracts goes up the details start mattering more)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced we need human readable names in the binary runtime artifact. (In the source code and in the docs, sure). I think we should just have a blob of GUIDs where each "name,version" is a new GUID.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced we need human readable names in the binary runtime artifact.

I think I disagree. The binary artifact should be self-contained and as such have human readable mappings. This can be optimized in some ways. See the Fall prototype for how an ordinal to string mapping was created and then ordinals used in all other places. Being able to intelligently debug this code is going to be paramount. I could see an argument for having an debugger extension with this mapping, but I would like to see the mechanics of that. It is something to explore.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I disagree. The binary artifact should be self-contained and as such have human readable mappings. This can be optimized in some ways. See the Fall prototype for how an ordinal to string mapping was created and then ordinals used in all other places. Being able to intelligently debug this code is going to be paramount.

If you're debugging the data contract reader, you have a mapping from GUIDs to names in the reader. It might make sense for the reader to immediately convert the GUIDs to human-readable names as soon as it connects to a target process. That doesn't mean the physical representation should serialize human-readable names.

I could see an argument for having an debugger extension with this mapping, but I would like to see the mechanics of that. It is something to explore.

you wouldn't need any extension - by the time you're debugging the reader, you can work in terms of names.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're debugging the data contract reader, you have a mapping from GUIDs to names in the reader.

The reader then becomes imbued with the totality of history instead of just working on the provided data. Note how the Fall prototype consumed all the human readable details from the current runtime and could ignore anything it didn't know about.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reader then becomes imbued with the totality of history instead of just working on the provided data.

yes, of course. My understanding was that the data contract reader proposal was that the reader is capable of working with any version going back to the first release of the data contract work.

Note how the Fall prototype consumed all the human readable details from the current runtime and could ignore anything it didn't know about.

I think that's fine for data but not for algorithms.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have preference for human readable names as well. They make things easy to debug and diagnose. The GUIDs are not that much more compact compared to human readable names.

If we push it to the limit, we can even use JSON to represent most of the data contract descriptor. (The JSON would be embedded as a read-only data blob in the executable.)

Copy link
Member

@noahfalk noahfalk Mar 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as we agree that we don't expect that many contracts then I don't think we need to be overly efficient with the encoding. Two potential encodings that I believe would be easy to work with that aren't that large:

  1. Define an ordinal for each contract and put the name into the header structure definition as the field name. IMO this is the simplest approach that doesn't have much overhead unless we start creating and deprecating large numbers of contracts:
struct DotNetDebugHeader
{
    // cookie and flags fields omitted
    int CountOfContracts; // Increase this number when we append new contracts
    void* MethodTableContract;
    void* GCContract;
    void* ThreadContract;
    ...
}
  1. If we expect it is frequent to deprecate or omit entire contracts then make an array of them and assign each one a modest fixed size name.
struct DotNetDebugHeader
{
    // cookie and flags fields omitted
    Contract* Contracts;
    int CountOfContracts;
    ...
}

struct Contract
{
    char[12] Name;  //UTF8 null terminated friendly name
                    //Either we pick short names for contracts or we associate the occasional 
                    //long name with a shorter abbreviation.
    int Version;
    // whatever other per-contract data goes here
}

docs/design/coreclr/botr/data_contracts_for_coreclr.md Outdated Show resolved Hide resolved
docs/design/coreclr/botr/data_contracts_for_coreclr.md Outdated Show resolved Hide resolved
### Global Value Contracts
For global value contracts they shall be stored in `docs/design/datacontracts/globals/<contract_name>/<version>`

The name of each contract shall be `docs/design/datacontracts/globals/<contract_name>/<contract_name>_<version>.md`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its going to be easier if we document related sets of globals, type layouts, and algorithms together rather than separating them across different files. The main advantage I could see for separate files is if we were doing some kind of machine generation or machine consumption but I don't think you are suggesting that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually would like to see machine generation/handling of the data structure stuff, so that the runtime level source code actually only works with offsets, and yet the stream could output only the more optimized type version data. As I said in another comment, I think this deserves a bit of experimentation to see how gnarly it is.

| Field Name | Type | Offset |
| --- | --- | --- |
| FirstField | Int32 | 0 |
| SecondField | Int8 | 4 |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is telling me that SecondField is always at offset 4, and in the future if it ever isn't at offset 4 then the contract version will change? I assume the doc says something else if the field offset is supposed to be read from memory somewhere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. That is why you can have a data structure layout contract OR a full type description just embedded in the contract description.


Algorithmic contracts these describe how a single contract works.

For these we will have 2 different documents. The first document shall describe the api to the contract, and the second shall describe the actual implementations.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the signature of the implementation just imply interface and it all goes in one file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general consensus appears to be that a single file is best. I'll update that tomorrow.

Copy link
Member

@lambdageek lambdageek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that this is relatively concrete.

I have a feeling we will want parametric contracts (after all: computation is composition+abstraction. we have composition contracts but not abstraction contracts). but I think that's a bridge we can cross once we have some practice.


I think it's fine to say "versions don't imply anything" but I suspect in practice we may want to document when particular versions of a contract represent an evolution and when they represent a completely algorithm for the same concept.


I think in addition to (untyped) "pointer" we also need "pointer sized integer" primitive type.

@@ -0,0 +1,430 @@
# Data Contracts for CoreCLR, and the data contract stream
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How many contracts (order of magnitude) do we anticipate existing? I'd propose it is in the range 5-20.

I would guess on the order of 500 per runtime per release (with perhasp 90% of them shared per platform).

But with composition contracts probably the actual descriptor would include around 20-50.

All global values have a unicode string describing their name, and a value of one of the above types.

### Compatible Contract
Each compatible contract is described by a unicode string naming the contract, and a uint32 version. It is an ERROR if multiple versions of a contract are specified in the stream.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced we need human readable names in the binary runtime artifact. (In the source code and in the docs, sure). I think we should just have a blob of GUIDs where each "name,version" is a new GUID.

Each data structure layout has a unicode name for the type, followed by a list of fields. These fields can be of primitive types (int8, uint8, int16, uint16, int32, uint32, int64, uint64, pointer) or of another named data structure type. Each field descriptor provides the offset of the field, the name of the field, and the type of the field.

## Versioning of contracts
Contracts are described an integer version number. A higher version number is not more recent, it just means different. In order to avoid conflicts, all contracts should be documented in the main branch of the dotnet repository with a version number which does not conflict with any other. It is expected that every version of every contract describes the same functionality/data layout/set of global values.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I partly agree. I think it's important that a data contract reader is able to reliably tell when it is talking to a runtime that is newer than itself and has a reasonable behavior (either detect that it is out of date or fall back to some compatible behavior).

Mono debugger-libs takes the "compatible behavior" route. the protocol is major/minor versioned and each operation does an "target version is at least X.Y" check when it needs to decide whether to use a newer or older version of an algorithm (or if it needs to detect that the wire data has added fields).


one approach might be to say that the data contract reader knows every contract+version that it is expected to support. if the descriptor includes a contract+version that isn't supported, the reader should abort.

additionally each contract spec should include a list of "subsumed" versions. So contract MethodTable,5 can say it subsumes versions 4,3,1. Whereas MethodTable,2 is unrelated to 5 (either because it is a breaking change or because it's for a different runtime). So a reader implementation for MethodTable,5 can also understand runtimes that expose MethodTable,4 or MethodTable,3. Whereas presumably MethodTable,2 needs an entirely separate implementation.

There are 3 different types of contracts each representing a different phase of execution of the data contract system.

### Composition contracts
These contracts indicate the version numbers of other contracts. This is done to reduce the size of contract list needed in the runtime data stream. In general it is intended that as a runtime nears shipping, the product team can gather up all of the current versions of the contracts into a single magic value, which can be used to initialize most of the contract versions of the data contract system. A specific version number in the contract data stream for a given contract will override any composition contracts specified in the contract data stream. If there are multiple composition contracts in a stream which specify the same contract to have a different version, the first composition contract linearly in the runtime data stream wins. This is intended to allow for a composite contract for the architecture/os indepedent work, and a separate composite contract for the non independent work. If a contract is specified explicitly in the Data Contract Stream and a different version is specified via the composition contract mechanism, the explicitly specified contract takes precedence.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯 I think it's important to have composition contracts as a component of the descriptor.

Copy link
Member

@AaronRobinsonMSFT AaronRobinsonMSFT Mar 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general it is intended that as a runtime nears shipping, the product team can gather up all of the current versions of the contracts into a single magic value

This represents a non-trivial engineering cost. It will also create a great deal of confusion as the product evolves. Any version number reduction methodology is going to be very lossy and require a detailed process for when to or not reduce version numbers. I don't think this is something that will age well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand how we intend to determine what (other contracts) should go into each composition contract. Are they intended to be logical groupings?

Copy link
Member

@noahfalk noahfalk Mar 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference right now is that there would be relatively few versioned contracts, each of which represents all the data structures and algorithms necessary to do a swath of related diagnostic functionality. For example contracts might be:

  • Enumerate all threads and detail information about each
  • Enumerate all loaded assemblies and provide details about each
  • Describe the state of the GC heap and enumerate all objects on it

Right now I don't see the value in finer grained contracts but I do see costs - such as needing the complexity of managing composition contracts. I'd prefer a model where composition contracts aren't needed.

Are they intended to be logical groupings?

I don't know if David envisioned the same thing, but yes I think of them as logical groupings.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logical groupings are what I see as sensible. My sense of taste wants to have more contracts than fewer, but there is a cost to that which folks keep pointing out. My preference would be to do fine grained contracts but really only use the composition contracts to keep things sane, but I see the general agreement is that that is really complex to manage over time, so I think we can also just do fewer big contracts.

These contracts represent data which is entirely determined by the contract version + contract name. There are 2 subtypes of this form of contract.

#### Global Value Contract
A global value contract specifies numbers which can be referred to by other contracts. If a global value is specified directly in the Contract Data Stream, then the global value defintion in the Contract Data Stream takes precedence. The intention is that these global variable contracts represent magic numbers and values which are useful for the operation of algorithmic contracts. For instance, we will likely have a `TargetPointerSize` global value represented via a contract, and things like `FEATURE_SUPPORTS_COM` can also be a global value contract, with a value of 1.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'm not sure TargetPointerSize is a great example. Or at least it's a special example because I think that is a contract whose information is probably derived from the descriptor header or even /a priori/ known to the data contract reader when it connects to a target - it's hard to do any truly portable parsing without knowing the pointer size.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the 2 cases I found for this are basically TargetPointerSize and feature compilation flags. Not sure if that's worth a whole mechanism, or if we could encode stuff in a bitfield somewhere.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it may be a little arbitrary what gets defined as globals vs what are defined as parameterized constants attached to contracts. Ignoring how we categorize them though I'd also expect we'll may want to encode values specified in C++ constants and C++ enums. For example there has been some shuffling of MethodTable flag values in recent versions so we might want a way to encode those values in case there are any further changes in the meaning we assign to different bit positions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'll may want to encode values specified in C++ constants and C++ enums.

Agreed. One thing that Mono does in a bunch of places is to conditionally define enums based on compile-time runtime feature flags. For example the mini IR has platform specific opcodes that are only defined on that platform. (we will not need to look at IR opcodes, but there are other enums with conditional definitions).

I think that sort of situation can be captured by global value contracts.

Comment on lines 85 to 88
interface IContract
{
string Name { get; }
uint Version { get; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as I mentioned above, I think this should also include IReadOnlyList<uint> Subsumes { get; } to list the older versions of a contract that a reader for a newer contract can understand.

docs/design/coreclr/botr/data_contracts_for_coreclr.md Outdated Show resolved Hide resolved
docs/design/coreclr/botr/data_contracts_for_coreclr.md Outdated Show resolved Hide resolved
Comment on lines 294 to 295
| FirstField | Int32 | 0 |
| SecondField | Int8 | 4 |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what the "offset" here refers to. The offsets are something we will discover when a reader begins to inspect a particular target. What possible offset value can we write down if the data structure has native pointers or native pointer-sized values?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thought was that different bitnesses would have different versions. My hope was that these data struct layout contracts would not be directly referenced in source code of the runtime ever, but chosen via some sort of macro/constexpr/something horror that was aware of the possible options. Perhaps I should prototype that, and see how bad of an idea it is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thought was that different bitnesses would have different versions. My hope was that these data struct layout contracts would not be directly referenced in source code of the runtime ever, but chosen via some sort of macro/constexpr/something horror that was aware of the possible options. Perhaps I should prototype that, and see how bad of an idea it is.

I strongly feel that having multiple sources of truth is a bad idea. The structure offsets should be queried from the target runtime. They should not be stored as artifacts in the repo.

Moreover, I think this creates challenges for being able to support custom builds / community builds with a single reader - if they need to open a PR to main to check in a new version of a data structure layout contract, that means the community port isn't really usable with an extant reader. (Or possibly I don't understand how contract versioning will work in this case)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think defining default offsets in documentation and providing a standard mechanism for contracts to encode an alternate value explicitly would be a nice tradeoff. Encoding every field offset in memory could be quite sizable and most of the offsets never change, but not having any way to adjust them without a version bump would be pretty restrictive.

Copy link
Member

@jkotas jkotas Mar 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree.

This also means that we need to have an automated way to compare the actual offsets with the baseline and only include the ones that need overriding in the runtime contract descriptor. Doing this manually would be very error prone and labor intensive.

Copy link
Member

@jkotas jkotas Mar 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a cost difference between unused bytes in the binary and bytes that are read/written during startup and sent as part of minidump payload. The cost of a byte that is read/written during startup and sent as part of minidump payload is much higher than the cost of a byte that just sits unused in the binary. In any case, the solution suggested by @lambdageek avoids the unused bytes in the binary, so we are all good here.

if we assume there are < 1000 fields and we stored only a 2 byte offset per field that already fits in 2KB without

You also need names for the fields to make this self-describing. I do not think we want to maintain ever-growing list of name ordinals.

Copy link
Member

@noahfalk noahfalk Mar 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You also need names for the fields to make this self-describing. I do not think we want to maintain ever-growing list of name ordinals.

Did I misunderstand? The original description of the array was static const int[] = { <unique sequence that identifies this array>, offsetof(TypeA, FieldX), offsetof(TypeA, FieldY), offsetof(TypeB, FieldZ), .... }. How would we know that the first offset in the array represents FieldX and not FieldY if we aren't keeping track of the ordinals?

There is a cost difference between unused bytes in the binary and bytes that are read/written during startup and sent as part of minidump payload

Was this the behavior you mentioned earlier where Linux wouldn't just map static data into memory and instead explicitly initializes it at startup? Assuming I've connected those dots correctly that part makes sense now, thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to think a little bit about how this will work for cross-compilation. I think it still works - we can use fixed size data types for the magic payload and we can detect endianness based on the prefix. That should provide enough info for the tool to form a model of the target platform's runtime and compute a delta or identify the contract that matches the output.

I think the build will have to produce some kind of correlated output (ie: a preprocessed header file or something) that has the names of the entities in the object file blob. But I think it's a solvable implementation problem. (Worst case we also dump a massive nul-separated string into the object file. But I would like to avoid that)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would we know that the first offset in the array represents FieldX and not FieldY if we aren't keeping track of the ordinals?

So I think this will work in three phases:

  1. In the first phase, the runtime build will compile a bunch of object files that contain the magic, sizeof(typeX), offsetof(typeX, fieldA), sizeof(typeY), offsetof(typeY, fieldB)... data blob. (New It will also emit a text file containing a list of (contract, not source-level) type name and type/membername items in the same order as the data blob). Note this object file's sole purpose is so that we get the C/C++/NativeAOT compiler to produce something targeting the target environment (ie: it's built by a real 32-bit LE arm linux toolchain, so it has offsets relevant to that platform). We dont' actually link this object file into the final output binary of the runtime. We just scrape it to extract the useful info.
  2. A tool (to be written) will consume the object file and the description at build time and produce a (logical) data type descriptor (ie: a fully expanded out type descriptor in a format yet to be specified). If the layout is an exact match for a well-known descriptor (ie: we 'publish' a final 'net9.0 linux-arm32-le' descriptor into the runtime repo at some point near the end of the release) then we create a source file where the descriptor is just a reference to the well known name. if the final descriptor doesn't match exactly a well known one, we emit a delta (ie: here's the baseline and here's the offsets that differ). The output of this step is a C source file with the descriptor in whatever format physical format we decide to implement.
  3. We compile the source file from step(2) and link it into the final built artifact of the runtime.

I will open a separate PR against this baseline spec to outline this part.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! It sounds like you are also creating a name<->ordinal mapping at compile time that is available for use during the build but never gets included in the final output binary. I do trust that you can get it working and that it would have good runtime/dump performance characteristics :) I'm just a little sad that it feels like a bunch of work and complexity is needed for the build tooling across all the different platforms we support and I was hoping we wouldn't need to spend that degree of effort.

I do not think we want to maintain ever-growing list of name ordinals.

The notion that we'd track a large list of ordinals doesn't seem as a worrisome to me. Presumably any valid reader will already need to encode a table of default field offsets. I'm not sure why managing a table of field offsets would be fine but a same sized table of ordinals would be problematic. The ordinal list also doesn't have to be ever-growing. If field offsets are associated with one or more contracts then we can retire old offsets either by retiring the contract which used them, or by bumping the contract version. In practice I'd be surprised if we are obsoleting more than 50 fields per year so it might be a decade before we could save 1KB via renumbering.

There are 3 different types of contracts each representing a different phase of execution of the data contract system.

### Composition contracts
These contracts indicate the version numbers of other contracts. This is done to reduce the size of contract list needed in the runtime data stream. In general it is intended that as a runtime nears shipping, the product team can gather up all of the current versions of the contracts into a single magic value, which can be used to initialize most of the contract versions of the data contract system. A specific version number in the contract data stream for a given contract will override any composition contracts specified in the contract data stream. If there are multiple composition contracts in a stream which specify the same contract to have a different version, the first composition contract linearly in the runtime data stream wins. This is intended to allow for a composite contract for the architecture/os indepedent work, and a separate composite contract for the non independent work. If a contract is specified explicitly in the Data Contract Stream and a different version is specified via the composition contract mechanism, the explicitly specified contract takes precedence.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand how we intend to determine what (other contracts) should go into each composition contract. Are they intended to be logical groupings?

A global value contract specifies numbers which can be referred to by other contracts. If a global value is specified directly in the Contract Data Stream, then the global value defintion in the Contract Data Stream takes precedence. The intention is that these global variable contracts represent magic numbers and values which are useful for the operation of algorithmic contracts. For instance, we will likely have a `TargetPointerSize` global value represented via a contract, and things like `FEATURE_SUPPORTS_COM` can also be a global value contract, with a value of 1.

#### Data Structure Definition Contract
A data structure definition contract defines a single type's physical layout. It MUST be named "MyDataStructureType_layout". If a data structure layout is specified directly in the Contract Data Stream, then the data structure defintion in the Contract Data Stream takes precedence. These contracts are responsible for declaring the field layout of individual fields.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a data structure layout is specified directly in the Contract Data Stream, then the data structure defintion in the Contract Data Stream takes precedence.

Trying to make sure I understand the precedence/override mechanism described here. So if the stream/descriptor has records:

[compatible contracts]
Composition1: v1
    documented as [ Data1_layout : v1, Data2_layout: v1, Data3_layout: v1, Algorithm1 : v3 ]
Composition2: v1
    documented as [ Data1_layout : v3, Data3_layout: v2, Algorithm2 : v1 ]
Data1_layout : v2
Algorithm1 : v2

[data structure layout]
Data2 : <fields>

The result should be the below? (And the same idea for global values)

Data1_layout : v2        // Data1_layout compatible contract record (precedence over Composition1 and Composition2)
Data2_layout : <fields>  // Data2 data structure layout record (precedence over any compatible contract record)
Data3_layout : v1        // Composition1 compatible contract record (precedence over Composition2)
Algorithm1 : v2          // Algorithm1 compatible contract record (precedence over Composition1)
Algorithm2 : v1          // Composition2 compatible contract record

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As currently framed I am also unclear what precedence rules are intended.

I'm hoping we avoid needing composition contracts at all, but if we do have them my next hope is that we can avoid having any override/precedence mechanics. We'd require that any given contract only has its version implied at most once, either by listing it explicitly or by including it in at most one composition.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elinor-fung interprets my inscrutable prose correctly. However, we can discuss whether or not precedence rules make any sense. @noahfalk, I'd like your opinion on my rationales here.

So there are really 2 different purposes for the precedence rules.

  1. To allow us to depend on a composition contracts + a delta that is what the runtime actually implements. (For instance, we have a composition contract which is the behavior of .NET 9 + v2 of GCWalking, because we started to change that early in .NET 10. Based on the feedback from folks in this PR, I expect to remove composition contracts from this version of the spec, and we can add it back later if it turns out to be desirable.

  2. To allow us to define the data structure layouts into the Data Contract Reader explicitly only for release formats, but for DEBUG or other one-off builds, that we would have a means to allow for data structures to have significantly different sets of offsets, and yet still be conveniently debuggable. For instance, one of the things that I sometimes do is add a field to a common data structure with a string that describes what it is, what is going on, etc. The idea is to allow that sort of ad hoc hackery to not effect the debuggability of the runtime generated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks David! My preference remains to avoid composition contracts if possible, but hypothetically if we were to have them then I think the next best choices would be:
scenario 1 - At the time we update the GCWalking contract, also update the composition which contains it. Hopefully we've defined the compositions such that the debugger functionality lost when breaking a member isn't that different than what is lost breaking the entire composition. GCWalking sounds like a relatively large contract already. I am assuming if we have composition then we have some finer-grained contracts composing it like GCInfoCompositionContract = Segment + Generation + GCHeap + AllocContext + GCGlobals. If you made a breaking change to Segment then I assume you've already broken most of the GC debugging functionality regardless.
Scenario 2 - I think it might be a good practice for contracts to specify default values of field offsets in documentation, but also include some place where optional explicit offsets can be included with the encoded contract data. If we had composition contracts then I'd propose they should support the same parameterization. If we decided that the best way to encode optional parameters of composition contracts was to explicitly encode some member contracts I think that would be OK as long as both the composition and the explicit member contract agree on the same version. This is a little more relaxed than my initial thought above.

@davidwrighton davidwrighton force-pushed the data_contracts_proposal branch from 92e5670 to d28135f Compare March 20, 2024 23:50

## Arrangement of contract specifications in the repo

Specs shall be stored in the repo in a set of directories. `docs/design/datacontracts` Each one of them shall be a seperate markdown file named with the name of contract. `docs/design/datacontracts/datalayout/<contract_name>.md` Every version of each contract shall be located in the same file to facilitate understanding how variations between different contracts work.
Copy link
Member

@lambdageek lambdageek Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be useful to have a machine-readable header to the contract specs. This will allow some kind of cross-validation:

  1. That the reader implements all known contracts, doesn't misspell any contract names, or miss a version.
  2. That the target runtime descriptor includes only known names/versions and doesn't misspell any names.

Additionally for globals/offsets we could use the machine-readable spec to either generate or validate the implementations in the reader.

For example we could use YAML frontmatter metadata to describe all the important info about a contract (or at least name, versions). And then leave the markdown purely for descriptive text. Something like:

---
contract_name: "XYZContract"
versions: [1, 2, 5]
contract_type: "global"
globals:
  - name: SomeGlobal
    type: int32
    value: 1
  - name: SomeOtherGlobal
    type: int8
    value: 0
version_2:
  - globals:
     - name: SomeGlobal
       value: 0
version_5:
  - globals:
     - name: ThirdGlobal
       type: int8
       value: 5
---

# Contract XYZ Contract

Descriptive text

## Version 1

What's interesting about Version 1

## Version 2

This is also interesting because SomeGlobal changed values

## Version 5

This version changed SomeGlobal back to its original value, but also added ThirdGlobal

We dont' necessarily need to build the tooling on day 1, but it would be nice to have the option for the future. And it's not really any more work for contract authors to follow a slightly constrained format.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no objections, and in fact the formatting of the data layout/globals stuff is something I'm not focusing on too much today. I'd like to hear other perspectives though. @jkotas, @noahfalk?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be a real-world example of SomeGlobal? I am not sure how to think about it. I would expect that each contract version can be used with multiple different values of SomeGlobal.

The non-YAML part of your example that has descriptions of the significant changes in each version looks good.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly this seems good to me 👍

One caveat is I'm still hoping we don't need to have different types of contracts so contract_type: "global" wouldn't exist. Instead I was imagining that any contract is allowed to include globals as well as other elements.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be a real-world example of SomeGlobal? I am not sure how to think about it. I would expect that each contract version can be used with multiple different values of SomeGlobal.

yea, I'm also not sure about this part.

One thing that @davidwrighton has talked about is that for algorithm contracts new versions are allowed to throw if a certain method is no longer relevant in a newer version of a contract.

I'm not sure if it makes sense for global access to throw if a newer version of a contract makes some global obsolete or if it is better for obsolete globals to be specified in a newer version of a contract to have some default constant value. I could imagine both to be useful in different situations.

The non-YAML part of your example that has descriptions of the significant changes in each version looks good.

I think the absolute minimal YAML part of my proposal is just something like this:

---
contract: "XYZContract"
versions: [1, 2, 3]
---

I'm absolutely 100% certain I will check in typos in the reader implementation if there isn't some tooling to spellcheck the valid contract names.

- To start, I've added the contracts for iterating the thread list and gathering information about individual threads
- Note that this algorithmic description is paired with a simplification of the SList data structure
  - See PR dotnet#100107
@davidwrighton
Copy link
Member Author

@elinor-fung @lambdageek @jkotas @noahfalk I've just pushed a new commit, which adds a few example algorithmic contracts, drawn from the set of functionality we need for examining threads. In combination with this is a small pr #100107 which adjusts the definition of SList so that it is easier to describe as an algorithmic contract.

@@ -0,0 +1,386 @@
namespace DataContracts
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the proposal here to actually provide a stable public API like this? I thought the C# API was just providing a theoretical API against which to write up algorithm contracts, but this iteration makes it seem like it is being proposed as an actual API here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it a lot easier to think about things in more concrete fashion. This isn't really a public api, and if its actually what we write the contracts (and possibly implementation of contracts against) thats ok, but it isn't intended to actually be shipped as publicly useable outside of writing the data contracts. As well, the "apis" in the contracts are not public shipped artifacts either. Those are intended to be used to implement the actual public apis like ISOSDacInterface which are public, and useable by customers/partners.

At some point I could see this all turning into a public api once we've used it for a few years, but that's far enough down the line that I don't want to worry too much about it.

@@ -0,0 +1,23 @@
# Contract GCHandle

This contract allows decoding and reading of GCHandles. This will also include handle enumeration in the future
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will also include handle enumeration in the future

I would think that GCHandle enumeration is going to be separate from GCHandle dereferencing.

GCHandle is very simple and it is needed for minidump analysis.

GCHandle enumeration is much more involved and it is not needed for minidump analysis.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair. I can easily see stopping at supporting only individual GCHandle here, and defining the enumerators separately. We may want to provide the ability to determine what kind of GCHandle, and possibly the various other characteristics of a non-boring GCHandle.

Copy link
Member

@noahfalk noahfalk Mar 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The split that I'm guessing would work well is:

  • A contract for Object that includes the facts that OBJECTREF can be treated as Object* and OBJECTHANDLE can be treated as Object**
  • A GCHandle contract that has all the other details for other handle state and handle enumeration.

Fwiw I don't recall that ICorDebug or SOS has any exposure of GCHandle information that isn't combined with enumerating them first.

If you do want to split GCHandle additional state from GCHandle enumeration I'd be fine with it, but at that level of granularity I'm guessing we'd looking at 30-50 contracts rather than 5-20.

## Data structures defined by contract
``` csharp
class SListReader
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think SList is unnecessary ceremony as a contract. Can we just have a next pointer in the Thread and a global that points to the first thread?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree. The idea of having a concept of how a linked list works that is external to the definition of any particular type works with it is quite useful. I don't expect that consumers of the cDAC will want to use the SList algorithm, but several algorithms within the cDAC may want to do so. For instance, the api for understanding what threads are blocked on a Monitor would use the SList algorithm as well on CoreCLR. Also as an example of how to handle our templated data structures, I believe it is fairly useful. We'll need to do something similar for SHash, and that will be a much more complex data structure to model in the contracts.

Although, looking a bit more closely at this particular use case, we might actually want to consider a doubly linked list instead of a singly linked list. The current model has a problem that cleaning up N threads has a O(N^2) cost instead of an O(N) cost since this is singly linked.

Finally, for V1 I'm describing the current the CoreCLR implementation of SList here right now, but honestly, the NativeAOT SList actually looks like a more reasonable implementation. This would be a fine place for us to wander by, and replace the CoreCLR SList with the NativeAOT SList, move one version number around, and see that we were able to take advantage of the granular versioning that the algorithms system provides.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My experience has been that any abstractions for linked list that are more than a trivial helper function or a macro are useless.

I agree with you that an abstraction like this would be useful for hashtables.


enum ThreadState
{
TS_Unknown = 0x00000000, // threads are initialized this way
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should review these flags. Number of them do not make sense to include as a contract.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea.


record struct DacThreadData (
uint ThreadId;
uint OsThreadId;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OS thread ID should be pointer sized or long. 32-bit OS thread ID is non-portable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change it to TargetNUInt

TargetPointer Frame;
TargetPointer FirstNestedException;
TargetPointer TEB;
TargetPointer LastThrownObjectHandle;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you desribe the difference between LastThrownObjectHandle vs FirstNestedException? Can we have just one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its unfortunate, but LastThrownObjectHandle is a GCHandle object handle to a managed object, and FirstNestedException is a "nested exception record object, which is actually some sort of linked list thing defines what regions of the stack are walkable vs not, as well as a GCHandle to each nested exception in that list. There are 3 different implementations of the linked list, all with subtly varying behavior around which how stack regions should be skipped, and under which circumstances. Handling of this sort of variation is going to be interesting, and the current spec does a minimal attempt at it. See the GetNestedExceptionInfo portion of the thread api.

This could have been made cleaner, by having the GCHandle api provide a type that indicates that something is a GCHandle so it would be clear where something is an input that is an actual GCHandle as opposed some weird target pointer thing. Some thing could be done with the nested exception info list.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we refactor the relevant runtime to make the contract simpler?

I think it is important that we do this sort of refactoring to keep the data contracts as simple as possible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea how feasible it is to refactor the nested exception stuff. This took @janvorli quite a long time to get all of this working, and frankly, its some of the most fragile and difficult to understand code in our codebase. Refactoring is both an incredibly useful idea here, and ... probably REALLY hard. To be honest, my preference here for doing refactoring, would be to move X86 to the funclet model, then determine that Jan's new EH stuff is completely correct, then delete 2 of the nested EH models, so that there aren't 3, there is just 1. Unfortunately, that is a LOT of work. In the meantime, I'm hoping that we can develop enough of an understanding of how this part of EH works that we can describe it in a manner which is close enough to correct that we can implement the data contracts.

Copy link
Member

@jkotas jkotas Mar 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If full refactoring is hard, we may be able to store a duplicate information in place that it is more reasonable to describe via a contract.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I'm not in a position to understand that refactoring at the moment though. This stack of nested exceptions objects is fairly easy to work with, but I don't understand enough of the real use of these concepts (stack walk entry skipping) to define what that refactoring would look like for a complete implementation. I have somewhat of an idea of what it looks like for the funclet model, but the x86 model is a complete mystery to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FirstNestedException is actually a pointer to a data structure containing all the information exception handling needed when handling the exception. That's why it has three different forms (where two of them have the common stuff in a base class). But I would argue that only a very minimal subset of the information is relevant to a debugger.
When there is a nested exception (throw from a catch / finally), a new instance of that data structure is created with a member pointing to the previous one.
All three of these structures actually contain OBJECTHANDLE m_hThrowable for the exception. I am not 100% sure at the moment why we need the LastThrownObjectHandle on the thread, maybe the lifetimes slightly differ.

Comment on lines +114 to +117
UnstartedThreadCount : runtimeThreadStore.m_UnstartedThreadCount,
BackgroundThreadCount : runtimeThreadStore.m_BackgroundThreadCount,
PendingThreadCount : runtimeThreadStore.m_PendingThreadCount,
DeadThreadCount: runtimeThreadStore.m_DeadThreadCount,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UnstartedThreadCount : runtimeThreadStore.m_UnstartedThreadCount,
BackgroundThreadCount : runtimeThreadStore.m_BackgroundThreadCount,
PendingThreadCount : runtimeThreadStore.m_PendingThreadCount,
DeadThreadCount: runtimeThreadStore.m_DeadThreadCount,

These are BCL concepts that we have not moved to C# yet. I am not sure whether it makes sense to have them in the low-level thread contract.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll split those out into a separate api then. It so happens that ISOSDacInterface exposes these to SOS, although I'm not sure how much they are used by SOS clients.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, these are not currently only BCL concepts, as the iterators we have over threads depend on them. I'm not 100% certain they can just be moved to managed without consequence. It would be nice though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

although I'm not sure how much they are used by SOS clients.

Yeah, I have struggled with random pieces of information returned by DAC SOS APIs many times when doing significant changes in the runtime (and often ended up breaking them in the process without anybody caring).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually think of our diagnostics in two categories:

  • abstractions that are broadly useful for anyone debugging managed code
  • abstractions that are only useful for runtime developers trying to understand runtime data structures

I think we have a lot of lattitude to do takebacks on the latter but need to be much more cautious on the former.

TargetPointer FirstNestedException;
TargetPointer TEB;
TargetPointer LastThrownObjectHandle;
TargetPointer NextThread;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this have a GCHandle for the managed thread object?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, probably. I modeled this on the GetThreadData SOS api which does not expose that, but it makes sense to provide along with the rest of these things.

Comment on lines +7 to +17
record struct DacThreadStoreData (
int ThreadCount,
TargetPointer FirstThread,
TargetPointer FinalizerThread,
TargetPointer GcThread);

record struct DacThreadStoreCounts (
int UnstartedThreadCount,
int BackgroundThreadCount,
int PendingThreadCount,
int DeadThreadCount);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these rather be together in the "Apis of contract" section? These data structures are not data structures found in the target process. They are here just to support the "Apis of contract".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to do that as a follow on PR focused on that one question. My general thought, is sure, we can do that. I had envisioned the "types" section as always being contract api types, and not having anything to do with runtime types at all, but I think its an interesting question as to whether or not we should have a types section that refers to the types the contract might use. That's different, and implies that the types section should be specified per version.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, added to #100162

@lambdageek
Copy link
Member

We had an offline discussion this afternoon and feel that most of the issues left unresolved in the discussion on this PR do not constitute a blocker to merging the PR and following up in further PRs to refine the spec.

I have captured a summary of the discussions in #100162 - please add ot the list if you think I missed something. Each of those bullet points could become its own GH issue / PR.

Copy link
Member

@lambdageek lambdageek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many important questions remain, but the initial spec is a good starting point to build on to resolve them. thanks @davidwrighton and @jkotas for laying the groundwork.

@davidwrighton davidwrighton merged commit f9067e9 into dotnet:main Mar 23, 2024
12 checks passed
``` csharp
struct DacGCHandle
{
DacGCHandle(TargetPointer value) { Value = value; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its hard to tell what level of indirection we are working with. When I think of GCHandle in the runtime I imagine:

struct ObjectHandle
{
    Object* Object;
}

But here I think you are defining Value as the Object**? Two things that might help:

  • Define the pseudo-code so that it types the target pointers. For example TargetPointer<DacObject> or TargetPointer<DacObjectRef>
  • Define the structs using the fields that are present in the runtime. Let it be implied that every pseudo-code structure has a constructor that marshals the fields and the signature is always NameOfRemoteType(Target, TargetPointer<NameOfRemoteType>). This constructor can be expected to iterate over every field in the structure, determine the proper remote offset and type, and then recursively construct the field value using the calculated field pointer. We could describe the algorithm in detail at one place in the documentation and then avoid repeating it for the specific fields in every structure.

I think we'd end up with something like:

struct DacGCHandle
{
    TargetPointer<DacObject> Object;
}

TargetPointer<DacObject> GetObject(DacGCHandle gcHandle)
{
    return gcHandle.Object;
}

Reading further below I think there is also a discrepancy whether types in this section were intended to represent runtime types or logical abstractions. If these are intended to be abstractions then I'd suggest either define the APIs as member functions of this type or use TargetPointer<DacGCHandle> as the logical type which shouldn't require any explicit definition.


## Data structures defined by contract
``` csharp
class SListReader
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The header here claimed we were defining data structures, but this SListReader is type with no member fields so it doesn't feel like it belongs here. It appears to be part of the implementation that describes API behavior.

Upon reading further below I noticed we are probably using the same terminology for different things. I'm not tied to my terminology, but in total I think we have at least three things that ideally will be clearly distinguished:

  • A physical model that describes layout of data in the target process memory. In pseudo-code I expect these are structures definitions and fields that mirror the runtime naming.
  • A logical model that describes abstractions we believe are useful to conceptualize the runtime state. These abstractions may have state and behavior. In pseudo-code I think of this as a C# type definition that has properties and methods on it.
  • A mapping that describes how the physical model can be converted into the logical model. In pseudo-code I think of this as the implementation of the logical model methods+properties in terms of structures+fields defined in the physical model.

int PendingThreadCount,
int DeadThreadCount);

enum ThreadState
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For flags like this I think we should make a decision about the likelihood we'd ever want these values to change. If we anticipate they may change then we probably want to define their values as part of encoded data in process memory. If we'd be willing to take a breaking change to diagnostic tools in order to bump the contract version and modify their values than documenting them as constants is fine.

I have no objection to these particular flags being constants, I just wanted to raise the broader issue because I suspect we won't want all enumerations to be documentation constants. For example I believe the NativeAOT MethodTable flags have been shifting in value over the past few years.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@noahfalk, handling of enums and varying values between implementations is a problem. At the moment, since this enum is defined in the API portion of the contract, it does not represent the enum values as are present in the runtime at all. Instead it represents the enum values that the contract exposes. I really should have named this DacThreadState What we don't have in our specs is a manner to a contract from the runtime that indicates that the values of the runtimes variant of the ThreadState enum have some specific set of values.

Also, as @jkotas noted earlier, we probably shouldn't be exposing this complete set of enum values from the contracts. Instead we should have a restricted set that we believe is likely to be kept for the longer term exposed from the contracts, and prevent the contract from allowing visibility into other values held in the corresponding field in the runtime data structure.

We have the luxury of the contract api not being a public, stable api, so if we guess wrong on the set of exposed flags, we can adjust that, but it will still be somewhat of a pain to do that.

DacThreadStoreData GetThreadStoreData()
{
TargetPointer threadStore = Target.ReadGlobalTargetPointer("s_pThreadStore");
var runtimeThreadStore = new ThreadStore(Target, threadStore);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ThreadStore and Thread in this code represent the physical model structures although they were never defined. I expect we will need to give an explicit definition of these types and their field lists somewhere in the documentation. At minimum the documentation needs to include the default field offsets for every field, otherwise the reader will not know the correct offset whenever an explicit offset wasn't encoded.

Comment on lines 294 to 295
| FirstField | Int32 | 0 |
| SecondField | Int8 | 4 |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! It sounds like you are also creating a name<->ordinal mapping at compile time that is available for use during the build but never gets included in the final output binary. I do trust that you can get it working and that it would have good runtime/dump performance characteristics :) I'm just a little sad that it feels like a bunch of work and complexity is needed for the build tooling across all the different platforms we support and I was hoping we wouldn't need to spend that degree of effort.

I do not think we want to maintain ever-growing list of name ordinals.

The notion that we'd track a large list of ordinals doesn't seem as a worrisome to me. Presumably any valid reader will already need to encode a table of default field offsets. I'm not sure why managing a table of field offsets would be fine but a same sized table of ordinals would be problematic. The ordinal list also doesn't have to be ever-growing. If field offsets are associated with one or more contracts then we can retire old offsets either by retiring the contract which used them, or by bumping the contract version. In practice I'd be surprised if we are obsoleting more than 50 fields per year so it might be a decade before we could save 1KB via renumbering.

## Apis of contract
``` csharp
DacThreadStoreData GetThreadStoreData();
DacThreadStoreCounts GetThreadCounts();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels a little odd to me that we'd be doing this grouping of ThreadStoreData and ThreadStoreCounts separately. I'd suggest we constrain the abstractions in our logical layer to either be 1:1 with some runtime type, or abstractions that are described in the ECMA CLI doc. In this particular case I think we should define a ThreadStore entity and put a bunch of properties on it for whatever values we believe should be exposed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we constrain the abstractions in our logical layer to either be 1:1 with some runtime type

I would agree with that if the runtime types had a reasonably clean design. The unmanaged Thread in CoreCLR is opposite of that. For historic reasons, Thread contained every thread-local variable ever used by the unmanaged runtime so it has been a dumping ground for a lot of unrelated stuff. Jeremy chipped away on it a little bit in #99989.

I think we need to some plan to clean this up so that we do not end up faulting a ton of cruft into the contracts.

One possible way to do that is to make some of the contracts identical between CoreCLR, NativeAOT and possibly Mono and refactor the code accordingly. Contracts that are identical between CoreCLR, NativeAOT and possibly Mono have a high chance of being properly factored.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to some plan to clean this up so that we do not end up faulting a ton of cruft into the contracts.

A general issue here is describing what we want with what we have. This model creates a fair bit of friction with by codifying current asperations instead of reality and permitted evolution with where we want to go. Either we should focus on the asperation and move in that direction or accept the current reality and model it as such while letting the versioning scheme grow as we improve.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the data contract project is going to fail if we accept the current reality and try to create contracts from it. We do not need to cleanup every little detail, but I think fair bit of cleanup and refactoring is required for the data contract project to succeed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the data contract project is going to fail if we accept the current reality and try to create contracts from it.

I disagree with that assessment, but I accept I am missing some nuance of this approach that forces a cleaner implementation.

We do not need to cleanup every little detail, but I think fair bit of cleanup and refactoring is required for the data contract project to succeed.

I see. It sounds like there is a prep work then with this effort. Has that specifically been identified and accounted for in the work for the cDAC? As in do we have a "simplication" work item for specific areas of the runtime we expect to focus on for a .NET 9 deliverable? I think it is fine if it is implied in the work item, but it should be an explicit expectation somewhere.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would agree with that if the runtime types had a reasonably clean design

I wasn't claiming that we are stuck with the shape of the runtime as it exists today. If we want to change the runtime during .NET 9, then ship a contract that is 1:1 with that new runtime shape that exists when we ship .NET 9 that is fine by me.

partial void Get_baseSizeAndFlags(ref int value);

[DataContractLayout(1, 8)]
public class DataLayout1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if the type definitions are inline or linked from the contract markdown files that use them.

public int baseSize;
}

// The rest of this is generated by a source generator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This auto-generated code doesn't seem needed for documentation. I think something like this would be a lot more concise and still convey all the info an implementation author needed:

MethodTable version 1

struct MethodTable
{
    int dynamicTypeId; // offset 0
    int baseSize;      // offset 4
}

MethodTable version 2

struct MethodTable
{
    int baseSize;      // offset 0
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed... but this C# file was intended to express how the contract would be possibly be realized in C#. So, I needed to describe what the source generator might do to effectively discuss the apis available. As a consequence, I wrote out a bunch of boilerplate. I agree this isn't very obvious, and needs to be described more fully.

partial void Get_dynamicTypeId_optional(ref int value);
partial void Get_baseSizeAndFlags(ref int value);

[DataContractLayout(1, 8)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think of the Size property as indicating how much space do we expect between elements in an array of this type. For types that would never show up in arrays (like MethodTable) we could omit it.

}

[DataContractAlgorithm(1)]
class MethodTableContract_1 : ContractDefinitions.MethodTableContract, IAlgorithmContract
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why some contract info is here and other contract info is in markdown files. It seems like there are two different patterns in use. Given a choice between the two I think the markdown approach is easier to read and understand.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... I think this is a bit misunderstood... the concept of a "MehtodTableContract_1" was intended as an example of what the final code would be. I really ought to put a comment here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, all the other content here appeared to be examples of how we'd document contracts so that was my default interpretation of the content here too. Since this is intended to be something else we could just put some comments at the top of the file saying so.

lambdageek added a commit that referenced this pull request Mar 29, 2024
Contributes to #100162 which is part of #99298

Follow-up to #99936 that removes "type layout" and "global value" contracts and instead replaces them with a "data descriptor" blob.

Conceptually a particular target runtime provides a pair of a logical data descriptor together with a set of algorithmic contract versions. The logical data descriptor is just a single model that defines all the globals and type layouts relevant to the set of algorithmic contract versions.

A logical data descriptor is realized by merging two physical data descriptors in a proscribed order.

The physical data descriptors provide some subset of the type layouts or global values.

The physical data descriptors come in two flavors:

- baseline descriptors that are checked into the dotnet/runtime repo and have well -known names
- in-proc descriptors that get embedded into a target runtime.

Each in-proc descriptor may refer to a baseline and represents a delta applied on top of the baseline.
The data contract model works on top of a flattened logical data descriptor.


Co-authored-by: Aaron Robinson <arobins@microsoft.com>
Co-authored-by: Jan Kotas <jkotas@microsoft.com>
Co-authored-by: Noah Falk <noahfalk@users.noreply.github.com>
@github-actions github-actions bot locked and limited conversation to collaborators Apr 25, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants