Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[cdac] Data Descriptor Spec #100253
[cdac] Data Descriptor Spec #100253
Changes from all commits
d392600
980144a
67eee3f
019bc63
452c990
8224050
11e988c
db02e42
b40fa38
a46b9e2
7896c0a
27f9892
b2dba2c
ddb0a4b
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My guess is that ordinals would be a bit simpler as well as being more compact, but if it fits in the size targets I'm fine with names.
I wouldn't be surprised if a dense ordinal-based encoding without any baseline values also fits within the perf size goals. If that meant we didn't need any custom build tooling, JSON format, or baseline offset tables that seems like a nice simplification. The part that remains an unknown for me on that route is how we deal with C# globals/types in NativeAOT assuming that we would have some. So far all the DebugHeader work for NativeAOT has never needed to encode info for managed type or global and I don't know what capabilities we have to work with in the NativeAOT toolchain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds like my suggestion is not the direction you are choosing to go. If you want to discuss it more I'm happy to, but I also don't want to be a distraction when you've decided on another approach already. I believe the current approach with JSON, baselines, and custom build tools can work even if it feels more complex than needed to me. Feel free to mark this resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about it some more:
I thought that the types are only meant for documentation purposes, but I do not see it actually mentioned anywhere. Do we expect to the types to be used in the reader?
If the types were for documentation purposes only, it does not make sense to mention them in the compact form.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is just a choice of what do we want to happen if a field ever changes its type? One option is type is implicit in the name so the name must change either in the source or via some override the tooling supports. The other option is we include the type explicitly so that the reader can match on (type,name) tuple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My vote would be for requiring field rename if it changes type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was under the impression that the JSON format was for documentation purposes only and it would never appear in memory of the target process. What is the scenario where we expect the JSON format to be embedded in memory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The well-known baseline JSON data will be embedded into the CDAC reader tool. It's not just for documentation.
If the in-proc data descriptor is going to be a data-segment constant that is being generated by
cdac-build-tool
, there didn't seem to be any reason why we should be a binary blob. JSON is easy to read with a variety of off-the-shelf parsers.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just misunderstood the text. I thought it was saying the JSON would be inside the target process being debugged. If this JSON is included in the reader then I have no concern with that at all. I consider it an implementation choice of any diagnostic tool, including ours, how it wants to utilize the documentation. Sorry for confusion!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The text is saying that the (compact) JSON will be inside the runtime binary in the target process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry it looks like I had it right the first time. I see there was an edit earlier today that eliminated the binary format entirely and just use the JSON as the in-memory format.
This feels like it is getting progressively further away from the approach I would have favored, but I think you can make it work and it feels like my opinion on this is in the minority. My impression is that defining flat in-memory structs with no baseline at all would be easier to generate (no build tooling needed or baseline to manage), easier to consume (no JSON parser needed), and fits in our size goals. As an example the GC already does this as have past successful prototypes. I know I suggested having a baseline at an earlier point, but pretty quickly it felt like the amount of complexity it added to the design made it feel not worth the potential size savings.
In any case if your vision is different and you think JSON, baselines, and custom build tools are the best approach go for it. I see no reason it won't work even if it isn't the approach I would have picked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lambdageek and I chatted about this too. I'm hoping we can focus more on actually understanding what the entire description will be and then after we know for sure we can apply a baseline logic or just accept that compisition wasn't needed because the size isn't as bad as assumed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the design having the space for baselining makes sense, but I would like to start with embedding everything in the target process and see where that is with
PrintException
and how much we think it will grow. And then actually use/implement the baselining if deemed necessary.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Part of my thinking that we wouldn't need a baseline is that I'm expecting a simple ordinal based binary encoding to be between 10-40% of the size of JSON encoding for the same set of data. For example you might see that a JSON encoding of the entire runtime descriptor with no baseline is 20KB in size which we decide is too big. Then we either save size by sticking with JSON and using a baseline, or we save size by using some simple structs with binary data that is substantially more efficient.