-
Notifications
You must be signed in to change notification settings - Fork 1
add notes on schema generation strategies #2
add notes on schema generation strategies #2
Conversation
Thanks for the feedback. A few remarks on my side:
If I'm understanding the issue correctly, it's not about the generated schema not being correct, but rather that the validation tests haven't been configured correctly?
That wouldn't address recursive types though, right?
I did this intentionally because I wanted to ensure that the schema corresponding to each generated type is |
@@ -61,21 +62,29 @@ public static IEnumerable<ITestData> GetTestDataCore() | |||
yield return new TestData<DateTimeOffset>( | |||
Value: new(new DateTime(2021, 1, 1), TimeSpan.Zero), | |||
AdditionalValues: [DateTimeOffset.MinValue, DateTimeOffset.MaxValue], | |||
ExpectedJsonSchema: """{"type":"string","format": "date-time"}"""); | |||
// `date-time` format is RFC3339; not sure about DTO, but DT serializes as IS08601. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. For similar reasons we replaced "format" : "time"
with a pattern keyword for the case of TimeSpan
. We might want to do the same here.
cc @eerhardt
@@ -84,14 +93,18 @@ public static IEnumerable<ITestData> GetTestDataCore() | |||
|
|||
// Enum types | |||
yield return new TestData<IntEnum>(IntEnum.A, ExpectedJsonSchema: """{"type":"integer"}"""); | |||
yield return new TestData<StringEnum>(StringEnum.A, ExpectedJsonSchema: """{"type": "string", "enum": ["A","B","C"]}"""); | |||
yield return new TestData<FlagsStringEnum>(FlagsStringEnum.A, ExpectedJsonSchema: """{"type":"string"}"""); | |||
// `type` is redundant here since `enum` is exclusive |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We added the type
keyword to enums following feedback from the Semantic Kernel team that this is required by the Gemini APIs (google's LLM).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then I would say that the Gemini LLM is wrong. It's not hurting anything; it's redundant and a lot of people come to us asking about it.
There is an argument that type
in this case can serve as an annotation for code gen scenarios for languages which don't support enums directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it's wrong, however supporting function calling in all LLMs that we support is what funds this project. As such, I can't see how we could remove it at this point. I think we could remove it in a future revision though, we should be free to evolve the generated schemas without it being considered a breaking change.
@@ -303,7 +319,10 @@ public static IEnumerable<ITestData> GetTestDataCore() | |||
} | |||
"""); | |||
|
|||
// would be neat to have the converter expose or be attributed with a schema somehow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's something we should be exposing in converters once we have a proper exchange type.
Not just the validation tests. Any validator would be expected to ignore assertions for
Recursive types work just fine. Ideally, you can reference the root schema Linked list {
"type": "object",
"properties": {
"value": { "type": "integer" },
"next": { "$ref": "#" }
}
} Object with linked list property {
"type": "object",
"properties": {
"list": { "$ref": "#/$defs/linked-list" }
},
"$defs": {
"linked-list": {
"type": "object",
"properties": {
"value": { "type": "integer" },
"next": { "$ref": "#/$defs/linked-list" }
}
}
}
} You could also give the linked list an Object with identified linked list property {
"$id": "https://dotnet.org/schemas/linked-list-container",
"type": "object",
"properties": {
"list": { "$ref": "linked-list" } // resolves to https://dotnet.org/schemas/linked-list
},
"$defs": {
"linked-list": {
"$id": "linked-list", // https://dotnet.org/schemas/linked-list
"type": "object",
"properties": {
"value": { "type": "integer" },
"next": { "$ref": "#" } // resolves back to https://dotnet.org/schemas/linked-list
}
}
}
} |
// why not use `enum`? | ||
yield return new TestData<FlagsStringEnum>(FlagsStringEnum.A, ExpectedJsonSchema: """{"type":"string"}"""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are multiple flags-enum-to-string approaches:
- array of flags
- comma- or pipe-delimited
(My comment actually probably isn't a good idea since you can have 2^n options.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The converter uses comma separated. The number of valid inputs is actually infinite because for example "A, A, A, A, A, A, A, A, A"
is accepted.
One concern I had with using |
You could take the fully-qualified type name, hash it to a 128-bit int, and render it as a GUID URI. That would abstract it sufficiently from .Net while still providing a consistent map. Collisions could still occur, but would be extremely rare. |
Wouldn't that make it look like a solution file? Wouldn't get many points for human readability. |
At least for the purposes of OpenAPI, there's a requirement for reference IDs to be human readable and semantically meaningful. The reference ID also has to be unique to the schema, and not the .NET type. For example, sequence-types that effectively map to the same schema (e.g. all types that would be represented by |
OpenAPI v3.0- actually doesn't allow identifiers in schemas. I've searched through 3.1, and I see no such requirement. |
Then you'd have to identify unique subschemas and extract them. That's actually what I do in JsonSchema.Net.Generation. I find this approach works better because a type on its own will have a schema, but a property of that type could have additional requirements imposed by property-level attributes which would yield a different schema. Scanning the generated schema structure (or tracking it during generation) for common subschemas (not common JSON data; they must be subschemas) will yield a good result. |
The "reference IDs" I'm referring to here are specifically in reference to the identifier used when constructing the url in the |
I recommend you have two processes:
For this effort, don't worry about (2). FWIW, OpenAPI is the reason my The thing is, you're not going to get around needing to use {
"type": "object",
"properties": {
"list": { "$ref": "#linked-list" }
},
"$defs": {
"linked-list": {
"$anchor": "linked-list",
"type": "object",
"properties": {
"value": { "type": "integer" },
"next": { "$ref": "#linked-list" }
}
}
}
} You just need to be sure your anchor is unique. Again, this is the problem you're facing, which is why I recommend less-readable GUIDs. (Editors generally have "find" functions.) |
This repo is now being archived. |
I've added a bunch of notes regarding the decisions made for this POC. Mostly they boil down to:
format
isn't a validating keyword by default. For the purposes of the tests, they can be treated as such in my lib by settingRequireFormatValidation = true
. Other validators might have similar settings, but they'd be opt-in.$ref
directly into the schema structure. It's recommended to extract common subschemas to a$defs
keyword.true
schema is preferred over the empty schema{}
(especially as a subschema foradditionalProperties
, which historically has accepted booleans even before booleans were valid schemas).There are a few other more targeted notes and corrections as well.
I don't intend for this PR to merge. It's just a vehicle for notes.