-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KTNB-693 Send the full dataframe schema as metadata #706
Conversation
core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/impl/io/writeJson.kt
Outdated
Show resolved
Hide resolved
schemaData["name"] = name | ||
schemaData["kind"] = columnSchema.kind.toString() | ||
when (columnSchema) { | ||
is ColumnSchema.Value -> schemaData["type"] = columnSchema.type.toString() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also have a function to turn KType to String, it's used in HTML rendering and DataFrameSchema.toString
internal fun renderType(type: KType?): String { |
What do you think? Suits for AI actions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking into that method, it seems to hide some of the type information in some cases. So I do not think it is suitable for AI Actions. At least if we want to be as specific as possible with the context information. Also, if others want to use the type information, we probably need the fully qualified type as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking into this method, it seems to remove type information in some cases, which might make it problematic when we want to use the metadata to improve the context of AI actions. So for now I would prefer to keep the fully qualified names that also include nullability, e.g. Kotlin.String?
There is a problem when a FrameColumn contains frames with different schemas. I recommend attaching types to the metadata of each nested frame. This may lead to duplication if the schema of each nested frame is the same, but it will make it easier to work with on the Kotlin Notebook plugin side. We already have a lot of duplication because we pass column names for each value in rows, so this additional overhead will be minimal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ermolenkodev I see your point. I forgot to think about that each row could hold different schemas for data frame references. So you are right, it is probably better to have the schema as part of the metadata inside the data frame content. I'll refactor it. |
…op-level frame as well as on each row. Updated serialization_format.md
After some discussion with @ermolenkodev we decided to rework the metadata a little. I have updated the PR and description. So it should be ready for a 2nd round of review. |
This part adds the infrastructure needed for https://youtrack.jetbrains.com/issue/KTNB-693/Enable-AI-Actions-for-DataFrames-in-Kotlin-Notebooks as we currently are not able to detect column types in a good way which is needed when creating prompts for the AI Assistant.
It adds a new "types" property to the top-level "metadata" as well as recursively on each row so it is possible to easily identify column types.
A
columns
property has also been added toColumnGroup
andFrameColumn
metadata, it contains nested column names similar to the top-levelcolumns
property.Example: