Extend "unfold" operation and support it in the compiler plugin #742

koperagen · 2024-06-18T18:38:56Z

It covers two interesting use cases:

Replace column with multiple, potentially nested. Provides DSL similar to add for that
More fine-grained toDataFrame. Instead of converting 20-30 properties to 2-3 level of nesting all at once user can choose to convert toDataFrame(maxDepth = 0) and unfold required properties to whatever level they need

On the compiler plugin side i will continue to support other overloads later in different PR

…rt them as compilation errors. Add special constructor for errors that shouldn't be caught

Interpreters need an ability to pass arguments down to DSL, so introduce new "dsl" factory function

…ts failure messages

github-actions · 2024-06-18T18:43:31Z

Generated sources will be updated after merging this PR.
Please inspect the changes in here.

Jolanrensen

I'm wondering, why did you decide to put unfold after replace?
Unfold itself, by definition, already replaces a column with a new column. I think we're making the API more complicated than it needs to be by making our users write df.replace { a }.unfold {} instead of keeping it inside the unfold operation, like df.unfold { a }.by {}. You can even allow df.unfold(maxDepth = 2) { a }. It would keep "replace with" simple and "unfold" more powerful.

Jolanrensen · 2024-06-25T14:44:03Z

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/impl/api/unfold.kt

+    return when (kind()) {
+        ColumnKind.Group, ColumnKind.Frame -> this
+        else -> when {
+            skipPrimitive && isPrimitive() -> this


I was very confused, like how can you unfold a primitive? but it's an isPrimitive() which can be a collection too... Can we rename isPrimitive() to something like isPrimitiveOrListLike()? unfold seems to be the only operation using it

You can't. Have a look unfold primitive test. skipPrimitive = false is needed to make it work, and skipPrimitive = true is needed to avoid unpacking for example a column of String to ColumnGroup, size: Int, the same as we do for toDataFrame with overloads

Yes, but can you take a look at the isPrimitive() function? That function also returns true when you run in on a collection and an array. My suggestion was only to rename the isPrimitive() function.

Jolanrensen · 2024-06-25T14:53:21Z

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/unfold.kt

+public inline fun <reified T> DataColumn<T>.unfold(noinline body: CreateDataFrameDsl<T>.() -> Unit): AnyCol =
+    unfoldImpl(skipPrimitive = false, body)
+
+public inline fun <T, reified C> ReplaceClause<T, C>.unfold(vararg props: KProperty<*>, maxDepth: Int = 0): DataFrame<T> =


I think the name unfolding would read better, or byUnfolding/withUnfolded. replace {}.unfold {} doesn't read as a sentence anymore.

If it's possible, let's avoid motion or gravity to the native language, I believe, it's not a goal

koperagen · 2024-06-26T14:30:30Z

Unfold itself, by definition, already replaces a column with a new column

What definition?
Originally i wanted a add replace by function with AddDsl. Then i remembered that toDataFrame DSL is pretty similar to AddDsl and we already have unfold function, it just lacks an overload with a DSL. Such an overload must be a multiplex operation, right?
unfold by is interesting, but it will have exactly one operation probably forever. Sounds good, right, but its semantics is no different from replace.
I'd rather have one entry point (replace) for situations when you want to, like, replace a column with a different one.

Jolanrensen · 2024-06-26T16:14:25Z

What definition? Originally i wanted a add replace by function with AddDsl. Then i remembered that toDataFrame DSL is pretty similar to AddDsl and we already have unfold function, it just lacks an overload with a DSL. Such an overload must be a multiplex operation, right? unfold by is interesting, but it will have exactly one operation probably forever. Sounds good, right, but its semantics is no different from replace. I'd rather have one entry point (replace) for situations when you want to, like, replace a column with a different one.

I meant the definition of "unfolding", you're unfolding a column with its contents, so its type is bound to change, aka, the column is replaced.

I can see where you're coming from, but it still may be hard for users to have two different ways to unfold, namely .replace {}.unfold {} and .unfold() and both work a little bit different and have different arguments.

So I'd either:

make df.replace {}.byUnfolding {} (or a name like that) the only version of unfold and deprecate the old one
or make just df.unfold {} more powerful
or keep both for discoverability, but both should be equally powerful

wdyt?

koperagen · 2024-06-27T13:12:49Z

I'd say df.unfold should stay, because use case for simply unfolding a column with objects stays. Worth to add df.unfold(maxDepth = , roots = ) overload too, missed it.

So I'd either:
or make just df.unfold {} more powerful

Please write this API with needed overloads and a few examples of its usage then

Jolanrensen · 2024-07-01T09:39:16Z

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/unfold.kt

+public inline fun <reified T> DataColumn<T>.unfold(noinline body: CreateDataFrameDsl<T>.() -> Unit): AnyCol =
+    unfoldImpl(skipPrimitive = false, body)
+
+public inline fun <T, reified C> ReplaceClause<T, C>.unfold(vararg props: KProperty<*>, maxDepth: Int = 0): DataFrame<T> =


oh, we should also use KCallable instead of KProperty for java classes support :)

Jolanrensen · 2024-07-01T09:43:54Z

core/src/test/kotlin/org/jetbrains/kotlinx/dataframe/api/replace.kt

+    fun `unfold properties`() {
+        val col by columnOf(A("1", 123, B(3.0)))
+        val df1 = dataFrameOf(col)
+        val conv = df1.replace { col }.unfold(maxDepth = 2)


Not specifying maxDepth now breaks, while it worked before.
Try running df1.replace { col }.with { it.unfold() } before the PR and after it.
It works before, but now it gives: java.lang.UnsupportedOperationException: Can not get nested column 'd' from ValueColumn 'bb'

Jolanrensen · 2024-07-01T09:51:09Z

I'd say df.unfold should stay, because use case for simply unfolding a column with objects stays. Worth to add df.unfold(maxDepth = , roots = ) overload too, missed it.

So I'd either:
or make just df.unfold {} more powerful

Please write this API with needed overloads and a few examples of its usage then

I made a little sample of unfold {}.by {} with the same options as your version. Both df.unfold {} and df.unfold {}.by {} can be used. See the commit here for more details:
20cab7a

koperagen · 2024-07-01T12:20:19Z

UnfoldingDataFrame looks good, we can use it. Supporting it on the plugin side will require some changes, but it's ok. I'm only worried with return type being different than DataFrame people will be tempted to call df.unfold { col }.by(). It's first time intermediate object in a multiplex operation is also a DataFrame. Or the opposite problem: df.unfold { col }. will print all DataFrame API, with by being somewhere in completion list no different than let's say filter
But if we go this route, i'd also add
df.replace {}.by(CreateDataFrameDsl) (only this, without vararg props: KProperty<*> and maxDepth: Int = 0 overloads)

Jolanrensen · 2024-07-01T13:35:51Z

@koperagen Yes, I couldn't find another way to have a notation with 2 selectors and the second one being optional while keeping the DataFrame DSL style :/ but indeed, it's something new. A bit like .recursively().

Actually, since by is defined on UnfoldingDataFrame it should appear quite high on the list:

We could also put it inside the class to make it even more discoverable.

Alternatively, we could change the return-type of unfold to something non-dataframe-ish and let people call
df.unfold { a }.byReplacing() or something.

Imagine that XD df.replace { a }.byUnfolding() and df.unfold { a }.byReplacing() and it would do the same.

replace {}.by {} also looks interesting :)

zaleslaw · 2024-07-02T13:20:53Z

core/src/test/kotlin/org/jetbrains/kotlinx/dataframe/api/replace.kt

+
+    @Test
+    fun `unfold properties`() {
+        val col by columnOf(A("1", 123, B(3.0)))


Is this case "More fine-grained toDataFrame. Instead of converting 20-30 properties to 2-3 level of nesting all at once user can choose to convert toDataFrame(maxDepth = 0) and unfold required properties to whatever level they need" covered here, in this test?

Technically, yes. I intend to have a more representative example as a part of compiler plugin demo. There's a tree of objects with many properties and potentially deep nesting from konsist library. It will be a good illustration. But here it merely unfolds one specific column up to 2 levels.

zaleslaw · 2024-07-02T13:22:18Z

core/src/test/kotlin/org/jetbrains/kotlinx/dataframe/api/replace.kt

+        val a by columnOf("123")
+        val df = dataFrameOf(a)
+
+        val conv = df.replace { a }.unfold {


Could we use replace and unfold independently? If somehow yes, could you please add test for this, of only together, could be combined to one function?

zaleslaw · 2024-07-02T13:33:03Z

Honetsly, I like the idea from the use-case "More fine-grained toDataFrame. Instead of converting 20-30 properties to 2-3 level of nesting all at once user can choose to convert toDataFrame(maxDepth = 0) and unfold required properties to whatever level they need", defining a level of our unfolding (that is a special situation of convert)

Also I found that we have a lack of docs/examples for this operation https://kotlin.github.io/dataframe/unfold.html

Some crazy ideas

replace {}.by { ::unfold }
replace {}.by { ::unfoldDeeply }

Jolanrensen · 2024-07-23T12:22:18Z

Since it's a WIP I made it a draft for now

Jolanrensen · 2024-08-08T09:58:41Z

Just a thought :) We can actually have something like replace by unfolding without changing the replace API. Replace already works like replace with, meaning we can already write something like df.replace { "data"<ColumnGroup<*>>() }.with { it.unfold() } in the current state of the library. Maybe we could expand on DataColumn.unfold to provide the Add-DSL like notation you suggest :)

Something like:

df.replace { data }.with { 
    it.unfold {
         "b" from { it }
         "c" from { DataRow.readJsonStr("""{"prop": 1}""") }
    }
}

koperagen added 5 commits June 18, 2024 21:31

Add unfold overloads with customization parameters

5492921

[Compiler plugin] Interpreter catches exception thrown inside to repo…

abe7157

…rt them as compilation errors. Add special constructor for errors that shouldn't be caught

[Compiler plugin] Rework toDataFrame implementation

2cda967

Interpreters need an ability to pass arguments down to DSL, so introduce new "dsl" factory function

[Compiler plugin] exception should not be caught to be visible in tes…

18beb71

…ts failure messages

[Compiler plugin] support "replace unfold with DSL"

382d140

koperagen added the enhancement New feature or request label Jun 18, 2024

koperagen added this to the 0.14.0 milestone Jun 18, 2024

koperagen self-assigned this Jun 18, 2024

Jolanrensen self-requested a review June 25, 2024 13:35

Jolanrensen requested changes Jun 25, 2024

View reviewed changes

koperagen requested a review from Jolanrensen June 26, 2024 14:32

Jolanrensen reviewed Jul 1, 2024

View reviewed changes

zaleslaw reviewed Jul 2, 2024

View reviewed changes

Jolanrensen marked this pull request as draft July 23, 2024 12:22

Jolanrensen added the Compiler plugin Anything related to the DataFrame Compiler Plugin label Aug 8, 2024

koperagen modified the milestones: 0.14.0, 0.15.0 Sep 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend "unfold" operation and support it in the compiler plugin #742

Extend "unfold" operation and support it in the compiler plugin #742

koperagen commented Jun 18, 2024 •

edited

Loading

github-actions bot commented Jun 18, 2024

Jolanrensen left a comment

Jolanrensen Jun 25, 2024

koperagen Jun 26, 2024 •

edited

Loading

Jolanrensen Jun 27, 2024

Jolanrensen Jun 25, 2024

zaleslaw Jul 2, 2024

koperagen commented Jun 26, 2024 •

edited

Loading

Jolanrensen commented Jun 26, 2024 •

edited

Loading

koperagen commented Jun 27, 2024

Jolanrensen Jul 1, 2024

Jolanrensen Jul 1, 2024

Jolanrensen commented Jul 1, 2024

koperagen commented Jul 1, 2024 •

edited

Loading

Jolanrensen commented Jul 1, 2024 •

edited

Loading

zaleslaw Jul 2, 2024

koperagen Jul 5, 2024

zaleslaw Jul 2, 2024

zaleslaw commented Jul 2, 2024

Jolanrensen commented Jul 23, 2024

Jolanrensen commented Aug 8, 2024

Extend "unfold" operation and support it in the compiler plugin #742

Are you sure you want to change the base?

Extend "unfold" operation and support it in the compiler plugin #742

Conversation

koperagen commented Jun 18, 2024 • edited Loading

github-actions bot commented Jun 18, 2024

Jolanrensen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

koperagen Jun 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

koperagen commented Jun 26, 2024 • edited Loading

Jolanrensen commented Jun 26, 2024 • edited Loading

koperagen commented Jun 27, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Jolanrensen commented Jul 1, 2024

koperagen commented Jul 1, 2024 • edited Loading

Jolanrensen commented Jul 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zaleslaw commented Jul 2, 2024

Jolanrensen commented Jul 23, 2024

Jolanrensen commented Aug 8, 2024

koperagen commented Jun 18, 2024 •

edited

Loading

koperagen Jun 26, 2024 •

edited

Loading

koperagen commented Jun 26, 2024 •

edited

Loading

Jolanrensen commented Jun 26, 2024 •

edited

Loading

koperagen commented Jul 1, 2024 •

edited

Loading

Jolanrensen commented Jul 1, 2024 •

edited

Loading