Writing a small description of each CS DSL function in the documentation

Kotlin · May 31, 2024 · 3260d52 · 3260d52
1 parent b7e9f59
commit 3260d52
Showing 1 changed file with 190 additions and 1 deletion.
diff --git a/docs/StardustDocs/topics/ColumnSelectors.md b/docs/StardustDocs/topics/ColumnSelectors.md
@@ -2,7 +2,7 @@
 
 <!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Access-->
 
-[`DataFrame`](DataFrame.md) provides a DSL for selecting an arbitrary set of columns.
+[`DataFrame`](DataFrame.md) provides a DSL for selecting an arbitrary set of columns: the Columns Selection DSL.
 
 Column selectors are used in many operations:
 
@@ -39,6 +39,195 @@ df.move { name.firstName and name.lastName }.after { city }
 </tab>
 </tabs>
 
+#### Functions Overview:
+
+##### First (Col), Last (Col), Single (Col)
+`first {}`, `firstCol()`, `last {}`, `lastCol()`, `single {}`, `singleCol()`
+
+Returns the first, last, or single column from the top-level, specified [column group](DataColumn.md#columngroup), 
+or `ColumnSet` that adheres to the optional given condition. If no column adheres to the given condition,
+`NoSuchElementException` is thrown.
+
+##### Col
+`col(name)`, `col(5)`, `this[5]`
+
+Creates a [ColumnAccessor](DataColumn.md#column-accessors) (or `SingleColumn`) for a column with the given 
+argument from the top-level or specified [column group](DataColumn.md#columngroup). The argument can be either an 
+index (`Int`) or a reference to a column (`String`, `ColumnPath`, `KProperty`, or `ColumnAccessor`;
+any [AccessApi](apiLevels.md)).
+
+##### Value Col, Frame Col, Col Group
+`valueCol(name)`, `valueCol(5)`, `frameCol(name)`, `frameCol(5)`, `colGroup(name)`, `colGroup(5)`
+
+Creates a [ColumnAccessor](DataColumn.md#column-accessors) (or `SingleColumn`) for a 
+[value column](DataColumn.md#valuecolumn) / [frame column](DataColumn.md#framecolumn) / 
+[column group](DataColumn.md#columngroup) with the given argument from the top-level or
+specified [column group](DataColumn.md#columngroup). The argument can be either an index (`Int`) or a reference
+to a column (`String`, `ColumnPath`, `KProperty`, or `ColumnAccessor`; any [AccessApi](apiLevels.md)).
+The functions can be both typed and untyped (in case you're supplying a column name, -path, or index).
+These functions throw an `IllegalArgumentException` if the column found is not the right kind.
+
+##### Cols
+`cols {}`, `cols()`, `cols(colA, colB)`, `cols(1, 5)`, `cols(1..5)`, `[{}]`
+
+Creates a subset of columns (`ColumnSet`) from the top-level, specified [column group](DataColumn.md#columngroup),
+or `ColumnSet`.
+You can use either a `ColumnFilter`, or any of the `vararg` overloads for any [AccessApi](apiLevels.md).
+The function can be both typed and untyped (in case you're supplying a column name, -path, or index (range)).
+
+##### Range of Columns
+`colA.."colB"`
+
+Creates a `ColumnSet` containing all columns from `colA` to `colB` (inclusive) from the top-level.
+Columns inside [column groups](DataColumn.md#columngroup) are also supported
+(as long as they share the same direct parent), as well as any combination of [AccessApi](apiLevels.md).
+
+##### Value Columns, Frame Columns, Column Groups
+`valueCols {}`, `valueCols()`, `frameCols {}`, `frameCols()`, `colGroups {}`, `colGroups()`
+
+Creates a subset of columns (`ColumnSet`) from the top-level, specified [column group](DataColumn.md#columngroup),
+or `ColumnSet` containing only [value columns](DataColumn.md#valuecolumn) / [frame columns](DataColumn.md#framecolumn) / 
+[column groups](DataColumn.md#columngroup) that adhere to the optional condition.
+
+##### Cols of Kind
+`colsOfKind(Value, Frame) {}`, `colsOfKind(Group, Frame)`
+
+Creates a subset of columns (`ColumnSet`) from the top-level, specified [column group](DataColumn.md#columngroup),
+or `ColumnSet` containing only columns of the specified kind(s) that adhere to the optional condition.
+
+##### All (Cols)
+`all()`, `allCols()`
+
+Creates a `ColumnSet` containing all columns from the top-level, specified [column group](DataColumn.md#columngroup),
+or `ColumnSet`. This is the opposite of `none()` and equivalent to `cols()` without filter.
+Note, on [column groups](DataColumn.md#columngroup), `all` is named `allCols` instead to avoid confusion.
+
+##### All (Cols) After, -Before, -From, -Up To
+`allAfter(colA)`, `allBefore(colA)`, `allColsFrom(colA)`, `allColsUpTo(colA)`
+
+Creates a `ColumnSet` containing a subset of columns from the top-level, 
+specified [column group](DataColumn.md#columngroup), or `ColumnSet`.
+The subset includes:
+- `all(Cols)Before(colA)`: All columns before the specified column, excluding that column.
+- `all(Cols)After(colA)`: All columns after the specified column, excluding that column.
+- `all(Cols)From(colA)`: All columns from the specified column, including that column.
+- `all(Cols)UpTo(colA)`: All columns up to the specified column, including that column.
+
+NOTE: The `{}` overloads of these functions in the Plain DSL and on [column groups](DataColumn.md#columngroup) 
+are a `ColumnSelector` (relative to the receiver).
+On `ColumnSets` they are a `ColumnFilter` instead.
+
+##### Cols at any Depth
+`colsAtAnyDepth {}`, `colsAtAnyDepth()`
+
+Creates a `ColumnSet` containing all columns from the top-level, specified [column group](DataColumn.md#columngroup),
+or `ColumnSet` at any depth if they satisfy the optional given predicate. This means that columns (of all three kinds!)
+nested inside [column groups](DataColumn.md#columngroup) are also included.
+This function can also be followed by another `ColumnSet` filter-function like `colsOf<>()`, `single()`,
+or `valueCols()`.
+
+**For example:**
+
+Depth-first search to a column containing the value "Alice":
+
+`df.select { colsAtAnyDepth().first { "Alice" in it.values() } }`
+
+The columns at any depth excluding the top-level:
+
+`df.select { colGroups().colsAtAnyDepth() }`
+
+All [value-](DataColumn.md#valuecolumn) and [frame columns](DataColumn.md#framecolumn) at any depth:
+
+`df.select { colsAtAnyDepth { !it.isColumnGroup } }`
+
+All value columns at any depth nested under a column group named "myColGroup":
+
+`df.select { myColGroup.colsAtAnyDepth().valueCols() }`
+
+
+**Converting from deprecated syntax:**
+
+`dfs { condition }` -> `colsAtAnyDepth { condition }`
+
+`allDfs(includeGroups = false)` -> `colsAtAnyDepth { includeGroups || !it.isColumnGroup() }`
+
+`dfsOf<Type> { condition }` -> `colsAtAnyDepth().colsOf<Type> { condition }`
+
+`cols { condition }.recursively()` -> `colsAtAnyDepth { condition }`
+
+`first { condition }.rec()` -> `colsAtAnyDepth { condition }.first()`
+
+`all().recursively()` -> `colsAtAnyDepth()`
+
+##### Cols in Groups
+`colsInGroups {}`, `colsInGroups()`
+
+Creates a `ColumnSet` containing all columns that are nested in the [column groups](DataColumn.md#columngroup) at 
+the top-level, specified [column group](DataColumn.md#columngroup), or `ColumnSet` adhering to an optional predicate.
+This is useful if you want to select all columns that are "one level down".
+
+This function used to be called `children()` in the past.
+
+**For example:**
+
+To get the columns inside all [column groups](DataColumn.md#columngroup) in a [dataframe](DataFrame.md),
+instead of having to write:
+
+`df.select { colGroupA.cols() and colGroupB.cols() ... }`
+
+you can use:
+
+`df.select { colsInGroups() }`
+
+or with filter:
+
+`df.select { colsInGroups { "user" in it.name } }`
+
+Similarly, you can take the columns inside all [column groups](DataColumn.md#columngroup) in a `ColumnSet`:
+
+`df.select { colGroups { "my" in it.name }.colsInGroups() }`
+
+##### Take (Last) (Cols) (While)
+`take(5)`, `takeLastCols(2)`, `takeLastWhile {}`, `takeColsWhile {}`,
+
+Creates a `ColumnSet` containing the first / last `n` columns from the top-level, 
+specified [column group](DataColumn.md#columngroup), or `ColumnSet` or those that adhere to the given condition.
+Note, to avoid ambiguity, `take` is called `takeCols` when called on a [column group](DataColumn.md#columngroup).
+
+##### Drop (Last) (Cols) (While)
+`drop(5)`, `dropLastCols(2)`, `dropLastWhile {}`, `dropColsWhile {}`
+
+Creates a `ColumnSet` without the first / last `n` columns from the top-level,
+specified [column group](DataColumn.md#columngroup), or `ColumnSet` or those that adhere to the given condition.
+Note, to avoid ambiguity, `drop` is called `dropCols` when called on a [column group](DataColumn.md#columngroup).
+
+##### Select from [Column Group](DataColumn.md#columngroup)
+`colGroupA.select {}`, `"colGroupA" {}`
+
+Creates a `ColumnSet` containing the columns selected by the provided `ColumnsSelector` relative to the specified
+[column group](DataColumn.md#columngroup). In practice, this means you're opening a new selection scope inside a 
+[column group](DataColumn.md#columngroup) and selecting columns from there.
+The selected columns are referenced individually and "unpacked" from their parent
+[column group](DataColumn.md#columngroup).
+
+**For example:**
+
+Select `myColGroup.someCol` and all `String` columns from `myColGroup`:
+
+`df.select { myColGroup.select { someCol and colsOf<String>() } }`
+
+
+
+`df.select { "myGroupCol" { "colA" and expr("newCol") { colB + 1 } } }`
+
+`df.select { "pathTo"["myGroupCol"].select { "colA" and "colB" } }`
+
+`df.select { it["myGroupCol"].asColumnGroup()() { "colA" and "colB" } }`
+
+TODO
+
+#### Examples:
+
 **Select columns by name:**
 
 <!---FUN columnSelectors-->