Skip to content

Commit

Permalink
Writing a small description of each CS DSL function in the documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
Jolanrensen committed May 31, 2024
1 parent b7e9f59 commit 3260d52
Showing 1 changed file with 190 additions and 1 deletion.
191 changes: 190 additions & 1 deletion docs/StardustDocs/topics/ColumnSelectors.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Access-->

[`DataFrame`](DataFrame.md) provides a DSL for selecting an arbitrary set of columns.
[`DataFrame`](DataFrame.md) provides a DSL for selecting an arbitrary set of columns: the Columns Selection DSL.

Column selectors are used in many operations:

Expand Down Expand Up @@ -39,6 +39,195 @@ df.move { name.firstName and name.lastName }.after { city }
</tab>
</tabs>

#### Functions Overview:

##### First (Col), Last (Col), Single (Col)
`first {}`, `firstCol()`, `last {}`, `lastCol()`, `single {}`, `singleCol()`

Returns the first, last, or single column from the top-level, specified [column group](DataColumn.md#columngroup),
or `ColumnSet` that adheres to the optional given condition. If no column adheres to the given condition,
`NoSuchElementException` is thrown.

##### Col
`col(name)`, `col(5)`, `this[5]`

Creates a [ColumnAccessor](DataColumn.md#column-accessors) (or `SingleColumn`) for a column with the given
argument from the top-level or specified [column group](DataColumn.md#columngroup). The argument can be either an
index (`Int`) or a reference to a column (`String`, `ColumnPath`, `KProperty`, or `ColumnAccessor`;
any [AccessApi](apiLevels.md)).

##### Value Col, Frame Col, Col Group
`valueCol(name)`, `valueCol(5)`, `frameCol(name)`, `frameCol(5)`, `colGroup(name)`, `colGroup(5)`

Creates a [ColumnAccessor](DataColumn.md#column-accessors) (or `SingleColumn`) for a
[value column](DataColumn.md#valuecolumn) / [frame column](DataColumn.md#framecolumn) /
[column group](DataColumn.md#columngroup) with the given argument from the top-level or
specified [column group](DataColumn.md#columngroup). The argument can be either an index (`Int`) or a reference
to a column (`String`, `ColumnPath`, `KProperty`, or `ColumnAccessor`; any [AccessApi](apiLevels.md)).
The functions can be both typed and untyped (in case you're supplying a column name, -path, or index).
These functions throw an `IllegalArgumentException` if the column found is not the right kind.

##### Cols
`cols {}`, `cols()`, `cols(colA, colB)`, `cols(1, 5)`, `cols(1..5)`, `[{}]`

Creates a subset of columns (`ColumnSet`) from the top-level, specified [column group](DataColumn.md#columngroup),
or `ColumnSet`.
You can use either a `ColumnFilter`, or any of the `vararg` overloads for any [AccessApi](apiLevels.md).
The function can be both typed and untyped (in case you're supplying a column name, -path, or index (range)).

##### Range of Columns
`colA.."colB"`

Creates a `ColumnSet` containing all columns from `colA` to `colB` (inclusive) from the top-level.
Columns inside [column groups](DataColumn.md#columngroup) are also supported
(as long as they share the same direct parent), as well as any combination of [AccessApi](apiLevels.md).

##### Value Columns, Frame Columns, Column Groups
`valueCols {}`, `valueCols()`, `frameCols {}`, `frameCols()`, `colGroups {}`, `colGroups()`

Creates a subset of columns (`ColumnSet`) from the top-level, specified [column group](DataColumn.md#columngroup),
or `ColumnSet` containing only [value columns](DataColumn.md#valuecolumn) / [frame columns](DataColumn.md#framecolumn) /
[column groups](DataColumn.md#columngroup) that adhere to the optional condition.

##### Cols of Kind
`colsOfKind(Value, Frame) {}`, `colsOfKind(Group, Frame)`

Creates a subset of columns (`ColumnSet`) from the top-level, specified [column group](DataColumn.md#columngroup),
or `ColumnSet` containing only columns of the specified kind(s) that adhere to the optional condition.

##### All (Cols)
`all()`, `allCols()`

Creates a `ColumnSet` containing all columns from the top-level, specified [column group](DataColumn.md#columngroup),
or `ColumnSet`. This is the opposite of `none()` and equivalent to `cols()` without filter.
Note, on [column groups](DataColumn.md#columngroup), `all` is named `allCols` instead to avoid confusion.

##### All (Cols) After, -Before, -From, -Up To
`allAfter(colA)`, `allBefore(colA)`, `allColsFrom(colA)`, `allColsUpTo(colA)`

Creates a `ColumnSet` containing a subset of columns from the top-level,
specified [column group](DataColumn.md#columngroup), or `ColumnSet`.
The subset includes:
- `all(Cols)Before(colA)`: All columns before the specified column, excluding that column.
- `all(Cols)After(colA)`: All columns after the specified column, excluding that column.
- `all(Cols)From(colA)`: All columns from the specified column, including that column.
- `all(Cols)UpTo(colA)`: All columns up to the specified column, including that column.

NOTE: The `{}` overloads of these functions in the Plain DSL and on [column groups](DataColumn.md#columngroup)
are a `ColumnSelector` (relative to the receiver).
On `ColumnSets` they are a `ColumnFilter` instead.

##### Cols at any Depth
`colsAtAnyDepth {}`, `colsAtAnyDepth()`

Creates a `ColumnSet` containing all columns from the top-level, specified [column group](DataColumn.md#columngroup),
or `ColumnSet` at any depth if they satisfy the optional given predicate. This means that columns (of all three kinds!)
nested inside [column groups](DataColumn.md#columngroup) are also included.
This function can also be followed by another `ColumnSet` filter-function like `colsOf<>()`, `single()`,
or `valueCols()`.

**For example:**

Depth-first search to a column containing the value "Alice":

`df.select { colsAtAnyDepth().first { "Alice" in it.values() } }`

The columns at any depth excluding the top-level:

`df.select { colGroups().colsAtAnyDepth() }`

All [value-](DataColumn.md#valuecolumn) and [frame columns](DataColumn.md#framecolumn) at any depth:

`df.select { colsAtAnyDepth { !it.isColumnGroup } }`

All value columns at any depth nested under a column group named "myColGroup":

`df.select { myColGroup.colsAtAnyDepth().valueCols() }`


**Converting from deprecated syntax:**

`dfs { condition }` -> `colsAtAnyDepth { condition }`

`allDfs(includeGroups = false)` -> `colsAtAnyDepth { includeGroups || !it.isColumnGroup() }`

`dfsOf<Type> { condition }` -> `colsAtAnyDepth().colsOf<Type> { condition }`

`cols { condition }.recursively()` -> `colsAtAnyDepth { condition }`

`first { condition }.rec()` -> `colsAtAnyDepth { condition }.first()`

`all().recursively()` -> `colsAtAnyDepth()`

##### Cols in Groups
`colsInGroups {}`, `colsInGroups()`

Creates a `ColumnSet` containing all columns that are nested in the [column groups](DataColumn.md#columngroup) at
the top-level, specified [column group](DataColumn.md#columngroup), or `ColumnSet` adhering to an optional predicate.
This is useful if you want to select all columns that are "one level down".

This function used to be called `children()` in the past.

**For example:**

To get the columns inside all [column groups](DataColumn.md#columngroup) in a [dataframe](DataFrame.md),
instead of having to write:

`df.select { colGroupA.cols() and colGroupB.cols() ... }`

you can use:

`df.select { colsInGroups() }`

or with filter:

`df.select { colsInGroups { "user" in it.name } }`

Similarly, you can take the columns inside all [column groups](DataColumn.md#columngroup) in a `ColumnSet`:

`df.select { colGroups { "my" in it.name }.colsInGroups() }`

##### Take (Last) (Cols) (While)
`take(5)`, `takeLastCols(2)`, `takeLastWhile {}`, `takeColsWhile {}`,

Creates a `ColumnSet` containing the first / last `n` columns from the top-level,
specified [column group](DataColumn.md#columngroup), or `ColumnSet` or those that adhere to the given condition.
Note, to avoid ambiguity, `take` is called `takeCols` when called on a [column group](DataColumn.md#columngroup).

##### Drop (Last) (Cols) (While)
`drop(5)`, `dropLastCols(2)`, `dropLastWhile {}`, `dropColsWhile {}`

Creates a `ColumnSet` without the first / last `n` columns from the top-level,
specified [column group](DataColumn.md#columngroup), or `ColumnSet` or those that adhere to the given condition.
Note, to avoid ambiguity, `drop` is called `dropCols` when called on a [column group](DataColumn.md#columngroup).

##### Select from [Column Group](DataColumn.md#columngroup)
`colGroupA.select {}`, `"colGroupA" {}`

Creates a `ColumnSet` containing the columns selected by the provided `ColumnsSelector` relative to the specified
[column group](DataColumn.md#columngroup). In practice, this means you're opening a new selection scope inside a
[column group](DataColumn.md#columngroup) and selecting columns from there.
The selected columns are referenced individually and "unpacked" from their parent
[column group](DataColumn.md#columngroup).

**For example:**

Select `myColGroup.someCol` and all `String` columns from `myColGroup`:

`df.select { myColGroup.select { someCol and colsOf<String>() } }`



`df.select { "myGroupCol" { "colA" and expr("newCol") { colB + 1 } } }`

`df.select { "pathTo"["myGroupCol"].select { "colA" and "colB" } }`

`df.select { it["myGroupCol"].asColumnGroup()() { "colA" and "colB" } }`

TODO

#### Examples:

**Select columns by name:**

<!---FUN columnSelectors-->
Expand Down

0 comments on commit 3260d52

Please sign in to comment.