Skip to content

Commit

Permalink
Updated mentions of DataFrame to represent objects (#664)
Browse files Browse the repository at this point in the history
* Update mentions of DataFrame to represent objects

* Improve DataColumn.md documentation clarity
  • Loading branch information
zaleslaw authored Apr 22, 2024
1 parent 41577df commit 7de6022
Show file tree
Hide file tree
Showing 22 changed files with 74 additions and 64 deletions.
15 changes: 9 additions & 6 deletions docs/StardustDocs/topics/DataColumn.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,15 @@
[//]: # (title: DataColumn)
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Create-->

[`DataColumn`](DataColumn.md) represents a column of values. It can store objects of primitive or reference types, or other [`DataFrames`](DataFrame.md).
[`DataColumn`](DataColumn.md) represents a column of values.
It can store objects of primitive or reference types,
or other [`DataFrame`](DataFrame.md) objects.

See [how to create columns](createColumn.md)

### Properties
* `name: String` — name of the column, should be unique within containing dataframe
* `path: ColumnPath` — path to the column, depends on the way column was retrieved from dataframe
* `name: String` — name of the column; should be unique within containing dataframe
* `path: ColumnPath` — path to the column; depends on the way column was retrieved from dataframe
* `type: KType` — type of elements in the column
* `hasNulls: Boolean` — flag indicating whether column contains `null` values
* `values: Iterable<T>` — column data
Expand All @@ -20,17 +22,18 @@ See [how to create columns](createColumn.md)

Represents a sequence of values.

It can store values of primitive (integers, strings, decimals etc.) or reference types. Currently, it uses [`List`](https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.collections/-list/) as underlying data storage.
It can store values of primitive (integers, strings, decimals, etc.) or reference types.
Currently, it uses [`List`](https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.collections/-list/) as underlying data storage.

#### ColumnGroup

Container for nested columns. Is used to create column hierarchy.

#### FrameColumn

Special case of [`ValueColumn`](#valuecolumn) that stores other [`DataFrames`](DataFrame.md) as elements.
Special case of [`ValueColumn`](#valuecolumn) that stores another [`DataFrame`](DataFrame.md) objects as elements.

[`DataFrames`](DataFrame.md) stored in [`FrameColumn`](DataColumn.md#framecolumn) may have different schemas.
[`DataFrame`](DataFrame.md) stored in [`FrameColumn`](DataColumn.md#framecolumn) may have different schemas.

[`FrameColumn`](DataColumn.md#framecolumn) may appear after [reading](read.md) from JSON or other hierarchical data structures, or after grouping operations such as [groupBy](groupBy.md) or [pivot](pivot.md).

Expand Down
16 changes: 8 additions & 8 deletions docs/StardustDocs/topics/DataFrame.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,13 @@

[`DataFrame`](DataFrame.md) represents a list of [`DataColumn`](DataColumn.md).

Columns in dataframe must have equal size and unique names.
Columns in [`DataFrame`](DataFrame.md) must have equal size and unique names.

**Learn how to:**
- [Create dataframe](createDataFrame.md)
- [Read dataframe](read.md)
- [Get an overview of dataframe](info.md)
- [Access data in dataframe](access.md)
- [Modify data in dataframe](modify.md)
- [Compute statistics for dataframe](summaryStatistics.md)
- [Combine several dataframes](multipleDataFrames.md)
- [Create DataFrame](createDataFrame.md)
- [Read DataFrame](read.md)
- [Get an overview of DataFrame](info.md)
- [Access data in DataFrame](access.md)
- [Modify data in DataFrame](modify.md)
- [Compute statistics for DataFrame](summaryStatistics.md)
- [Combine several DataFrame objects](multipleDataFrames.md)
10 changes: 5 additions & 5 deletions docs/StardustDocs/topics/DataRow.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,19 +11,19 @@
* `prev(): DataRow?` — previous row (`null` for the first row)
* `next(): DataRow?` — next row (`null` for the last row)
* `diff(T) { rowExpression }: T / diffOrNull { rowExpression }: T?` — difference between the results of a [row expression](DataRow.md#row-expressions) calculated for current and previous rows
* `explode(columns): DataFrame<T>` — spread lists and [`DataFrames`](DataFrame.md) vertically into new rows
* `explode(columns): DataFrame<T>` — spread lists and [`DataFrame`](DataFrame.md) objects vertically into new rows
* `values(): List<Any?>` — list of all cell values from the current row
* `valuesOf<T>(): List<T>` — list of values of the given type
* `columnsCount(): Int` — number of columns
* `columnNames(): List<String>` — list of all column names
* `columnTypes(): List<KType>` — list of all column types
* `namedValues(): List<NameValuePair<Any?>>` — list of name-value pairs where `name` is a column name and `value` is cell value
* `namedValuesOf<T>(): List<NameValuePair<T>>` — list of name-value pairs where value has given type
* `transpose(): DataFrame<NameValuePair<*>>`dataframe of two columns: `name: String` is column names and `value: Any?` is cell values
* `transposeTo<T>(): DataFrame<NameValuePair<T>>`dataframe of two columns: `name: String` is column names and `value: T` is cell values
* `transpose(): DataFrame<NameValuePair<*>>`[`DataFrame`](DataFrame.md) of two columns: `name: String` is column names and `value: Any?` is cell values
* `transposeTo<T>(): DataFrame<NameValuePair<T>>`[`DataFrame`](DataFrame.md) of two columns: `name: String` is column names and `value: T` is cell values
* `getRow(Int): DataRow` — row from [`DataFrame`](DataFrame.md) by row index
* `getRows(Iterable<Int>): DataFrame`dataframe with subset of rows selected by absolute row index.
* `relative(Iterable<Int>): DataFrame`dataframe with subset of rows selected by relative row index: `relative(-1..1)` will return previous, current and next row. Requested indices will be coerced to the valid range and invalid indices will be skipped
* `getRows(Iterable<Int>): DataFrame`[`DataFrame`](DataFrame.md) with subset of rows selected by absolute row index.
* `relative(Iterable<Int>): DataFrame`[`DataFrame`](DataFrame.md) with subset of rows selected by relative row index: `relative(-1..1)` will return previous, current and next row. Requested indices will be coerced to the valid range and invalid indices will be skipped
* `getValue<T>(columnName)` — cell value of type `T` by this row and given `columnName`
* `getValueOrNull<T>(columnName)` — cell value of type `T?` by this row and given `columnName` or `null` if there's no such column
* `get(column): T` — cell value by this row and given `column`
Expand Down
2 changes: 1 addition & 1 deletion docs/StardustDocs/topics/addDf.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Modify-->

Returns [`DataFrame`](DataFrame.md) with union of columns from several given [`DataFrames`](DataFrame.md).
Returns [`DataFrame`](DataFrame.md) with union of columns from several given [`DataFrame`](DataFrame.md) objects.

<!---FUN addDataFrames-->

Expand Down
10 changes: 5 additions & 5 deletions docs/StardustDocs/topics/concat.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Modify-->

Returns a [`DataFrame`](DataFrame.md) with the union of rows from several given [`DataFrames`](DataFrame.md).
Returns a [`DataFrame`](DataFrame.md) with the union of rows from several given [`DataFrame`](DataFrame.md) objects.

`concat` is available for:

Expand Down Expand Up @@ -91,14 +91,14 @@ frameColumn.concat()

<!---END-->

If you want to take the union of columns (not rows) from several [`DataFrames`](DataFrame.md), see [`add`](add.md).
If you want to take the union of columns (not rows) from several [`DataFrame`](DataFrame.md) objects, see [`add`](add.md).

## Schema unification

If input [`DataFrames`](DataFrame.md) have different schemas, every column in the resulting [`DataFrames`](DataFrame.md)
If input [`DataFrame`](DataFrame.md) objects have different schemas, every column in the resulting [`DataFrame`](DataFrame.md)
will get the lowest common type of the original columns with the same name.

For example, if one [`DataFrame`](DataFrame.md) has a column `A: Int` and another [`DataFrame`](DataFrame.md) has a column `A: Double`,
the resulting ` DataFrame ` will have a column `A: Number`.
the resulting [`DataFrame`](DataFrame.md) will have a column `A: Number`.

Missing columns in dataframes will be filled with `null`.
Missing columns in [`DataFrame`](DataFrame.md) objects will be filled with `null`.
2 changes: 1 addition & 1 deletion docs/StardustDocs/topics/concatDf.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Modify-->

Returns [`DataFrame`](DataFrame.md) with the union of rows from several given [`DataFrames`](DataFrame.md).
Returns [`DataFrame`](DataFrame.md) with the union of rows from several given [`DataFrame`](DataFrame.md) objects.

<!---FUN concatDataFrames-->

Expand Down
4 changes: 2 additions & 2 deletions docs/StardustDocs/topics/create.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@
<show-structure depth="3"/>
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Create-->

There are several ways to create [`dataframes`](DataFrame.md) from data that is already loaded into memory:
There are several ways to create [`DataFrame`](DataFrame.md) objects from data that is already loaded into memory:
* [create columns with data](createColumn.md) and then [bundle them](createDataFrame.md) into a [`DataFrame`](DataFrame.md)
* create and initialize [`DataFrame`](DataFrame.md) directly from values using `vararg` variants of the [corresponding functions](createDataFrame.md).
* [convert Kotlin objects](createDataFrame.md#todataframe) into [`DataFrame`](DataFrame.md)

To learn how to read [`dataframes`](DataFrame.md) from files and URLs, go to the [next section](read.md).
To learn how to read dataframes from files and URLs, go to the [next section](read.md).
2 changes: 1 addition & 1 deletion docs/StardustDocs/topics/createColumn.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ val fullName by columnOf(firstName, lastName)

<!---END-->

When column elements are [`DataFrames`](DataFrame.md) it returns a [`FrameColumn`](DataColumn.md#framecolumn):
When column elements are [`DataFrame`](DataFrame.md) objects it returns a [`FrameColumn`](DataColumn.md#framecolumn):

<!---FUN createFrameColumn-->

Expand Down
5 changes: 3 additions & 2 deletions docs/StardustDocs/topics/createDataFrame.md
Original file line number Diff line number Diff line change
Expand Up @@ -218,8 +218,9 @@ val df = students.toDataFrame {

### DynamicDataFrameBuilder

Previously mentioned dataframe constructors throw an exception when column names are duplicated.
When implementing a custom operation involving multiple dataframes, or computed columns or when parsing some third-party data,
Previously mentioned [`DataFrame`](DataFrame.md) constructors throw an exception when column names are duplicated.
When implementing a custom operation involving multiple [`DataFrame`](DataFrame.md) objects,
or computed columns or when parsing some third-party data,
it might be desirable to disambiguate column names instead of throwing an exception.

<!---FUN duplicatedColumns-->
Expand Down
2 changes: 1 addition & 1 deletion docs/StardustDocs/topics/explode.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ explode(dropEmpty = true) [ { columns } ]
```

**Parameters:**
* `dropEmpty` — if `true`, removes rows with empty lists or dataframes. Otherwise, they will be exploded into `null`.
* `dropEmpty` — if `true`, removes rows with empty lists or [`DataFrame`](DataFrame.md) objects. Otherwise, they will be exploded into `null`.

**Available for:**
* [`DataFrame`](DataFrame.md)
Expand Down
4 changes: 2 additions & 2 deletions docs/StardustDocs/topics/explodeImplode.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
[//]: # (title: Explode / implode columns)

* [`explode`](explode.md) — distributes lists of values or [`DataFrames`](DataFrame.md) in given columns vertically, replicating data in other columns
* [`implode`](implode.md) — collects column values in given columns into lists or [`DataFrames`](DataFrame.md), grouping by other columns
* [`explode`](explode.md) — distributes lists of values or [`DataFrame`](DataFrame.md) object in given columns vertically, replicating data in other columns
* [`implode`](implode.md) — collects column values in given columns into lists or [`DataFrame`](DataFrame.md) objects, grouping by other columns
2 changes: 1 addition & 1 deletion docs/StardustDocs/topics/extensionPropertiesApi.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ In notebooks, extension properties are generated for [`DataSchema`](schemas.md)
instance after REPL line execution.
After that [`DataFrame`](DataFrame.md) variable is typed with its own [`DataSchema`](schemas.md), so only valid extension properties corresponding to actual columns in DataFrame will be allowed by the compiler and suggested by completion.

Extension properties can be generated in IntelliJ IDEA using the [Kotlin Dataframe Gradle plugin](schemasGradle.md#configuration).
Extension properties can be generated in IntelliJ IDEA using the [Kotlin DataFrame Gradle plugin](schemasGradle.md#configuration).

<warning>
In notebooks generated properties won't appear and be updated until the cell has been executed. It often means that you have to introduce new variable frequently to sync extension properties with actual schema
Expand Down
2 changes: 1 addition & 1 deletion docs/StardustDocs/topics/groupByConcat.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
[//]: # (title: GroupBy / concat rows)

* [`groupBy`](groupBy.md) — groups rows of [`DataFrame`](DataFrame.md) by given key columns.
* [`concat`](concat.md) — concatenates rows from several [`DataFrames`](DataFrame.md) into single [`DataFrame`](DataFrame.md).
* [`concat`](concat.md) — concatenates rows from several [`DataFrame`](DataFrame.md) objects into single [`DataFrame`](DataFrame.md).
10 changes: 5 additions & 5 deletions docs/StardustDocs/topics/join.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Join-->

Joins two [`DataFrames`](DataFrame.md) by join columns.
Joins two [`DataFrame`](DataFrame.md) object by join columns.

```kotlin
join(otherDf, type = JoinType.Inner) [ { joinColumns } ]
Expand Down Expand Up @@ -79,7 +79,7 @@ df.join(other, "name", "city")
<dataFrame src="org.jetbrains.kotlinx.dataframe.samples.api.Join.join.html"/>
<!---END-->

If `joinColumns` is not specified, columns with the same name from both [`DataFrames`](DataFrame.md) will be used as join columns:
If `joinColumns` is not specified, columns with the same name from both [`DataFrame`](DataFrame.md) objects will be used as join columns:

<!---FUN joinDefault-->

Expand All @@ -93,12 +93,12 @@ df.join(other)
### Join types

Supported join types:
* `Inner` (default) — only matched rows from left and right [`DataFrames`](DataFrame.md)
* `Inner` (default) — only matched rows from left and right [`DataFrame`](DataFrame.md) objects
* `Filter` — only matched rows from left [`DataFrame`](DataFrame.md)
* `Left` — all rows from left [`DataFrame`](DataFrame.md), mismatches from right [`DataFrame`](DataFrame.md) filled with `null`
* `Right` — all rows from right [`DataFrame`](DataFrame.md), mismatches from left [`DataFrame`](DataFrame.md) filled with `null`
* `Full` — all rows from left and right [`DataFrames`](DataFrame.md), any mismatches filled with `null`
* `Exclude` — only mismatched rows from left
* `Full` — all rows from left and right [`DataFrame`](DataFrame.md) objects, any mismatches filled with `null`
* `Exclude` — only mismatched rows from left [`DataFrame`](DataFrame.md)

For every join type there is a shortcut operation:

Expand Down
14 changes: 8 additions & 6 deletions docs/StardustDocs/topics/joinWith.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.JoinWith-->

Joins two [`DataFrames`](DataFrame.md) by a join expression.
Joins two [`DataFrame`](DataFrame.md) objects by a join expression.

```kotlin
joinWith(otherDf, type = JoinType.Inner) { joinExpression }
Expand All @@ -29,11 +29,11 @@ For example, you can match rows based on:
### Join types with examples

Supported join types:
* `Inner` (default) — only matched rows from left and right [`DataFrames`](DataFrame.md)
* `Inner` (default) — only matched rows from left and right [`DataFrame`](DataFrame.md) objects
* `Filter` — only matched rows from left [`DataFrame`](DataFrame.md)
* `Left` — all rows from left [`DataFrame`](DataFrame.md), mismatches from right [`DataFrame`](DataFrame.md) filled with `null`
* `Right` — all rows from right [`DataFrame`](DataFrame.md), mismatches from left [`DataFrame`](DataFrame.md) filled with `null`
* `Full` — all rows from left and right [`DataFrames`](DataFrame.md), any mismatches filled with `null`
* `Full` — all rows from left and right [`DataFrame`](DataFrame.md) objects, any mismatches filled with `null`
* `Exclude` — only mismatched rows from left

For every join type there is a shortcut operation:
Expand Down Expand Up @@ -272,7 +272,7 @@ campaigns.excludeJoinWith(visits) {

#### Cross join

Can also be called cross product of two dataframes
It can also be called cross product of two [`DataFrame`](DataFrame.md) objects.

<!---FUN crossProduct-->

Expand Down Expand Up @@ -308,8 +308,10 @@ df1.innerJoinWith(df2) { it["index"] == right["index"] && it["age"] == right["ag
<dataFrame src="org.jetbrains.kotlinx.dataframe.samples.api.JoinWith.compareInnerValues.html"/>
<!---END-->

Here columns from both dataframes are presented as is. So [join](join.md) is better suited for `equals` relation, and joinWith is for everything else.
Below are two more examples with join types that allow mismatches. Note the difference in `null` values
Here columns from both [`DataFrame`](DataFrame.md) objects are presented as is.
So [join](join.md) is better suited for `equals` relation, and joinWith is for everything else.
Below are two more examples with join types that allow mismatches.
Note the difference in `null` values

<!---FUN compareLeft-->

Expand Down
4 changes: 2 additions & 2 deletions docs/StardustDocs/topics/modify.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,11 +42,11 @@ as [`DataFrame`](DataFrame.md) can be interpreted as a [`Collection`](https://ko

**Vertical (row) operations:**
* [append](append.md) — add rows
* [concat](concat.md) — union rows from several [`DataFrames`](DataFrame.md)
* [concat](concat.md) — union rows from several [`DataFrame`](DataFrame.md) objects
* [distinct](distinct.md) / [distinctBy](distinct.md#distinctby) — remove duplicated rows
* [drop](drop.md) / [dropLast](sliceRows.md#droplast) / [dropWhile](sliceRows.md#dropwhile) / [dropNulls](drop.md#dropnulls) / [dropNA](drop.md#dropna) — remove rows by condition
* [duplicate](duplicate.md) — duplicate rows
* [explode](explode.md) — spread lists and [`DataFrames`](DataFrame.md) vertically into new rows
* [explode](explode.md) — spread lists and [`DataFrame`](DataFrame.md) objects vertically into new rows
* [filter](filter.md) / [filterBy](filter.md#filterby) — filter rows
* [implode](implode.md) — merge column values into lists grouping by other columns
* [reverse](reverse.md) — reverse rows
Expand Down
8 changes: 4 additions & 4 deletions docs/StardustDocs/topics/multipleDataFrames.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
[//]: # (title: Multiple DataFrames)
<show-structure depth="3"/>

* [`add`](add.md) — union of columns from several [`DataFrames`](DataFrame.md)
* [`concat`](concat.md) — union of rows from several [`DataFrames`](DataFrame.md)
* [`join`](join.md) — sql-like join of two [`DataFrames`](DataFrame.md) by key columns
* [`joinWith`](joinWith.md) — join of two [`DataFrames`](DataFrame.md) by an expression that evaluates joined [DataRows](DataRow.md) to Boolean
* [`add`](add.md) — union of columns from several [`DataFrame`](DataFrame.md) objects
* [`concat`](concat.md) — union of rows from several [`DataFrame`](DataFrame.md) objects
* [`join`](join.md) — sql-like join of two [`DataFrame`](DataFrame.md) objects by key columns
* [`joinWith`](joinWith.md) — join of two [`DataFrame`](DataFrame.md) objects by an expression that evaluates joined [DataRows](DataRow.md) to Boolean
Loading

0 comments on commit 7de6022

Please sign in to comment.