Skip to content

Commit

Permalink
[BREAKING] Refactor unstack (#2494)
Browse files Browse the repository at this point in the history
  • Loading branch information
bkamins authored Oct 31, 2020
1 parent f4db95f commit b76c04f
Show file tree
Hide file tree
Showing 4 changed files with 444 additions and 219 deletions.
3 changes: 3 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,9 @@
* in `describe` the specification of custom aggregation is now `function => name`;
old `name => function` order is now deprecated
([#2401](https://github.com/JuliaData/DataFrames.jl/pull/2401))
* `unstack` now produces row and column keys in the order of their first appearance
and has two new keyword arguments `allowmissing` and `allowduplicates`
([#2494](https://github.com/JuliaData/DataFrames.jl/pull/2494))

## New functionalities

Expand Down
33 changes: 24 additions & 9 deletions docs/src/man/reshaping_and_pivoting.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,9 @@ julia> last(d, 6)
│ 6 │ Iris-virginica │ PetalWidth │ 1.8 │
```

The second optional argument to `stack` indicates the columns to be stacked. These are normally referred to as the measured variables. Column names can also be given:
The second optional argument to `stack` indicates the columns to be stacked.
These are normally referred to as the measured variables. Column names can also
be given:

```jldoctest reshape
julia> d = stack(iris, [:SepalLength, :SepalWidth, :PetalLength, :PetalWidth]);
Expand Down Expand Up @@ -88,11 +90,18 @@ julia> last(d, 6)
│ 6 │ Iris-virginica │ PetalWidth │ 1.8 │
```

Note that all columns can be of different types. Type promotion follows the rules of `vcat`.
Note that all columns can be of different types. Type promotion follows the
rules of `vcat`.

The stacked `DataFrame` that results includes all of the columns not specified to be stacked. These are repeated for each stacked column. These are normally refered to as identifier (id) columns. In addition to the id columns, two additional columns labeled `:variable` and `:values` contain the column identifier and the stacked columns.
The stacked `DataFrame` that results includes all of the columns not specified
to be stacked. These are repeated for each stacked column. These are normally
refered to as identifier (id) columns. In addition to the id columns, two
additional columns labeled `:variable` and `:values` contain the column
identifier and the stacked columns.

A third optional argument to `stack` represents the id columns that are repeated. This makes it easier to specify which variables you want included in the long format:
A third optional argument to `stack` represents the id columns that are
repeated. This makes it easier to specify which variables you want included in
the long format:

```jldoctest reshape
julia> d = stack(iris, [:SepalLength, :SepalWidth], :Species);
Expand Down Expand Up @@ -152,7 +161,9 @@ julia> last(d, 6)
│ 6 │ Iris-virginica │ PetalWidth │ 1.8 │
```

`unstack` converts from a long format to a wide format. The default is requires specifying which columns are an id variable, column variable names, and column values:
`unstack` converts from a long format to a wide format.
The default is requires specifying which columns are an id variable,
column variable names, and column values:

```jldoctest reshape
julia> iris.id = 1:size(iris, 1)
Expand Down Expand Up @@ -267,7 +278,8 @@ julia> last(widedf, 6)
│ 6 │ Iris-virginica │ 150 │ 5.9 │ 3.0 │ 5.1 │ 1.8 │
```

You can even skip passing the `:variable` and `:value` values as positional arguments, as they will be used by default, and write:
You can even skip passing the `:variable` and `:value` values as positional
arguments, as they will be used by default, and write:
```jldoctest reshape
julia> widedf = unstack(longdf);
Expand Down Expand Up @@ -296,7 +308,8 @@ julia> last(widedf, 6)
│ 6 │ Iris-virginica │ 150 │ 5.9 │ 3.0 │ 5.1 │ 1.8 │
```

Passing `view=true` to `stack` returns a data frame whose columns are views into the original wide data frame. Here is an example:
Passing `view=true` to `stack` returns a data frame whose columns are views into
the original wide data frame. Here is an example:

```jldoctest reshape
julia> d = stack(iris, view=true);
Expand Down Expand Up @@ -337,7 +350,9 @@ This is provides a view of the original columns stacked together.
Id columns -- `RepeatedVector`
This repeats the original columns N times where N is the number of columns stacked.

None of these reshaping functions perform any aggregation. To do aggregation, use the split-apply-combine functions in combination with reshaping. Here is an example:
None of these reshaping functions perform any aggregation. To do aggregation,
use the split-apply-combine functions in combination with reshaping. Here is an
example:

```jldoctest reshape
julia> using Statistics
Expand All @@ -356,7 +371,7 @@ julia> first(d, 6)
│ 5 │ Iris-setosa │ SepalLength │ 5.0 │
│ 6 │ Iris-setosa │ SepalLength │ 5.4 │
julia> x = by(d, [:variable, :Species], :value => mean => :vsum);
julia> x = combine(groupby(d, [:variable, :Species]), :value => mean => :vsum);
julia> first(x, 6)
│ Row │ variable │ Species │ vsum │
Expand Down
Loading

0 comments on commit b76c04f

Please sign in to comment.