"RowNumber by Partition" function #3374

eoteroe · 2023-08-27T17:51:43Z

Sometimes, you would need a row numbering partitioned by some fields. T-SQL gives you this. Is there a chance to get a function for this, even you have something like groupby(combine(df, :field1) , :field1=> (row->1:length(row))=>:rownumber)

Something like RowNumber(df, field1) or a funcion la nrow would be helpful to create the new field.

The text was updated successfully, but these errors were encountered:

bkamins · 2023-08-27T18:20:45Z

Use eachindex:

julia> df = DataFrame(x=[3,3,1,1,2,3,2,1,2,3,3])
11×1 DataFrame
 Row │ x
     │ Int64
─────┼───────
   1 │     3
   2 │     3
   3 │     1
   4 │     1
   5 │     2
   6 │     3
   7 │     2
   8 │     1
   9 │     2
  10 │     3
  11 │     3

julia> combine(groupby(df, :x), eachindex)
11×2 DataFrame
 Row │ x      eachindex
     │ Int64  Int64
─────┼──────────────────
   1 │     1          1
   2 │     1          2
   3 │     1          3
   4 │     2          1
   5 │     2          2
   6 │     2          3
   7 │     3          1
   8 │     3          2
   9 │     3          3
  10 │     3          4
  11 │     3          5

julia> select(groupby(df, :x), eachindex)
11×2 DataFrame
 Row │ x      eachindex
     │ Int64  Int64
─────┼──────────────────
   1 │     3          1
   2 │     3          2
   3 │     1          1
   4 │     1          2
   5 │     2          1
   6 │     3          3
   7 │     2          2
   8 │     1          3
   9 │     2          3
  10 │     3          4
  11 │     3          5

julia> transform(df, eachindex)
11×2 DataFrame
 Row │ x      eachindex
     │ Int64  Int64
─────┼──────────────────
   1 │     3          1
   2 │     3          2
   3 │     1          3
   4 │     1          4
   5 │     2          5
   6 │     3          6
   7 │     2          7
   8 │     1          8
   9 │     2          9
  10 │     3         10
  11 │     3         11

See the manual entry:

column-independent operations function => target_cols or just function for specific functions where the input columns are omitted; without target_cols the new column has the same name as function, otherwise it must be single name (as a Symbol or a string). Supported functions are:
* `nrow` to efficiently compute the number of rows in each group.
* `proprow` to efficiently compute the proportion of rows in each group.
* `eachindex` to return a vector holding the number of each row within each group.
* `groupindices` to return the group number.

bkamins closed this as completed Aug 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"RowNumber by Partition" function #3374

"RowNumber by Partition" function #3374

eoteroe commented Aug 27, 2023

bkamins commented Aug 27, 2023

"RowNumber by Partition" function #3374

"RowNumber by Partition" function #3374

Comments

eoteroe commented Aug 27, 2023

bkamins commented Aug 27, 2023