Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"RowNumber by Partition" function #3374

Closed
eoteroe opened this issue Aug 27, 2023 · 1 comment
Closed

"RowNumber by Partition" function #3374

eoteroe opened this issue Aug 27, 2023 · 1 comment

Comments

@eoteroe
Copy link

eoteroe commented Aug 27, 2023

Sometimes, you would need a row numbering partitioned by some fields. T-SQL gives you this. Is there a chance to get a function for this, even you have something like groupby(combine(df, :field1) , :field1=> (row->1:length(row))=>:rownumber)

Something like RowNumber(df, field1) or a funcion la nrow would be helpful to create the new field.

@bkamins
Copy link
Member

bkamins commented Aug 27, 2023

Use eachindex:

julia> df = DataFrame(x=[3,3,1,1,2,3,2,1,2,3,3])
11×1 DataFrame
 Row │ x
     │ Int64
─────┼───────
   1 │     3
   2 │     3
   3 │     1
   4 │     1
   5 │     2
   6 │     3
   7 │     2
   8 │     1
   9 │     2
  10 │     3
  11 │     3

julia> combine(groupby(df, :x), eachindex)
11×2 DataFrame
 Row │ x      eachindex
     │ Int64  Int64
─────┼──────────────────
   1 │     1          1
   2 │     1          2
   3 │     1          3
   4 │     2          1
   5 │     2          2
   6 │     2          3
   7 │     3          1
   8 │     3          2
   9 │     3          3
  10 │     3          4
  11 │     3          5

julia> select(groupby(df, :x), eachindex)
11×2 DataFrame
 Row │ x      eachindex
     │ Int64  Int64
─────┼──────────────────
   1 │     3          1
   2 │     3          2
   3 │     1          1
   4 │     1          2
   5 │     2          1
   6 │     3          3
   7 │     2          2
   8 │     1          3
   9 │     2          3
  10 │     3          4
  11 │     3          5

julia> transform(df, eachindex)
11×2 DataFrame
 Row │ x      eachindex
     │ Int64  Int64
─────┼──────────────────
   1 │     3          1
   2 │     3          2
   3 │     1          3
   4 │     1          4
   5 │     2          5
   6 │     3          6
   7 │     2          7
   8 │     1          8
   9 │     2          9
  10 │     3         10
  11 │     3         11

See the manual entry:

column-independent operations function => target_cols or just function for specific functions where the input columns are omitted; without target_cols the new column has the same name as function, otherwise it must be single name (as a Symbol or a string). Supported functions are:
* `nrow` to efficiently compute the number of rows in each group.
* `proprow` to efficiently compute the proportion of rows in each group.
* `eachindex` to return a vector holding the number of each row within each group.
* `groupindices` to return the group number.

@bkamins bkamins closed this as completed Aug 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants