Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve type instability issues in aggregation #2424

Closed
bkamins opened this issue Sep 11, 2020 · 3 comments · Fixed by #2682
Closed

Resolve type instability issues in aggregation #2424

bkamins opened this issue Sep 11, 2020 · 3 comments · Fixed by #2682
Labels
bug non-breaking The proposed change is not breaking performance
Milestone

Comments

@bkamins bkamins added non-breaking The proposed change is not breaking performance labels Sep 11, 2020
@bkamins bkamins added this to the 1.0 milestone Sep 11, 2020
@nalimilan
Copy link
Member

In the following functions, the first two allocate, but not the last one:

function fill_row2!(row, col::AbstractVector,
                    i::Integer,
                    colnames::NTuple{N, Symbol}) where N
    @inbounds for j in 1:length(colnames)
        cn = colnames[j]
        val = row[cn]
        col[i] = val
    end
    return nothing
end

function fill_row2b!(row, col::AbstractVector,
                     i::Integer,
                     colnames::NTuple{N, Symbol}) where N
    @inbounds for cn in colnames
        val = row[cn]
        col[i] = val
    end
    return nothing
end

function fill_row3!(row, col::AbstractVector,
                    i::Integer,
                    colname::Symbol) where N
    val = row[colname]
    col[i] = val
    return nothing
end

A possible workaround is to require column names to be always in the same order: then we could index using an integer, which I just checked doesn't allocate.

@bkamins bkamins mentioned this issue Mar 4, 2021
19 tasks
@bkamins
Copy link
Member Author

bkamins commented Mar 27, 2021

This is a bug. We have to fix it and insist that column names always have the same order. I will fix it. Otherwise we have:

julia> df = DataFrame(x = 1:4)
4×1 DataFrame
 Row │ x     
     │ Int64 
─────┼───────
   1 │     1
   2 │     2
   3 │     3
   4 │     4

julia> gdf = groupby(df, :x)
GroupedDataFrame with 4 groups based on key: x
First Group (1 row): x = 1
 Row │ x     
     │ Int64 
─────┼───────
   1 │     1
⋮
Last Group (1 row): x = 4
 Row │ x     
     │ Int64 
─────┼───────
   1 │     4

julia> combine(gdf, :x => f => AsTable)
4×3 DataFrame
 Row │ x      b      a     
     │ Int64  Int64  Int64 
─────┼─────────────────────
   1 │     1      1     11
   2 │     2     12      2
   3 │     3      3     13
   4 │     4     14      4

julia> function f2(x)
           if isodd(x[1])
                  return (a=x[1], b=x[1]+10)
              else
                  return (b=x[1], a=x[1]+10)
              end
       end
f2 (generic function with 1 method)

julia> combine(gdf, :x => f2 => AsTable)
4×3 DataFrame
 Row │ x      a      b     
     │ Int64  Int64  Int64 
─────┼─────────────────────
   1 │     1      1     11
   2 │     2     12      2
   3 │     3      3     13
   4 │     4     14      4

so it is hard to predict the column order (it will depend on the first row processed, which in threading context is problematic)

@bkamins bkamins added the bug label Mar 27, 2021
@bkamins
Copy link
Member Author

bkamins commented Mar 27, 2021

Also this is what we require in other places:

julia> combine(df, :x => ByRow(x -> iseven(x) ? (a=x, b=x+10) : (b=x, a=x+10)))
4×1 DataFrame
 Row │ x_function      
     │ NamedTupl…      
─────┼─────────────────
   1 │ (b = 1, a = 11)
   2 │ (a = 2, b = 12)
   3 │ (b = 3, a = 13)
   4 │ (a = 4, b = 14)

julia> combine(df, :x => ByRow(x -> iseven(x) ? (a=x, b=x+10) : (b=x, a=x+10)) => AsTable)
ERROR: ArgumentError: keys of the returned elements must be identical

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug non-breaking The proposed change is not breaking performance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants