Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests of describe and multithreading fail in Julia-1.10.0-beta3 #3383

Closed
George9000 opened this issue Oct 4, 2023 · 13 comments · Fixed by #3385
Closed

Tests of describe and multithreading fail in Julia-1.10.0-beta3 #3383

George9000 opened this issue Oct 4, 2023 · 13 comments · Fixed by #3385

Comments

@George9000
Copy link

versioninfo

tested with ENV["JULIA_USE_FLISP_PARSER"] = 1 and with JuliaSyntax parser

saw similar errors in Julia 1.11.0-DEV.582 (a4562e608f)

describe failures
describe: Test Failed at /Users/foo/.julia/packages/DataFrames/58MUJ/test/dataframe.jl:571
  Expression: describe_output[:, [:variable; default_fields]] == describe(df)
   Evaluated: 6×7 DataFrame
 Row │ variable        mean    min         median      max         nmissing  e ⋯
     │ Symbol          Union…  Any         Any         Any         Int64     T ⋯
─────┼──────────────────────────────────────────────────────────────────────────
   1 │ number          2.5     1.0         2.5         4.0                0  I ⋯
   2 │ number_missing  2.0     1.0         2.0         3.0                1  U
   3 │ string                  a                       d                  0  S
   4 │ string_missing          a                       c                  1  U
   5 │ dates                   2000-01-01  2002-01-01  2004-01-01         0  D ⋯
   6 │ catarray                1                       2                  0  C
                                                                1 column omitted == 6×7 DataFrame
 Row │ variable        mean    min         median  max         nmissing  eltyp ⋯
     │ Symbol          Union…  Any         Union…  Any         Int64     Type  ⋯
─────┼──────────────────────────────────────────────────────────────────────────
   1 │ number          2.5     1           2.5     4                  0  Int64 ⋯
   2 │ number_missing  2.0     1           2.0     3                  1  Union
   3 │ string                  a                   d                  0  Strin
   4 │ string_missing          a                   c                  1  Union
   5 │ dates                   2000-01-01          2004-01-01         0  Date  ⋯
   6 │ catarray                1                   2                  0  Categ
                                                                1 column omitted

Stacktrace:
 [1] macro expansion
   @ ~/applications/juliadev/usr/share/julia/stdlib/v1.10/Test/src/Test.jl:672 [inlined]
 [2] macro expansion
   @ ~/.julia/packages/DataFrames/58MUJ/test/dataframe.jl:571 [inlined]
 [3] macro expansion
   @ ~/applications/juliadev/usr/share/julia/stdlib/v1.10/Test/src/Test.jl:1577 [inlined]
 [4] top-level scope
   @ ~/.julia/packages/DataFrames/58MUJ/test/dataframe.jl:541

describe: Test Failed at /Users/foo/.julia/packages/DataFrames/58MUJ/test/dataframe.jl:577
  Expression: describe_output ≅ describe(df, :all)
   Evaluated: 6×16 DataFrame
 Row │ variable        mean    std      min         q25     median      q75    ⋯
     │ Symbol          Union…  Union…   Any         Union…  Any         Union… ⋯
─────┼──────────────────────────────────────────────────────────────────────────
   1 │ number          2.5     1.29099  1.0         1.75    2.5         3.25   ⋯
   2 │ number_missing  2.0     1.0      1.0         1.5     2.0         2.5
   3 │ string                           a
   4 │ string_missing                   a
   5 │ dates                            2000-01-01          2002-01-01         ⋯
   6 │ catarray                         1
                                                               9 columns omitted ≅ 6×16 DataFrame
 Row │ variable        mean    std      min         q25     median  q75     ma ⋯
     │ Symbol          Union…  Union…   Any         Union…  Union…  Union…  An ⋯
─────┼──────────────────────────────────────────────────────────────────────────
   1 │ number          2.5     1.29099  1           1.75    2.5     3.25    4  ⋯
   2 │ number_missing  2.0     1.0      1           1.5     2.0     2.5     3
   3 │ string                           a                                   d
   4 │ string_missing                   a                                   c
   5 │ dates                            2000-01-01                          20 ⋯
   6 │ catarray                         1                                   2
                                                               9 columns omitted

Stacktrace:
 [1] macro expansion
   @ ~/applications/juliadev/usr/share/julia/stdlib/v1.10/Test/src/Test.jl:672 [inlined]
 [2] macro expansion
   @ ~/.julia/packages/DataFrames/58MUJ/test/dataframe.jl:577 [inlined]
 [3] macro expansion
   @ ~/applications/juliadev/usr/share/julia/stdlib/v1.10/Test/src/Test.jl:1577 [inlined]
 [4] top-level scope
   @ ~/.julia/packages/DataFrames/58MUJ/test/dataframe.jl:541

describe: Test Failed at /Users/foo/.julia/packages/DataFrames/58MUJ/test/dataframe.jl:580
  Expression: describe_output[:, [:variable, :mean, :std, :min, :q25, :median, :q75, :max, :nunique, :nmissing, :eltype]] ≅ describe(df, :detailed)
   Evaluated: 6×11 DataFrame
 Row │ variable        mean    std      min         q25     median      q75    ⋯
     │ Symbol          Union…  Union…   Any         Union…  Any         Union… ⋯
─────┼──────────────────────────────────────────────────────────────────────────
   1 │ number          2.5     1.29099  1.0         1.75    2.5         3.25   ⋯
   2 │ number_missing  2.0     1.0      1.0         1.5     2.0         2.5
   3 │ string                           a
   4 │ string_missing                   a
   5 │ dates                            2000-01-01          2002-01-01         ⋯
   6 │ catarray                         1
                                                               4 columns omitted ≅ 6×11 DataFrame
 Row │ variable        mean    std      min         q25     median  q75     ma ⋯
     │ Symbol          Union…  Union…   Any         Union…  Union…  Union…  An ⋯
─────┼──────────────────────────────────────────────────────────────────────────
   1 │ number          2.5     1.29099  1           1.75    2.5     3.25    4  ⋯
   2 │ number_missing  2.0     1.0      1           1.5     2.0     2.5     3
   3 │ string                           a                                   d
   4 │ string_missing                   a                                   c
   5 │ dates                            2000-01-01                          20 ⋯
   6 │ catarray                         1                                   2
                                                               4 columns omitted

Stacktrace:
 [1] macro expansion
   @ ~/applications/juliadev/usr/share/julia/stdlib/v1.10/Test/src/Test.jl:672 [inlined]
 [2] macro expansion
   @ ~/.julia/packages/DataFrames/58MUJ/test/dataframe.jl:580 [inlined]
 [3] macro expansion
   @ ~/applications/juliadev/usr/share/julia/stdlib/v1.10/Test/src/Test.jl:1577 [inlined]
 [4] top-level scope
   @ ~/.julia/packages/DataFrames/58MUJ/test/dataframe.jl:541
Test Summary: | Pass  Fail  Total  Time
describe      |   13     3     16  2.2s
        FAILED: dataframe.jl
LoadError: Some tests did not pass: 13 passed, 3 failed, 0 errored, 0 broken.
in expression starting at /Users/foo/.julia/packages/DataFrames/58MUJ/test/dataframe.jl:1
multithreading disable failures
disabling multithreading via keyword argument: Test Failed at /Users/foo/.julia/packages/DataFrames/58MUJ/test/multithreading.jl:157
  Expression: combine(gd, [] => ((()->begin
                        #= /Users/foo/.julia/packages/DataFrames/58MUJ/test/multithreading.jl:157 =#
                        Threads.threadid()
                    end) => :id1), [] => ((()->begin
                        #= /Users/foo/.julia/packages/DataFrames/58MUJ/test/multithreading.jl:158 =#
                        Threads.threadid()
                    end) => :id2), threads = true) != DataFrame(y = 1:4, id1 = 1, id2 = 1)
   Evaluated: 4×3 DataFrame
 Row │ y      id1    id2
     │ Int64  Int64  Int64
─────┼─────────────────────
   1 │     1      1      1
   2 │     2      1      1
   3 │     3      1      1
   4 │     4      1      1 != 4×3 DataFrame
 Row │ y      id1    id2
     │ Int64  Int64  Int64
─────┼─────────────────────
   1 │     1      1      1
   2 │     2      1      1
   3 │     3      1      1
   4 │     4      1      1

Stacktrace:
 [1] macro expansion
   @ ~/applications/juliadev/usr/share/julia/stdlib/v1.10/Test/src/Test.jl:672 [inlined]
 [2] macro expansion
   @ ~/.julia/packages/DataFrames/58MUJ/test/multithreading.jl:157 [inlined]
 [3] macro expansion
   @ ~/applications/juliadev/usr/share/julia/stdlib/v1.10/Test/src/Test.jl:1577 [inlined]
 [4] top-level scope
   @ ~/.julia/packages/DataFrames/58MUJ/test/multithreading.jl:80

disabling multithreading via keyword argument: Test Failed at /Users/foo/.julia/packages/DataFrames/58MUJ/test/multithreading.jl:177
  Expression: select(gd, [] => ((()->begin
                        #= /Users/foo/.julia/packages/DataFrames/58MUJ/test/multithreading.jl:177 =#
                        Threads.threadid()
                    end) => :id1), [] => ((()->begin
                        #= /Users/foo/.julia/packages/DataFrames/58MUJ/test/multithreading.jl:178 =#
                        Threads.threadid()
                    end) => :id2), threads = true) != DataFrame(y = refdf.y, id1 = 1, id2 = 1)
   Evaluated: 1000×3 DataFrame
  Row │ y      id1    id2
      │ Int64  Int64  Int64
──────┼─────────────────────
    1 │     2      1      1
    2 │     3      1      1
    3 │     1      1      1
    4 │     4      1      1
    5 │     2      1      1
    6 │     2      1      1
    7 │     4      1      1
    8 │     4      1      1
  ⋮   │   ⋮      ⋮      ⋮
  994 │     4      1      1
  995 │     2      1      1
  996 │     4      1      1
  997 │     4      1      1
  998 │     1      1      1
  999 │     1      1      1
 1000 │     4      1      1
            985 rows omitted != 1000×3 DataFrame
  Row │ y      id1    id2
      │ Int64  Int64  Int64
──────┼─────────────────────
    1 │     2      1      1
    2 │     3      1      1
    3 │     1      1      1
    4 │     4      1      1
    5 │     2      1      1
    6 │     2      1      1
    7 │     4      1      1
    8 │     4      1      1
  ⋮   │   ⋮      ⋮      ⋮
  994 │     4      1      1
  995 │     2      1      1
  996 │     4      1      1
  997 │     4      1      1
  998 │     1      1      1
  999 │     1      1      1
 1000 │     4      1      1
            985 rows omitted

Stacktrace:
 [1] macro expansion
   @ ~/applications/juliadev/usr/share/julia/stdlib/v1.10/Test/src/Test.jl:672 [inlined]
 [2] macro expansion
   @ ~/.julia/packages/DataFrames/58MUJ/test/multithreading.jl:177 [inlined]
 [3] macro expansion
   @ ~/applications/juliadev/usr/share/julia/stdlib/v1.10/Test/src/Test.jl:1577 [inlined]
 [4] top-level scope
   @ ~/.julia/packages/DataFrames/58MUJ/test/multithreading.jl:80

disabling multithreading via keyword argument: Test Failed at /Users/foo/.julia/packages/DataFrames/58MUJ/test/multithreading.jl:181
  Expression: select!(gd, [] => ((()->begin
                        #= /Users/foo/.julia/packages/DataFrames/58MUJ/test/multithreading.jl:181 =#
                        Threads.threadid()
                    end) => :id1), [] => ((()->begin
                        #= /Users/foo/.julia/packages/DataFrames/58MUJ/test/multithreading.jl:182 =#
                        Threads.threadid()
                    end) => :id2), threads = true) != DataFrame(y = refdf.y, id1 = 1, id2 = 1)
   Evaluated: 1000×3 DataFrame
  Row │ y      id1    id2
      │ Int64  Int64  Int64
──────┼─────────────────────
    1 │     2      1      1
    2 │     3      1      1
    3 │     1      1      1
    4 │     4      1      1
    5 │     2      1      1
    6 │     2      1      1
    7 │     4      1      1
    8 │     4      1      1
  ⋮   │   ⋮      ⋮      ⋮
  994 │     4      1      1
  995 │     2      1      1
  996 │     4      1      1
  997 │     4      1      1
  998 │     1      1      1
  999 │     1      1      1
 1000 │     4      1      1
            985 rows omitted != 1000×3 DataFrame
  Row │ y      id1    id2
      │ Int64  Int64  Int64
──────┼─────────────────────
    1 │     2      1      1
    2 │     3      1      1
    3 │     1      1      1
    4 │     4      1      1
    5 │     2      1      1
    6 │     2      1      1
    7 │     4      1      1
    8 │     4      1      1
  ⋮   │   ⋮      ⋮      ⋮
  994 │     4      1      1
  995 │     2      1      1
  996 │     4      1      1
  997 │     4      1      1
  998 │     1      1      1
  999 │     1      1      1
 1000 │     4      1      1
            985 rows omitted

Stacktrace:
 [1] macro expansion
   @ ~/applications/juliadev/usr/share/julia/stdlib/v1.10/Test/src/Test.jl:672 [inlined]
 [2] macro expansion
   @ ~/.julia/packages/DataFrames/58MUJ/test/multithreading.jl:181 [inlined]
 [3] macro expansion
   @ ~/applications/juliadev/usr/share/julia/stdlib/v1.10/Test/src/Test.jl:1577 [inlined]
 [4] top-level scope
   @ ~/.julia/packages/DataFrames/58MUJ/test/multithreading.jl:80

disabling multithreading via keyword argument: Test Failed at /Users/foo/.julia/packages/DataFrames/58MUJ/test/multithreading.jl:201
  Expression: transform(gd, [] => ((()->begin
                        #= /Users/foo/.julia/packages/DataFrames/58MUJ/test/multithreading.jl:201 =#
                        Threads.threadid()
                    end) => :id1), [] => ((()->begin
                        #= /Users/foo/.julia/packages/DataFrames/58MUJ/test/multithreading.jl:202 =#
                        Threads.threadid()
                    end) => :id2), threads = true) != [refdf DataFrame(id1 = fill(1, nrow(refdf)), id2 = 1)]
   Evaluated: 1000×4 DataFrame
  Row │ x      y      id1    id2
      │ Int64  Int64  Int64  Int64
──────┼────────────────────────────
    1 │     1      2      1      1
    2 │     2      3      1      1
    3 │     3      1      1      1
    4 │     4      4      1      1
    5 │     5      2      1      1
    6 │     6      2      1      1
    7 │     7      4      1      1
    8 │     8      4      1      1
  ⋮   │   ⋮      ⋮      ⋮      ⋮
  994 │   994      4      1      1
  995 │   995      2      1      1
  996 │   996      4      1      1
  997 │   997      4      1      1
  998 │   998      1      1      1
  999 │   999      1      1      1
 1000 │  1000      4      1      1
                   985 rows omitted != 1000×4 DataFrame
  Row │ x      y      id1    id2
      │ Int64  Int64  Int64  Int64
──────┼────────────────────────────
    1 │     1      2      1      1
    2 │     2      3      1      1
    3 │     3      1      1      1
    4 │     4      4      1      1
    5 │     5      2      1      1
    6 │     6      2      1      1
    7 │     7      4      1      1
    8 │     8      4      1      1
  ⋮   │   ⋮      ⋮      ⋮      ⋮
  994 │   994      4      1      1
  995 │   995      2      1      1
  996 │   996      4      1      1
  997 │   997      4      1      1
  998 │   998      1      1      1
  999 │   999      1      1      1
 1000 │  1000      4      1      1
                   985 rows omitted

Stacktrace:
 [1] macro expansion
   @ ~/applications/juliadev/usr/share/julia/stdlib/v1.10/Test/src/Test.jl:672 [inlined]
 [2] macro expansion
   @ ~/.julia/packages/DataFrames/58MUJ/test/multithreading.jl:201 [inlined]
 [3] macro expansion
   @ ~/applications/juliadev/usr/share/julia/stdlib/v1.10/Test/src/Test.jl:1577 [inlined]
 [4] top-level scope
   @ ~/.julia/packages/DataFrames/58MUJ/test/multithreading.jl:80

disabling multithreading via keyword argument: Test Failed at /Users/foo/.julia/packages/DataFrames/58MUJ/test/multithreading.jl:205
  Expression: transform!(gd, [] => ((()->begin
                        #= /Users/foo/.julia/packages/DataFrames/58MUJ/test/multithreading.jl:205 =#
                        Threads.threadid()
                    end) => :id1), [] => ((()->begin
                        #= /Users/foo/.julia/packages/DataFrames/58MUJ/test/multithreading.jl:206 =#
                        Threads.threadid()
                    end) => :id2), threads = true) != [refdf DataFrame(id1 = fill(1, nrow(refdf)), id2 = 1)]
   Evaluated: 1000×4 DataFrame
  Row │ x      y      id1    id2
      │ Int64  Int64  Int64  Int64
──────┼────────────────────────────
    1 │     1      2      1      1
    2 │     2      3      1      1
    3 │     3      1      1      1
    4 │     4      4      1      1
    5 │     5      2      1      1
    6 │     6      2      1      1
    7 │     7      4      1      1
    8 │     8      4      1      1
  ⋮   │   ⋮      ⋮      ⋮      ⋮
  994 │   994      4      1      1
  995 │   995      2      1      1
  996 │   996      4      1      1
  997 │   997      4      1      1
  998 │   998      1      1      1
  999 │   999      1      1      1
 1000 │  1000      4      1      1
                   985 rows omitted != 1000×4 DataFrame
  Row │ x      y      id1    id2
      │ Int64  Int64  Int64  Int64
──────┼────────────────────────────
    1 │     1      2      1      1
    2 │     2      3      1      1
    3 │     3      1      1      1
    4 │     4      4      1      1
    5 │     5      2      1      1
    6 │     6      2      1      1
    7 │     7      4      1      1
    8 │     8      4      1      1
  ⋮   │   ⋮      ⋮      ⋮      ⋮
  994 │   994      4      1      1
  995 │   995      2      1      1
  996 │   996      4      1      1
  997 │   997      4      1      1
  998 │   998      1      1      1
  999 │   999      1      1      1
 1000 │  1000      4      1      1
                   985 rows omitted

Stacktrace:
 [1] macro expansion
   @ ~/applications/juliadev/usr/share/julia/stdlib/v1.10/Test/src/Test.jl:672 [inlined]
 [2] macro expansion
   @ ~/.julia/packages/DataFrames/58MUJ/test/multithreading.jl:205 [inlined]
 [3] macro expansion
   @ ~/applications/juliadev/usr/share/julia/stdlib/v1.10/Test/src/Test.jl:1577 [inlined]
 [4] top-level scope
   @ ~/.julia/packages/DataFrames/58MUJ/test/multithreading.jl:80
Test Summary:                                 | Pass  Fail  Total  Time
disabling multithreading via keyword argument |   15     5     20  4.1s
        FAILED: multithreading.jl
LoadError: Some tests did not pass: 15 passed, 5 failed, 0 errored, 0 broken.
in expression starting at /Users/foo/.julia/packages/DataFrames/58MUJ/test/multithreading.jl:1
@bkamins
Copy link
Member

bkamins commented Oct 4, 2023

Thank you for reporting.

describe is already fixed by JuliaStats/Statistics.jl#153

I will have to look into multithreading failures (it seems that even enabling it makes DataFrames.jl use only one thread)

CC @nalimilan

@bkamins bkamins added this to the patch milestone Oct 4, 2023
@bkamins
Copy link
Member

bkamins commented Oct 5, 2023

I just run tests on CI nightly https://github.com/JuliaData/DataFrames.jl/actions/runs/6421079453/job/17434612269?pr=3372 and the multithreading issue does not show. On how many threads was Julia started when you run these tests?

@George9000
Copy link
Author

George9000 commented Oct 5, 2023

Julia started as julia -t 4,1

To check if this is an aarch64/m1 issue, let me run test DataFrames on julia 1.9.3 and report back if the multithreading issue is found there. I was just testing DataFrames with dev versions of julia.

@bkamins
Copy link
Member

bkamins commented Oct 5, 2023

Thanks. Is this the same with julia -t 4?

@George9000
Copy link
Author

I believe the 4,1 reserves an extra thread for the REPL

@bkamins
Copy link
Member

bkamins commented Oct 5, 2023

This is what I understand that one interactive thread is ensured, but I am wondering if this might affect the result.

@George9000
Copy link
Author

Will try both ways and report back for 1.9.3 and 1.10.0-beta3

@George9000
Copy link
Author

George9000 commented Oct 5, 2023

On 1.9.3, julia -t 4 test DataFrames succeeds.

Test Summary:                                 | Pass  Total  Time
disabling multithreading via keyword argument |   20     20  3.7s
        PASSED: multithreading.jl

However, julia -t 4,1 gives an error as reported in the original post.

multithreading error
disabling multithreading via keyword argument: Test Failed at /Users/foo/.julia/packages/DataFrames/58MUJ/test/multithreading.jl:157
  Expression: combine(gd, [] => ((()->begin
                        Threads.threadid()
                    end) => :id1), [] => ((()->begin
                        Threads.threadid()
                    end) => :id2), threads = true) != DataFrame(y = 1:4, id1 = 1, id2 = 1)
   Evaluated: 4×3 DataFrame
 Row │ y      id1    id2
     │ Int64  Int64  Int64
─────┼─────────────────────
   1 │     1      1      1
   2 │     2      1      1
   3 │     3      1      1
   4 │     4      1      1 != 4×3 DataFrame
 Row │ y      id1    id2
     │ Int64  Int64  Int64
─────┼─────────────────────
   1 │     1      1      1
   2 │     2      1      1
   3 │     3      1      1
   4 │     4      1      1

Stacktrace:
 [1] macro expansion
   @ ~/applications/julia/usr/share/julia/stdlib/v1.9/Test/src/Test.jl:478 [inlined]
 [2] macro expansion
   @ ~/.julia/packages/DataFrames/58MUJ/test/multithreading.jl:157 [inlined]
 [3] macro expansion
   @ ~/applications/julia/usr/share/julia/stdlib/v1.9/Test/src/Test.jl:1498 [inlined]
 [4] top-level scope
   @ ~/.julia/packages/DataFrames/58MUJ/test/multithreading.jl:80
disabling multithreading via keyword argument: Test Failed at /Users/foo/.julia/packages/DataFrames/58MUJ/test/multithreading.jl:177
  Expression: select(gd, [] => ((()->begin
                        Threads.threadid()
                    end) => :id1), [] => ((()->begin
                        Threads.threadid()
                    end) => :id2), threads = true) != DataFrame(y = refdf.y, id1 = 1, id2 = 1)
   Evaluated: 1000×3 DataFrame
  Row │ y      id1    id2
      │ Int64  Int64  Int64
──────┼─────────────────────
    1 │     2      1      1
    2 │     3      1      1
    3 │     1      1      1
    4 │     4      1      1
    5 │     2      1      1
    6 │     2      1      1
    7 │     4      1      1
    8 │     4      1      1
  ⋮   │   ⋮      ⋮      ⋮
  994 │     4      1      1
  995 │     2      1      1
  996 │     4      1      1
  997 │     4      1      1
  998 │     1      1      1
  999 │     1      1      1
 1000 │     4      1      1
            985 rows omitted != 1000×3 DataFrame
  Row │ y      id1    id2
      │ Int64  Int64  Int64
──────┼─────────────────────
    1 │     2      1      1
    2 │     3      1      1
    3 │     1      1      1
    4 │     4      1      1
    5 │     2      1      1
    6 │     2      1      1
    7 │     4      1      1
    8 │     4      1      1
  ⋮   │   ⋮      ⋮      ⋮
  994 │     4      1      1
  995 │     2      1      1
  996 │     4      1      1
  997 │     4      1      1
  998 │     1      1      1
  999 │     1      1      1
 1000 │     4      1      1
            985 rows omitted

Stacktrace:
 [1] macro expansion
   @ ~/applications/julia/usr/share/julia/stdlib/v1.9/Test/src/Test.jl:478 [inlined]
 [2] macro expansion
   @ ~/.julia/packages/DataFrames/58MUJ/test/multithreading.jl:177 [inlined]
 [3] macro expansion
   @ ~/applications/julia/usr/share/julia/stdlib/v1.9/Test/src/Test.jl:1498 [inlined]
 [4] top-level scope
   @ ~/.julia/packages/DataFrames/58MUJ/test/multithreading.jl:80
disabling multithreading via keyword argument: Test Failed at /Users/foo/.julia/packages/DataFrames/58MUJ/test/multithreading.jl:181
  Expression: select!(gd, [] => ((()->begin
                        Threads.threadid()
                    end) => :id1), [] => ((()->begin
                        Threads.threadid()
                    end) => :id2), threads = true) != DataFrame(y = refdf.y, id1 = 1, id2 = 1)
   Evaluated: 1000×3 DataFrame
  Row │ y      id1    id2
      │ Int64  Int64  Int64
──────┼─────────────────────
    1 │     2      1      1
    2 │     3      1      1
    3 │     1      1      1
    4 │     4      1      1
    5 │     2      1      1
    6 │     2      1      1
    7 │     4      1      1
    8 │     4      1      1
  ⋮   │   ⋮      ⋮      ⋮
  994 │     4      1      1
  995 │     2      1      1
  996 │     4      1      1
  997 │     4      1      1
  998 │     1      1      1
  999 │     1      1      1
 1000 │     4      1      1
            985 rows omitted != 1000×3 DataFrame
  Row │ y      id1    id2
      │ Int64  Int64  Int64
──────┼─────────────────────
    1 │     2      1      1
    2 │     3      1      1
    3 │     1      1      1
    4 │     4      1      1
    5 │     2      1      1
    6 │     2      1      1
    7 │     4      1      1
    8 │     4      1      1
  ⋮   │   ⋮      ⋮      ⋮
  994 │     4      1      1
  995 │     2      1      1
  996 │     4      1      1
  997 │     4      1      1
  998 │     1      1      1
  999 │     1      1      1
 1000 │     4      1      1
            985 rows omitted

Stacktrace:
 [1] macro expansion
   @ ~/applications/julia/usr/share/julia/stdlib/v1.9/Test/src/Test.jl:478 [inlined]
 [2] macro expansion
   @ ~/.julia/packages/DataFrames/58MUJ/test/multithreading.jl:181 [inlined]
 [3] macro expansion
   @ ~/applications/julia/usr/share/julia/stdlib/v1.9/Test/src/Test.jl:1498 [inlined]
 [4] top-level scope
   @ ~/.julia/packages/DataFrames/58MUJ/test/multithreading.jl:80
disabling multithreading via keyword argument: Test Failed at /Users/foo/.julia/packages/DataFrames/58MUJ/test/multithreading.jl:201
  Expression: transform(gd, [] => ((()->begin
                        Threads.threadid()
                    end) => :id1), [] => ((()->begin
                        Threads.threadid()
                    end) => :id2), threads = true) != [refdf DataFrame(id1 = fill(1, nrow(refdf)), id2 = 1)]
   Evaluated: 1000×4 DataFrame
  Row │ x      y      id1    id2
      │ Int64  Int64  Int64  Int64
──────┼────────────────────────────
    1 │     1      2      1      1
    2 │     2      3      1      1
    3 │     3      1      1      1
    4 │     4      4      1      1
    5 │     5      2      1      1
    6 │     6      2      1      1
    7 │     7      4      1      1
    8 │     8      4      1      1
  ⋮   │   ⋮      ⋮      ⋮      ⋮
  994 │   994      4      1      1
  995 │   995      2      1      1
  996 │   996      4      1      1
  997 │   997      4      1      1
  998 │   998      1      1      1
  999 │   999      1      1      1
 1000 │  1000      4      1      1
                   985 rows omitted != 1000×4 DataFrame
  Row │ x      y      id1    id2
      │ Int64  Int64  Int64  Int64
──────┼────────────────────────────
    1 │     1      2      1      1
    2 │     2      3      1      1
    3 │     3      1      1      1
    4 │     4      4      1      1
    5 │     5      2      1      1
    6 │     6      2      1      1
    7 │     7      4      1      1
    8 │     8      4      1      1
  ⋮   │   ⋮      ⋮      ⋮      ⋮
  994 │   994      4      1      1
  995 │   995      2      1      1
  996 │   996      4      1      1
  997 │   997      4      1      1
  998 │   998      1      1      1
  999 │   999      1      1      1
 1000 │  1000      4      1      1
                   985 rows omitted

Stacktrace:
 [1] macro expansion
   @ ~/applications/julia/usr/share/julia/stdlib/v1.9/Test/src/Test.jl:478 [inlined]
 [2] macro expansion
   @ ~/.julia/packages/DataFrames/58MUJ/test/multithreading.jl:201 [inlined]
 [3] macro expansion
   @ ~/applications/julia/usr/share/julia/stdlib/v1.9/Test/src/Test.jl:1498 [inlined]
 [4] top-level scope
   @ ~/.julia/packages/DataFrames/58MUJ/test/multithreading.jl:80
disabling multithreading via keyword argument: Test Failed at /Users/foo/.julia/packages/DataFrames/58MUJ/test/multithreading.jl:205
  Expression: transform!(gd, [] => ((()->begin
                        Threads.threadid()
                    end) => :id1), [] => ((()->begin
                        Threads.threadid()
                    end) => :id2), threads = true) != [refdf DataFrame(id1 = fill(1, nrow(refdf)), id2 = 1)]
   Evaluated: 1000×4 DataFrame
  Row │ x      y      id1    id2
      │ Int64  Int64  Int64  Int64
──────┼────────────────────────────
    1 │     1      2      1      1
    2 │     2      3      1      1
    3 │     3      1      1      1
    4 │     4      4      1      1
    5 │     5      2      1      1
    6 │     6      2      1      1
    7 │     7      4      1      1
    8 │     8      4      1      1
  ⋮   │   ⋮      ⋮      ⋮      ⋮
  994 │   994      4      1      1
  995 │   995      2      1      1
  996 │   996      4      1      1
  997 │   997      4      1      1
  998 │   998      1      1      1
  999 │   999      1      1      1
 1000 │  1000      4      1      1
                   985 rows omitted != 1000×4 DataFrame
  Row │ x      y      id1    id2
      │ Int64  Int64  Int64  Int64
──────┼────────────────────────────
    1 │     1      2      1      1
    2 │     2      3      1      1
    3 │     3      1      1      1
    4 │     4      4      1      1
    5 │     5      2      1      1
    6 │     6      2      1      1
    7 │     7      4      1      1
    8 │     8      4      1      1
  ⋮   │   ⋮      ⋮      ⋮      ⋮
  994 │   994      4      1      1
  995 │   995      2      1      1
  996 │   996      4      1      1
  997 │   997      4      1      1
  998 │   998      1      1      1
  999 │   999      1      1      1
 1000 │  1000      4      1      1
                   985 rows omitted

Stacktrace:
 [1] macro expansion
   @ ~/applications/julia/usr/share/julia/stdlib/v1.9/Test/src/Test.jl:478 [inlined]
 [2] macro expansion
   @ ~/.julia/packages/DataFrames/58MUJ/test/multithreading.jl:205 [inlined]
 [3] macro expansion
   @ ~/applications/julia/usr/share/julia/stdlib/v1.9/Test/src/Test.jl:1498 [inlined]
 [4] top-level scope
   @ ~/.julia/packages/DataFrames/58MUJ/test/multithreading.jl:80
Test Summary:                                 | Pass  Fail  Total  Time
disabling multithreading via keyword argument |   15     5     20  4.6s
        FAILED: multithreading.jl

Using 1.10.0-beta3 begun with julia -t 4 multithreading passes tests.

Test Summary:                                 | Pass  Total  Time
disabling multithreading via keyword argument |   20     20  3.4s
        PASSED: multithreading.jl

Isn't the default setting for -t {auto|N[,]|M} auto? If so, isn't the auto setting for M 1? Just wondering if this needs to be entered as an issue on the Julia repo.

@bkamins
Copy link
Member

bkamins commented Oct 6, 2023

I read the manual entry for M setting and it is not clear for me either. I will ask for help on slack.

@nalimilan - maybe you know how M works and how it interacts with DataFrames.jl as you have implemented the multithreaded routines?

x-ref https://docs.julialang.org/en/v1/manual/multi-threading/#man-threadpools, but I think we use the default so :default should be used and the M value should not be affected.

@George9000
Copy link
Author

A coda: On Julia 1.11.0-DEV.648 (a3effa97ee) built with gmake DEPS_GIT=Statistics to ensure the latest changes by @nalimilan were pulled in, a test DataFrames#HEAD still produces an error with describe. I don't completely get the comment here so maybe I'm missing something and 1.11 will just work eventually.

Details

@nalimilan
Copy link
Member

I don't think DEPS_GIT=Statistics is enough to ensure you get the latest git master. You need to go into stdlib/Statistics and do git pull for that.

@George9000
Copy link
Author

Did a git pull in stdlibs/Statistics. Then built 1.11 using DEPS_GIT=Statistics and also without. Tested DataFrames in both cases, and both builds throw the describe error. So something is still awry. I'll wait for things to work out.

@George9000
Copy link
Author

@nalimilan Just tested DataFrames on Julia 1.10.0-rc1
Initially failed testing as posted above.
For success and passing of tests, here's what was necessary:

Julia Build

gmake DEPS_GIT=Statistics

# after the build, in the julia repo
cd stdlib/Statistics
git checkout master
git pull

# edit the Statistics Project.toml and change line 6:
version = "1.10.0"

DataFrames testing

# in the project made to test DataFrames:

pkg> dev DataFrames

# before testing
# in the dev directory where DataFrames is cloned:
edit Project.toml and under Compat add:
[compat]
Statistics = "1.10"

Following this guidance on compat.

Since the commit you applied to Statistics.jl to fix the describe issue is associated with version = "1.11" in Project.toml in the current Statistics master, I needed to manually change the Statistics Project.toml to 1.10 to work with Julia 1.10. I tried 1.11 but got an error message from Package when trying to dev DataFrames.

These steps seem quite involved. I'm not sure what needs to be done so that all of this just "works" seamlessly when Julia 1.10 is released.

KristofferC pushed a commit to JuliaLang/julia that referenced this issue Nov 9, 2023
This bumps Statistics to the latest commit of the release-1.10 branch in
order to backport JuliaStats/Statistics.jl#153.

See JuliaData/DataFrames.jl#3383. Cc: @bkamins
@George9000
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants