-
-
Notifications
You must be signed in to change notification settings - Fork 604
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update for Cuda 2 #1345
update for Cuda 2 #1345
Conversation
@DhairyaLGandhi I add to remove the manifest since cuda 2.0 is not compatible with julia 1.4. I suggest we finally get rid of it (as discussed in #500 #1164 #1234 |
@DhairyaLGandhi can we fix or remove the gitlab pipeline? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need the gitlab pipeline for precisely the GPU tests, best is to merge the Zygote fixes for now and clean CI.
test/cuda/losses.jl
Outdated
@@ -36,7 +36,11 @@ end | |||
y = rand(Float32, 3,3) | |||
|
|||
for loss in ALL_LOSSES | |||
gpu_gradtest(loss, x, y) | |||
if loss == Flux.Losses.huber_loss | |||
@test_broken 1 == 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we adding a fake test here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just wanted to remark that the huber_loss is broken for gpu and we should fix the test at some point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just @test_broken
the Huber loss test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because it is not failing the test, it is throwing an error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@test_throws
then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the error? A CUDA-related error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@test_throws then?
that's supposed to test cases where you actually want to throw an error
Doesn't seem to be a Zygote problem, it doesn't find CUDNN. When running GPU test with |
bors try |
tryBuild failed: |
That was separate, some of the runners had old versions of CUDNN installed. That is rectified since |
bors try |
tryBuild failed: |
FWIW, I think committed Manifests are very useful: to reproduce CI environment locally, for the ability to reliably bisect issues, and to make it possible to depend on unreleased dependencies. |
Agreed, we should hold off any Manifest related changes for this patch.
…On Wed, Oct 14, 2020, 00:15 Tim Besard ***@***.***> wrote:
FWIW, I think committed Manifests are very useful: to reproduce CI
environment locally, for the ability to reliably bisect issues, and to make
it possible to depend on unreleased dependencies.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1345 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AJOZVVNY6JT2J46PJBSJ2N3SKSN3TANCNFSM4SDFBRTQ>
.
|
Is there a lot of stuff holding this up? I just ask because there's a bug on nightly that makes v1 CUDA not work on MacOS systems with no cuda gpus, so Flux doesn't work on nightly. If there is a lot, it may make sense to backport the fix to CUDA v1 |
Bump :) |
If I add back the Manifest CI will fail on julia 1.4 since it will try to use the non julia 1.4 compatible CUDA 2.0 |
bors try |
tryBuild failed: |
Unless I'm missing something, 1.4 shouldn't need to be supported given 1.5 is out? |
yes, we can drop 1.4 support if we consider keeping the Manifest more important than supporting recent julia versions (btw, I do not, and I dislike some other aspects of keeping the Manifest in general, but probably I'm in minority here) |
Even with the newly released zygote, I'm experiencing all sorts fo error on my machine, especially with RNNs. Let's see what bors says bors try |
tryBuild failed: |
How are these related? Manifests aren't used by users. And for CI, we could jerry-rig something to have multiple Manifests for each Julia version. |
that would be cool. Can it be done? |
Could you elaborate on the errors? |
this is what I see on my machine (where tests passed last week, dunno why): [ Info: Testing GPU Support
LayerNorm GPU grad test: Test Failed at /home/carlo/.julia/dev/Flux/test/cuda/layers.jl:38
Expression: gs[first(ps)] isa Flux.CUDA.CuArray
Evaluated: Float32[1.2636185f-5] isa CuArray
Stacktrace:
[1] macro expansion at /home/carlo/.julia/dev/Flux/test/cuda/layers.jl:38 [inlined]
[2] macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1115 [inlined]
[3] macro expansion at /home/carlo/.julia/dev/Flux/test/cuda/layers.jl:26 [inlined]
[4] macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1115 [inlined]
[5] gradtest(::String, ::Array{UnionAll,1}, ::Array{Float32,4}, ::Int64) at /home/carlo/.julia/dev/Flux/test/cuda/layers.jl:24
[ Info: Testing Flux/CUDNN
R = RNN, batch_size = 1: Test Failed at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:30
Expression: y ≈ collect(cuy)
Evaluated: [-0.019431099902958646, -0.11841413968698347, 0.4722386310464857, -0.14604490552473026, -0.64691227373787] ≈ Float32[0.0, 0.0, 0.0, 0.0, 0.0]
Stacktrace:
[1] top-level scope at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:30
[2] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1190
[3] top-level scope at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:16
[4] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1115
[5] top-level scope at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:16
R = RNN, batch_size = 1: Test Failed at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:38
Expression: x̄ ≈ collect(cux̄)
Evaluated: [-0.830899605563262, -0.6662040633027664, -2.1067316948003136, 1.8660190087509037, -0.22096769547077244, 0.6238499990006453, 0.9966454260407338, -1.722732776318945, 1.6458662195438152, 0.86747077146643] ≈ Float32[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
Stacktrace:
[1] top-level scope at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:38
[2] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1190
[3] top-level scope at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:16
[4] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1115
[5] top-level scope at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:16
R = RNN, batch_size = 1: Test Failed at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:47
Expression: (m̄[]).state ≈ collect((cum̄[]).state)
Evaluated: [-0.8405158956849136, 2.061215493843378, 0.14061960905985366, -1.8565078327587845, 0.4424841142453512] ≈ Float32[0.0, 0.0, 0.0, 0.0, 0.0]
Stacktrace:
[1] top-level scope at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:47
[2] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1190
[3] top-level scope at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:16
[4] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1115
[5] top-level scope at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:16
R = RNN, batch_size = 5: Test Failed at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:30
Expression: y ≈ collect(cuy)
Evaluated: [-0.019431099902958646 0.1618381490680678 … -0.22441343007056594 0.047527729285282455; -0.11841413968698347 0.748564964186107 … 0.25762538051276473 0.565018971059919; … ; -0.14604490552473026 0.44967100141219984 … 0.11898398784283139 -0.20242204874078795; -0.64691227373787 -0.8838368682898494 … -0.8729884936098729 -0.8352227805469519] ≈ Float32[0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; … ; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0]
Stacktrace:
[1] top-level scope at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:30
[2] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1190
[3] top-level scope at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:16
[4] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1115
[5] top-level scope at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:16
R = RNN, batch_size = 5: Test Failed at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:38
Expression: x̄ ≈ collect(cux̄)
Evaluated: [-1.2563565767280167 -0.32043449857558765 … 0.6598185823329246 -0.8743064105599022; 0.4139337826036863 -0.0489961935370902 … 0.7104068283356213 -0.21005488704762065; … ; -1.1329831875497842 0.09261050769960308 … -0.3669623959899771 0.11156129322481852; -0.0898901569255448 -0.3161775390020851 … 0.21676706154085665 -0.09266956050321835] ≈ Float32[0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; … ; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0]
Stacktrace:
[1] top-level scope at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:38
[2] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1190
[3] top-level scope at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:16
[4] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1115
[5] top-level scope at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:16
R = RNN, batch_size = 5: Test Failed at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:47
Expression: (m̄[]).state ≈ collect((cum̄[]).state)
Evaluated: [1.1450011499435349, -3.070202557803463, -1.1615876545036332, -0.04242987457808446, 0.2762399538473071] ≈ Float32[0.0, 0.0, 0.0, 0.0, 0.0]
Stacktrace:
[1] top-level scope at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:47
[2] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1190
[3] top-level scope at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:16
[4] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1115
[5] top-level scope at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:16
R = GRU, batch_size = 1: Test Failed at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:30
Expression: y ≈ collect(cuy)
Evaluated: [-0.4275777643268079, -0.014423148979310548, 0.007237418485216347, -0.5282777904512245, -0.15764599247777863] ≈ Float32[0.0, 0.0, 0.0, 0.0, 0.0]
Stacktrace:
[1] top-level scope at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:30
[2] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1190
[3] top-level scope at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:16
[4] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1115
[5] top-level scope at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:16
R = GRU, batch_size = 1: Test Failed at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:38
Expression: x̄ ≈ collect(cux̄)
Evaluated: [-0.22776741748701154, 0.3209347956882381, 0.10693373763558503, -0.41107076672434856, -0.32476707616330297, 0.352650092060774, -0.04047323530121069, 0.1934077904424144, -0.2822926974842711, 0.19332311343347433] ≈ Float32[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
Stacktrace:
[1] top-level scope at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:38
[2] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1190
[3] top-level scope at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:16
[4] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1115
[5] top-level scope at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:16
R = GRU, batch_size = 1: Test Failed at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:47
Expression: (m̄[]).state ≈ collect((cum̄[]).state)
Evaluated: [-0.5141135637065525, -0.5711118387205082, -0.39127310148295213, 0.36650655519300074, -0.14387137441365852] ≈ Float32[-0.7155977, -0.22475724, -0.44065005, 0.53868115, -0.03074057]
Stacktrace:
[1] top-level scope at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:47
[2] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1190
[3] top-level scope at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:16
[4] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1115
[5] top-level scope at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:16
R = GRU, batch_size = 5: Test Failed at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:30
Expression: y ≈ collect(cuy)
Evaluated: [-0.4275777643268079 -0.3869860053459997 … -0.2683960454492888 -0.3939344448830501; -0.014423148979310548 0.0825856908074926 … 0.08759317579596015 0.11528816778371748; … ; -0.5282777904512245 -0.5154974736821768 … -0.38026836831507604 -0.38486394257549705; -0.15764599247777863 -0.1989142030500093 … -0.08267492009419829 -0.10008267859310813] ≈ Float32[0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; … ; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0]
Stacktrace:
[1] top-level scope at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:30
[2] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1190
[3] top-level scope at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:16
[4] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1115
[5] top-level scope at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:16
R = GRU, batch_size = 5: Test Failed at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:38
Expression: x̄ ≈ collect(cux̄)
Evaluated: [-0.167915828645572 0.10664332868172638 … -0.13235601271948444 -0.0846844520822104; 0.36443219445413677 -0.024176883724530952 … 0.14256752722756888 0.01371924757029691; … ; -0.28075880626172206 0.03838631885980828 … -0.24944711109191708 -0.0435247184088787; 0.3739027735371024 -0.05672318160495745 … 0.014029365321461928 0.10232388257206129] ≈ Float32[0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; … ; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0]
Stacktrace:
[1] top-level scope at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:38
[2] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1190
[3] top-level scope at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:16
[4] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1115
[5] top-level scope at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:16
R = GRU, batch_size = 5: Test Failed at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:47
Expression: (m̄[]).state ≈ collect((cum̄[]).state)
Evaluated: [-0.9164938006621115, 1.0211574132258705, -0.3580377108098198, -0.08689351690247682, 0.5065986969549852] ≈ Float32[-0.97730887, 0.828195, -0.37877595, 0.085035026, 0.79239774]
Stacktrace:
[1] top-level scope at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:47
[2] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1190
[3] top-level scope at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:16
[4] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1115
[5] top-level scope at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:16
R = LSTM, batch_size = 1: Test Failed at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:30
Expression: y ≈ collect(cuy)
Evaluated: [0.17133454661393993, 0.08491448663403546, 0.1581792194746248, 0.10928057416312513, -0.00353936869665635] ≈ Float32[0.0, 0.0, 0.0, 0.0, 0.0]
Stacktrace:
[1] top-level scope at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:30
[2] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1190
[3] top-level scope at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:16
[4] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1115
[5] top-level scope at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:16
R = LSTM, batch_size = 1: Test Failed at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:38
Expression: x̄ ≈ collect(cux̄)
Evaluated: [-0.030229336215151753, 0.12456343964566355, -0.01523602769955032, 0.03882078681763432, -0.19705709591272808, -0.05689166506030991, 0.09940324503612268, -0.018838281636579312, 0.09927881419095592, -0.014974470640134938] ≈ Float32[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
Stacktrace:
[1] top-level scope at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:38
[2] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1190
[3] top-level scope at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:16
[4] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1115
[5] top-level scope at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:16
R = LSTM, batch_size = 1: Test Failed at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:44
Expression: x ≈ collect(cx)
Evaluated: [-0.08668788044236499, -0.09631148376162414, 0.16450188507074665, 0.16419487349568404, -0.10264999946485072] ≈ Float32[0.0, 0.0, 0.0, 0.0, 0.0]
Stacktrace:
[1] top-level scope at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:44
[2] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1190
[3] top-level scope at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:16
[4] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1115
[5] top-level scope at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:16
R = LSTM, batch_size = 1: Test Failed at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:44
Expression: x ≈ collect(cx)
Evaluated: [0.19790952785346314, -0.14148924286137277, 0.20355174555572256, -0.23772413178250634, -0.43214533482284356] ≈ Float32[0.15913527, -0.111981004, 0.090632685, -0.26090252, -0.4198485]
Stacktrace:
[1] top-level scope at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:44
[2] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1190
[3] top-level scope at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:16
[4] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1115
[5] top-level scope at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:16
R = LSTM, batch_size = 5: Test Failed at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:30
Expression: y ≈ collect(cuy)
Evaluated: [0.17133454661393993 0.11182878930195422 … 0.1548314885630126 0.16292665019358618; 0.08491448663403546 0.06320487690104416 … 0.08889681684473064 0.12589660868354857; … ; 0.10928057416312513 0.06698992894957341 … 0.07453009457766963 0.06410766606551206; -0.00353936869665635 -0.027621339369615545 … -0.013245722894946247 0.008094052575241577] ≈ Float32[0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; … ; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0]
Stacktrace:
[1] top-level scope at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:30
[2] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1190
[3] top-level scope at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:16
[4] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1115
[5] top-level scope at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:16
R = LSTM, batch_size = 5: Test Failed at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:38
Expression: x̄ ≈ collect(cux̄)
Evaluated: [-0.10268096525608542 0.00994827058360619 … 0.16947322399116943 -0.20729710678265328; 0.08206973619000309 -0.01609505278780965 … -0.12169657342862912 0.10296755973293965; … ; 0.058128216231219505 0.16542800217809261 … -0.1275825994643608 0.10090241858739234; -0.030771661228154282 0.0030305073944870728 … -0.045480045439165784 0.12628198308549338] ≈ Float32[0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; … ; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0]
Stacktrace:
[1] top-level scope at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:38
[2] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1190
[3] top-level scope at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:16
[4] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1115
[5] top-level scope at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:16
R = LSTM, batch_size = 5: Test Failed at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:44
Expression: x ≈ collect(cx)
Evaluated: [-0.15736053657821605, -0.1298820980694558, 0.18058347703021838, 0.20431763798467564, -0.17769890918058903] ≈ Float32[0.0, 0.0, 0.0, 0.0, 0.0]
Stacktrace:
[1] top-level scope at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:44
[2] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1190
[3] top-level scope at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:16
[4] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1115
[5] top-level scope at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:16
R = LSTM, batch_size = 5: Test Failed at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:44
Expression: x ≈ collect(cx)
Evaluated: [0.05868754018291389, -0.12972053368132985, 0.16920241016444965, -0.18323024718071512, -0.6215401990328058] ≈ Float32[-0.0030328631, -0.15407962, 0.08194515, -0.21456608, -0.546141]
Stacktrace:
[1] top-level scope at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:44
[2] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1190
[3] top-level scope at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:16
[4] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1115
[5] top-level scope at /home/carlo/.julia/dev/Flux/test/cuda/curnn.jl:16
Test Summary: | Pass Fail Broken Total
CUDA | 92 21 35 148
CUDA | 9 9
onecold gpu | 2 2
restructure gpu | 1 1
GPU functors | 2 2
Losses | 44 1 45
Basic GPU Movement | 2 2
Conv GPU grad tests | 6 1 7
Pooling GPU grad tests | 2 2
AdaptivePooling GPU grad tests | 2 2
Dropout GPU grad tests | 1 1 2
Normalising GPU grad tests | 3 1 4
LayerNorm GPU grad test | 1 1 2
BatchNorm GPU grad test | 2 2
InstanceNorm GPU grad tests | 1 1
GroupNorm GPU grad tests | 1 1
Stateless GPU grad tests | 1 1
CUDNN BatchNorm | 8 8
R = RNN | 1 2 3
R = GRU | 1 2 3
R = LSTM | 1 2 3
RNN | 6 20 24 50
R = RNN, batch_size = 1 | 1 3 4 8
R = RNN, batch_size = 5 | 1 3 4 8
R = GRU, batch_size = 1 | 1 3 4 8
R = GRU, batch_size = 5 | 1 3 4 8
R = LSTM, batch_size = 1 | 1 4 4 9
R = LSTM, batch_size = 5 | 1 4 4 9
ERROR: LoadError: Some tests did not pass: 92 passed, 21 failed, 0 errored, 35 broken.
in expression starting at /home/carlo/.julia/dev/Flux/test/runtests.jl:36
ERROR: Package Flux errored during testing
|
bors instead still has problems in finding CUDNN https://gitlab.com/JuliaGPU/Flux.jl/-/jobs/796597125 |
You're using an explicit |
Yeah I'll have a look. |
since I removed the manifest I also added julia 1.3 to CI, as it should be since we support it. I have yet to see a single situation where having a Manifest in flux provided some advantage, while I have seen several induced problems. I strongly suggest we give manifest removal a try and see how it goes, then we can reinstate it back at any point, also temporarily while working on a PR. |
bors try |
tryBuild succeeded: |
if loss == Flux.Losses.huber_loss # remove when fixed | ||
try | ||
gpu_gradtest(loss, x, y) | ||
@test 1 == 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still not super sure about this one, maybe we can have a different dispatch on huber_loss
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is no big deal, we shouldn't overthink it too much. hopefully, we will fix it soon
.gitignore
Outdated
docs/build/ | ||
docs/site/ | ||
deps | ||
Manifest.toml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we re-add the manifest for this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just want to understand if the environment needed to be updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it did, since we moved to cuda 2.1.
...
By adding a dependency on a specific revision to the Manifest you would have been able to test the updated CUDA.jl days ago, without requiring me to tag a release. And this has been done in the past in Flux.jl: 1c0e9ac#diff-6d1205e03b7a43c01ed2ac8b8c83eb58c6f3a3b1360769e18644245abee8c565R95 |
This can be easily done by temporarily reinstating the Manifest in a PR. |
Can we merge this as it is? that is a) no Manifest --> CI on julia 1.3 + always test on up-to-date ecosystem Alternatives are b) broken CI I strongly favor a). @DhairyaLGandhi if you really dislike a), tell me if I should go with b) or c), I'd like this to be merged as soon as possible |
We often have cases (even with the updating CUDNN) when we were incompatible with NNlib and CUDA at the same time, or even with LoopVectorisation. I definitely feel having the Manifest is useful, and managing it in a PR by PR basis is untenable since the dependencies would vary much more for every single PR, compared to having a set of known ones to go to. It's not about disliking Julia 1.3 support (on the contrary, I dislike not supporting the latest versions of packages on LTS versions of the language), its about moving forward, and this change is not critical to getting the CUDA updates in, which is a bigger deal to me. We used to Having Julia 1.3 would still fetch the old dependencies when the package is added by a user. The downside is we don't get to test it on CI with every commit. |
.gitignore
Outdated
docs/build/ | ||
docs/site/ | ||
deps | ||
Manifest.toml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Manifest.toml |
Does this work to get the PR merged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if adding back the manifest, we should also remove CI on julia 1.3 from travis, I don't really want to see red lights for all PRs
These incompatibilities are very rare, surely not common enough to be worth dropping CI support or julia 1.3 support. Anyway, let's reinstate the Manifest. I wouldn't keep julia 1.3 and 1.4 support without CI, it's not healthy, so I think we should just drop < 1.5 support. Do you agree? |
We can drop 1.5 support for now since we have some fixes on master that need to be tagged alongside CUDA 2 anyway |
bors r+ |
Build succeeded: |
and mark as broken a cuda test on the
huber_loss
that was already failing on master (for reasons i do not understand)