Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some little problems with the tutorial DINCAE_tutorial.ipynb #9

Closed
jmbeckers opened this issue Jun 7, 2022 · 35 comments
Closed

some little problems with the tutorial DINCAE_tutorial.ipynb #9

jmbeckers opened this issue Jun 7, 2022 · 35 comments

Comments

@jmbeckers
Copy link
Member

using Pkg
Pkg.add(url="https://github.com/gher-ulg/DINCAE.jl", rev="main")
Pkg.add(url="https://github.com/gher-ulg/DINCAE_utils.jl", rev="main")

worked without problem on my windows version 1.7.1 IJulia, but did NOT install CUDA nor Knet.

easily corrected by installing the packages by hand.

Later the kernel is killed when trying

ds = NCDataset(url)

So I guess it is again that NetCDF problem under windows ?

@jmbeckers
Copy link
Member Author

Forcing

using Pkg
Pkg.add("NetCDF_jll")
Pkg.pin(name="NetCDF_jll", version="400.702.400")

did not resolve the issue. Packages:

  Status `C:\Users\jmbeckers\.julia\environments\v1.7\Project.toml`

[6e4b80f9] BenchmarkTools v1.3.1
[052768ef] CUDA v3.10.1
[479239e8] Catalyst v10.8.0
[7a955b69] CircularArrays v1.3.0
[861a8166] Combinatorics v1.0.2
[0d879ee6] DINCAE v2.0.1 https://github.com/gher-ulg/DINCAE.jl#main
[f57bf84d] DINCAE_utils v0.1.0 https://github.com/gher-ulg/DINCAE_utils.jl#main
[efc8151c] DIVAnd v2.7.7 c:/Users/jmbeckers/Documents/GitHub/DIVAnd.jl
[0c46a032] DifferentialEquations v7.1.0
[31c24e10] Distributions v0.25.53
[7073ff75] IJulia v1.23.2
[916415d5] Images v0.25.2
[a98d9a8b] Interpolations v0.13.5
[1902f260] Knet v1.4.10
[961ee093] ModelingToolkit v8.11.0
[85f8d34a] NCDatasets v0.12.4
[6fe1bfb0] OffsetArrays v1.11.0
[91a5bcdd] Plots v1.28.0
[d330b81b] PyPlot v2.10.0
[860ef19b] StableRNGs v1.0.0
[7243133f] NetCDF_jll v400.702.400+0 ⚲

@ctroupin
Copy link
Member

ctroupin commented Jun 7, 2022

I had previously problems to read the netCDF from the OPEnDAP URL, while it worked on a local file.

Does opening a local file works for you?

@jmbeckers
Copy link
Member Author

Yep, opening local file is ok (at least no crash)

@Alexander-Barth
Copy link
Member

worked without problem on my windows version 1.7.1 IJulia, but did NOT install CUDA nor Knet.

I added now these instructions:
b7e926c

Later the kernel is killed when trying
ds = NCDataset(url)

Do you have an error message? For instance when running this line in the REPL. I cannot reproduce it on my system:

julia> using NCDatasets
[ Info: Precompiling NCDatasets [85f8d34a-cbdd-5861-8df4-14fed0d494ab]
url = "https://thredds.jpl.nasa.gov/thredds/dodsC/ncml_aggregation/OceanTemperature/modis/terra/11um/4km/aggregate__MODIS_TERRA_L3_SST_THERMAL_DAILY_4KM_DAYTIME_V2019.0.ncml#fillmismatch"
ds = NCDataset(url)
julia> url = "https://thredds.jpl.nasa.gov/thredds/dodsC/ncml_aggregation/OceanTemperature/modis/terra/11um/4km/aggregate__MODIS_TERRA_L3_SST_THERMAL_DAILY_4KM_DAYTIME_V2019.0.ncml#fillmismatch"
"https://thredds.jpl.nasa.gov/thredds/dodsC/ncml_aggregation/OceanTemperature/modis/terra/11um/4km/aggregate__MODIS_TERRA_L3_SST_THERMAL_DAILY_4KM_DAYTIME_V2019.0.ncml#fillmismatch"

julia> ds = NCDataset(url)
NCDataset: https://thredds.jpl.nasa.gov/thredds/dodsC/ncml_aggregation/OceanTemperature/modis/terra/11um/4km/aggregate__MODIS_TERRA_L3_SST_THERMAL_DAILY_4KM_DAYTIME_V2019.0.ncml#fillmismatch
Group: /

Dimensions
   eightbitcolor = 256
   lat = 4320
   lon = 8640
   rgb = 3
   time = 7602
[...]

I rarely use jupyter notebook anymore because troubleshooting is sometimes quite challenging (e.g. crash without error message).

@jmbeckers
Copy link
Member Author

If run in REPL, it kills the Julia session (no time to see the error message)

@Alexander-Barth
Copy link
Member

What happen if you first open a Windows Cmd window, then type drive:\path\to\bin\julia. Does the Cmd window stay open after the error message?

Does e.g. this URL work?

NCDataset("https://erddap.ifremer.fr/erddap/griddap/SDC_GLO_CLIM_TS_V2_1")

Maybe it is related to #fillmismatch ( Unidata/netcdf-c#1614 )

@jmbeckers
Copy link
Member Author

jmbeckers commented Jun 7, 2022

Same crash with NCDataset("https://erddap.ifremer.fr/erddap/griddap/SDC_GLO_CLIM_TS_V2_1") (and I already took out the #fillmismatch to be sure)

Running julia.exe from a command window does the trick and provides the following error message

NCDataset("https://erddap.ifremer.fr/erddap/griddap/SDC_GLO_CLIM_TS_V2_1")
state->auth.curlflags.cookiejar != NULL

Assertion failed: ocpanic(("state->auth.curlflags.cookiejar != NULL")), file ocinternal.c, line 566

signal (22): SIGABRT
in expression starting at REPL[2]:1
crt_sig_handler at /cygdrive/c/buildbot/worker/package_win64/build/src\signals-win.c:92
raise at C:\Windows\System32\msvcrt.dll (unknown line)
abort at C:\Windows\System32\msvcrt.dll (unknown line)
assert at C:\Windows\System32\msvcrt.dll (unknown line)
ocset_curlproperties at C:\Users\jmbeckers.julia\artifacts\a81fe95ac632a7fa76c5e9cbe522c998aee9fa21\bin\libnetcdf-18.dll (unknown line)
ocopen at C:\Users\jmbeckers.julia\artifacts\a81fe95ac632a7fa76c5e9cbe522c998aee9fa21\bin\libnetcdf-18.dll (unknown line)
.text at C:\Users\jmbeckers.julia\artifacts\a81fe95ac632a7fa76c5e9cbe522c998aee9fa21\bin\libnetcdf-18.dll (unknown line)
NCD2_open at C:\Users\jmbeckers.julia\artifacts\a81fe95ac632a7fa76c5e9cbe522c998aee9fa21\bin\libnetcdf-18.dll (unknown line)
NC_open at C:\Users\jmbeckers.julia\artifacts\a81fe95ac632a7fa76c5e9cbe522c998aee9fa21\bin\libnetcdf-18.dll (unknown line)
nc_open at C:\Users\jmbeckers.julia\artifacts\a81fe95ac632a7fa76c5e9cbe522c998aee9fa21\bin\libnetcdf-18.dll (unknown line)
nc_open at C:\Users\jmbeckers.julia\packages\NCDatasets\XVX8L\src\netcdf_c.jl:266
unknown function (ip: 000000005f6f6b8d)
#NCDataset#12 at C:\Users\jmbeckers.julia\packages\NCDatasets\XVX8L\src\dataset.jl:187
NCDataset at C:\Users\jmbeckers.julia\packages\NCDatasets\XVX8L\src\dataset.jl:157 [inlined]
NCDataset at C:\Users\jmbeckers.julia\packages\NCDatasets\XVX8L\src\dataset.jl:157
jl_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1788 [inlined]
do_call at /cygdrive/c/buildbot/worker/package_win64/build/src\interpreter.c:126
eval_value at /cygdrive/c/buildbot/worker/package_win64/build/src\interpreter.c:215
eval_stmt_value at /cygdrive/c/buildbot/worker/package_win64/build/src\interpreter.c:166 [inlined]
eval_body at /cygdrive/c/buildbot/worker/package_win64/build/src\interpreter.c:583
jl_interpret_toplevel_thunk at /cygdrive/c/buildbot/worker/package_win64/build/src\interpreter.c:731
jl_toplevel_eval_flex at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:885
jl_toplevel_eval_flex at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:830
jl_toplevel_eval at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:894 [inlined]
jl_toplevel_eval_in at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:944
eval at .\boot.jl:373 [inlined]
eval_user_input at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.7\REPL\src\REPL.jl:150
repl_backend_loop at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.7\REPL\src\REPL.jl:244
start_repl_backend at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.7\REPL\src\REPL.jl:229
#run_repl#47 at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.7\REPL\src\REPL.jl:362
run_repl at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.7\REPL\src\REPL.jl:349
#930 at .\client.jl:394
jfptr_YY.930_35191.clone_1 at C:\Users\jmbeckers\AppData\Local\Programs\Julia-1.7.1\lib\julia\sys.dll (unknown line)
jl_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1788 [inlined]
jl_f__call_latest at /cygdrive/c/buildbot/worker/package_win64/build/src\builtins.c:757
#invokelatest#2 at .\essentials.jl:716 [inlined]
invokelatest at .\essentials.jl:714 [inlined]
run_main_repl at .\client.jl:379
exec_options at .\client.jl:309
_start at .\client.jl:495
jfptr__start_43221.clone_1 at C:\Users\jmbeckers\AppData\Local\Programs\Julia-1.7.1\lib\julia\sys.dll (unknown line)
jl_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1788 [inlined]
true_main at /cygdrive/c/buildbot/worker/package_win64/build/src\jlapi.c:559
jl_repl_entrypoint at /cygdrive/c/buildbot/worker/package_win64/build/src\jlapi.c:701
mainCRTStartup at /cygdrive/c/buildbot/worker/package_win64/build/cli\loader_exe.c:42
BaseThreadInitThunk at C:\Windows\System32\KERNEL32.DLL (unknown line)
RtlUserThreadStart at C:\Windows\SYSTEM32\ntdll.dll (unknown line)
Allocations: 722840 (Pool: 722461; Big: 379); GC: 1

@jmbeckers
Copy link
Member Author

Windows seems to have its own curl.exe ( in C:/Windows/System32, which is of course in the PATH).

@Alexander-Barth
Copy link
Member

Alexander-Barth commented Jun 7, 2022

This fails also in NCDatasets' CI:

https://github.com/Alexander-Barth/NCDatasets.jl/runs/6774217910?check_suite_focus=true#step:8:112

(so it is not specific to a machine or to a julia version)

@Alexander-Barth
Copy link
Member

@Alexander-Barth
Copy link
Member

@jmbeckers
Copy link
Member Author

Looks suspicious indeed.

@jmbeckers
Copy link
Member Author

julia> NCDataset("https://thredds.jpl.nasa.gov/thredds/dodsC/ncml_aggregation/OceanTemperature/modis/terra/11um/4km/aggregate__MODIS_TERRA_L3_SST_THERMAL_DAILY_4KM_DAYTIME_V2019.0.ncml#fillmismatch")
state->auth.curlflags.cookiejar != NULL

Assertion failed: ocpanic(("state->auth.curlflags.cookiejar != NULL")), file ocinternal.c, line 566

signal (22): SIGABRT
in expression starting at REPL[3]:1
crt_sig_handler at /cygdrive/c/buildbot/worker/package_win64/build/src\signals-win.c:92
raise at C:\Windows\System32\msvcrt.dll (unknown line)
abort at C:\Windows\System32\msvcrt.dll (unknown line)
assert at C:\Windows\System32\msvcrt.dll (unknown line)
ocset_curlproperties at C:\Users\jmbeckers.julia\artifacts\a81fe95ac632a7fa76c5e9cbe522c998aee9fa21\bin\libnetcdf-18.dll (unknown line)
ocopen at C:\Users\jmbeckers.julia\artifacts\a81fe95ac632a7fa76c5e9cbe522c998aee9fa21\bin\libnetcdf-18.dll (unknown line)
.text at C:\Users\jmbeckers.julia\artifacts\a81fe95ac632a7fa76c5e9cbe522c998aee9fa21\bin\libnetcdf-18.dll (unknown line)
NCD2_open at C:\Users\jmbeckers.julia\artifacts\a81fe95ac632a7fa76c5e9cbe522c998aee9fa21\bin\libnetcdf-18.dll (unknown line)
NC_open at C:\Users\jmbeckers.julia\artifacts\a81fe95ac632a7fa76c5e9cbe522c998aee9fa21\bin\libnetcdf-18.dll (unknown line)
nc_open at C:\Users\jmbeckers.julia\artifacts\a81fe95ac632a7fa76c5e9cbe522c998aee9fa21\bin\libnetcdf-18.dll (unknown line)
nc_open at C:\Users\jmbeckers.julia\packages\NCDatasets\XVX8L\src\netcdf_c.jl:266
unknown function (ip: 000000005f6885bd)
#NCDataset#12 at C:\Users\jmbeckers.julia\packages\NCDatasets\XVX8L\src\dataset.jl:187
NCDataset at C:\Users\jmbeckers.julia\packages\NCDatasets\XVX8L\src\dataset.jl:157 [inlined]
NCDataset at C:\Users\jmbeckers.julia\packages\NCDatasets\XVX8L\src\dataset.jl:157
jl_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1788 [inlined]
do_call at /cygdrive/c/buildbot/worker/package_win64/build/src\interpreter.c:126
eval_value at /cygdrive/c/buildbot/worker/package_win64/build/src\interpreter.c:215
eval_stmt_value at /cygdrive/c/buildbot/worker/package_win64/build/src\interpreter.c:166 [inlined]
eval_body at /cygdrive/c/buildbot/worker/package_win64/build/src\interpreter.c:583
jl_interpret_toplevel_thunk at /cygdrive/c/buildbot/worker/package_win64/build/src\interpreter.c:731
jl_toplevel_eval_flex at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:885
jl_toplevel_eval_flex at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:830
jl_toplevel_eval at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:894 [inlined]
jl_toplevel_eval_in at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:944
eval at .\boot.jl:373 [inlined]
eval_user_input at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.7\REPL\src\REPL.jl:150
repl_backend_loop at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.7\REPL\src\REPL.jl:244
start_repl_backend at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.7\REPL\src\REPL.jl:229
#run_repl#47 at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.7\REPL\src\REPL.jl:362
run_repl at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.7\REPL\src\REPL.jl:349
#930 at .\client.jl:394
jfptr_YY.930_35191.clone_1 at C:\Users\jmbeckers\AppData\Local\Programs\Julia-1.7.1\lib\julia\sys.dll (unknown line)
jl_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1788 [inlined]
jl_f__call_latest at /cygdrive/c/buildbot/worker/package_win64/build/src\builtins.c:757
#invokelatest#2 at .\essentials.jl:716 [inlined]
invokelatest at .\essentials.jl:714 [inlined]
run_main_repl at .\client.jl:379
exec_options at .\client.jl:309
_start at .\client.jl:495
jfptr__start_43221.clone_1 at C:\Users\jmbeckers\AppData\Local\Programs\Julia-1.7.1\lib\julia\sys.dll (unknown line)
jl_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1788 [inlined]
true_main at /cygdrive/c/buildbot/worker/package_win64/build/src\jlapi.c:559
jl_repl_entrypoint at /cygdrive/c/buildbot/worker/package_win64/build/src\jlapi.c:701
mainCRTStartup at /cygdrive/c/buildbot/worker/package_win64/build/cli\loader_exe.c:42
BaseThreadInitThunk at C:\Windows\System32\KERNEL32.DLL (unknown line)
RtlUserThreadStart at C:\Windows\SYSTEM32\ntdll.dll (unknown line)
Allocations: 725617 (Pool: 725237; Big: 380); GC: 1

@Alexander-Barth
Copy link
Member

I guess it is better to report it at https://github.com/Unidata/netcdf-c/issues
In Julia we are using the mingw compiler (x86_64-w64-mingw32-gcc (GCC) 4.8.5 and NetCDF 4.7.4 (for windows).

@jmbeckers
Copy link
Member Author

Solution for this problem:
Unidata/netcdf-c#2380

@jmbeckers
Copy link
Member Author

Process advances to next problem:
sst_t = copy(sst)
sst_t[(qual .> 3) .& .!ismissing.(qual)] .= missing
sst_t[.!ismissing.(sst) .& (sst_t .> 40)] .= missing

ArgumentError: unable to check bounds for indices of type Missing

Stacktrace:
[1] checkindex(#unused#::Type{Bool}, inds::Base.OneTo{Int64}, i::Missing)
@ Base .\abstractarray.jl:713
[2] checkindex
@ .\abstractarray.jl:728 [inlined]
[3] checkbounds
@ .\abstractarray.jl:641 [inlined]
[4] checkbounds
@ .\abstractarray.jl:656 [inlined]
[5] view
@ .\subarray.jl:177 [inlined]
[6] maybeview
@ .\views.jl:146 [inlined]
[7] dotview(::Array{Union{Missing, Float32}, 3}, ::Array{Union{Missing, Bool}, 3})
@ Base.Broadcast .\broadcast.jl:1200
[8] top-level scope
@ In[11]:3
[9] eval
@ .\boot.jl:373 [inlined]
[10] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
@ Base .\loading.jl:1196

1

@info "number of missing obs

In Jupyter, the notebook continues with
Info: number of missing observations: 18480900
└ @ Main In[12]:1
┌ Info: number of valid observations: 94430406
└ @ Main In[12]:2

@jmbeckers
Copy link
Member Author

Suggestion: since people will try on CPU (as I do), add a progress indicator in the outer loop on epochs of DINCAE.reconstruct (and maybe flush stdout to make sure one can see progress).

@Alexander-Barth
Copy link
Member

Strange I don't see this issue on julia 1.7.2:

sst_t = copy(sst);
sst_t[(qual .> 3) .& .!ismissing.(qual)] .= missing;
sst_t[.!ismissing.(sst) .& (sst_t .> 40)] .= missing;
@show size(sst_t)
# size(sst_t) = (149, 106, 7149)

Maybe a bug in julia 1.7.1 fixed in 1.7.2?

I have this:

julia> false & missing
false

Maybe you have missing?

progress indicator in the outer loop on epochs of DINCAE.reconstruct (and maybe flush stdout to make sure one can see progress).
There is this already:
https://github.com/gher-ulg/DINCAE.jl/blob/main/src/model.jl#L782

However, running on the CPU will be really really slow. Do you see "epoch: ??? loss: ???" when running the script from the REPL? Maybe jupyter is buffering the output? Or maybe it is even not finishing the first epoch in a reasonable time?

@Alexander-Barth
Copy link
Member

Indeed, flush(stdout) seem to be necessary for the notebook (it is not for the script, but it does not harm to add it).

@jmbeckers
Copy link
Member Author

jmbeckers commented Jun 8, 2022

Its definitively buffering, when I stop the kernel under Jupyter, one sees how "far" he has gone :
epoch: 1 loss 7.2259
epoch: 2 loss -5.2457
epoch: 3 loss -5.5224
epoch: 4 loss -5.8101
epoch: 5 loss -5.8607
epoch: 6 loss -5.9163
epoch: 7 loss -5.4619
epoch: 8 loss -6.1039
epoch: 9 loss -5.9513
epoch: 10 loss -6.0861
epoch: 11 loss -5.9090

I would force
flush(stdout)

For the missing thing, if have false & missing being false and running the .jl file under REPL is still running.
(both 1.7.1). I will upgrade to 1.7.2 and report back

Edit: REPL finished and the missing problem is missing ;-). So .jl works under REPL 1.7.1. I will try to run again under Jupyter

@Alexander-Barth
Copy link
Member

Alexander-Barth commented Jun 8, 2022

OK, I added flush in the main branch. I am wondering how long it took approximatively for 11 epochs on CPU. Did julia use multiple threads?

The missing thinks get quite strange:

sst_t = copy(sst)
sst_t[(qual .> 3) .& .!ismissing.(qual)] .= missing
sst_t[.!ismissing.(sst) .& (sst_t .> 40)] .= missing

works for me in Julia 1.7.2 (in REPL and jupyter notebook)

But this works in the REPL:

julia> d = [missing, 1]; (d .> 3) .& .!ismissing.(d)
2-element BitVector:
 0
 0

but fails in jupyter notebook:

d = [missing, 1]; (d .> 3) .& .!ismissing.(d)
MethodError: no method matching &(::Vector{Union{Missing, Int64}}, ::Bool)
Closest candidates are:
  &(::Any, ::Any, ::Any, ::Any...) at /opt/julia-1.7.2/share/julia/base/operators.jl:655
  &(::Bool, ::Bool) at /opt/julia-1.7.2/share/julia/base/bool.jl:38
  &(::T, ::T) where T<:Integer at /opt/julia-1.7.2/share/julia/base/promotion.jl:464

with the same julia version. But this issue disappears when opening a new notebook.

@jmbeckers
Copy link
Member Author

jmbeckers commented Jun 8, 2022

d = [missing, 1]; (d .> 3) .& .!ismissing.(d)

work on both REPL and notebook 1.7.1 when loading no packages or all packages of the tutorial.

@jmbeckers
Copy link
Member Author

Now also the tutorial .jl file loaded into a notebook works !
I now will take the .ipynb again ...

@Alexander-Barth
Copy link
Member

🤦 OK, I must have accidentally overwritten the > operator by copying pasting code a bit to ferociously:

@which 1 > 0
# output >(julia, d) in Main at In[4]:1

@jmbeckers
Copy link
Member Author

While waiting to see if the ipynb now works,
I run 11 epochs over a night, single threaded ...

@ctroupin
Copy link
Member

ctroupin commented Jun 8, 2022

OK it's good to have an idea of how long it takes, I tried yesterday for several hours without having a single epoch completed.

Meanwhile with GPU I have memory problems.

@Alexander-Barth
Copy link
Member

I run 11 epochs over a night, single threaded ...

OK, that is very slow (but expected on CPU). I just tested on a GPU ( NVIDIA GeForce RTX 3080) for example, it take 3 minutes for 11 epochs.

To reduce GPU memory:
924db8f
The example works for me on an older GeForce GTX 1080 (with 8 GB of GPU RAM), but I think I am very close at the limit. It is good to check with nvidia-smi -l that only one process uses the GPU for computing.

dragon2 and hercules2 have (a few) nodes with GPUs.

@jmbeckers
Copy link
Member Author

Now the notebook works fine. I do not know what happened with the missing thing; maybe I did not restart the Kernel and something inconsistent remained.
Anyway, the only thing to decide before closing the issue is how to inform windows users or include a workaround for windows users with the .dodsrc file.

@Alexander-Barth
Copy link
Member

Anyway, the only thing to decide before closing the issue is how to inform windows users or include a workaround for windows users with the .dodsrc file.

Yesterday, I tried to recompile NetCDF 4.8.1 on Windows. I planned to debug with gdb to see why we have these crashes on windows. But I could not get msys2 gdb to be installed (msys2/MINGW-packages#6196 (comment)). In any case with NetCDF 4.8.1 this opendap issue is resolved but we have the more serious issue crashing when creating NetCDF4 files on Windows.

I can document the .dodsrc work-around at NCDatasets and we link in the tutorial to this issue.

@jmbeckers
Copy link
Member Author

I get crazy, I restarted the Kernel and now I get again
ArgumentError: unable to check bounds for indices of type Missing

Stacktrace:
[1] checkindex(#unused#::Type{Bool}, inds::Base.OneTo{Int64}, i::Missing)
@ Base .\abstractarray.jl:713
[2] checkindex
@ .\abstractarray.jl:728 [inlined]
[3] checkbounds
@ .\abstractarray.jl:641 [inlined]
[4] checkbounds
@ .\abstractarray.jl:656 [inlined]
[5] view
@ .\subarray.jl:177 [inlined]
[6] maybeview
@ .\views.jl:146 [inlined]
[7] dotview(::Array{Union{Missing, Float32}, 3}, ::Array{Union{Missing, Bool}, 3})
@ Base.Broadcast .\broadcast.jl:1200
[8] top-level scope
@ In[7]:3
[9] eval
@ .\boot.jl:373 [inlined]
[10] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
@ Base .\loading.jl:1196

@jmbeckers
Copy link
Member Author

In any case, maybe it would be better to replace
sst_t[.!ismissing.(sst) .& (sst_t .> 40)] .= missing
by
sst_t[.!ismissing.(sst_t) .& (sst_t .> 40)] .= missing

@Alexander-Barth
Copy link
Member

ok done here:

7b41f61

@jmbeckers jmbeckers reopened this Jun 14, 2022
@jmbeckers
Copy link
Member Author

When reducing to 10 epochs to run it on a cpu, calculations are done but plotting does not succeed:

NetCDF error: Variable 'lon' not found in file ~/Data/SST-AlboranSea-example\Results\data-avg.nc (NetCDF error code: -49)

Stacktrace:
[1] nc_inq_varid(ncid::Int32, name::String)
@ NCDatasets C:\Users\jmbeckers.julia\packages\NCDatasets\XVX8L\src\netcdf_c.jl:1475
[2] variable
@ C:\Users\jmbeckers.julia\packages\NCDatasets\XVX8L\src\variable.jl:76 [inlined]
[3] cfvariable(ds::NCDataset{Nothing}, varname::String)
@ NCDatasets C:\Users\jmbeckers.julia\packages\NCDatasets\XVX8L\src\cfvariable.jl:355
[4] getindex
@ C:\Users\jmbeckers.julia\packages\NCDatasets\XVX8L\src\cfvariable.jl:452 [inlined]
[5] loadbatch(case::NamedTuple{(:fname_orig, :fname_cv, :varname), Tuple{String, String, String}}, fname::String)
@ DINCAE_utils C:\Users\jmbeckers.julia\packages\DINCAE_utils\IwZgW\src\validation.jl:22
[6] summary(case::NamedTuple{(:fname_orig, :fname_cv, :varname), Tuple{String, String, String}}, fname::String; fnamesummary::String)
@ DINCAE_utils C:\Users\jmbeckers.julia\packages\DINCAE_utils\IwZgW\src\validation.jl:82
[7] summary
@ C:\Users\jmbeckers.julia\packages\DINCAE_utils\IwZgW\src\validation.jl:77 [inlined]
[8] cvrms(case::NamedTuple{(:fname_orig, :fname_cv, :varname), Tuple{String, String, String}}, fname::String; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ DINCAE_utils C:\Users\jmbeckers.julia\packages\DINCAE_utils\IwZgW\src\validation.jl:122
[9] cvrms(case::NamedTuple{(:fname_orig, :fname_cv, :varname), Tuple{String, String, String}}, fname::String)
@ DINCAE_utils C:\Users\jmbeckers.julia\packages\DINCAE_utils\IwZgW\src\validation.jl:122
[10] top-level scope
@ In[19]:7
[11] eval
@ .\boot.jl:373 [inlined]
[12] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
@ Base .\loading.jl:1196

Looking at the file content
using NCDatasets
@show fnameavg
dstt = Dataset(fnameavg)

fnameavg = "~/Data/SST-AlboranSea-example\Results\data-avg.nc"

Group: /

Dimensions
epochs = 10

Variables
losses (10)
Datatype: Float64
Dimensions: epochs

Probably related to

save_epochs = 200:10:epochs

which I forgot to change also when reducing epochs to 10...

So maybe a warning in the code when there are no savings of epochs asked (to help identifying the problem) ?

@Alexander-Barth
Copy link
Member

Yes, save_epochs was inconsistent with epoch in this test. The update code raised an error now for such situation and the default value of save_epochs is now min(epochs,200):10:epochs so that your test would save saved the last result (epoch = 10).

By the way, julia has this nice feature that the default value of a parameter (here save_epoch) can dependent on the actual value of other parameters (here epoch). Python might have this in future https://peps.python.org/pep-0671/.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants