Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use GeoData.jl with staggered grid data? #82

Closed
ali-ramadhan opened this issue Nov 3, 2020 · 6 comments · Fixed by #83
Closed

How to use GeoData.jl with staggered grid data? #82

ali-ramadhan opened this issue Nov 3, 2020 · 6 comments · Fixed by #83

Comments

@ali-ramadhan
Copy link
Collaborator

I'm trying to load some fields from a NetCDF file but they were created on a staggered grid (Arakawa C-grid) so there are six possible dimensions (xC, yC, zC, xF, yF, zF) depending on where the variable is defined but each field one has three dimensions.

Couldn't figure out how to load to the data into an NCDarray or NCDstack without errors though. I thought it would use Dim{:name} for "undetected dimensions". Since it's 3D data I believe I should be using NCDstack.

I'm copy pasting some things I tried in the REPL below (plus what the NetCDF file looks like as an NCDataset + my environment).

PS: Beautiful package (also DimensionalData.jl)! Hoping I can move away from xarray.


Error

julia> using NCDatasets, GeoData

julia> ds = NCDstack("europa_constant_bottom_heat_flux_10W_10days_lat45_meridional.nc")
NCDstack{String,Tuple{},Tuple{},NCDstackMetadata{String,Any},Tuple{}}("europa_constant_bottom_heat_flux_10W_10days_lat45_meridional.nc", (), (), NCDstackMetadata{String,Any}(Pair{String,Any}("coriolis", "NonTraditionalBetaPlane{Float64}: fz = 7.51e-05, fy = 7.51e-05, β = 2.98e-10, γ = -5.96e-10, R = 2.52e+05"),Pair{String,Any}("rotation_period", "118327.40691486979 seconds"),Pair{String,Any}("schedule", "TimeInterval"),Pair{String,Any}("horizontal_diffusivity", "1.0 m²/s"),Pair{String,Any}("vertical_viscosity", "1 m²/s"),Pair{String,Any}("vertical_diffusivity", "1 m²/s"),Pair{String,Any}("Ekman_number", 1.5065913370998117e-5),Pair{String,Any}("interval", 5916.370345743489),Pair{String,Any}("Julia", "This file was generated using Julia Version 1.5.2\nCommit 539f3ce943 (2020-09-23 23:17 UTC)\nPlatform Info:\n  OS: Linux (x86_64-pc-linux-gnu)\n  CPU: Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz\n  WORD_SIZE: 64\n  LIBM: libopenlibm\n  LLVM: libLLVM-9.0.1 (ORCJIT, cascadelake)\n  GPU: TITAN V\n"),Pair{String,Any}("output time interval", "Output was saved every 1.643 hours.")…), ())

julia> ds[:u]
ERROR: BoundsError: attempt to access 1-element Array{Float64,1} at index [2]
Stacktrace:
 [1] getindex at ./array.jl:809 [inlined]
 [2] _ncdspan(::Array{Float64,1}, ::Ordered{ReverseIndex,ReverseArray,ForwardRelation}) at /home/alir/.julia/packages/GeoData/Vnpza/src/sources/ncdatasets.jl:422
 [3] _ncdmode(::Array{Float64,1}, ::Type{T} where T, ::EPSG, ::EPSG, ::NCDdimMetadata{String,Any}) at /home/alir/.julia/packages/GeoData/Vnpza/src/sources/ncdatasets.jl:403
 [4] dims(::NCDataset{Nothing}, ::Symbol, ::EPSG, ::EPSG) at /home/alir/.julia/packages/GeoData/Vnpza/src/sources/ncdatasets.jl:385
 [5] NCDarray(::NCDataset{Nothing}, ::String, ::Symbol; crs::EPSG, dimcrs::EPSG, name::Symbol, dims::Nothing, refdims::Tuple{}, metadata::Nothing, missingval::Missing) at /home/alir/.julia/packages/GeoData/Vnpza/src/sources/ncdatasets.jl:213
 [6] (::GeoData.var"#24#25"{NCDstack{String,Tuple{},Tuple{},NCDstackMetadata{String,Any},Tuple{}},Symbol,String})(::NCDataset{Nothing}) at /home/alir/.julia/packages/GeoData/Vnpza/src/stack.jl:195
 [7] NCDataset(::GeoData.var"#24#25"{NCDstack{String,Tuple{},Tuple{},NCDstackMetadata{String,Any},Tuple{}},Symbol,String}, ::String; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/alir/.julia/packages/NCDatasets/HhdCu/src/dataset.jl:286
 [8] NCDataset at /home/alir/.julia/packages/NCDatasets/HhdCu/src/dataset.jl:284 [inlined]
 [9] ncread at /home/alir/.julia/packages/GeoData/Vnpza/src/sources/ncdatasets.jl:37 [inlined]
 [10] withsource at /home/alir/.julia/packages/GeoData/Vnpza/src/sources/ncdatasets.jl:339 [inlined]
 [11] getindex(::NCDstack{String,Tuple{},Tuple{},NCDstackMetadata{String,Any},Tuple{}}, ::Symbol) at /home/alir/.julia/packages/GeoData/Vnpza/src/stack.jl:194
 [12] top-level scope at REPL[20]:1

NetCDF dataset

julia> ds = NCDataset("europa_constant_bottom_heat_flux_10W_10days_lat45_meridional.nc")
NCDataset: europa_constant_bottom_heat_flux_10W_10days_lat45_meridional.nc
Group: /

Dimensions
   zC = 125
   zF = 126
   xC = 1
   yF = 1251
   xF = 1
   yC = 1250
   time = 156

Variables
  zC   (125)
    Datatype:    Float64
    Dimensions:  zC
    Attributes:
     units                = m
     longname             = Locations of the cell centers in the z-direction.

  zF   (126)
    Datatype:    Float64
    Dimensions:  zF
    Attributes:
     units                = m
     longname             = Locations of the cell faces in the z-direction.

  xC   (1)
    Datatype:    Float64
    Dimensions:  xC
    Attributes:
     units                = m
     longname             = Locations of the cell centers in the x-direction.

  yF   (1251)
    Datatype:    Float64
    Dimensions:  yF
    Attributes:
     units                = m
     longname             = Locations of the cell faces in the y-direction.

  xF   (1)
    Datatype:    Float64
    Dimensions:  xF
    Attributes:
     units                = m
     longname             = Locations of the cell faces in the x-direction.

  yC   (1250)
    Datatype:    Float64
    Dimensions:  yC
    Attributes:
     units                = m
     longname             = Locations of the cell centers in the y-direction.

  time   (156)
    Datatype:    Float64
    Dimensions:  time

  v   (1 × 1251 × 125 × 156)
    Datatype:    Float32
    Dimensions:  xC × yF × zC × time
    Attributes:
     units                = m/s
     longname             = Velocity in the y-direction

  w   (1 × 1250 × 126 × 156)
    Datatype:    Float32
    Dimensions:  xC × yC × zF × time
    Attributes:
     units                = m/s
     longname             = Velocity in the z-direction

  T   (1 × 1250 × 125 × 156)
    Datatype:    Float32
    Dimensions:  xC × yC × zC × time
    Attributes:
     units                = °C
     longname             = Temperature

  u   (1 × 1250 × 125 × 156)
    Datatype:    Float32
    Dimensions:  xF × yC × zC × time
    Attributes:
     units                = m/s
     longname             = Velocity in the x-direction

Global attributes
  coriolis             = NonTraditionalBetaPlane{Float64}: fz = 7.51e-05, fy = 7.51e-05, β = 2.98e-10, γ = -5.96e-10, R = 2.52e+05
  rotation_period      = 118327.40691486979 seconds
  schedule             = TimeInterval
  horizontal_diffusivity = 1.0 m²/s
  vertical_viscosity   = 1 m²/s
  vertical_diffusivity = 1 m²/s
  Ekman_number         = 1.5065913370998117e-5
  interval             = 5916.370345743489
  Julia                = This file was generated using Julia Version 1.5.2
Commit 539f3ce943 (2020-09-23 23:17 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, cascadelake)
  GPU: TITAN V

  output time interval = Output was saved every 1.643 hours.
  rotation_rate        = 5.31e-5 s⁻¹
  simulation_rotation_periods = 10 periods
  heat_flux            = 10 W/m²
  date                 = This file was generated on 2020-10-28T07:42:09.758.
  specific_heat_capacity = 4000 J/kg/K
  Buoyancy flux        = 4.5937195715676725e-11 m²/s³
  latitude             = 45 °N
  Temperature flux     = 2.4342745861733204e-6 K m/s
  density              = 1027 kg/m³
  natural_Rossby_number = 0.00024771679708825465
  closure              = AnisotropicDiffusivity: (νx=1.0, νy=1.0, νz=1.0), (κx=(T = 1.0,), κy=(T = 1.0,), κz=(T = 1.0,))
  Oceananigans         = This file was generated using Oceananigans v0.43.0
  horizontal_viscosity = 1.0 m²/s
  europa-hydrothermal-plumes = 
  gravitational_acceleration = 0.113 m/s²

Julia environment

(@v1.5) pkg> st
Status `~/.julia/environments/v1.5/Project.toml`
  [c7e460c6] ArgParse v1.1.0
  [fbb218c0] BSON v0.2.6
  [6e4b80f9] BenchmarkTools v0.5.0
  [052768ef] CUDA v1.3.3
  [8f4d0f93] Conda v1.4.1
  [d58978e5] Dagger v0.9.2
  [68e73e28] DaggerGPU v0.1.1 `https://github.com/jpsamaroo/DaggerGPU.jl.git#master`
  [933edf66] DarkMode v0.1.0 `https://github.com/Pocket-titan/DarkMode#master`
  [a93c6f00] DataFrames v0.21.8
  [9b6fcbb8] GeoData v0.2.1
  [7073ff75] IJulia v1.21.4
  [85f8d34a] NCDatasets v0.10.4
  [9e8cae18] Oceananigans v0.43.0
  [c3e4b0f8] Pluto v0.12.4
  [08abe8d2] PrettyTables v0.9.1 `https://github.com/ronisbr/PrettyTables.jl.git#master`
  [438e738f] PyCall v1.92.1
  [d330b81b] PyPlot v2.9.0
  [295af30f] Revise v3.1.3
  [a759f4b9] TimerOutputs v0.5.6
@ali-ramadhan
Copy link
Collaborator Author

ali-ramadhan commented Nov 3, 2020

Hmmm, not sure if I'm doing this right but trying to convert an NCDatasets.CFVariable to an NCDstack seems to result in infinite recursion (or at least pretty deep recursion, I had to interrupt it):

julia> using NCDatasets, GeoData

julia> ds = NCDataset("europa_constant_bottom_heat_flux_10W_10days_lat45_meridional.nc")

julia> julia> NCDstack(ds["u"])
^CERROR: InterruptException:
Stacktrace:
 [1] NCDstack(::Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{NCDatasets.CFVariable{Float32,4,NCDatasets.Variable{Float32,4,NCDataset},NCDatasets.Attributes{NCDataset{Nothing}},NamedTuple{(:fillvalue, :scale_factor, :add_offset, :calendar, :time_origin, :time_factor),NTuple{6,Nothing}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/alir/.julia/packages/GeoData/Vnpza/src/sources/ncdatasets.jl:318
 [2] NCDstack(::Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{Tuple{NCDatasets.CFVariable{Float32,4,NCDatasets.Variable{Float32,4,NCDataset},NCDatasets.Attributes{NCDataset{Nothing}},NamedTuple{(:fillvalue, :scale_factor, :add_offset, :calendar, :time_origin, :time_factor),NTuple{6,Nothing}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}) at /home/alir/.julia/packages/GeoData/Vnpza/src/sources/ncdatasets.jl:318
 ... (the last 2 lines are repeated 203 more times)

@rafaqz
Copy link
Owner

rafaqz commented Nov 3, 2020

Stack/Array is orthogonal to dimensionality. NCDstack lets you indexing into multiple named layers, so indexing is like a NamedTuple, and a GeoArray is returned. NCDarray just uses one layer (either with a key you set, or the first one), so indexing is like an Array (some netcdf only have one layer).

In this case do you want NCDstack as you have variables v, w, T and u.

julia> ds = NCDataset("europa_constant_bottom_heat_flux_10W_10days_lat45_meridional.nc")

You can't do that with NCDataset. What you tried first should work, and seeing it didn't, this is a bug - probably due to the complexity of the file and some mistake I have made. NetCDF is pretty complicated.

It looks like your dimension index for xF and xC have length 1, and _ncspan tries to index into 2, which is BAD. If you can upload the file somewhere, I can check if the fix works in your specific case.

@ali-ramadhan
Copy link
Collaborator Author

Ah thank you for the explanation @rafaqz (and for the quick reply!). I think I got confused between layer and vertical level haha.

Not sure if it's a proper use of NetCDF but the output represents a meridional or x-slice so there's only 1 value in the x dimension.

Will try out your fix on #83!

@rafaqz
Copy link
Owner

rafaqz commented Nov 3, 2020

Ok let me know how it goes.

It's also interesting to hear that you want to move away from xarray to GeoData/DimensionalData. I would really like to hear any comparisons and feedback you have, as I don't actually use xarray.

@rafaqz rafaqz closed this as completed in #83 Nov 3, 2020
@ali-ramadhan
Copy link
Collaborator Author

Ah I've mostly been using xarray to analyze and plot NetCDF output as I really like being able to index by name instead of memorizing which axis to index into, e.g. ds[:u][X=16, Ti=end] instead of ds[:u][16, :, :, end] (gotta remember to reverse the order in Python!). It helps avoid lots of bugs.

Seems this functionality is available through GeoData + NCDatasets so I can do more stuff in Julia now.

Some people need xarray as their datasets are too big to fit in memory and xarray can use dask to distribute huge workloads across cores/nodes, but my datasets aren't that big right now.

@glwagner and I are also considering using DimensionalData.jl to get a lot of the named indexing into Oceananigans.jl fields but that would take some refactoring so we haven't done it yet.

@rafaqz
Copy link
Owner

rafaqz commented Nov 3, 2020

Cool. I think there is a way of using End how you want. I see how you would try that if you can do it in python.

struct End{T} 
    start::T
    step::T
end
End(start) = End(start, nothing)
End() = End(nothing, nothing)

(:)(start, ::Type{End}) = End(start)
(:)(start, step, ::Type{End}) = End(start, step)

Then we just have to handle it in getindex later, and grab lastindex for the dimension. So you could do:

ds[:u][X=16:End, Ti=End]

And that's a good idea to integrate with Oceananigans.jl. ClimateTools.jl builds on DimensionalData.jl as well I think. You can just copy the implementation here, it's fairly simple to extend to a ne type like I do for GeoArray. You mostly just need to add a dims and name field and methods, a rebuild method, and inherit from AbstractDimArray.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants