Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement FilePattern.file_type #322

Merged
merged 11 commits into from
Mar 11, 2022
Merged

Conversation

cisaacstern
Copy link
Member

Closes #320

Wanted to prioritize this because it will resolve:

Defaulting FilePattern.file_type to FileType.netcdf4 means that this PR is backwards compatible with recent releases and documentation. We might add a block quote note about this in the docs somewhere to the effect of:

Note: We assume your inputs are NetCDF4 files. If they are NetCDF3, simply specify FilePattern(..., file_type="netcdf3")

The only other thing I think I'd like to do is consider wrapping the xr.open_dataset call in XarrayZarrRecipe in a try/except, so that we can raise a more descriptive error in situations like those that motivated this PR.

@cisaacstern
Copy link
Member Author

The failed tests are the same upstream error seen in #303 (comment). Setting that aside for the moment...

@rabernat, are you aware of anything about the "classic CDF-1" format which would make it incompatible with appending to an existing Zarr store via ds.to_zarr?

Here's what we know:

Using the procedure recommended in the Unidata docs, we can confirm that this file is in "classic CDF-1" format:

$ wget http://tds.coaps.fsu.edu/thredds/fileServer/samos/data/research/ZCYL5/2021/ZCYL5_20210101v30001.nc
$ od -An -c -N4 ZCYL5_20210101v30001.nc
           C   D   F 001

There is no mention of netcdf3 per se in those docs, so assuming this is analogous...?

With xarray, we can open the file, dump it to Zarr, and get the data back out of Zarr:

import xarray as xr
ds = xr.open_dataset("ZCYL5_20210101v30001.nc", engine="scipy")
print(ds)
<xarray.Dataset>
Dimensions:      (time: 1439, h_num: 50)
Coordinates:
  * time         (time) datetime64[ns] 2021-01-01 ... 2021-01-01T23:59:00
Dimensions without coordinates: h_num
Data variables: (12/38)
    lat          (time) float32 ...
    lon          (time) float32 ...
    PL_HD        (time) float32 ...
    PL_CRS       (time) float32 ...
    DIR          (time) float32 ...
    DIR2         (time) float32 ...
    ...           ...
    RAD_PAR      (time) float32 ...
    RAD_PAR2     (time) float32 ...
    date         (time) int32 ...
    time_of_day  (time) int32 ...
    flag         (time) |S35 ...
    history      (h_num) |S236 ...
Attributes: (12/22)
    title:                       FALKOR Meteorological Data
    site:                        FALKOR
    elev:                        0
    ID:                          ZCYL5
    IMO:                         007928677
    platform:                    unknown at this time
    ...                          ...
    Cruise_id:                   Cruise_id undefined for now
    Data_modification_date:      01/12/2021 13:21:16 EST
    Metadata_modification_date:  01/12/2021 13:21:16 EST
    metadata_retrieved_from:     ZCYL5_20210101v10001.nc
    files_merged:                [ZCYL5_20210101v10001.nc]
    merger_version:              v001
ds.to_zarr("my.zarr")
ds_zarr = xr.open_zarr("my.zarr")
print(ds_zarr)
<xarray.Dataset>
Dimensions:      (time: 1439, h_num: 50)
Coordinates:
  * time         (time) datetime64[ns] 2021-01-01 ... 2021-01-01T23:59:00
Dimensions without coordinates: h_num
Data variables: (12/38)
    CNDC         (time) float32 dask.array<chunksize=(1439,), meta=np.ndarray>
    DIR          (time) float32 dask.array<chunksize=(1439,), meta=np.ndarray>
    DIR2         (time) float32 dask.array<chunksize=(1439,), meta=np.ndarray>
    DIR3         (time) float32 dask.array<chunksize=(1439,), meta=np.ndarray>
    P            (time) float32 dask.array<chunksize=(1439,), meta=np.ndarray>
    P2           (time) float32 dask.array<chunksize=(1439,), meta=np.ndarray>
    ...           ...
    date         (time) int32 dask.array<chunksize=(1439,), meta=np.ndarray>
    flag         (time) |S35 dask.array<chunksize=(1439,), meta=np.ndarray>
    history      (h_num) |S236 dask.array<chunksize=(50,), meta=np.ndarray>
    lat          (time) float32 dask.array<chunksize=(1439,), meta=np.ndarray>
    lon          (time) float32 dask.array<chunksize=(1439,), meta=np.ndarray>
    time_of_day  (time) int32 dask.array<chunksize=(1439,), meta=np.ndarray>
Attributes: (12/22)
    Cruise_id:                   Cruise_id undefined for now
    Data_modification_date:      01/12/2021 13:21:16 EST
    EXPOCODE:                    EXPOCODE undefined for now
    ID:                          ZCYL5
    IMO:                         007928677
    Metadata_modification_date:  01/12/2021 13:21:16 EST
    ...                          ...
    platform:                    unknown at this time
    platform_version:            unknown at this time
    receipt_order:               01
    site:                        FALKOR
    start_date_time:             2021/01/01 -- 00:00 UTC
    title:                       FALKOR Meteorological Data

But with pangeo-forge-recipes installed from this feature branch, running the recipe provided in #315 (comment) with the following edit

import pandas as pd
import xarray as xr
from pangeo_forge_recipes.patterns import ConcatDim, FilePattern
from pangeo_forge_recipes.recipes import XarrayZarrRecipe, setup_logging

def make_url(time):
    year=time.strftime('%Y')
    year_month_day = time.strftime('%Y%m%d')
    return(f'http://tds.coaps.fsu.edu/thredds/fileServer/samos/data/research/ZCYL5/{year}/ZCYL5_{year_month_day}v30001.nc')

dates = pd.date_range('2021-01-01','2021-01-03', freq='D')

time_concat_dim = ConcatDim("time", dates, nitems_per_file=1)

pattern = FilePattern(
    make_url,
    time_concat_dim,
+   file_type="netcdf3",   
)

recipe = XarrayZarrRecipe(pattern, inputs_per_chunk=30)

setup_logging()

recipe_pruned = recipe.copy_pruned()

run_function = recipe_pruned.to_function()

run_function()

errors with

Logs + Traceback
[03/09/22 10:13:15] INFO     Caching input 'Index({DimIndex(name='time', index=0, sequence_len=2,                          xarray_zarr.py:149
                             operation=<CombineOp.CONCAT: 2>)})'                                                                             
                    INFO     Caching file 'http://tds.coaps.fsu.edu/thredds/fileServer/samos/data/research/ZCYL5/2021/ZCYL5_20 storage.py:154
                             210101v30001.nc'                                                                                                
                    INFO     Copying remote file 'http://tds.coaps.fsu.edu/thredds/fileServer/samos/data/research/ZCYL5/2021/Z storage.py:165
                             CYL5_20210101v30001.nc' to cache                                                                                
[03/09/22 10:13:16] DEBUG    entering fs.open context manager for /var/folders/tt/4f941hdn0zq549zdwhcgg98c0000gn/T/tmptvm460p6 storage.py:122
                             /oiAAMfqs/777be2b9214151be7e2c4f211c36a334-http_tds.coaps.fsu.edu_thredds_fileserver_samos_data_r               
                             esearch_zcyl5_2021_zcyl5_20210101v30001.nc                                                                      
                    DEBUG    FSSpecTarget.open yielding <fsspec.implementations.local.LocalFileOpener object at 0x10bb8afa0>   storage.py:124
                    DEBUG    _copy_btw_filesystems total bytes copied: 305660                                                   storage.py:51
                    DEBUG    avg throughput over 0.01 min: 0.69 MB/sec                                                          storage.py:52
                    DEBUG    FSSpecTarget.open yielded                                                                         storage.py:126
                    DEBUG    _copy_btw_filesystems done                                                                         storage.py:56
                    INFO     Caching input 'Index({DimIndex(name='time', index=1, sequence_len=2,                          xarray_zarr.py:149
                             operation=<CombineOp.CONCAT: 2>)})'                                                                             
                    INFO     Caching file 'http://tds.coaps.fsu.edu/thredds/fileServer/samos/data/research/ZCYL5/2021/ZCYL5_20 storage.py:154
                             210102v30001.nc'                                                                                                
                    INFO     Copying remote file 'http://tds.coaps.fsu.edu/thredds/fileServer/samos/data/research/ZCYL5/2021/Z storage.py:165
                             CYL5_20210102v30001.nc' to cache                                                                                
                    DEBUG    entering fs.open context manager for /var/folders/tt/4f941hdn0zq549zdwhcgg98c0000gn/T/tmptvm460p6 storage.py:122
                             /oiAAMfqs/8c0e3b3efe320f6d7e72b7b9c38c77e0-http_tds.coaps.fsu.edu_thredds_fileserver_samos_data_r               
                             esearch_zcyl5_2021_zcyl5_20210102v30001.nc                                                                      
                    DEBUG    FSSpecTarget.open yielding <fsspec.implementations.local.LocalFileOpener object at 0x17a8418e0>   storage.py:124
                    DEBUG    _copy_btw_filesystems total bytes copied: 305868                                                   storage.py:51
                    DEBUG    avg throughput over 0.00 min: 1.65 MB/sec                                                          storage.py:52
                    DEBUG    FSSpecTarget.open yielded                                                                         storage.py:126
                    DEBUG    _copy_btw_filesystems done                                                                         storage.py:56
/Users/charlesstern/Dropbox/pangeo/pangeo-forge-recipes/pangeo_forge_recipes/recipes/xarray_zarr.py:111: RuntimeWarning: Failed to open Zarr store with consolidated metadata, falling back to try reading non-consolidated metadata. This is typically much slower for opening a dataset. To silence this warning, consider:
1. Consolidating metadata in this existing store with zarr.consolidate_metadata().
2. Explicitly setting consolidated=False, to avoid trying to read consolidate metadata, or
3. Explicitly setting consolidated=True, to raise an error in this case instead of falling back to try reading non-consolidated metadata.
  return xr.open_zarr(target.get_mapper())
                    INFO     Creating a new dataset in target                                                              xarray_zarr.py:452
                    INFO     Opening inputs for chunk Index({DimIndex(name='time', index=0, sequence_len=1,                xarray_zarr.py:334
                             operation=<CombineOp.CONCAT: 2>)})                                                                              
                    INFO     Opening input with Xarray Index({DimIndex(name='time', index=0, sequence_len=2,               xarray_zarr.py:249
                             operation=<CombineOp.CONCAT: 2>)}): 'http://tds.coaps.fsu.edu/thredds/fileServer/samos/data/r                   
                             esearch/ZCYL5/2021/ZCYL5_20210101v30001.nc'                                                                     
                    INFO     Opening 'http://tds.coaps.fsu.edu/thredds/fileServer/samos/data/research/ZCYL5/2021/ZCYL5_2021010 storage.py:260
                             1v30001.nc' from cache                                                                                          
                    DEBUG    file_opener entering first context for <contextlib._GeneratorContextManager object at             storage.py:275
                             0x10bc96790>                                                                                                    
                    DEBUG    entering fs.open context manager for /var/folders/tt/4f941hdn0zq549zdwhcgg98c0000gn/T/tmptvm460p6 storage.py:122
                             /oiAAMfqs/777be2b9214151be7e2c4f211c36a334-http_tds.coaps.fsu.edu_thredds_fileserver_samos_data_r               
                             esearch_zcyl5_2021_zcyl5_20210101v30001.nc                                                                      
                    DEBUG    FSSpecTarget.open yielding <fsspec.implementations.local.LocalFileOpener object at 0x17be41e50>   storage.py:124
                    DEBUG    file_opener entering second context for <fsspec.implementations.local.LocalFileOpener object at   storage.py:277
                             0x17be41e50>                                                                                                    
                    DEBUG    about to enter xr.open_dataset context on <fsspec.implementations.local.LocalFileOpener       xarray_zarr.py:303
                             object at 0x17be41e50>                                                                                          
                    DEBUG    successfully opened dataset                                                                   xarray_zarr.py:305
                    DEBUG    <xarray.Dataset>                                                                              xarray_zarr.py:315
                             Dimensions:      (time: 1439, h_num: 50)                                                                        
                             Coordinates:                                                                                                    
                               * time         (time) datetime64[ns] 2021-01-01 ... 2021-01-01T23:59:00                                       
                             Dimensions without coordinates: h_num                                                                           
                             Data variables: (12/38)                                                                                         
                                 lat          (time) float32 ...                                                                             
                                 lon          (time) float32 ...                                                                             
                                 PL_HD        (time) float32 ...                                                                             
                                 PL_CRS       (time) float32 ...                                                                             
                                 DIR          (time) float32 ...                                                                             
                                 DIR2         (time) float32 ...                                                                             
                                 ...           ...                                                                                           
                                 RAD_PAR      (time) float32 ...                                                                             
                                 RAD_PAR2     (time) float32 ...                                                                             
                                 date         (time) int32 ...                                                                               
                                 time_of_day  (time) int32 ...                                                                               
                                 flag         (time) |S35 ...                                                                                
                                 history      (h_num) |S236 ...                                                                              
                             Attributes: (12/22)                                                                                             
                                 title:                       FALKOR Meteorological Data                                                     
                                 site:                        FALKOR                                                                         
                                 elev:                        0                                                                              
                                 ID:                          ZCYL5                                                                          
                                 IMO:                         007928677                                                                      
                                 platform:                    unknown at this time                                                           
                                 ...                          ...                                                                            
                                 Cruise_id:                   Cruise_id undefined for now                                                    
                                 Data_modification_date:      01/12/2021 13:21:16 EST                                                        
                                 Metadata_modification_date:  01/12/2021 13:21:16 EST                                                        
                                 metadata_retrieved_from:     ZCYL5_20210101v10001.nc                                                        
                                 files_merged:                [ZCYL5_20210101v10001.nc]                                                      
                                 merger_version:              v001                                                                           
                    INFO     Opening input with Xarray Index({DimIndex(name='time', index=1, sequence_len=2,               xarray_zarr.py:249
                             operation=<CombineOp.CONCAT: 2>)}): 'http://tds.coaps.fsu.edu/thredds/fileServer/samos/data/r                   
                             esearch/ZCYL5/2021/ZCYL5_20210102v30001.nc'                                                                     
                    INFO     Opening 'http://tds.coaps.fsu.edu/thredds/fileServer/samos/data/research/ZCYL5/2021/ZCYL5_2021010 storage.py:260
                             2v30001.nc' from cache                                                                                          
                    DEBUG    file_opener entering first context for <contextlib._GeneratorContextManager object at             storage.py:275
                             0x17be41040>                                                                                                    
                    DEBUG    entering fs.open context manager for /var/folders/tt/4f941hdn0zq549zdwhcgg98c0000gn/T/tmptvm460p6 storage.py:122
                             /oiAAMfqs/8c0e3b3efe320f6d7e72b7b9c38c77e0-http_tds.coaps.fsu.edu_thredds_fileserver_samos_data_r               
                             esearch_zcyl5_2021_zcyl5_20210102v30001.nc                                                                      
                    DEBUG    FSSpecTarget.open yielding <fsspec.implementations.local.LocalFileOpener object at 0x17be93f70>   storage.py:124
                    DEBUG    file_opener entering second context for <fsspec.implementations.local.LocalFileOpener object at   storage.py:277
                             0x17be93f70>                                                                                                    
                    DEBUG    about to enter xr.open_dataset context on <fsspec.implementations.local.LocalFileOpener       xarray_zarr.py:303
                             object at 0x17be93f70>                                                                                          
                    DEBUG    successfully opened dataset                                                                   xarray_zarr.py:305
                    DEBUG    <xarray.Dataset>                                                                              xarray_zarr.py:315
                             Dimensions:      (time: 1440, h_num: 50)                                                                        
                             Coordinates:                                                                                                    
                               * time         (time) datetime64[ns] 2021-01-02 ... 2021-01-02T23:59:00                                       
                             Dimensions without coordinates: h_num                                                                           
                             Data variables: (12/38)                                                                                         
                                 lat          (time) float32 ...                                                                             
                                 lon          (time) float32 ...                                                                             
                                 PL_HD        (time) float32 ...                                                                             
                                 PL_CRS       (time) float32 ...                                                                             
                                 DIR          (time) float32 ...                                                                             
                                 DIR2         (time) float32 ...                                                                             
                                 ...           ...                                                                                           
                                 RAD_PAR      (time) float32 ...                                                                             
                                 RAD_PAR2     (time) float32 ...                                                                             
                                 date         (time) int32 ...                                                                               
                                 time_of_day  (time) int32 ...                                                                               
                                 flag         (time) |S35 ...                                                                                
                                 history      (h_num) |S236 ...                                                                              
                             Attributes: (12/22)                                                                                             
                                 title:                       FALKOR Meteorological Data                                                     
                                 site:                        FALKOR                                                                         
                                 elev:                        0                                                                              
                                 ID:                          ZCYL5                                                                          
                                 IMO:                         007928677                                                                      
                                 platform:                    unknown at this time                                                           
                                 ...                          ...                                                                            
                                 Cruise_id:                   Cruise_id undefined for now                                                    
                                 Data_modification_date:      01/12/2021 13:41:04 EST                                                        
                                 Metadata_modification_date:  01/12/2021 13:41:04 EST                                                        
                                 metadata_retrieved_from:     ZCYL5_20210102v10002.nc                                                        
                                 files_merged:                [ZCYL5_20210102v10001.nc, ZCYL5_20210102v100...                                
                                 merger_version:              v001                                                                           
                    INFO     Combining inputs for chunk 'Index({DimIndex(name='time', index=0, sequence_len=1,             xarray_zarr.py:352
                             operation=<CombineOp.CONCAT: 2>)})'                                                                             
[03/09/22 10:13:17] DEBUG    <xarray.Dataset>                                                                              xarray_zarr.py:368
                             Dimensions:      (time: 2879, h_num: 50)                                                                        
                             Coordinates:                                                                                                    
                               * time         (time) datetime64[ns] 2021-01-01 ... 2021-01-02T23:59:00                                       
                             Dimensions without coordinates: h_num                                                                           
                             Data variables: (12/38)                                                                                         
                                 lat          (time) float32 dask.array<chunksize=(1439,), meta=np.ndarray>                                  
                                 lon          (time) float32 dask.array<chunksize=(1439,), meta=np.ndarray>                                  
                                 PL_HD        (time) float32 dask.array<chunksize=(1439,), meta=np.ndarray>                                  
                                 PL_CRS       (time) float32 dask.array<chunksize=(1439,), meta=np.ndarray>                                  
                                 DIR          (time) float32 dask.array<chunksize=(1439,), meta=np.ndarray>                                  
                                 DIR2         (time) float32 dask.array<chunksize=(1439,), meta=np.ndarray>                                  
                                 ...           ...                                                                                           
                                 RAD_PAR      (time) float32 dask.array<chunksize=(1439,), meta=np.ndarray>                                  
                                 RAD_PAR2     (time) float32 dask.array<chunksize=(1439,), meta=np.ndarray>                                  
                                 date         (time) int32 dask.array<chunksize=(1439,), meta=np.ndarray>                                    
                                 time_of_day  (time) int32 dask.array<chunksize=(1439,), meta=np.ndarray>                                    
                                 flag         (time) |S35 dask.array<chunksize=(1439,), meta=np.ndarray>                                     
                                 history      (time, h_num) |S236 dask.array<chunksize=(1439, 50), meta=np.ndarray>                          
                             Attributes: (12/22)                                                                                             
                                 title:                       FALKOR Meteorological Data                                                     
                                 site:                        FALKOR                                                                         
                                 elev:                        0                                                                              
                                 ID:                          ZCYL5                                                                          
                                 IMO:                         007928677                                                                      
                                 platform:                    unknown at this time                                                           
                                 ...                          ...                                                                            
                                 Cruise_id:                   Cruise_id undefined for now                                                    
                                 Data_modification_date:      01/12/2021 13:21:16 EST                                                        
                                 Metadata_modification_date:  01/12/2021 13:21:16 EST                                                        
                                 metadata_retrieved_from:     ZCYL5_20210101v10001.nc                                                        
                                 files_merged:                [ZCYL5_20210101v10001.nc]                                                      
                                 merger_version:              v001                                                                           
                    DEBUG    Setting variable time encoding chunks to (2879,)                                              xarray_zarr.py:482
                    DEBUG    Setting variable lat encoding chunks to (2879,)                                               xarray_zarr.py:482
                    DEBUG    Setting variable lon encoding chunks to (2879,)                                               xarray_zarr.py:482
                    DEBUG    Setting variable PL_HD encoding chunks to (2879,)                                             xarray_zarr.py:482
                    DEBUG    Setting variable PL_CRS encoding chunks to (2879,)                                            xarray_zarr.py:482
                    DEBUG    Setting variable DIR encoding chunks to (2879,)                                               xarray_zarr.py:482
                    DEBUG    Setting variable DIR2 encoding chunks to (2879,)                                              xarray_zarr.py:482
                    DEBUG    Setting variable DIR3 encoding chunks to (2879,)                                              xarray_zarr.py:482
                    DEBUG    Setting variable PL_WDIR encoding chunks to (2879,)                                           xarray_zarr.py:482
                    DEBUG    Setting variable PL_WDIR2 encoding chunks to (2879,)                                          xarray_zarr.py:482
                    DEBUG    Setting variable PL_WDIR3 encoding chunks to (2879,)                                          xarray_zarr.py:482
                    DEBUG    Setting variable PL_SPD encoding chunks to (2879,)                                            xarray_zarr.py:482
                    DEBUG    Setting variable SPD encoding chunks to (2879,)                                               xarray_zarr.py:482
                    DEBUG    Setting variable SPD2 encoding chunks to (2879,)                                              xarray_zarr.py:482
                    DEBUG    Setting variable SPD3 encoding chunks to (2879,)                                              xarray_zarr.py:482
                    DEBUG    Setting variable PL_WSPD encoding chunks to (2879,)                                           xarray_zarr.py:482
                    DEBUG    Setting variable PL_WSPD2 encoding chunks to (2879,)                                          xarray_zarr.py:482
                    DEBUG    Setting variable PL_WSPD3 encoding chunks to (2879,)                                          xarray_zarr.py:482
                    DEBUG    Setting variable P encoding chunks to (2879,)                                                 xarray_zarr.py:482
                    DEBUG    Setting variable P2 encoding chunks to (2879,)                                                xarray_zarr.py:482
                    DEBUG    Setting variable P3 encoding chunks to (2879,)                                                xarray_zarr.py:482
                    DEBUG    Setting variable T encoding chunks to (2879,)                                                 xarray_zarr.py:482
                    DEBUG    Setting variable T2 encoding chunks to (2879,)                                                xarray_zarr.py:482
                    DEBUG    Setting variable T3 encoding chunks to (2879,)                                                xarray_zarr.py:482
                    DEBUG    Setting variable RH encoding chunks to (2879,)                                                xarray_zarr.py:482
                    DEBUG    Setting variable RH2 encoding chunks to (2879,)                                               xarray_zarr.py:482
                    DEBUG    Setting variable RH3 encoding chunks to (2879,)                                               xarray_zarr.py:482
                    DEBUG    Setting variable TS encoding chunks to (2879,)                                                xarray_zarr.py:482
                    DEBUG    Setting variable TS2 encoding chunks to (2879,)                                               xarray_zarr.py:482
                    DEBUG    Setting variable SSPS encoding chunks to (2879,)                                              xarray_zarr.py:482
                    DEBUG    Setting variable CNDC encoding chunks to (2879,)                                              xarray_zarr.py:482
                    DEBUG    Setting variable RAD_SW encoding chunks to (2879,)                                            xarray_zarr.py:482
                    DEBUG    Setting variable RAD_LW encoding chunks to (2879,)                                            xarray_zarr.py:482
                    DEBUG    Setting variable RAD_PAR encoding chunks to (2879,)                                           xarray_zarr.py:482
                    DEBUG    Setting variable RAD_PAR2 encoding chunks to (2879,)                                          xarray_zarr.py:482
                    DEBUG    Setting variable date encoding chunks to (2879,)                                              xarray_zarr.py:482
                    DEBUG    Setting variable time_of_day encoding chunks to (2879,)                                       xarray_zarr.py:482
                    DEBUG    Setting variable flag encoding chunks to (2879,)                                              xarray_zarr.py:482
                    DEBUG    Setting variable history encoding chunks to (2879, 50)                                        xarray_zarr.py:482
                    INFO     Storing dataset in /var/folders/tt/4f941hdn0zq549zdwhcgg98c0000gn/T/tmptvm460p6/xtIlMDcU      xarray_zarr.py:494
                    DEBUG    <xarray.Dataset>                                                                              xarray_zarr.py:495
                             Dimensions:      (time: 2879, h_num: 50)                                                                        
                             Coordinates:                                                                                                    
                               * time         (time) datetime64[ns] 2021-01-01 ... 2021-01-02T23:59:00                                       
                             Dimensions without coordinates: h_num                                                                           
                             Data variables: (12/38)                                                                                         
                                 lat          (time) float32 dask.array<chunksize=(1439,), meta=np.ndarray>                                  
                                 lon          (time) float32 dask.array<chunksize=(1439,), meta=np.ndarray>                                  
                                 PL_HD        (time) float32 dask.array<chunksize=(1439,), meta=np.ndarray>                                  
                                 PL_CRS       (time) float32 dask.array<chunksize=(1439,), meta=np.ndarray>                                  
                                 DIR          (time) float32 dask.array<chunksize=(1439,), meta=np.ndarray>                                  
                                 DIR2         (time) float32 dask.array<chunksize=(1439,), meta=np.ndarray>                                  
                                 ...           ...                                                                                           
                                 RAD_PAR      (time) float32 dask.array<chunksize=(1439,), meta=np.ndarray>                                  
                                 RAD_PAR2     (time) float32 dask.array<chunksize=(1439,), meta=np.ndarray>                                  
                                 date         (time) int32 dask.array<chunksize=(1439,), meta=np.ndarray>                                    
                                 time_of_day  (time) int32 dask.array<chunksize=(1439,), meta=np.ndarray>                                    
                                 flag         (time) |S35 dask.array<chunksize=(1439,), meta=np.ndarray>                                     
                                 history      (time, h_num) |S236 dask.array<chunksize=(1439, 50), meta=np.ndarray>                          
                             Attributes: (12/22)                                                                                             
                                 title:                       FALKOR Meteorological Data                                                     
                                 site:                        FALKOR                                                                         
                                 elev:                        0                                                                              
                                 ID:                          ZCYL5                                                                          
                                 IMO:                         007928677                                                                      
                                 platform:                    unknown at this time                                                           
                                 ...                          ...                                                                            
                                 Cruise_id:                   Cruise_id undefined for now                                                    
                                 Data_modification_date:      01/12/2021 13:21:16 EST                                                        
                                 Metadata_modification_date:  01/12/2021 13:21:16 EST                                                        
                                 metadata_retrieved_from:     ZCYL5_20210101v10001.nc                                                        
                                 files_merged:                [ZCYL5_20210101v10001.nc]                                                      
                                 merger_version:              v001                                                                           
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
File ~/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/fsspec/mapping.py:135, in FSMap.__getitem__(self, key, default)
    134 try:
--> 135     result = self.fs.cat(k)
    136 except self.missing_exceptions:

File ~/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/fsspec/spec.py:739, in AbstractFileSystem.cat(self, path, recursive, on_error, **kwargs)
    738 else:
--> 739     return self.cat_file(paths[0], **kwargs)

File ~/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/fsspec/spec.py:649, in AbstractFileSystem.cat_file(self, path, start, end, **kwargs)
    648 # explicitly set buffering off?
--> 649 with self.open(path, "rb", **kwargs) as f:
    650     if start is not None:

File ~/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/fsspec/spec.py:1009, in AbstractFileSystem.open(self, path, mode, block_size, cache_options, compression, **kwargs)
   1008 ac = kwargs.pop("autocommit", not self._intrans)
-> 1009 f = self._open(
   1010     path,
   1011     mode=mode,
   1012     block_size=block_size,
   1013     autocommit=ac,
   1014     cache_options=cache_options,
   1015     **kwargs,
   1016 )
   1017 if compression is not None:

File ~/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/fsspec/implementations/local.py:155, in LocalFileSystem._open(self, path, mode, block_size, **kwargs)
    154     self.makedirs(self._parent(path), exist_ok=True)
--> 155 return LocalFileOpener(path, mode, fs=self, **kwargs)

File ~/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/fsspec/implementations/local.py:250, in LocalFileOpener.__init__(self, path, mode, autocommit, fs, compression, **kwargs)
    249 self.blocksize = io.DEFAULT_BUFFER_SIZE
--> 250 self._open()

File ~/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/fsspec/implementations/local.py:255, in LocalFileOpener._open(self)
    254 if self.autocommit or "w" not in self.mode:
--> 255     self.f = open(self.path, mode=self.mode)
    256     if self.compression:

FileNotFoundError: [Errno 2] No such file or directory: '/var/folders/tt/4f941hdn0zq549zdwhcgg98c0000gn/T/tmptvm460p6/xtIlMDcU/.zmetadata'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
File ~/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/xarray/backends/zarr.py:348, in ZarrStore.open_group(cls, store, mode, synchronizer, group, consolidated, consolidate_on_close, chunk_store, storage_options, append_dim, write_region, safe_chunks, stacklevel)
    347 try:
--> 348     zarr_group = zarr.open_consolidated(store, **open_kwargs)
    349 except KeyError:

File ~/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/zarr/convenience.py:1188, in open_consolidated(store, metadata_key, mode, **kwargs)
   1187 # setup metadata store
-> 1188 meta_store = ConsolidatedMetadataStore(store, metadata_key=metadata_key)
   1190 # pass through

File ~/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/zarr/storage.py:2645, in ConsolidatedMetadataStore.__init__(self, store, metadata_key)
   2644 # retrieve consolidated metadata
-> 2645 meta = json_loads(store[metadata_key])
   2647 # check format of consolidated metadata

File ~/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/zarr/storage.py:546, in KVStore.__getitem__(self, key)
    545 def __getitem__(self, key):
--> 546     return self._mutable_mapping[key]

File ~/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/fsspec/mapping.py:139, in FSMap.__getitem__(self, key, default)
    138         return default
--> 139     raise KeyError(key)
    140 return result

KeyError: '.zmetadata'

During handling of the above exception, another exception occurred:

GroupNotFoundError                        Traceback (most recent call last)
File ~/Dropbox/pangeo/pangeo-forge-recipes/pangeo_forge_recipes/recipes/xarray_zarr.py:443, in prepare_target(config)
    442 try:
--> 443     ds = open_target(config.storage_config.target)
    444     logger.info("Found an existing dataset in target")

File ~/Dropbox/pangeo/pangeo-forge-recipes/pangeo_forge_recipes/recipes/xarray_zarr.py:111, in open_target(target)
    110 def open_target(target: FSSpecTarget) -> xr.Dataset:
--> 111     return xr.open_zarr(target.get_mapper())

File ~/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/xarray/backends/zarr.py:752, in open_zarr(store, group, synchronizer, chunks, decode_cf, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, consolidated, overwrite_encoded_chunks, chunk_store, storage_options, decode_timedelta, use_cftime, **kwargs)
    743 backend_kwargs = {
    744     "synchronizer": synchronizer,
    745     "consolidated": consolidated,
   (...)
    749     "stacklevel": 4,
    750 }
--> 752 ds = open_dataset(
    753     filename_or_obj=store,
    754     group=group,
    755     decode_cf=decode_cf,
    756     mask_and_scale=mask_and_scale,
    757     decode_times=decode_times,
    758     concat_characters=concat_characters,
    759     decode_coords=decode_coords,
    760     engine="zarr",
    761     chunks=chunks,
    762     drop_variables=drop_variables,
    763     backend_kwargs=backend_kwargs,
    764     decode_timedelta=decode_timedelta,
    765     use_cftime=use_cftime,
    766 )
    767 return ds

File ~/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/xarray/backends/api.py:495, in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, backend_kwargs, *args, **kwargs)
    494 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None)
--> 495 backend_ds = backend.open_dataset(
    496     filename_or_obj,
    497     drop_variables=drop_variables,
    498     **decoders,
    499     **kwargs,
    500 )
    501 ds = _dataset_from_backend_dataset(
    502     backend_ds,
    503     filename_or_obj,
   (...)
    510     **kwargs,
    511 )

File ~/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/xarray/backends/zarr.py:800, in ZarrBackendEntrypoint.open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, group, mode, synchronizer, consolidated, chunk_store, storage_options, stacklevel)
    799 filename_or_obj = _normalize_path(filename_or_obj)
--> 800 store = ZarrStore.open_group(
    801     filename_or_obj,
    802     group=group,
    803     mode=mode,
    804     synchronizer=synchronizer,
    805     consolidated=consolidated,
    806     consolidate_on_close=False,
    807     chunk_store=chunk_store,
    808     storage_options=storage_options,
    809     stacklevel=stacklevel + 1,
    810 )
    812 store_entrypoint = StoreBackendEntrypoint()

File ~/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/xarray/backends/zarr.py:365, in ZarrStore.open_group(cls, store, mode, synchronizer, group, consolidated, consolidate_on_close, chunk_store, storage_options, append_dim, write_region, safe_chunks, stacklevel)
    350         warnings.warn(
    351             "Failed to open Zarr store with consolidated metadata, "
    352             "falling back to try reading non-consolidated metadata. "
   (...)
    363             stacklevel=stacklevel,
    364         )
--> 365         zarr_group = zarr.open_group(store, **open_kwargs)
    366 elif consolidated:
    367     # TODO: an option to pass the metadata_key keyword

File ~/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/zarr/hierarchy.py:1182, in open_group(store, mode, cache_attrs, synchronizer, path, chunk_store, storage_options)
   1181             raise ContainsArrayError(path)
-> 1182         raise GroupNotFoundError(path)
   1184 elif mode == 'w':

GroupNotFoundError: group not found at path ''

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
Input In [2], in <cell line: 29>()
     25 recipe_pruned = recipe.copy_pruned()
     27 run_function = recipe_pruned.to_function()
---> 29 run_function()

File ~/Dropbox/pangeo/pangeo-forge-recipes/pangeo_forge_recipes/executors/python.py:46, in FunctionPipelineExecutor.compile.<locals>.function()
     44         stage.function(m, config=pipeline.config)
     45 else:
---> 46     stage.function(config=pipeline.config)

File ~/Dropbox/pangeo/pangeo-forge-recipes/pangeo_forge_recipes/recipes/xarray_zarr.py:500, in prepare_target(config)
    496             with warnings.catch_warnings():
    497                 warnings.simplefilter(
    498                     "ignore"
    499                 )  # suppress the warning that comes with safe_chunks
--> 500                 ds.to_zarr(target_mapper, mode="a", compute=False, safe_chunks=False)
    502 # Regardless of whether there is an existing dataset or we are creating a new one,
    503 # we need to expand the concat_dim to hold the entire expected size of the data
    504 input_sequence_lens = calculate_sequence_lens(
    505     config.nitems_per_input, config.file_pattern, config.storage_config.metadata,
    506 )

File ~/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/xarray/core/dataset.py:2036, in Dataset.to_zarr(self, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks, storage_options)
   2033 if encoding is None:
   2034     encoding = {}
-> 2036 return to_zarr(
   2037     self,
   2038     store=store,
   2039     chunk_store=chunk_store,
   2040     storage_options=storage_options,
   2041     mode=mode,
   2042     synchronizer=synchronizer,
   2043     group=group,
   2044     encoding=encoding,
   2045     compute=compute,
   2046     consolidated=consolidated,
   2047     append_dim=append_dim,
   2048     region=region,
   2049     safe_chunks=safe_chunks,
   2050 )

File ~/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/xarray/backends/api.py:1406, in to_zarr(dataset, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks, storage_options)
   1391 zstore = backends.ZarrStore.open_group(
   1392     store=mapper,
   1393     mode=mode,
   (...)
   1402     stacklevel=4,  # for Dataset.to_zarr()
   1403 )
   1405 if mode in ["a", "r+"]:
-> 1406     _validate_datatypes_for_zarr_append(dataset)
   1407     if append_dim is not None:
   1408         existing_dims = zstore.get_dimensions()

File ~/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/xarray/backends/api.py:1301, in _validate_datatypes_for_zarr_append(dataset)
   1292         raise ValueError(
   1293             "Invalid dtype for data variable: {} "
   1294             "dtype must be a subtype of number, "
   (...)
   1297             "object".format(var)
   1298         )
   1300 for k in dataset.data_vars.values():
-> 1301     check_dtype(k)

File ~/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/xarray/backends/api.py:1292, in _validate_datatypes_for_zarr_append.<locals>.check_dtype(var)
   1283 def check_dtype(var):
   1284     if (
   1285         not np.issubdtype(var.dtype, np.number)
   1286         and not np.issubdtype(var.dtype, np.datetime64)
   (...)
   1290     ):
   1291         # and not re.match('^bytes[1-9]+$', var.dtype.name)):
-> 1292         raise ValueError(
   1293             "Invalid dtype for data variable: {} "
   1294             "dtype must be a subtype of number, "
   1295             "datetime, bool, a fixed sized string, "
   1296             "a fixed size unicode string or an "
   1297             "object".format(var)
   1298         )

ValueError: Invalid dtype for data variable: <xarray.DataArray 'flag' (time: 2879)>
dask.array<concatenate, shape=(2879,), dtype=|S35, chunksize=(1440,), chunktype=numpy.ndarray>
Coordinates:
  * time     (time) datetime64[ns] 2021-01-01 ... 2021-01-02T23:59:00
Attributes:
    long_name:                quality control flags
    A:                        Units added
    B:                        Data out of range
    C:                        Non-sequential time
    D:                        Failed T>=Tw>=Td
    E:                        True wind error
    F:                        Velocity unrealistic
    G:                        Value > 4 s. d. from climatology
    H:                        Discontinuity
    I:                        Interesting feature
    J:                        Erroneous
    K:                        Suspect - visual
    L:                        Ocean platform over land
    M:                        Instrument malfunction
    N:                        In Port
    O:                        Multiple original units
    P:                        Movement uncertain
    Q:                        Pre-flagged as suspect
    R:                        Interpolated data
    S:                        Spike - visual
    T:                        Time duplicate
    U:                        Suspect - statistial
    V:                        Spike - statistical
    X:                        Step - statistical
    Y:                        Suspect between X-flags
    Z:                        Good data
    metadata_retrieved_from:  ZCYL5_20210101v10001.nc dtype must be a subtype of number, datetime, bool, a fixed sized string, a fixed size unicode string or an object

@rabernat
Copy link
Contributor

rabernat commented Mar 9, 2022

The zarr errors are a red herring.The problem is that xarray cannot open and decode the file because it has an invalid dtype. Edit: that was wrong.

@cisaacstern
Copy link
Member Author

Why does it work outside pangeo-forge-recipes with the simplified ds = xr.open_dataset(path); ds.to_zarr() ... because the zarr store has not be pre-initialized?

And also, in terms of a solution, this would be resolved by fixing the dtype with an XarrayZarrRecipe.process_input callable?

@cisaacstern
Copy link
Member Author

xref pydata/xarray#6345

@cisaacstern
Copy link
Member Author

With 154fa6a, I've added descriptive errors so future users with mismatched xarray backend + FilePattern.file_type configurations are directed how to resolve their configuration issues. For the case of the motivating recipe from #315 (comment), if run ...

  • ... exactly as provided. The error now tells the user to pass file_type="netcdf3":

    OSError: Unable to open file /var/folders/tt/4f941hdn0zq549zdwhcgg98c0000gn/T/tmpdbit5d22/anv73EoN/777be2b9214151be7e2c4f211c36a334-http_tds.coaps.fsu.edu_thredds_fileserver_samos_data_research_zcyl5_2021_zcyl5_20210101v30001.nc with `{engine: h5netcdf}`, which was set automatically based on the fact that `FilePattern.file_type` is using the default value of 'netcdf4'. It seems likely that this input file is in NetCDF3 format. If that is the case, please re-instantiate your `FilePattern` with `FilePattern(..., file_type="netcdf3")`.
  • ... with just file_type='netcdf3' added (the desired configuration, following suggestion from first error), no error related to the xarray backend is raised, but for this particular recipe we then hit the unrelated issue being tracked in to_zarr raises ValueError: Invalid dtype with mode='a' (but not with mode='w') pydata/xarray#6345

Some other failure modes:

With the motivating recipe...

  • ... as provided plus xarray_open_kwargs=dict(engine="h5netcdf"). The error tells the user to remove those kwargs:

    OSError: Unable to open file /var/folders/tt/4f941hdn0zq549zdwhcgg98c0000gn/T/tmpdbit5d22/MOF4vvgt/777be2b9214151be7e2c4f211c36a334-http_tds.coaps.fsu.edu_thredds_fileserver_samos_data_research_zcyl5_2021_zcyl5_20210101v30001.nc with `{engine: h5netcdf}`, which was set explicitly via `xarray_open_kwargs`. Please remove `{engine: h5netcdf}` from `xarray_open_kwargs`.
  • ... as provided plus xarray_open_kwargs=dict(engine="scipy"). The error tells the user that this is mismatched with the default file_type:

    ValueError: pangeo-forge-recipes will automatically set the xarray backend for files of type 'netcdf4' to '{'engine': 'h5netcdf'}', which is different from the value you have passed via `xarray_open_kwargs`. If this input file is actually of type 'netcdf4', please remove `{'engine': 'scipy'}` from `xarray_open_kwargs`. If this input file is not of type 'netcdf4', please update this recipe by passing a different value to `FilePattern.file_type`.
  • ... with file_type='netcdf3' and xarray_open_kwargs=dict(engine="h5netcdf"). Error notes the incompatibility:

    ValueError: pangeo-forge-recipes will automatically set the xarray backend for files of type 'netcdf3' to '{'engine': 'scipy'}', which is different from the value you have passed via `xarray_open_kwargs`. If this input file is actually of type 'netcdf3', please remove `{'engine': 'h5netcdf'}` from `xarray_open_kwargs`. If this input file is not of type 'netcdf3', please update this recipe by passing a different value to `FilePattern.file_type`.
  • ... with file_type='netcdf3' and xarray_open_kwargs=dict(engine="scipy"). No error but the redundancy is noted in a warning:

    /pangeo_forge_recipes/recipes/xarray_zarr.py:326: UserWarning: pangeo-forge-recipes will automatically set the xarray backend for files of type 'netcdf3' to '{'engine': 'scipy'}', which is the same value you have passed via `xarray_open_kwargs`. If this input file is actually of type 'netcdf3', you can remove `{'engine': 'scipy'}` from `xarray_open_kwargs`. If this input file is not of type 'netcdf3', please update this recipe by passing a different value to `FilePattern.file_type`.

@cisaacstern
Copy link
Member Author

@rabernat, things which are unresolved:

Things which IMO are resolved:

Look forward to your review.

Comment on lines -117 to -118
:param is_opendap: If True, assume all input fnames represent opendap endpoints.
Cannot be used with caching.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we wanted to be nice to our users, we would not just remove this but deprecate it. Now that we have a few users, do we want to be more conservative about breaking changes? Or do we just want to move fast and not worry about that.

@martindurant
Copy link
Contributor

kerchunk upstream-dev issue

Sorry, did I miss something?

@rabernat
Copy link
Contributor

Martin see #305.

Copy link
Contributor

@rabernat rabernat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

One more suggestion then LGTM.

Can we add a test in test_patterns that makes sure you can initialize a FilePattern with every valid FileType and that specifying an unsupported type raises an error?

Comment on lines 158 to 164
if is_opendap:
_deprecation_message = (
"`FilePattern(..., is_opendap=True)` will be deprecated in v0.9.0. "
"Please use `FilePattern(..., file_type='opendap')` instead."
)
warnings.warn(_deprecation_message, DeprecationWarning)
self.file_type = FileType("opendap")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In doing this I realized that we never actually followed through on

_deprecation_message = (
"This method will be deprecated in v0.8.0. "
"Please call the equivalent function directly from the xarray_zarr module."
)

# Below lie convience methods that help users develop and debug the recipe
# They will all be deprecated

but for clarity that's probably best as a separate PR.

@rabernat
Copy link
Contributor

/run-test-tutorials

@rabernat
Copy link
Contributor

Some of the tutorial notebooks appear to be failing as well. See https://github.com/pangeo-forge/pangeo-forge-recipes/actions/runs/1964614213

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@cisaacstern
Copy link
Member Author

/run-test-tutorials

@cisaacstern
Copy link
Member Author

Logging output makes notebook diffs hard to read, even with ReviewNB, so enumerating the tutorial notebook fixes here:

  • reference_cmip6.ipynb hadn't been updated for the storage_config refactor
  • netcdf_zarr_sequential.ipynb deliberately raises a FileNotFound error to demonstrate what happens if you don't cache inputs. I'm confused as to how (if?) this ever passed /run-test-tutorials. The fix was capturing and printing the str() representation of the error, but not raising it.
  • opendap_subset_recipe.ipynb actually caught a mistake in this PR: xarray doesn't need an engine kwarg for opendap inputs, therefore we should only set that keyword for file types which are keys of OPENER_MAP and otherwise skip that step.
  • terraclimate.ipynb has hitting some obscure UnicodeEncodeError: which appears to have something to do with the _repr_html_ for the dataset in that notebook. Using the text __repr__ works around that problem for now, but obviously I'm interested to reproduce this issue and raise it upstream once I know what's causing it.

Comment on lines 118 to 121
OPENER_MAP = {
FileType.netcdf3: dict(engine="scipy"),
FileType.netcdf4: dict(engine="h5netcdf"),
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final nit I promise! 😇

Can we move OPENER_MAP to xarray_zarr_recipe.py?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else:
with pytest.raises(ValueError) as excinfo:
fp = make_concat_merge_pattern(**file_type_kwargs)[0]
assert f"'{file_type_value}' is not a valid FileType" in str(excinfo.value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this assertion ever gets hit. You want to use the match= option in pytest.raises to check the error message.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cisaacstern
Copy link
Member Author

I'm confused as to why we are still seeing upstream-dev fail with

AttributeError: 'ReferenceFileSystem' object has no attribute 'fs'

when this was solved by fsspec/kerchunk#132 and we are installing kerchunk from GitHub.

@rabernat
Copy link
Contributor

Is it possible that the environment is cached? Can you double check the exact version / hash of kerchunk that is getting installed? Can you make the tests pass in a local env?

@cisaacstern
Copy link
Member Author

Is it possible that the environment is cached?

IIUC we never cache upstream dev versions

- name: 🧑‍💻 Maybe update to upstream dev versions
if: matrix.dependencies == 'upstream-dev'
run: mamba env update -n pangeo-forge-recipes -f ci/upstream-dev.yml

Can you double check the exact version / hash of kerchunk that is getting installed?

Referencing the latest 3.8 upstream dev env build here:

kerchunk                  0.0.5+54.g3e9de53          pypi_0    pypi

Does the trailing +54.g3e9de53 point us to a particular commit? I'm not clear on that.

Can you make the tests pass in a local env?

Can you clarify what you mean by this?

@cisaacstern
Copy link
Member Author

Can you make the tests pass in a local env?

Can you clarify what you mean by this? Sorry I misread this as "can you pass the tests a local env" ... I will check if the tests pass in a local env now.

@cisaacstern
Copy link
Member Author

/run-test-tutorials

@cisaacstern
Copy link
Member Author

All checks (including Tutorial Notebooks) pass with exception of upstream-dev, which fails due to kerchunk's MultiZarrToZarr signature change noted in #324 (comment). We can leave this broken until the next kerchunk release.

All other notes have been addressed here, so going to merge.

Noting that the only docs change I made was to update the release notes for 0.8.3 with a bullet about the new file_type attribute. This feature is fully backwards-compatible with 0.8.2, and we don't test against any input types aside from the default "netcdf4" and OPeNDAP, so I don't think we're leaving users under-informed. The OPeNDAP tutorial still uses the old is_opendap syntax (which is supported until deprecation in 0.9.0). I propose that we update this tutorial sometime just prior to the 0.9.0 release, as that will allow us to incorporate any other breaking changes that arise before then into that rewrite. Users who run it now will receive a DeprecationWarning when using the old syntax.

@cisaacstern cisaacstern merged commit 3379754 into pangeo-forge:master Mar 11, 2022
@rabernat rabernat mentioned this pull request Mar 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replace FilePattern.is_opendap with generalized FilePattern.file_type
3 participants