-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enable compact storage for netcdf-4 vars #1570
Changes from all commits
82df287
bb1f5e1
1a665b6
1a1f537
8599484
06896f4
89b8981
e43a5d9
90324df
66a2b4c
e75c248
fb4a209
19fef32
fd604dd
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -452,9 +452,25 @@ nc_def_var_fletcher32(int ncid, int varid, int fletcher32) | |
/** | ||
Define chunking parameters for a variable | ||
|
||
The function nc_def_var_chunking sets the chunking parameters for a | ||
variable in a netCDF-4 file. It can set the chunk sizes to get chunked | ||
storage, or it can set the contiguous flag to get contiguous storage. | ||
The function nc_def_var_chunking sets the storage and, optionally, | ||
the chunking parameters for a variable in a netCDF-4 file. | ||
|
||
The storage may be set to NC_CONTIGUOUS, NC_COMPACT, or NC_CHUNKED. | ||
|
||
Contiguous storage means the variable is stored as one block of | ||
data in the file. | ||
|
||
Compact storage means the variable is stored in the header record | ||
of the file. This can have large performance benefits on HPC system | ||
running many processors. Compact storage is only available for | ||
variables whose data are 64 KB or less. Attempting to turn on | ||
compact storage for a variable that is too large will result in the | ||
::NC_EVARSIZE error. | ||
|
||
Chunked storage means the data are stored as chunks, of | ||
user-configurable size. Chunked storage is required for variable | ||
with one or more unlimted dimensions, or variable which use | ||
compression. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. May want to also document that each mpi rank must output the same data to the variable if compact storage is used. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you elaborate further? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Never mind. I was confused. The datasets that I can declare as compact in Exodus are all "metadata" types that are the same on all ranks, but that isn't a requirement on the HDF5 side; confused myself (and others). |
||
|
||
The total size of a chunk must be less than 4 GiB. That is, the | ||
product of all chunksizes and the size of the data (or the size of | ||
|
@@ -467,20 +483,21 @@ nc_def_var_fletcher32(int ncid, int varid, int fletcher32) | |
Note that this does not work for scalar variables. Only non-scalar | ||
variables can have chunking. | ||
|
||
@param[in] ncid NetCDF ID, from a previous call to nc_open or | ||
nc_create. | ||
@param ncid NetCDF ID, from a previous call to nc_open() or | ||
nc_create(). | ||
|
||
@param[in] varid Variable ID. | ||
@param varid Variable ID. | ||
|
||
@param[in] storage If ::NC_CONTIGUOUS, then contiguous storage is used | ||
for this variable. Variables with one or more unlimited dimensions | ||
cannot use contiguous storage. If contiguous storage is turned on, the | ||
chunksizes parameter is ignored. If ::NC_CHUNKED, then chunked storage | ||
is used for this variable. Chunk sizes may be specified with the | ||
chunksizes parameter or default sizes will be used if that parameter | ||
is NULL. | ||
@param storage If ::NC_CONTIGUOUS or ::NC_COMPACT, then contiguous | ||
or compact storage is used for this variable. Variables with one or | ||
more unlimited dimensions cannot use contiguous or compact | ||
storage. If contiguous or compact storage is turned on, the | ||
chunksizes parameter is ignored. If ::NC_CHUNKED, then chunked | ||
storage is used for this variable. Chunk sizes may be specified | ||
with the chunksizes parameter or default sizes will be used if that | ||
parameter is NULL. | ||
|
||
@param[in] chunksizesp A pointer to an array list of chunk sizes. The | ||
@param chunksizesp A pointer to an array list of chunk sizes. The | ||
array must have one chunksize for each dimension of the variable. If | ||
::NC_CONTIGUOUS storage is set, then the chunksizes parameter is | ||
ignored. | ||
|
@@ -500,6 +517,10 @@ nc_def_var_fletcher32(int ncid, int varid, int fletcher32) | |
@return ::NC_EBADCHUNK Returns if the chunk size specified for a | ||
variable is larger than the length of the dimensions associated with | ||
variable. | ||
@return ::NC_EVARSIZE Compact storage attempted for variable bigger | ||
than 64 KB. | ||
@return ::NC_EINVAL Attempt to set contiguous or compact storage | ||
for var with one or more unlimited dimensions. | ||
|
||
@section nc_def_var_chunking_example Example | ||
|
||
|
@@ -539,6 +560,7 @@ nc_def_var_fletcher32(int ncid, int varid, int fletcher32) | |
if (chunksize[d] != chunksize_in[d]) ERR; | ||
if (storage_in != NC_CHUNKED) ERR; | ||
@endcode | ||
@author Ed Hartnett, Dennis Heimbigner | ||
*/ | ||
int | ||
nc_def_var_chunking(int ncid, int varid, int storage, | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -22,6 +22,9 @@ | |
* order. */ | ||
#define NC_TEMP_NAME "_netcdf4_temporary_variable_name_for_rename" | ||
|
||
/** Number of bytes in 64 KB. */ | ||
#define SIXTY_FOUR_KB (65536) | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The limit for a compact data set is 64 KiB, not 64 MiB There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks, I will fix. |
||
#ifdef LOGGING | ||
/** | ||
* Report the chunksizes selected for a variable. | ||
|
@@ -707,41 +710,65 @@ nc_def_var_extra(int ncid, int varid, int *shuffle, int *deflate, | |
var->contiguous = NC_FALSE; | ||
} | ||
|
||
/* Does the user want a contiguous dataset? Not so fast! Make sure | ||
* that there are no unlimited dimensions, and no filters in use | ||
* for this data. */ | ||
if (contiguous && *contiguous) | ||
/* Handle storage settings. */ | ||
if (contiguous) | ||
{ | ||
if (var->deflate || var->fletcher32 || var->shuffle) | ||
return NC_EINVAL; | ||
|
||
for (d = 0; d < var->ndims; d++) | ||
if (var->dim[d]->unlimited) | ||
/* Does the user want a contiguous or compact dataset? Not so | ||
* fast! Make sure that there are no unlimited dimensions, and | ||
* no filters in use for this data. */ | ||
if (*contiguous) | ||
{ | ||
if (var->deflate || var->fletcher32 || var->shuffle) | ||
return NC_EINVAL; | ||
var->contiguous = NC_TRUE; | ||
} | ||
|
||
/* Chunksizes anyone? */ | ||
if (contiguous && *contiguous == NC_CHUNKED) | ||
{ | ||
var->contiguous = NC_FALSE; | ||
for (d = 0; d < var->ndims; d++) | ||
if (var->dim[d]->unlimited) | ||
return NC_EINVAL; | ||
} | ||
|
||
/* If the user provided chunksizes, check that they are not too | ||
* big, and that their total size of chunk is less than 4 GB. */ | ||
if (chunksizes) | ||
/* Handle chunked storage settings. */ | ||
if (*contiguous == NC_CHUNKED) | ||
{ | ||
var->contiguous = NC_FALSE; | ||
|
||
if ((retval = check_chunksizes(grp, var, chunksizes))) | ||
return retval; | ||
/* If the user provided chunksizes, check that they are not too | ||
* big, and that their total size of chunk is less than 4 GB. */ | ||
if (chunksizes) | ||
{ | ||
/* Check the chunksizes for validity. */ | ||
if ((retval = check_chunksizes(grp, var, chunksizes))) | ||
return retval; | ||
|
||
/* Ensure chunksize is smaller than dimension size */ | ||
for (d = 0; d < var->ndims; d++) | ||
if(!var->dim[d]->unlimited && var->dim[d]->len > 0 && chunksizes[d] > var->dim[d]->len) | ||
return NC_EBADCHUNK; | ||
/* Ensure chunksize is smaller than dimension size */ | ||
for (d = 0; d < var->ndims; d++) | ||
if (!var->dim[d]->unlimited && var->dim[d]->len > 0 && | ||
chunksizes[d] > var->dim[d]->len) | ||
return NC_EBADCHUNK; | ||
|
||
/* Set the chunksizes for this variable. */ | ||
for (d = 0; d < var->ndims; d++) | ||
var->chunksizes[d] = chunksizes[d]; | ||
} | ||
} | ||
else if (*contiguous == NC_CONTIGUOUS) | ||
{ | ||
var->contiguous = NC_TRUE; | ||
} | ||
else if (*contiguous == NC_COMPACT) | ||
{ | ||
size_t ndata = 1; | ||
|
||
/* Set the chunksizes for this variable. */ | ||
/* Find the number of elements in the data. */ | ||
for (d = 0; d < var->ndims; d++) | ||
var->chunksizes[d] = chunksizes[d]; | ||
ndata *= var->dim[d]->len; | ||
|
||
/* Ensure var is small enough to fit in compact | ||
* storage. It must be <= 64 KB. */ | ||
if (ndata * var->type_info->size > SIXTY_FOUR_KB) | ||
return NC_EVARSIZE; | ||
|
||
var->contiguous = NC_FALSE; | ||
var->compact = NC_TRUE; | ||
} | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this matters in C, but if this were C++, it would save space in the struct (due to alignment concerns) to have all the
nc_bool_t
together instead of having theint parallel_access
in between. I haven't measured whether it makes any difference in C.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the
nc_bool_t
size is 4 bytes which is the same size as an int, so intermingling int and nc_bool_t doesn't change the size of the struct in this case. If thenc_bool_t
were changed to the stdbool-definedbool
, the size of the struct would drop by 40-bytes (something for the future...)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nc_bool_t is just an int. If it ever changes, then whoever makes the change will be in charge of making the correct change. I don't generally code based on this kind of thinking. I can only code correctly, and hope that future netCDF programmers will do the same. ;-)