Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add experimental support for fixed length string type to nczarr. #2467

Closed

Conversation

DennisHeimbigner
Copy link
Collaborator

re: Issue #2465
re: Issue #2259

[Note: It also tangentially affects PR https://github.com//pull/2466 since this PR requires that PR to be merged before this one and actually includes that PR here.]

The primary issue to be addressed is to provide a way for user to
specify the size of the fixed length strings. This is handled by providing
the following new attributes special:

  1. _nczarr_default_maxstrlen
    This is an attribute of the root group. It specifies the default
    maximum string length for string types. If not specified, then
    it has the value of 64 characters.
  2. _nczarr_maxstrlen &mdash
    This is a per-variable attribute. It specifies the maximum
    string length for the string type associated with the variable.
    If not specified, then it is assigned the value of
    _nczarr_default_maxstrlen.

This PR also requires some hacking to handle the existing netcdf-c NC_CHAR
type, which does not exist in zarr. The goal was to choose numpy types for
both the netcdf-c NC_STRING type and the netcdf-c NC_CHAR type such that
if a pure zarr implementation read them, it would still work and an
NC_CHAR type would be handled by zarr as a string of length 1.

For writing variables and NCZarr attributes, the type mapping is as follows:

  • "|S1" for NC_CHAR.
  • ">S1" for NC_STRING && MAXSTRLEN==1
  • ">Sn" for NC_STRING && MAXSTRLEN==n

Note that it is a bit of a hack to use endianness, but it should be ok since for
string/char, the endianness has no meaning.

For reading attributes with pure zarr (i.e. with no nczarr
atribute types defined), they will always be interpreted as of
type NC_CHAR.

Misc. Other Changes

  1. Convert the nczarr special attributes and keys to be all lower case. So "_NCZARR_ATTR" now used "_nczarr_attr. Support back compatibility for the upper case names.
  2. Cleanup my too-clever-by-half handling of scalars in libnczarr.

re: Issue Unidata#2465
re: Issue Unidata#2259

[Note: It also tangentially affects PR Unidata#2466 since this PR requires that PR to be merged before this one and actually includes that PR here.]

The primary issue to be addressed is to provide a way for user to
specify the size of the fixed length strings. This is handled by providing
the following new attributes special:
1. **_nczarr_default_maxstrlen** —
This is an attribute of the root group. It specifies the default
maximum string length for string types. If not specified, then
it has the value of 64 characters.
2. **_nczarr_maxstrlen** &mdash
This is a per-variable attribute. It specifies the maximum
string length for the string type associated with the variable.
If not specified, then it is assigned the value of
**_nczarr_default_maxstrlen**.

This PR also requires some hacking to handle the existing netcdf-c NC_CHAR
type, which does not exist in zarr. The goal was to choose numpy types for
both the netcdf-c NC_STRING type and the netcdf-c NC_CHAR type such that
if a pure zarr implementation read them, it would still work and an
NC_CHAR type would be handled by zarr as a string of length 1.

For writing variables and NCZarr attributes, the type mapping is as follows:
* "|S1" for NC_CHAR.
* ">S1" for NC_STRING && MAXSTRLEN==1
* ">Sn" for NC_STRING && MAXSTRLEN==n

Note that it is a bit of a hack to use endianness, but it should be ok since for
string/char, the endianness has no meaning.

For reading attributes with pure zarr (i.e. with no nczarr
atribute types defined), they will always be interpreted as of
type NC_CHAR.

## Misc. Other Changes
1. Convert the nczarr special attributes and keys to be all lower case. So "_NCZARR_ATTR" now used "_nczarr_attr. Support back compatibility for the upper case names.
2. Cleanup my too-clever-by-half handling of scalars in libnczarr.
@DennisHeimbigner DennisHeimbigner marked this pull request as draft August 1, 2022 20:25
@DennisHeimbigner DennisHeimbigner changed the title Add support for fixed length string type to nczarr. Add experimental support for fixed length string type to nczarr. Aug 5, 2022
@DennisHeimbigner
Copy link
Collaborator Author

This PR introduces some memory leaks, so I am closing it in favor
of a more general PR.

@DennisHeimbigner
Copy link
Collaborator Author

Subsumed by PR #2492

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant