Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle NULL values in features correctly #177

Closed
evetion opened this issue Mar 5, 2021 · 6 comments · Fixed by #238
Closed

Handle NULL values in features correctly #177

evetion opened this issue Mar 5, 2021 · 6 comments · Fixed by #238
Assignees
Labels
Milestone

Comments

@evetion
Copy link
Collaborator

evetion commented Mar 5, 2021

If I create the following geopackage (shown with ogrinfo)

Geometry Column = geom
date: DateTime (0.0)
test: Integer(Boolean) (0.0)
name: String (255.0)
number: Integer (0.0)
OGRFeature(null):1
  date (DateTime) = (null)         <-- note these nulls
  test (Integer(Boolean)) = 0
  name (String) = test
  number (Integer) = 1
  POINT (0.029535864978903 0.578059071729958)

OGRFeature(null):2
  date (DateTime) = 2021/03/12 00:00:00
  test (Integer(Boolean)) = 1
  name (String) = (null)         <-- note these nulls
  number (Integer) = (null)    <-- note these nulls
  POINT (-0.355836849507736 0.20393811533052)

and apply getfield to it, ArchGDAL fails with:

ERROR: Failed to fetch datetime at index 0
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] asdatetime(::ArchGDAL.Feature, ::Int32) at /Users/evetion/.julia/packages/ArchGDAL/j9NPL/src/ogr/feature.jl:318
 [3] getfield(::ArchGDAL.Feature, ::Int32) at /Users/evetion/.julia/packages/ArchGDAL/j9NPL/src/ogr/feature.jl:378
 [4] getfield(::ArchGDAL.Feature, ::Symbol) at /Users/evetion/.julia/packages/ArchGDAL/j9NPL/src/ogr/feature.jl:384

It seems that isfieldset doesn't trigger correctly for NULL DateTime fields. If we actually fill it, we get a result, but ArchGDAL uses getdefault, which returns actual data values (name and number should both be null). These should be either nothing or missing.

(date = Dates.DateTime("2021-03-12T00:00:00"), test = 1, name = "", number = 0, geom = Geometry: POINT (-0.355836849507736 0.20393811533052))

On a last note, we don't parse the width of integers, so we don't parse the test=1 as the bool it is.

Ping to @vernimmen who ran into this when opening a GeoPackage with NULLs via GeoDataFrames.jl.

@yeesian
Copy link
Owner

yeesian commented Apr 26, 2021

Sorry for being late to respond here: is it possible to provide a small working example here?

@evetion
Copy link
Collaborator Author

evetion commented Apr 26, 2021

Sure, I've uploaded a small test file.

using ArchGDAL

fn = download("https://github.com/evetion/GeoDataFrames.jl/raw/null-test-data/test/null.gpkg")
ds = ArchGDAL.read(fn)
layer = ArchGDAL.getlayer(ds, 0)

Purely the show method will already crash there:

Layer: null
Geometry 0 (geom): [wkbPoint], POINT (0.02953586497...), ...
Error showing value of type ArchGDAL.IFeatureLayer:
ERROR: Failed to fetch datetime at index 0

@yeesian
Copy link
Owner

yeesian commented Apr 27, 2021

In #177 (comment), I'm also curious about "On a last note, we don't parse the width of integers, so we don't parse the test=1 as the bool it is."

Geometry Column = geom
date: DateTime (0.0)
test: Integer(Boolean) (0.0)         <-- note the Boolean (rather than the width of 0)
name: String (255.0)
number: Integer (0.0)
OGRFeature(null):1
  date (DateTime) = (null)
  test (Integer(Boolean)) = 0
  name (String) = test
  number (Integer) = 1
  POINT (0.029535864978903 0.578059071729958)

...

I suspect I should be going by the fieldsubtype (per https://github.com/OSGeo/gdal/blob/fec15b146f8a750c23c5e765cac12ed5fc9c2b85/gdal/apps/ogrinfo.cpp#L396). What do you think?

@yeesian
Copy link
Owner

yeesian commented Apr 27, 2021

I dug into the code for ogrfeature.cpp (which is very instructive), and realize that it checks for nullity of the fields.

We should wrap the corresponding method from GDAL.jl and use it to check for nullity in ArchGDAL.jl

@evetion
Copy link
Collaborator Author

evetion commented Jul 6, 2021

I suspect I should be going by the fieldsubtype (per https://github.com/OSGeo/gdal/blob/fec15b146f8a750c23c5e765cac12ed5fc9c2b85/gdal/apps/ogrinfo.cpp#L396). What do you think?

Agreed. It also means our current method of a single convert (https://github.com/yeesian/ArchGDAL.jl/blob/master/src/tables.jl#L7) isn't good enough, as we need two converts and some extra logic.

@yeesian
Copy link
Owner

yeesian commented Aug 31, 2021

Also linking to Toblerity/Fiona#460 (comment) here.

@yeesian yeesian self-assigned this Aug 31, 2021
yeesian added a commit that referenced this issue Aug 31, 2021
Fixes #177 .

I think this aligns the behavior a lot better across the different file formats (see the test sets in test/test_tables.jl)

This behavior is implemented through `getfield(feature, i)` which makes use of `getfieldtype(feature, i)`. That way, we are aligned in behavior for field subtypes even for e.g. displaying the fields.

Because a lot hinges on distinguishing between whether a field is set versus whether it is null (see Toblerity/Fiona#460 (comment)), I have also added support for `isfieldnull()`, `isfieldsetandnotnull()`, and `setfieldnull!()`.
yeesian added a commit that referenced this issue Oct 8, 2021
* Return `missing` if the field is set but null.

Fixes #177 .

I think this aligns the behavior a lot better across the different file formats (see the test sets in test/test_tables.jl)

This behavior is implemented through `getfield(feature, i)` which makes use of `getfieldtype(feature, i)`. That way, we are aligned in behavior for field subtypes even for e.g. displaying the fields.

Because a lot hinges on distinguishing between whether a field is set versus whether it is null (see Toblerity/Fiona#460 (comment)), I have also added support for `isfieldnull()`, `isfieldsetandnotnull()`, and `setfieldnull!()`.

* Add docstring references for GDAL/OGR RFCs

* Document the intended ArchGDAL behavior for NULL semantics

* Add tests and implementation for getfield with missing values

As mathieu has observed, `OGRUnsetMarker` and `OGRNullMarkerare` are mutually exclusive. We implement that case to return nothing if it ever comes up, but it is not possible for us to the corresponding code path.

* Update the implementation

* Implement setting fields to missing or nothing.

* Update the documentation to indicate the methods being discussed

* remove assertions

* Map unset fields to `nothing` and null values to `missing`

Because getdefault() is meant to be used only when setting fields for notnullable columns with missing values, we make it return `nothing` instead of `missing` to unset the field.

* Turn references into hyperlinks

* use isfieldset() and isfieldnull() when getting a field

isfieldsetandnotnull() is just a convenience function.

* raise an error instead of getting default if we do not recognize the fieldtype when reading the field

(in the future, we will switch to dispatch on the fieldtype rather than the current dictionary approach, but that is out of the scope of this commit.)

* Fix docstring for isfieldsetandnotnull()

* Update to reflect NA for reading unset & null fields
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants