Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Joining on DBF sometimes results in 0 length char output #541

Closed
nvkelso opened this issue May 7, 2022 · 4 comments
Closed

Joining on DBF sometimes results in 0 length char output #541

nvkelso opened this issue May 7, 2022 · 4 comments

Comments

@nvkelso
Copy link

nvkelso commented May 7, 2022

In Natural Earth v5.1.0 I'm hearing reports that the resulting Shapefile's DBF component now sometimes have zero character length format fields, compared with v5.0.1 and earlier which respected the earlier DBF which alway set a character length of say 30, even if the values were always empty.

When importing shapefiles into PostGIS it complains about string columns with length zero.

https://github.com/nvkelso/natural-earth-vector/blob/v5.1.0/Makefile#L689-L702

build_a5_ne_10m_admin_0_countries_iso: 10m_cultural/ne_10m_admin_0_scale_rank.shp \
	housekeeping/ne_admin_0_details_iso_countries.dbf \
	housekeeping/ne_admin_0_details_level_5_disputed.dbf
	mapshaper -i 10m_cultural/ne_10m_admin_0_scale_rank.shp \
		-join housekeeping/ne_admin_0_details_level_5_disputed.dbf encoding=utf8 keys=sr_brk_a3,BRK_A3 fields=ADM0_ISO \
		-dissolve 'ADM0_ISO' calc='featurecla="Admin-0 country", scalerank = min(scalerank)' \
		-filter 'scalerank !== null' + \
		-filter 'scalerank <= 6' + \
		-join housekeeping/ne_admin_0_details_iso_countries.dbf encoding=utf8 keys=ADM0_ISO,ADM0_ISO fields=* \
		-filter 'ADM0_ISO !== "-99"' + \
		-each 'delete sr_adm0_a3' \
		-each 'NAME=BRK_NAME' \
		-each 'NAME_LONG=BRK_NAME' \
		-o 10m_cultural/ne_10m_admin_0_countries_iso.shp \

In the input DBF "housekeeping/ne_admin_0_details_iso_countries.dbf" it's column format is set as FCLASS_DE,C,30, while in the resulting 10m_cultural/ne_10m_admin_0_countries_iso.dbf it's set to FCLASS_DE,C,0. It should continue to be FCLASS_DE,C,30. I'm verifying this in OpenOffice by opening the DBF files and looking at the column headings, or in QGIS looking at the field definitions (via OGR). Or if you need to reset it because you don't see any values, FCLASS_DE,C,1 so it's interoperable with PostGIS.

PostGIS error:

ERROR:  length for type varchar must be at least 1
LINE 139: "fclass_us" varchar(0),
                      ^
ERROR:  current transaction is aborted, commands ignored until end of transaction block

I don't recall updating Mapshaper (currently on version 0.5.53) between v5.0.0 and v5.1.0 so I'm flummoxed as to why this change in output is there. The input DBF did change, though.

@mbloch
Copy link
Owner

mbloch commented May 7, 2022

I'll look into this right away, thanks for reporting

@nvkelso
Copy link
Author

nvkelso commented May 7, 2022

Confirmed this is still a problem after upgrading to 0.5.115.

Thanks @mbloch !

@mbloch
Copy link
Owner

mbloch commented May 7, 2022

I just released 0.5.116, which sets the minimum size of type-C DBF fields to 1 (so if a field only contains empty strings, the size will be 1 not 0).

This problem occurred because Mapshaper adapts the field size to the size of the content. Some software sets all the DBF string fields to the maximum width (254 bytes), which makes for bloated output files.

@mbloch mbloch closed this as completed May 7, 2022
@nvkelso
Copy link
Author

nvkelso commented May 7, 2022

Confirmed 0.5.116 produces 1-char fields instead of 0-char fields now. Thanks!

I'll push out v5.1.1 of Natural Earth with the fix.

(It will take me till Monday to see if PostGIS is happy, but I assume we're set there now too :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants