Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ogr2ogr: speed-up reprojection in Arrow code path #10717

Merged
merged 10 commits into from
Sep 18, 2024

Conversation

rouault
Copy link
Member

@rouault rouault commented Sep 2, 2024

(on top of PR #10716)

Use newly added OGRWKBTransform(), when possible, to speed-up -t_srs in Arrow code path (which is triggered in Parquet/GPKG -> anything, when no option or no other option than -t_srs is specified). Also use multi-threaded coordinate transformation by splitting the features within each batch in several sub-batches, each processed in its own thread.

Can result in a 3.3x speed-up in ogr2ogr Parquet->Parquet -t_srs use case on a 3.2 million feature dataset:

No reprojection:

$ time ogr2ogr out.parquet nz-building-outlines.parquet
real	0m5,687s
user	0m5,254s
sys	0m1,032s

Reprojection EPSG:2193 ("NZGD2000 / New Zealand Transverse Mercator 2000") to EPSG:4326 without optimization:

$ time ogr2ogr out.parquet nz-building-outlines.parquet -t_srs EPSG:4326 --config OGR2OGR_USE_ARROW_API=NO
real	0m24,038s
user	0m23,472s
sys	0m1,314s

Reprojection with optimization using 6 theads for reprojection:

$ time ogr2ogr out.parquet nz-building-outlines.parquet -t_srs EPSG:4326
real	0m7,100s
user	0m14,376s
sys	0m1,044s

Reprojection with optimization using 1 thead for reprojection:

$ time ogr2ogr out.parquet nz-building-outlines.parquet -t_srs EPSG:4326 --config GDAL_NUM_THREADS=1
real	0m13,773s
user	0m12,786s
sys	0m1,002s

Also applies for GPKG -> GPKG with a 2x speedup:

$ time ogr2ogr out.gpkg nz-building-outlines.gpkg -t_srs EPSG:4326
real	0m17,976s
user	0m29,195s
sys	0m1,802s

Without optimization:

$ time ogr2ogr out.gpkg nz-building-outlines.gpkg -t_srs EPSG:4326 --config OGR2OGR_USE_ARROW_API=NO
real	0m36,688s
user	0m35,809s
sys	0m2,269s

@rouault rouault added this to the 3.10.0 milestone Sep 2, 2024
@rouault rouault force-pushed the ogr2ogr_arrow_reproj branch 12 times, most recently from 3bac315 to 8957171 Compare September 3, 2024 00:48
@coveralls
Copy link
Collaborator

Coverage Status

coverage: 69.363% (+0.01%) from 69.353%
when pulling d0131e4 on rouault:ogr2ogr_arrow_reproj
into 4ca4d62 on OSGeo:master.

@rouault rouault merged commit 94ede75 into OSGeo:master Sep 18, 2024
36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants