diff --git a/doc/source/development/rfc/index.rst b/doc/source/development/rfc/index.rst index 73f24bcf634e..fc7714e67617 100644 --- a/doc/source/development/rfc/index.rst +++ b/doc/source/development/rfc/index.rst @@ -107,3 +107,4 @@ RFC list rfc98_build_requirements_gdal_3_9 rfc99_geometry_coordinate_precision rfc101_raster_dataset_threadsafety + rfc102_embedded_resources diff --git a/doc/source/development/rfc/rfc102_embedded_resources.rst b/doc/source/development/rfc/rfc102_embedded_resources.rst new file mode 100644 index 000000000000..ceee9bc2422f --- /dev/null +++ b/doc/source/development/rfc/rfc102_embedded_resources.rst @@ -0,0 +1,170 @@ +.. _rfc-102: + +=================================================================== +RFC 102: Embedding resource files into libgdal +=================================================================== + +============== ============================================= +Author: Even Rouault +Contact: even.rouault @ spatialys.com +Started: 2024-Oct-01 +Status: Draft +Target: GDAL 3.10 or 3.11 +============== ============================================= + +Summary +------- + +This RFC uses C23 ``#embed`` pre-processor directive, when available, +to be able to embed GDAL resource files directly into libgdal. It is also +intended to be used for PROJ, in particular for its :file:`proj.db` file. + +Motivation +---------- + +Some parts of GDAL core, but mostly drivers, depend on a number of resource +files for correct execution. Locating those resource files on the filesystem +can be painful in some use cases of GDAL, that involve relocating the GDAL +binary at installation time. One such case could be the GDAL embedded in Rasterio +or Fiona binary wheels where :config:`GDAL_DATA` must be correctly set currently. +Web-assembly (WASM) use cases come also to mind as users of GDAL builds where +resources are directly included in libgdal. + +Technical solution +------------------ + +The C23 standard includes a `#embed "filename" `__ +pre-processor directive that ingests the specified filename and returns its +content as tokens that can be stored in a unsigned char or char array. + +Getting the content of a file into a variable is as simple as the following +(which also demonstrates adding a nul-terminating character when this is needed): + +.. code-block:: c + + static const char szPDS4Template[] = { + #embed "data/pds4_template.xml" + , '\0'}; + +Compiler support +---------------- + +Support for that directive is still very new. clang 19.1 is the +first compiler which has a release including it, and has an efficient +implementation of it, able to embed very large files with minimum RAM and CPU +usage. + +The development version of GCC 15 also supports it, but in a non-optimized way +for now. i.e. trying to include large files, of several tens of megabytes could +cause significant compilation time, but without impact on runtime. This is not +an issue for GDAL use cases, and there is intent from GCC developers to improve +this in the future. + +Embedding PROJ's :file:`proj.db` of size 9.1 MB with GCC 15dev at time of writing takes +18 seconds and 1.7 GB RAM, compared to 0.4 second and 400 MB RAM for clang 19, +which is still reasonable (Generating :file:`proj.db` itself from its source .sql files +takes one minute on the same system). + +There is no timeline for Visual Studio C/C++ at time of writing (it has been +`requested by users `__) + +To be noted that currently clang 19.1 only supports ``#embed`` in .c files, not +C++ ones (the C++ standard has not yet adopted this feature). So embedding +resources must be done in a .c file, which is obviously not a problem since +we can easily export symbols/functions from a .c file to be available by C++. + +New CMake options +----------------- + +Resources will only be embedded if the new ``EMBED_RESOURCE_FILES`` CMake option +is set to ``ON``. This option will default to ``ON`` for static library builds +and if `C23 ``#embed`` is detected to be available. Users might also turn it to ON for +shared library builds. A CMake error is emitted if the option is turned on but +the compiler lacks support for it. + +A complementary CMake option ``USE_ONLY_EMBEDDED_RESOURCE_FILES`` will also +be added. It will default to ``OFF``. When set to ON, GDAL will not try to +locate resource files in the GDAL_DATA directory burnt at build time into libgdal +(``${install_prefix}/share/gdal``), or by the :config:`GDAL_DATA` configuration option. + +Said otherwise, if ``EMBED_RESOURCE_FILES=ON`` but ``USE_ONLY_EMBEDDED_RESOURCE_FILES=OFF``, +GDAL will first try to locate resource files from the file system, and +fallback to the embedded version if not found. + +The resource files will still be installed in ``${install_prefix}/share/gdal``, +unless ``USE_ONLY_EMBEDDED_RESOURCE_FILES`` is set to OFF. + +Impacted code +------------- + +- gcore: embedding LICENSE.TXT, and tms_*.json files +- frmts/grib: embedding GRIB2 CSV files +- frmts/hdf5: embedding bag_template.xml +- frmts/nitf: embedding nitf_spec.xml +- frmts/pdf: embedding pdf_composition.xml +- frmts/pds: embedding pds4_template.xml +- ogr/ogrsf_frmts/dgn: embedding seed_2d.dgn and seed_3d.dgn +- ogr/ogrsf_frmts/dxf: embedding header.dxf and leader.dxf +- ogr/ogrsf_frmts/gml: embedding .gfs files and gml_registry.xml +- ogr/ogrsf_frmts/gmlas: embedding gmlasconf.xml +- ogr/ogrsf_frmts/miramon: embedding MM_m_idofic.csv +- ogr/ogrsf_frmts/osm: embedding osm_conf.ini +- ogr/ogrsf_frmts/plscenes: embedding plscenesconf.json +- ogr/ogrsf_frmts/s57: embedding s57*.csv files +- ogr/ogrsf_frmts/sxf: embedding default.rsc +- ogr/ogrsf_frmts/vdv: embedding vdv452.xml + +PROJ specificities +------------------ + +Loading of the embedded :file:`proj.db` will involve using the +`SQLite3 memvfs `__, +as done by +`DuckDB Spatial `__ + +Considered alternatives +----------------------- + +Including resource files into libraries has been a long-wished feature of C/C++. +Different workarounds have emerged over the years, such as the use of the +``od -x`` utility, GNU ``ld`` linker ``-b`` mode, or CMake-based solutions such +as https://jonathanhamberg.com/post/cmake-file-embedding/ + +We could potentially use the later to address non-C23 capable compilers, but +we have chosen not to do that, for the sake of implementation simplicity. And, +if considering using the CMake trick as the only solution, we should note that +C23 #embed has the potential for better compile time, as demonstrated by clang +implementation. + +Backward compatibility +---------------------- + +Fully backwards compatible. + +C23 is not required if EMBED_RESOURCE_FILES is not enabled. + +Documentation +------------- + +The 2 new CMake variables will be documented. + +Testing +------- + +The existing fedora:rawhide continuous integration target, which has now clang +19.1 available, will be modified to test the effect of the new variables. + +Local builds using GCC 15dev builds of https://jwakely.github.io/pkg-gcc-latest/ +have also be successfully done during the development of the candidate implementation + +Related issues and PRs +---------------------- + +- https://github.com/OSGeo/gdal/issues/10780 + +- Candidate implementation (in progress): https://github.com/OSGeo/gdal/compare/master...rouault:gdal:embedded_resources?expand=1 + +Voting history +-------------- + +TBD diff --git a/doc/source/spelling_wordlist.txt b/doc/source/spelling_wordlist.txt index 21e25ccf2846..c68096f85efb 100644 --- a/doc/source/spelling_wordlist.txt +++ b/doc/source/spelling_wordlist.txt @@ -1353,6 +1353,7 @@ IdentificationTolerance identificator Identificator IDisposable +idofic Idrisi iDriver idx @@ -1782,6 +1783,7 @@ minY MinZ Mipmaps MiraD +miramon mis mitab mkdir @@ -2409,6 +2411,7 @@ Placemark plaintext Plessis plmosaic +plscenes plscenesconf pM pnBufferSize @@ -2840,6 +2843,7 @@ RPF rpr RRaster rrd +rsc rsiz rst rsync @@ -3140,6 +3144,7 @@ swi Swif swiftclient swq +sxf sym symlinked syntaxes @@ -3411,6 +3416,7 @@ vcpkg vct vcvars vdc +vdv vecror VectorInfo VectorInfoOptions