Skip to content

Commit

Permalink
Add RFC 102 text: Embedding resource files into libgdal
Browse files Browse the repository at this point in the history
  • Loading branch information
rouault committed Oct 1, 2024
1 parent 7babe99 commit b236908
Show file tree
Hide file tree
Showing 3 changed files with 169 additions and 0 deletions.
1 change: 1 addition & 0 deletions doc/source/development/rfc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -107,3 +107,4 @@ RFC list
rfc98_build_requirements_gdal_3_9
rfc99_geometry_coordinate_precision
rfc101_raster_dataset_threadsafety
rfc102_embedded_resources
162 changes: 162 additions & 0 deletions doc/source/development/rfc/rfc102_embedded_resources.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,162 @@
.. _rfc-102:

===================================================================
RFC 102: Embedding resource files into libgdal
===================================================================

============== =============================================
Author: Even Rouault
Contact: even.rouault @ spatialys.com
Started: 2024-Oct-01
Status: Draft
Target: GDAL 3.10 or 3.11
============== =============================================

Summary
-------

This RFC uses C23 ``#embed`` pre-processor directive, when available,
to be able to embed GDAL resource files directly into libgdal. It is also
intended to be used for PROJ, in particular for its ``proj.db`` file.

Motivation
----------

Some parts of GDAL core, but mostly drivers, depend on a number of resource
files for correct execution. Locating those resource files on the filesystem
can be painful in some use cases of GDAL, that involve relocating the GDAL
binary at installation time. One such case could be the GDAL embedded in Rasterio
or Fiona binary wheels where :config:`GDAL_DATA` must be correctly set currently.
Web-assembly (WASM) use cases come also to mind as users of GDAL builds where
resources are directly included in libgdal.

Technical solution
------------------

The C23 standard includes a `#embed "filename" <https://en.cppreference.com/w/c/preprocessor/embed>`__
pre-processor directive that ingests the specified filename and returns its
content as tokens that can be stored in a unsigned char or char array.

Getting the content of a file into a variable is as simple as the following
(which also demonstrates adding a nul-terminating character when this is needed):

.. code-block:: c
static const char szPDS4Template[] = {
#embed "data/pds4_template.xml"
, '\0'};
Compiler support
----------------

Support for that directive is still very new. clang 19.1 is the
first compiler which has a release including it, and has an efficient
implementation of it, able to embed very large files with minimum RAM and CPU
usage.

The development version of GCC 15 also supports it, but in a non-optimized way
for now. i.e. trying to include large files, of several tens of megabytes could
cause significant compilation time, but without impact on runtime. This is not
an issue for GDAL use cases, and there is intent from GCC developers to improve
this in the future.

Embedding PROJ's proj.db of size 9.1 MB with GCC 15dev at time of writing takes
18 seconds and 1.7 GB RAM, compared to 0.4 second and 400 MB RAM for clang 19,
which is still reasonable (Generating proj.db itself from its source .sql files
takes one minute on the same system).

There is no timeline for Visual Studio C/C++ at time of writing (it has been
`requested by users <https://developercommunity.visualstudio.com/t/Add-support-for-embed-as-voted-into-the/10451640?ru=https%3A%2F%2Fdevelopercommunity.visualstudio.com%2Fcontent%2Fproblem%2F132967%2Fcopycut-commands-doesnt-work.html%3FchildToView%3D133119>`__)

To be noted that currently clang 19.1 only supports ``#embed`` in .c files, not
C++ ones (the C++ standard has not yet adopted this feature). So embedding
resources must be done in a .c file, which is obviously not a problem since
we can easily export symbols/functions from a .c file to be available by C++.

New CMake options
-----------------

Resources will only be embedded if the new ``EMBED_RESOURCE_FILES`` CMake option
is set to ``ON``. This option will default to ``ON`` for static library builds
and if `C23 ``#embed`` is detected to be available. Users might also turn it to ON for
shared library builds. A CMake error is emitted if the option is turned on but
the compiler lacks support for it.

A complementary CMake option ``USE_ONLY_EMBEDDED_RESOURCE_FILES`` will also
be added. It will default to ``OFF``. When set to ON, GDAL will not try to
locate resource files in the GDAL_DATA directory burnt at build time into libgdal
(``${install_prefix}/share/gdal``), or by the :config:`GDAL_DATA` configuration option.

Said otherwise, if ``EMBED_RESOURCE_FILES=ON`` but ``USE_ONLY_EMBEDDED_RESOURCE_FILES=OFF``,
GDAL will first try to locate resource files from the file system, and
fallback to the embedded version if not found.

The resource files will still be installed in ``${install_prefix}/share/gdal``,
unless ``USE_ONLY_EMBEDDED_RESOURCE_FILES`` is set to OFF.

Impacted code
-------------

- gcore: embedding LICENSE.TXT, and tms_*.json files
- frmts/grib: embedding GRIB2 CSV files
- frmts/hdf5: embedding bag_template.xml
- frmts/nitf: embedding nitf_spec.xml
- frmts/pdf: embedding pdf_composition.xml
- frmts/pds: embedding pds4_template.xml
- ogr/ogrsf_frmts/dgn: embedding seed_2d.dgn and seed_3d.dgn
- ogr/ogrsf_frmts/dxf: embedding header.dxf and leader.dxf
- ogr/ogrsf_frmts/gml: embedding .gfs files and gml_registry.xml
- ogr/ogrsf_frmts/gmlas: embedding gmlasconf.xml
- ogr/ogrsf_frmts/miramon: embedding MM_m_idofic.csv
- ogr/ogrsf_frmts/osm: embedding osm_conf.ini
- ogr/ogrsf_frmts/plscenes: embedding plscenesconf.json
- ogr/ogrsf_frmts/s57: embedding s57*.csv files
- ogr/ogrsf_frmts/sxf: embedding default.rsc
- ogr/ogrsf_frmts/vdv: embedding vdv452.xml

Considered alternatives
-----------------------

Including resource files into libraries has been a long-wished feature of C/C++.
Different workarounds have emerged over the years, such as the use of the
``od -x`` utility, GNU ``ld`` linker ``-b`` mode, or CMake-based solutions such
as https://jonathanhamberg.com/post/cmake-file-embedding/

We could potentially use the later to address non-C23 capable compilers, but
we have chosen not to do that, for the sake of implementation simplicity. And,
if considering using the CMake trick as the only solution, we should note that
C23 #embed has the potential for better compile time, as demonstrated by clang
implementation.

Backward compatibility
----------------------

Fully backwards compatible.

C23 is not required if EMBED_RESOURCE_FILES is not enabled.

Documentation
-------------

The 2 new CMake variables will be documented.

Testing
-------

The existing fedora:rawhide continuous integration target, which has now clang
19.1 available, will be modified to test the effect of the new variables.

Local builds using GCC 15dev builds of https://jwakely.github.io/pkg-gcc-latest/
have also be successfully done during the development of the candidate implementation

Related issues and PRs
----------------------

- https://github.com/OSGeo/gdal/issues/10780

- Candidate implementation (in progress): https://github.com/OSGeo/gdal/compare/master...rouault:gdal:embedded_resources?expand=1

Voting history
--------------

TBD
6 changes: 6 additions & 0 deletions doc/source/spelling_wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -1353,6 +1353,7 @@ IdentificationTolerance
identificator
Identificator
IDisposable
idofic
Idrisi
iDriver
idx
Expand Down Expand Up @@ -1782,6 +1783,7 @@ minY
MinZ
Mipmaps
MiraD
miramon
mis
mitab
mkdir
Expand Down Expand Up @@ -2409,6 +2411,7 @@ Placemark
plaintext
Plessis
plmosaic
plscenes
plscenesconf
pM
pnBufferSize
Expand Down Expand Up @@ -2840,6 +2843,7 @@ RPF
rpr
RRaster
rrd
rsc
rsiz
rst
rsync
Expand Down Expand Up @@ -3140,6 +3144,7 @@ swi
Swif
swiftclient
swq
sxf
sym
symlinked
syntaxes
Expand Down Expand Up @@ -3411,6 +3416,7 @@ vcpkg
vct
vcvars
vdc
vdv
vecror
VectorInfo
VectorInfoOptions
Expand Down

0 comments on commit b236908

Please sign in to comment.