Skip to content

Commit

Permalink
gh-37399: Developer Guide: Document practices for data files
Browse files Browse the repository at this point in the history
    
Topics here:
- ext_data is legacy location only, see #33037
- separate repos for large data files
- importlib_resources

Follow-ups:
- use of Features

<!-- ^^^^^
Please provide a concise, informative and self-explanatory title.
Don't put issue numbers in there, do this in the PR body below.
For example, instead of "Fixes #1234" use "Introduce new method to
calculate 1+1"
-->
<!-- Describe your changes here in detail -->

<!-- Why is this change required? What problem does it solve? -->
<!-- If this PR resolves an open issue, please link to it here. For
example "Fixes #12345". -->
<!-- If your change requires a documentation PR, please link it
appropriately. -->

### 📝 Checklist

<!-- Put an `x` in all the boxes that apply. -->
<!-- If your change requires a documentation PR, please link it
appropriately -->
<!-- If you're unsure about any of these, don't hesitate to ask. We're
here to help! -->
<!-- Feel free to remove irrelevant items. -->

- [x] The title is concise, informative, and self-explanatory.
- [ ] The description explains in detail what this PR is about.
- [ ] I have linked a relevant issue or discussion.
- [ ] I have created tests covering the changes.
- [ ] I have updated the documentation accordingly.

### ⌛ Dependencies

<!-- List all open PRs that this PR logically depends on
- #12345: short description why this is a dependency
- #34567: ...
-->

<!-- If you're unsure about any of these, don't hesitate to ask. We're
here to help! -->
    
URL: #37399
Reported by: Matthias Köppe
Reviewer(s): gmou3, Gonzalo Tornaría, Matthias Köppe, Sebastian Oehms
  • Loading branch information
Release Manager committed Mar 29, 2024
2 parents 3ad892a + d3a4313 commit e6a377e
Show file tree
Hide file tree
Showing 2 changed files with 61 additions and 11 deletions.
71 changes: 60 additions & 11 deletions src/doc/en/developer/coding_basics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -89,9 +89,9 @@ In particular,
Files and directory structure
=============================

Roughly, the Sage directory tree is layout like this. Note that we use
``SAGE_ROOT`` in the following as a shortcut for the (arbitrary) name
of the directory containing the Sage sources:
Roughly, the Sage directory tree is laid out like this. Note that we
use ``SAGE_ROOT`` in the following as a shortcut for the name of the
directory containing the Sage sources:

.. CODE-BLOCK:: text
Expand All @@ -104,7 +104,7 @@ of the directory containing the Sage sources:
setup.py
...
sage/ # Sage library
ext_data/ # extra Sage resources (formerly src/ext)
ext_data/ # extra Sage resources (legacy)
bin/ # the scripts in local/bin that are tracked
upstream/ # tarballs of upstream sources
local/ # installed binaries
Expand Down Expand Up @@ -149,15 +149,36 @@ Adding new top-level packages below :mod:`sage` should be done
sparingly. It is often better to create subpackages of existing
packages.

Non-Python Sage source code and supporting files can be included in one
of the following places:
Non-Python Sage source code and small supporting files can be
included in one of the following places:

- In the directory of the Python code that uses that file. When the
Sage library is installed, the file will be installed in the same
location as the Python code. For example,
``SAGE_ROOT/src/sage/interfaces/maxima.py`` needs to use the file
``SAGE_ROOT/src/sage/interfaces/maxima.lisp`` at runtime, so it refers
to it as ::
location as the Python code. This is referred to as "package data".

The preferred way to access the data from Python is using the
`importlib.resources API
<https://importlib-resources.readthedocs.io/en/latest/using.html>`_,
in particular the function :func:`importlib.resources.files`.
Using it, you can:

- open a resource for text reading: ``fd = files(package).joinpath(resource).open('rt')``
- open a resource for binary reading: ``fd = files(package).joinpath(resource).open('rb')``
- read a resource as text: ``text = files(package).joinpath(resource).read_text()``
- read a resource as bytes: ``bytes = files(package).joinpath(resource).read_bytes()``
- open an xz-compressed resource for text reading: ``fd = lzma.open(files(package).joinpath(resource).open('rb'), 'rt')``
- open an xz-compressed resource for binary reading: ``fd = lzma.open(files(package).joinpath(resource).open('rb'), 'rb')``

If the file needs to be used outside of Python, then the
preferred way is using the context manager
:func:`importlib.resources.as_file`. It should be imported in the
same way as shown above.

- Older code in the Sage library accesses
the package data in more direct ways. For example,
``SAGE_ROOT/src/sage/interfaces/maxima.py`` uses the file
``SAGE_ROOT/src/sage/interfaces/maxima.lisp`` at runtime, so it
refers to it as::

os.path.join(os.path.dirname(__file__), 'sage-maxima.lisp')

Expand All @@ -169,11 +190,39 @@ of the following places:
from sage.env import SAGE_EXTCODE
file = os.path.join(SAGE_EXTCODE, 'directory', 'file')

In both cases, the files must be listed (explicitly or via wildcards) in
This practice is deprecated, see :issue:`33037`.

In all cases, the files must be listed (explicitly or via wildcards) in
the section ``options.package_data`` of the file
``SAGE_ROOT/pkgs/sagemath-standard/setup.cfg.m4`` (or the corresponding
file of another distribution).

Large data files should not be added to the Sage source tree. Instead, it
is proposed to do the following:

- create a separate git repository and upload them there [2]_,

- add metadata to the repository that make it a pip-installable
package (distribution package), as explained for example in the
`Python Packaging User Guide
<https://packaging.python.org/en/latest/tutorials/packaging-projects/>`_,

- `upload it to PyPI
<https://packaging.python.org/en/latest/tutorials/packaging-projects/#uploading-the-distribution-archives>`_,

- create metadata in ``SAGE_ROOT/build/pkgs`` that make your new
pip-installable package known to Sage; see :ref:`chapter-packaging`.

For guiding examples of external repositories that host large data
files, see https://github.com/sagemath/conway-polynomials, and
https://github.com/gmou3/matroid-database.

.. [2]
It is also suggested that the files are compressed, e.g., through
the command ``xz -e``. They can then be read via a command such as
``lzma.open(file, 'rt')``.
Learn by copy/paste
===================
Expand Down
1 change: 1 addition & 0 deletions src/doc/en/developer/coding_in_python.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ Coding in Python for Sage
This chapter discusses some issues with, and advice for, coding in
Sage.

.. _section-python-language-standard:

Python language standard
========================
Expand Down

0 comments on commit e6a377e

Please sign in to comment.