Skip to content

Commit

Permalink
fix: isolate required and optional dependencies (#237)
Browse files Browse the repository at this point in the history
Allow more minimal installs by making certain packages optional, namely the backend engines (`h5netcdf, `netcdf4`, and `zarr`) used for saving and loading models, plus `statsmodels` which is used for CPCCA-derived models, and `numba` which is used for GWPCA.
  • Loading branch information
slevang authored Oct 7, 2024
1 parent a6a32a0 commit a6c05e6
Show file tree
Hide file tree
Showing 31 changed files with 177 additions and 67 deletions.
15 changes: 12 additions & 3 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ on:

jobs:
test:
name: py${{ matrix.versions.python-version }} ${{ matrix.versions.resolution }}
name: py${{ matrix.versions.python-version }} ${{ matrix.versions.resolution }} ${{ matrix.deps.name}}
runs-on: ubuntu-latest
strategy:
matrix:
Expand All @@ -22,6 +22,14 @@ jobs:
resolution: highest
- python-version: '3.12'
resolution: highest
deps:
- name: minimal
value: '[dev]'
doctest: '' # doctest runs MCA and requires statsmodels
- name: complete
value: '[dev,complete]'
doctest: '--doctest-glob=README.md'

steps:
- uses: actions/checkout@v4

Expand All @@ -33,11 +41,12 @@ jobs:
- name: Install dependencies
run: |
pip install uv
uv pip install . -r pyproject.toml --system --extra dev --resolution ${{ matrix.versions.resolution }}
uv pip install .${{ matrix.deps.value }} -r pyproject.toml \
--system --resolution ${{ matrix.versions.resolution }}
- name: Execute Tests
run: |
coverage run -m pytest -n auto --doctest-glob="README.md"
coverage run -m pytest -n auto ${{ matrix.deps.doctest }}
coverage report -m
coverage xml
Expand Down
16 changes: 3 additions & 13 deletions docs/content/contributing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -48,19 +48,9 @@ Using the commands below, prepare your environment:
conda create -n xeofs python=3.11 rpy2 pandoc
conda activate xeofs
pip install -e .[docs,dev]
pip install -e .[complete,docs,dev]
This will install all necessary dependencies, including those for development and documentation. If you're only updating the code (without modifying online documentation), you can skip the docs dependency:

.. code-block:: bash
pip install -e .[dev]
On the other hand, if you're just updating documentation:

.. code-block:: bash
pip install -e .[docs]
This will install both core and optional dependencies, including those for specialized models, documentation, and development. Alternatively, you can skip some of the optional dependency sets (``[complete,docs,dev]``) depending on which components of the package you're working on.

Additionally, install the pre-commit hooks:

Expand All @@ -81,7 +71,7 @@ Before diving into your contribution, ensure your local main branch is updated:
git fetch upstream
git merge upstream/main
This syncs your local main branch with the latest from the primary `xeofs` repository.
This syncs your local main branch with the latest from the primary ``xeofs`` repository.

4. Create a new branch
----------------------
Expand Down
50 changes: 31 additions & 19 deletions docs/content/user_guide/installation.rst
Original file line number Diff line number Diff line change
@@ -1,35 +1,33 @@
Installation
------------

Required Dependencies
Dependencies
~~~~~~~~~~~~~~~~~~~~~

The following packages are required dependencies:
The following packages are dependencies of ``xeofs``:

**Core Dependencies**
**Core Dependencies (Required)**

* Python (3.10 or higher)
* `numpy <https://www.numpy.org/>`__
* `pandas <https://pandas.pydata.org/>`__
* `xarray <http://xarray.pydata.org/>`__
* `scikit-learn <https://scikit-learn.org/stable/>`__
* `statsmodels <https://www.statsmodels.org/stable/index.html>`__
* `numpy <https://www.numpy.org/>`__
* `pandas <https://pandas.pydata.org/>`__
* `xarray <http://xarray.pydata.org/>`__
* `dask <https://dask.org/>`__
* `scikit-learn <https://scikit-learn.org/stable/>`__
* `typing-extensions <https://pypi.org/project/typing-extensions/>`__
* `tqdm <https://tqdm.github.io/>`__

**For Performance**
**For Specialized Models (Optional)**

* `dask <https://dask.org/>`__
* `numba <https://numba.pydata.org/>`__
* `numba <https://numba.pydata.org/>`__
* `statsmodels <https://www.statsmodels.org/stable/index.html>`__

**For I/O**
**For I/O (Optional)**

* `netCDF4 <https://unidata.github.io/netcdf4-python/netCDF4/index.html>`__
* `zarr <https://zarr.readthedocs.io/en/stable/>`__
* `xarray-datatree <https://github.com/xarray-contrib/datatree>`__
* `h5netcdf <https://h5netcdf.org/>`__
* `netCDF4 <https://unidata.github.io/netcdf4-python/netCDF4/index.html>`__
* `zarr <https://zarr.readthedocs.io/en/stable/>`__

**Miscellaneous**

* `typing-extensions <https://pypi.org/project/typing-extensions/>`__
* `tqdm <https://tqdm.github.io/>`__

Instructions
~~~~~~~~~~~~
Expand All @@ -46,3 +44,17 @@ or the Python package installer `pip <https://pip.pypa.io/en/stable/getting-star
.. code-block:: bash
pip install xeofs
Several optional dependencies are required for certain functionality and are not installed by default:

* ``zarr``, ``h5netcdf``, or ``netcdf4`` are necessary for saving and loading models to disk
* ``statsmodels`` is required for all models that inherit from ``CPCCA`` including ``CCA``, ``MCA`` and ``RDA``
* ``numba`` is required for the ``GWPCA`` model

These extras can be automatically included when installing with pip:

.. code-block:: bash
pip install xeofs[complete]
# or using individual groups
pip install xeofs[io,etc]
2 changes: 1 addition & 1 deletion docs/environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,4 @@ dependencies:
- pandoc
- pip
- pip:
- -e ../.[docs]
- -e ../.[complete,docs]
14 changes: 10 additions & 4 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -17,15 +17,12 @@ dependencies = [
"scikit-learn>=1.0.2",
"tqdm>=4.64.0",
"dask>=2023.0.1",
"statsmodels>=0.14.0",
"netCDF4>=1.5.8",
"numba>=0.57",
"typing-extensions>=4.8.0",
"zarr>=2.14.0",
"xarray-datatree>=0.0.12",
]

[project.optional-dependencies]
complete = ["xeofs[etc,io]"]
dev = [
"build>=1.0.0",
"ruff>=0.3",
Expand Down Expand Up @@ -53,6 +50,15 @@ docs = [
"ipython>=8.14",
"ipykernel>=6.23",
]
etc = [
"numba>=0.57",
"statsmodels>=0.14.0",
]
io = [
"h5netcdf>=1.0.0",
"netcdf4>=1.5.8",
"zarr>=2.14.0",
]

[project.urls]
homepage = "https://github.com/xarray-contrib/xeofs"
Expand Down
3 changes: 3 additions & 0 deletions tests/models/cross/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
import pytest

pytest.importorskip("statsmodels")
6 changes: 5 additions & 1 deletion tests/models/cross/test_cca.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@

from xeofs.cross import CCA

from ...utilities import skip_if_missing_engine


def generate_random_data(shape, lazy=False, seed=142):
rng = np.random.default_rng(seed)
Expand Down Expand Up @@ -226,11 +228,13 @@ def test_predict():
_ = cca.inverse_transform(Y=Ry_pred)


@pytest.mark.parametrize("engine", ["netcdf4", "zarr"])
@pytest.mark.parametrize("engine", ["h5netcdf", "netcdf4", "zarr"])
def test_save_load(tmp_path, engine):
"""Test save/load methods in MCA class, ensuring that we can
roundtrip the model and get the same results when transforming
data."""
skip_if_missing_engine(engine)

X = generate_random_data((200, 10), seed=123)
Y = generate_random_data((200, 20), seed=321)

Expand Down
10 changes: 8 additions & 2 deletions tests/models/cross/test_cpcca.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@

from xeofs.cross import CPCCA

from ...utilities import skip_if_missing_engine


def generate_random_data(shape, lazy=False, seed=142):
rng = np.random.default_rng(seed)
Expand Down Expand Up @@ -274,12 +276,14 @@ def test_predict():
_ = cpcca.inverse_transform(Y=Ry_pred)


@pytest.mark.parametrize("engine", ["netcdf4", "zarr"])
@pytest.mark.parametrize("engine", ["h5netcdf", "netcdf4", "zarr"])
@pytest.mark.parametrize("alpha", [0.0, 0.5, 1.0])
def test_save_load(tmp_path, engine, alpha):
"""Test save/load methods in MCA class, ensuring that we can
roundtrip the model and get the same results when transforming
data."""
skip_if_missing_engine(engine)

X = generate_random_data((200, 10), seed=123)
Y = generate_random_data((200, 20), seed=321)

Expand Down Expand Up @@ -319,11 +323,13 @@ def test_save_load(tmp_path, engine, alpha):
assert np.allclose(XYr_o[1], XYr_l[1])


@pytest.mark.parametrize("engine", ["netcdf4", "zarr"])
@pytest.mark.parametrize("engine", ["h5netcdf", "netcdf4", "zarr"])
@pytest.mark.parametrize("alpha", [0.0, 0.5, 1.0])
def test_save_load_with_data(tmp_path, engine, alpha):
"""Test save/load methods in CPCCA class, ensuring that we can
roundtrip the model and get the same results for SCF."""
skip_if_missing_engine(engine)

X = generate_random_data((200, 10), seed=123)
Y = generate_random_data((200, 20), seed=321)

Expand Down
6 changes: 5 additions & 1 deletion tests/models/cross/test_hilbert_cpcca.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@

from xeofs.cross import HilbertCPCCA

from ...utilities import skip_if_missing_engine


def generate_random_data(shape, lazy=False, seed=142):
rng = np.random.default_rng(seed)
Expand Down Expand Up @@ -65,11 +67,13 @@ def test_singular_values(use_pca):


# Currently, netCDF4 does not support complex numbers, so skip this test
@pytest.mark.parametrize("engine", ["zarr"])
@pytest.mark.parametrize("engine", ["h5netcdf", "zarr"])
@pytest.mark.parametrize("alpha", [0.0, 0.5, 1.0])
def test_save_load_with_data(tmp_path, engine, alpha):
"""Test save/load methods in CPCCA class, ensuring that we can
roundtrip the model and get the same results."""
skip_if_missing_engine(engine)

X = generate_random_data((200, 10), seed=123)
Y = generate_random_data((200, 20), seed=321)

Expand Down
6 changes: 5 additions & 1 deletion tests/models/cross/test_hilbert_mca_rotator.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@
# Import the classes from your modules
from xeofs.cross import HilbertMCA, HilbertMCARotator

from ...utilities import skip_if_missing_engine


@pytest.fixture
def mca_model(mock_data_array, dim):
Expand Down Expand Up @@ -242,10 +244,12 @@ def test_scores_phase(mca_model, mock_data_array, dim):
],
)
# Currently, netCDF4 does not support complex numbers, so skip this test
@pytest.mark.parametrize("engine", ["zarr"])
@pytest.mark.parametrize("engine", ["h5netcdf", "zarr"])
def test_save_load_with_data(tmp_path, engine, mca_model):
"""Test save/load methods in HilbertMCARotator class, ensuring that we can
roundtrip the model and get the same results."""
skip_if_missing_engine(engine)

original = HilbertMCARotator(n_modes=2)
original.fit(mca_model)

Expand Down
6 changes: 4 additions & 2 deletions tests/models/cross/test_mca.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

from xeofs.cross import MCA

from ...utilities import data_is_dask
from ...utilities import data_is_dask, skip_if_missing_engine


@pytest.fixture
Expand Down Expand Up @@ -376,11 +376,13 @@ def test_compute(mock_dask_data_array, dim, compute):
(("lon", "lat")),
],
)
@pytest.mark.parametrize("engine", ["netcdf4", "zarr"])
@pytest.mark.parametrize("engine", ["h5netcdf", "netcdf4", "zarr"])
def test_save_load(dim, mock_data_array, tmp_path, engine):
"""Test save/load methods in MCA class, ensuring that we can
roundtrip the model and get the same results when transforming
data."""
skip_if_missing_engine(engine)

original = MCA()
original.fit(mock_data_array, mock_data_array, dim)

Expand Down
6 changes: 4 additions & 2 deletions tests/models/cross/test_mca_rotator.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# Import the classes from your modules
from xeofs.cross import MCA, MCARotator

from ...utilities import data_is_dask
from ...utilities import data_is_dask, skip_if_missing_engine


@pytest.fixture
Expand Down Expand Up @@ -230,11 +230,13 @@ def test_compute(mca_model_delayed, compute):
(("lon", "lat")),
],
)
@pytest.mark.parametrize("engine", ["netcdf4", "zarr"])
@pytest.mark.parametrize("engine", ["h5netcdf", "netcdf4", "zarr"])
def test_save_load(dim, mock_data_array, tmp_path, engine):
"""Test save/load methods in MCA class, ensuring that we can
roundtrip the model and get the same results when transforming
data."""
skip_if_missing_engine(engine)

original_unrotated = MCA()
original_unrotated.fit(mock_data_array, mock_data_array, dim)

Expand Down
6 changes: 5 additions & 1 deletion tests/models/cross/test_rda.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@

from xeofs.cross import RDA

from ...utilities import skip_if_missing_engine


def generate_random_data(shape, lazy=False, seed=142):
rng = np.random.default_rng(seed)
Expand Down Expand Up @@ -226,11 +228,13 @@ def test_predict():
_ = rda.inverse_transform(Y=Ry_pred)


@pytest.mark.parametrize("engine", ["netcdf4", "zarr"])
@pytest.mark.parametrize("engine", ["h5netcdf", "netcdf4", "zarr"])
def test_save_load(tmp_path, engine):
"""Test save/load methods in MCA class, ensuring that we can
roundtrip the model and get the same results when transforming
data."""
skip_if_missing_engine(engine)

X = generate_random_data((200, 10), seed=123)
Y = generate_random_data((200, 20), seed=321)

Expand Down
6 changes: 5 additions & 1 deletion tests/models/single/test_eof.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@

from xeofs.single import EOF

from ...utilities import skip_if_missing_engine


def test_init():
"""Tests the initialization of the EOF class"""
Expand Down Expand Up @@ -494,11 +496,13 @@ def test_inverse_transform(dim, mock_data_array, normalized):
(("lon", "lat")),
],
)
@pytest.mark.parametrize("engine", ["netcdf4", "zarr"])
@pytest.mark.parametrize("engine", ["h5netcdf", "netcdf4", "zarr"])
def test_save_load(dim, mock_data_array, tmp_path, engine):
"""Test save/load methods in EOF class, ensuring that we can
roundtrip the model and get the same results when transforming
data."""
skip_if_missing_engine(engine)

original = EOF()
original.fit(mock_data_array, dim)

Expand Down
Loading

0 comments on commit a6c05e6

Please sign in to comment.