Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bump enda to version 0.1.0 #11

Merged
merged 9 commits into from
Jan 9, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -148,3 +148,4 @@ data/



.vscode/settings.json
29 changes: 29 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
---
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
hooks:
- id: check-yaml
args: ['--unsafe']
- id: trailing-whitespace
- id: detect-private-key
- id: name-tests-test
args: ["--pytest-test-first"]
- repo: https://github.com/psf/black
rev: 22.12.0
hooks:
- id: black
- repo: https://github.com/pycqa/isort
rev: 5.11.5
hooks:
- id: isort
name: isort (python)
args: ["--profile", "black", "--filter-files"]
- repo: https://github.com/charliermarsh/ruff-pre-commit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not one of our requirements (we try to use Pylint), probably using ruff in a pre-commit should be discussed ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ruff is faster, but it does not cover all the rules pylint covers, see astral-sh/ruff#689

In most of our repos, we enforce pre-commit during CI tasks and we use ruff. I don't know about the tradeoff between accuracy and speed between those two, so I don't have a strong opinion, really.

# Ruff version.
rev: "v0.0.261"
hooks:
- id: ruff
fail_fast: false
files: ".*"
exclude: "desktop.ini" # Windows
4 changes: 4 additions & 0 deletions .prettierrc
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{
"tabWidth": 2,
"useTabs": false
}
62 changes: 62 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# Contributing

The project can be built using [Poetry](https://python-poetry.org/). You can install it following the instructions here: <https://python-poetry.org/docs/#installation>.

To develop `enda`, you can clone the repository and install the dependencies with poetry:

```sh
git clone https://github.com/enercoop/enda.git
cd enda
poetry install --with dev
```

If you are not using `Poetry` you can install the dependencies with `pip`:

```sh
pip install enda[dev]
```

To run the tests, you can use the following command:

```sh
poetry run pytest # or just pytest if you have activated the virtual environment
```

To run tests using tox, you can use the following command:

```sh
poetry run tox # or just tox if you have activated the virtual environment
```

## Building and publishing

After you have built a new feature, upgraded a dependency or fixed a bug, you need to update the version number in `pyproject.toml` and `enda/__init__.py` and publish the new version to PyPI.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not that fast to publish new version to PyPI (as you saw^^). Maybe, it's sometjing we should discuss too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! I'm not sure what it is that you want to talk about in this case. That bit in the documentation covers the how and the when, not the "how often". That's relative, I'd say.


If you are using `Poetry`, you can add the poetry plugin `poetry-bumpversion` to help you with this:

```sh
poetry self add poetry-bumpversion
```

and then you can use the following commands to update the version number:

```sh
poetry version patch # or minor or major
```

See the [`poetry-bumpversion`](https://pypi.org/project/poetry-bumpversion/) documentation for more details:

After that you will need to build the package with:

```sh
poetry build
```

and then you can publish the new version to PyPI:

```sh
poetry publish
```

You will need to setup your PyPI credentials for this to work. See the [Poetry documentation](https://python-poetry.org/docs/repositories/#configuring-credentials) for more details.

68 changes: 52 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,25 @@
# enda

![PyPI](https://img.shields.io/pypi/v/enda?link=https%3A%2F%2Fpypi.org%2Fproject%2Fenda%2F) [![Poetry](https://img.shields.io/endpoint?url=https://python-poetry.org/badge/v0.json)](https://python-poetry.org/) [![Imports: isort](https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat&labelColor=ef8336)](https://pycqa.github.io/isort/) [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) [![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit)](https://github.com/pre-commit/pre-commit)

## What is it?

**enda** is a Python package that provides tools to manipulate **timeseries** data in conjunction with **contracts** data for analysis and **forecasts**.
**enda** is a Python package that provides tools to manipulate **timeseries** data in conjunction with **contracts** data for analysis and **forecasts**.

Its main goal is to help [Rescoop.eu](https://www.rescoop.eu/) members build various applications, such as short-term electricity load and production forecasts, specifically for the [RescoopVPP](https://www.rescoopvpp.eu/) project. Hence some tools in this package perform TSO (transmission network operator) and DNO (distribution network operator) data wrangling as well as weather data management. enda is mainly developed by [Enercoop](https://www.enercoop.fr/).

## Main Features
Here are some things **enda** does well :

- Provide robust machine learning algorithms for short-term electricty load and production forecasts, developed by Enercoop. The load forecast was originally based on Komi Nagbe's thesis (http://www.theses.fr/s148364).
- Manipulate **contracts** data coming from your ERP and turn it into timeseries you can use for analysis, visualisation and machine learning.
- Timeseries-specific detection of missing data, like time gaps and frequency changes.
- Date-time feature engineering robust to timezone hazards.
Here are some things **enda** does well:

- Provide robust machine learning algorithms for short-term electricty load and production forecasts, developed by Enercoop. The load forecast was originally based on Komi Nagbe's thesis (<http://www.theses.fr/s148364>).
- Manipulate **contracts** data coming from your ERP and turn it into timeseries you can use for analysis, visualisation and machine learning.
- Timeseries-specific detection of missing data, like time gaps and frequency changes.
- Date-time feature engineering robust to timezone hazards.

## Where to get it
The source code is currently hosted on GitHub at : https://github.com/enercoop/enda

The source code is currently hosted on GitHub at: <https://github.com/enercoop/enda>

Binary installers for the latest released version are available at the [Python
Package Index (PyPI)](https://pypi.org/project/enda) (for now it is not directly on [Conda](https://docs.conda.io/en/latest/)).
Expand All @@ -26,36 +29,69 @@ Package Index (PyPI)](https://pypi.org/project/enda) (for now it is not directly
pip install enda
```

## How to get started ?
If you wish to install the dependencies needed to run the examples, you can install `enda` with the `examples` extra:

```sh
pip install enda[examples]
```

You can install all the optional dependencies with the `all` extra:

```sh
pip install enda[all]
```

or using poetry:

```sh
poetry add enda[all]
```

## How to get started?

Check out the guides : https://github.com/enercoop/enda/tree/main/guides .
Check out the guides: <https://github.com/enercoop/enda/tree/main/guides>.

## Hard dependencies

- [Pandas - the main dataframe manipulation tool for python, advanced timeseries management included.](https://pandas.pydata.org/)
- Pandas itself has hard dependencies and optional dependencies, checkout https://pandas.pydata.org/pandas-docs/stable/getting_started/install.html . Hard dependencies of pandas include : `setuptools`, `NumPy`, `python-dateutil`, `pytz`.
- Pandas itself has hard dependencies and optional dependencies, checkout <https://pandas.pydata.org/pandas-docs/stable/getting_started/install.html> . Hard dependencies of pandas include: `setuptools`, `NumPy`, `python-dateutil`, `pytz`.

## Optional dependencies
## Optional dependencies
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That may not be consistent with the poetry installer, which seems to require h2o and scikit-learn ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When trying the code, I was not able to run anything without these dependencies. The code launches a h2o instance at runtime, and calls fit() on subclasses of scikit-learn models.

These were mentioned as optional for reasons I don't understand, so I left that there. A rewrite of the README.md perhaps is needed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case we enforce them as required dependencies, yes a rewriting will be necessary :)


Optional dependencies are used only for specific methods. Enda will give an error if the method called requires a dependency that is not installed.
Optional dependencies are used only for specific methods. Enda will give an error if the method called requires a dependency that is not installed.

Enda can work with different machine learning "backends" :

- [Scikit-learn](https://scikit-learn.org/stable/)
- [H2O - an efficient machine learning framework](https://docs.h2o.ai/)

You can also easily implement your own ml-backend by implementing enda's ModelInterface. Checkout `enda.ml_backends.sklearn_linreg.py` for an example with `SKLearnLinearRegression`.
You can also easily implement your own ml-backend by implementing enda's ModelInterface. Checkout `enda.ml_backends.sklearn_linreg.py` for an example with `SKLearnLinearRegression`.

Other optional dependencies:

Other optional dependencies :
- [statsmodel](https://pypi.org/project/statsmodels/)

Furthermore, don't hesitate to install pandas "Recommended dependencies" for speed-ups : `numexpr` and `bottleneck`.

If you want to save your trained models, we recommend `joblib`. See Scikit-learn's recommendations here : https://scikit-learn.org/stable/modules/model_persistence.html .
If you want to save your trained models, we recommend `joblib`. See Scikit-learn's recommendations here: <https://scikit-learn.org/stable/modules/model_persistence.html>.

All these dependencies can be installed along `enda` with the following command:

An almost complete install looks like :
```sh
pip install enda[examples]
```

Or you can install them manually:

```sh
pip install numexpr bottleneck pandas enda jupyter h2o scikit-learn statsmodels joblib matplotlib
```

## About `numpy` support for python 3.7

Support for `numpy` and python 3.7 according to <https://numpy.org/neps/nep-0029-deprecation_policy.html#support-table>
and <https://github.com/scipy/oldest-supported-numpy>

## License

[MIT](LICENSE)
22 changes: 14 additions & 8 deletions enda/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
__version__ = "0.1.0"

# Let users know if they're missing any of our hard dependencies
hard_dependencies = ("pandas",)
# note that we also need python-dateutil and pytz but pandas already depends on them, so importing pandas is enough
Expand All @@ -6,21 +8,25 @@
for dependency in hard_dependencies:
try:
__import__(dependency)
except ImportError as e:
except ImportError:
missing_dependencies.append(dependency)

if len(missing_dependencies) > 0:
raise ImportError(
"Unable to import required dependencies:\n" + "\n".join(missing_dependencies)
"Unable to import required dependencies:\n"
+ "\n".join(
missing_dependencies,
)
)
del hard_dependencies, dependency, missing_dependencies

from enda.backtesting import BackTesting # noqa

# import some subclasses here so users can use for instance :
# 'enda.Contracts' without knowing the internal structure.
# Do not import classes that need a specific packages like "H2OModel".
from enda.contracts import (Contracts)
from enda.feature_engineering.datetime_features import (DatetimeFeature)
from enda.timeseries import (TimeSeries)
from enda.backtesting import (BackTesting)
from enda.power_stations import (PowerStations)
from enda.power_predictor import (PowerPredictor)
from enda.contracts import Contracts # noqa
from enda.feature_engineering.datetime_features import DatetimeFeature # noqa
from enda.power_predictor import PowerPredictor # noqa
from enda.power_stations import PowerStations # noqa
from enda.timeseries import TimeSeries # noqa
Loading