Public repository containing the material for the 2022 data2day conference.
More on the program: From PoCs to Large Scale ML Operationalization Covering the End-to-End Pipeline.
This repository is owned and maintained by E-Breuninger Developer Team.
For any feedbacks or inquiries related to this repository, you can contact Olivier Bénard (Data Software Engineer).
The dependencies are managed via poetry
. We recommend to use and integrate this tool in your process.
However, we also provide the list of necessary requirements with the requirements.txt
file if you decide otherwise.
Note: It might be possible that you have to switch your python version. We recommend using pyenv
as a python version manager, to be installed via brew install pyenv
.
To install all the dependencies and rapidly start getting your hands dirty:
- Create a
settings.toml
file based on the following template:
[default]
LOG_LEVEL = "DEBUG"
LATITUDE = "<google-map-latitude>"
LONGITUDE = "<google-map-longitude>"
APP_PATH = "/absolute/path/to/the/local/repository/"
- Create a
.secrets.toml
file based on the following template (you can left the default if you have no key):
[default]
google_map_api_key = "<your-google-map-api-key>"
-
Install all the dependencies on the virtual environment via
poetry
:poetry install
-
You are ready to go and can start the
jupyter notebook
kernel:make notebook
Only thing left to do if to naviguate through notebooks/
and play with the notebooks.
Bonus: If you want to publish some changes, you first need to install pre-commit:
make pre-commit-install
This will guarantee that the code you push meets the best software development standards and the github CI/CD pipeline to succeed i.e. your code will be accepted.
Notes:
- You need to install poetry if you do not have it already via
brew install poetry
. - The Google Map API key is used to display the weather stations on Google Map. However, you do not need it since by default, the developer mode (activated by default if you do not have a key or a valid one) - even though grants less opportunities - also does the job.
- The
data2day_2022/
foler contains reusable part of the code such as thesql
queries and thehelpers
package. - The
datasets/
folder contains the template you have to fill int to make the forecast. - The
notebooks/
folder contains a couple of jupyter notebook where lies the main logic of the code. - The
results/
folder contains the results to be generated by the notebooks. - The
slides/
folder contains the anonymised presentation as a.pdf
format. - The
tests/
folder contains a couple of unittests to test our code. - The
.pre-commit-config.yaml
file contains a couple of logics to be executed at the commit time before the code can be pushed. - The
Makefile
contains a serie of redundant commands e.g.make check
ormake notebook
. - The
.secrets.toml
andsettings.toml
are parametrisation files containing the variables used in the code.
- You can parametrised the serie you want to predict using the
datasets/customer_frequentation.csv
file. Fill it with your own data, respecting the following template:
date | quantity |
---|---|
<YYYY-MM-DD> | <float> |
- Rainfall data for Stuttgart in 2018 has been retrieved and collected in the
results/weather_prpc.csv
file. You can however query the intial tables on BigQuery usingnotebooks/weather_data_on_biqguery.ipynb
. Results will be captured under theresults/
folder.
The troubleshooting section is empty so far but should you encounter any issue not stated in the current documentation, please contact us.