Template for Data Team ETL repos.
- Create repo from template
- Checkout a DEV branch
- Rename files and placeholder { } variables
- Start developing and installing the packages you will require, and then add them in the requirements.in
- Once everything is done, test locally and in a local docker image
- install invoke, pip-tools, lint
- Invoke req-compile, invoke pylint
- Raise PR
- clone repo by clicking on "use this template" from github.
- clone new repo locally.
- checkout develop branch.
- rename all
__name__
placeholder files and folders to something of your choosing. - Make sure you create a new virtual environment whenever you start a new project
- Make sure you set the src as a sources root directory by: a. If you are using Pycharm then right-click on the src file and then click on Mark Directory as. From there choose Sources Root. b. If you are using Spyder then change your PYTHONPATH to be the src absolute path.
- To use the Helper functions make sure that you have set up a Github ssh key on your local machine or that you have a github token.
- If you have a github token then you need to run this command:
on your terminal in pycharm or in an anaconda prompt while your environment for this project is active
pip install git+https://{your_token_here}@github.com/energyaspects/helper_functions.git@latest
- If you have an ssh key set up with your Github account then you need to run this command:
on your terminal in pycharm or in an anaconda prompt while your environment for this project is active Useful links:
pip install git+ssh://git@github.com/energyaspects/helper_functions.git@latest
- Creating your personal github token: link
- Creating your own ssh key and linking to github account: link
In order to correctly setup your requirements.txt file, you should input all your core packages in the folder
requirements/requirements.in
. The requirements.in
file should have all the packages listed with no dependencies/versions,
unless you want to persist a certain package version which will persist that version in your requirements.txt
.
Remember, you should NOT include the helper_functions_ea in your requirements.in
as it is not PyPI installable, and it is
separately installed either in your docker image or the ea-data image. Once you have generated the requirements.in
file
you then need to run the following commands on your console:
pip install invoke pip-tools
inv req-compile
The second command uses the invoke task req_compile, located in the tasks.py
, which constructs the requirements.txt
file.
NOTE:
Remember, that if you have updated a package, and you want to reflect that in the
requirements.txt
file you should runreq-upgrade
instead ofreq-compile
.req-compile
will upgrade all packages that can be upgraded from therequirements.in
file. If you only want to change specific packages you should persist the version of the other packages in yourrequirements.in
and then run req-upgrade. If you want to force specific versions for all packages, just set the versions on therequirements.txt
and runinv req-compile
normally.
-
Run the linter to check that you have structured your code following PEP standards.
-
Try and running the setup.py file within your terminal and see if it installs the package correctly. You can do this by initiating an anaconda prompt and creating a new environment and then navigating to the location of your setup.py file and running this command:
python setup.py install
-
Once you installed your package you need to check if your package runs properly by running the name of the console script you created in the setup.py within your prompt. (i.e for this example where in setup.py we create the console script with the name that we give to the variable command_name_. Thus in our prompt we will just input that name and run it. This should execute the function we automated.)
-
Configure a local docker image and run the ETL console script there, and see if everything runs successfully. You can follow the instructions here
-
Create a PR and only merge when you have 2 approvals.
-
If you managed to perform all the above with no issues then you can create a DAG in a branch in ea-data project and automate your code using Airflow. Please make sure that you also ask someone to review your DAG addition.
NOTE: If you are planning to create your own Docker Image, which will be used in your Airflow DAG, then please remember to configure your Cloud Trigger here and also ensure that your trigger successfully executes before you proceed to adding the Airflow DAG