Originaly forked from https://github.com/robintw/BankClassify and the modified to match my needs
conda create --prefix ./envs python=3.9
conda activate /Users/jangie/Projects/DataScience/BankClassify/envs
docker run -p 8888:8888 -v $(pwd):/home/jovyan/work -v /Users/jangie/Documents/Finanzen/Detailed_Data:/home/jovyan/data jupyter/scipy-notebook
raw exported data is put into the raw
folder. The the function add_raw_data
reads in new data, checks for duplicates with existing data and appends new data to the clean data, where each year is kept in a individual file for each source.
The classifier is then trained on the clean data.
data
├── raw
│ └── paypal
└── raw_data_1.csv
└── raw_data_2.csv
└── ...
│ └── another_source
└── raw_data_1.csv
└── raw_data_2.csv
└── ...
├── clean
│ ├── 2020_paypal.csv
│ └── ..