This is a collection of notebooks for reading in and analysing credit card and bank account data from DKB (Deutsche Kreditbank) accounts.
They work with the standard documents from "Postfach" in PDF format, instead of manually exported CSV files of transactions. Therefore, this code is suited to aggregate and analyse all past financial data as long as those bank account and credit card statements are available.
The tool is separated into three parts:
-
Read data from PDF and convert to pandas DataFrame
-
Annotate the data so that each expense has a category
-
Analyse the annotated data
(you might want to implement your own analysis code tailored to your requirements)
Each of these three steps has its own notebook file. For steps 1 and 2, there is one file for credit cards and one for account statements, since their tables in the PDFs have different formats (affecting step 1), and then the subjects also often look different (credit card subjects are more concise and hence easier to assign to categories, affecting step 2).
NB: If you fork this or in any other way upload your own version somewhere, make sure to remove all sensitive information from the source code! These include, amongst others, bank account numbers and (partial) credit card numbers!
In order to use this tool, do the following:
-
Install
pdfplumber
(it's the only requirement not included in anaconda) -
Run
cc_pdfs_to_dataframe.ipynb
andstatement_pdfs_to_dataframe.ipynb
, after adjusting the file paths to the respective statements. -
Go to the annotation files (
cc_data_annotation.ipynb
andstatement_data_annotation.ipynb
) and add meaningful substrings to each category. Repeat this step as instructed in the file, until no categories are unassigned (or if there are some still unassigned, they'll be ignored in step 4).
If you add or remove categories, adjust relevant cells indata_analysis.ipynb
as well, if you intend to use that file. -
You now have the annotated data in a DataFrame. You can either write code to analyse it yourself, or you can use
data_analysis.ipynb
to see monthly rolling costs for each category and plot pie-charts of average costs for different time frames.
pdfplumber
pandas
,numpy
and some other common packages shipped with anaconda