Skip to content

Commit

Permalink
fix: create example ‘auto’ correlation notebook
Browse files Browse the repository at this point in the history
  • Loading branch information
jtook committed Oct 28, 2022
1 parent 3099352 commit 09aab0b
Showing 1 changed file with 119 additions and 0 deletions.
119 changes: 119 additions & 0 deletions examples/features/correlation_auto_example.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Correlation \"Auto\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The \"Auto\" correlation is an easily interpretable pairwise column metric of the following mapping:\n",
"\n",
"- Variable_type-Variable_type : Method, **Range** \n",
"- Categorical-Categorical : Cramer's V, **[0,1]**\n",
"- Numerical-Categorical : Cramer's V, **[0,1]** (using a discretized numerical column)\n",
"- Numerical-Numerical : Spearman's Rho, **[-1,1]**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This example is based on the one found at: examples/bank_marketing_data/banking_data.py"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"%%capture\n",
"#Run the example as before\n",
"from pathlib import Path\n",
"\n",
"import pandas as pd\n",
"\n",
"from pandas_profiling import ProfileReport\n",
"from pandas_profiling.utils.cache import cache_zipped_file\n",
"\n",
"file_name = cache_zipped_file(\n",
" \"bank-full.csv\",\n",
" \"https://archive.ics.uci.edu/ml/machine-learning-databases/00222/bank.zip\",\n",
")\n",
"\n",
"# Download the UCI Bank Marketing Dataset\n",
"df = pd.read_csv(file_name, sep=\";\")\n",
"\n",
"profile = ProfileReport(\n",
" df, title=\"Profile Report of the UCI Bank Marketing Dataset\", explorative=True\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The simplest way to change the number of bins is either through your script or notebook. This changes the granularity of the association measure for Numerical-Categorical column pairs."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"# changing the number of bins from 10 (the default value) to 8\n",
"profile.config.correlations[\"auto\"].n_bins = 8"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The 'auto' correlation matrix is displayed with the other correlation matrices in the report."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"%%capture cap --no-stdout\n",
"profile.to_file(Path(\"uci_bank_marketing_report.html\"))"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.10.6 ('pp_test')",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "1af52722583f346ec227bb297de332430c519562ffdabd22d0ef7652ab6213c7"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}

0 comments on commit 09aab0b

Please sign in to comment.