Skip to content

Commit

Permalink
Updates example section in the Readme (#48)
Browse files Browse the repository at this point in the history
* removes jupyter artifacts
* adds extended description of the fruits example and clarifies use for larger datasets
* updates status badge and gives ci workflow the name build to match with travis
* moved plot creation script to .github as it has nothing to do with the package and is only used for the repo
* simplifies data aggregation in the preprocessing example
* removes old png file from package folder (the ones still used can be found in the .github/ folder

Co-authored-by: Pierre Sassoulas <pierre.sassoulas@gmail.com>
  • Loading branch information
Trybnetic and Pierre-Sassoulas authored Oct 30, 2024
1 parent c1bc4fb commit 14a58b1
Show file tree
Hide file tree
Showing 10 changed files with 123 additions and 254 deletions.
53 changes: 53 additions & 0 deletions .github/create_documentation_plots.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
import matplotlib.pyplot as plt
import pandas as pd
from pysankey import sankey

df = pd.read_csv("../pysankey/fruits.txt", sep=" ", names=["true", "predicted"])

colorDict = {
"apple": "#f71b1b",
"blueberry": "#1b7ef7",
"banana": "#f3f71b",
"lime": "#12e23f",
"orange": "#f78c1b",
"kiwi": "#9BD937",
}

labels = list(colorDict.keys())
leftLabels = [label for label in labels if label in df["true"].values]
rightLabels = [label for label in labels if label in df["predicted"].values]

ax = sankey(
left=df["true"],
right=df["predicted"],
leftLabels=leftLabels,
rightLabels=rightLabels,
colorDict=colorDict,
aspect=20,
fontsize=12,
)

plt.savefig("img/fruits.png")
plt.close()


# This calculates how often the different combinations of "true" and
# "predicted" co-occure
df = df.groupby(["true", "predicted"]).size().reset_index()
weights = df[0].astype(float)


ax = sankey(
left=df["true"],
right=df["predicted"],
rightWeight=weights,
leftWeight=weights,
leftLabels=leftLabels,
rightLabels=rightLabels,
colorDict=colorDict,
aspect=20,
fontsize=12,
)


plt.savefig("img/fruits_weighted.png")
Binary file added .github/img/fruits.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added .github/img/fruits_weighted.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 2 additions & 0 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
name: build

on:
push:
branches:
Expand Down
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# Jupyter
.ipynb_checkpoints/

#Eclipse/pydev
.project
Expand Down
189 changes: 0 additions & 189 deletions .ipynb_checkpoints/plotFruit-checkpoint.ipynb

This file was deleted.

131 changes: 66 additions & 65 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,17 @@ Uses matplotlib to create simple <a href="https://en.wikipedia.org/wiki/Sankey_d
Sankey diagrams</a> flowing only from left to right.

[![PyPI version](https://badge.fury.io/py/pySankeyBeta.svg)](https://badge.fury.io/py/pySankeyBeta)
[![Build Status](https://travis-ci.org/Pierre-Sassoulas/pySankey.svg?branch=master)](https://travis-ci.org/Pierre-Sassoulas/pySankey)
[![Build Status](https://github.com/Pierre-Sassoulas/pySankey/actions/workflows/ci.yaml/badge.svg)](https://github.com/Pierre-Sassoulas/pySankey/actions/workflows/ci.yaml)
[![Coverage Status](https://coveralls.io/repos/github/Pierre-Sassoulas/pySankey/badge.svg?branch=master)](https://coveralls.io/github/Pierre-Sassoulas/pySankey?branch=master)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/ambv/black)
[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)

## Example
## Examples

With fruits.txt :
### Simple expected/predicted example with fruits.txt:

`pysankey` contains a simple expected/predicted dataset called `fruits.txt` which looks
like the following:

<div>
<table border="1" class="dataframe">
Expand Down Expand Up @@ -80,9 +83,13 @@ import pandas as pd
from pysankey import sankey
import matplotlib.pyplot as plt


df = pd.read_csv(
'pysankey/fruits.txt', sep=' ', names=['true', 'predicted']
'fruits.txt',
sep=' ',
names=['true', 'predicted']
)

colorDict = {
'apple':'#f71b1b',
'blueberry':'#1b7ef7',
Expand All @@ -92,83 +99,77 @@ colorDict = {
'kiwi':'#9BD937'
}

labels = list(colorDict.keys())
leftLabels = [label for label in labels if label in df['true'].values]
rightLabels = [label for label in labels if label in df['predicted'].values]

# Create the sankey diagram
ax = sankey(
df['true'], df['predicted'], aspect=20, colorDict=colorDict,
leftLabels=['banana','orange','blueberry','apple','lime'],
rightLabels=['orange','banana','blueberry','apple','lime','kiwi'],
left=df['true'],
right=df['predicted'],
leftLabels=leftLabels,
rightLabels=rightLabels,
colorDict=colorDict,
aspect=20,
fontsize=12
)

plt.show() # to display
plt.savefig('fruit.png', bbox_inches='tight') # to save
```

![Fruity Alchemy](pysankey/fruit.png)
![Fruity Alchemy](.github/img/fruits.png)

You could also use weight:
### Plotting preprocessed data using weights

```
,customer,good,revenue
0,John,fruit,5.5
1,Mike,meat,11.0
2,Betty,drinks,7.0
3,Ben,fruit,4.0
4,Betty,bread,2.0
5,John,bread,2.5
6,John,drinks,8.0
7,Ben,bread,2.0
8,Mike,bread,3.5
9,John,meat,13.0
```
However, the data may not always be available in the format mentioned in the previous
example (for instance, if the dataset is too large). In such cases, the weights between
the true and predicted labels can be calculated in advance and used to create the Sankey
diagram. In this example, we will continue working with the data that was loaded in the
previous example:

```python
import pandas as pd
from pysankey import sankey
import matplotlib.pyplot as plt

df = pd.read_csv(
'pysankey/customers-goods.csv', sep=',',
names=['id', 'customer', 'good', 'revenue']
)
weight = df['revenue'].values[1:].astype(float)
# Calculate the weights from the fruits dataframe
df = df.groupby(["true", "predicted"]).size().reset_index()
weights = df[0].astype(float)

ax = sankey(
left=df['customer'].values[1:], right=df['good'].values[1:],
rightWeight=weight, leftWeight=weight, aspect=20, fontsize=20
left=df['true'],
right=df['predicted'],
rightWeight=weights,
leftWeight=weights,
leftLabels=leftLabels,
rightLabels=rightLabels,
colorDict=colorDict,
aspect=20,
fontsize=12
)

plt.show() # to display
plt.savefig('customers-goods.png', bbox_inches='tight') # to save
```

![Customer goods](pysankey/customers-goods.png)

Similar to seaborn, you can pass a matplotlib `Axes` to `sankey` function:

```python
import pandas as pd
from pysankey import sankey
import matplotlib.pyplot as plt

df = pd.read_csv(
'pysankey/fruits.txt',
sep=' ', names=['true', 'predicted']
)
colorDict = {
'apple': '#f71b1b',
'blueberry': '#1b7ef7',
'banana': '#f3f71b',
'lime': '#12e23f',
'orange': '#f78c1b'
}

ax1 = plt.axes()

sankey(
df['true'], df['predicted'], aspect=20, colorDict=colorDict,
fontsize=12, ax=ax1
)

plt.show()
```
![Fruity Alchemy](.github/img/fruits_weighted.png)

### pysankey function overview

> `sankey(left, right, leftWeight=None, rightWeight=None, colorDict=None, leftLabels=None, rightLabels=None, aspect=4, rightColor=False, fontsize=14, ax=None, color_gradient=False, alphaDict=None)`
>
> **left**, **right** : NumPy array of object labels on the left and right of the
> diagram
>
> **leftWeight**, **rightWeight** : Numpy arrays of the weights each strip
>
> **colorDict** : Dictionary of colors to use for each label
>
> **leftLabels**, **rightLabels** : order of the left and right labels in the diagram
>
> **aspect** : vertical extent of the diagram in units of horizontal extent
>
> **rightColor** : If true, each strip in the diagram will be be colored according to
> its left label
>
> **fontsize** : Fontsize to be used for the labels
>
> **ax** : matplotlib axes to plot on, otherwise uses current axes.
## Important informations

Expand Down
Binary file removed pysankey/customers-goods.png
Binary file not shown.
Binary file removed pysankey/fruit.png
Binary file not shown.
Binary file removed pysankey/fruits.png
Binary file not shown.

0 comments on commit 14a58b1

Please sign in to comment.