A visualization tool designed to help data scientists better examine their data sets.
pip install dshelper
import dshelper
dshelper.dshelp(df)
- ✅ Default view with raw data and its statistics info
- ✅ Drag on the header to re-arrange columns
- ✅ Left click on the right panel to show/hide columns
- ✅ Plots: histogram, heatmap, correlation, scatter, box, violin, pair
- ✅ Bottom right buttons to hide panels and focus on data set
- ✅ Easy to see memory usage and logs in bottom status bar
- ✅ Easy to use in command line, jupyter notebook and docker
- ✅ Histogram
- ✅ Heatmap
- ✅ Correlation
- ✅ Scatter Plot
- ✅ Box Plot
- ✅ Violin Plot
- ✅ Pair plot
The default view, main panel displays the dataset. The bottom panel displays the statistics of the dataset The right panel has two tabs, the first one displays the stats for all the columns, the second one displays the system logs.
The bottom and right panels can be hidden by clicking the buttons located on the bottom right of the window. This will allow data scientists to focus on the dataset and plots
You can also drag and drop to re-arrange the column orders, click on the right column tab to hide columns in the main view.
And below are a few plots:
- wxpython
- matplotlib
- seaborn
- pandas
- numpy
- scikit-learn
- scipy
- statsmodels
git clone git@github.com:zmcddn/Data-Science-Helper.git
conda create -n py36 python=3.6
or use virtualenv or pipenvactivate py36
(windows) orsource activate py36
(mac, linux)conda install --yes --file requirements.txt
orpip install -r requirements.txt
- In case the
PyPubSub
is not installed with conda, you can dopip install PyPubSub
cd dshelper
python main_gui.py
(windows, linux) orpythonw main_gui.py
(mac)
For help with any dataframe, you can follow the following steps:
import dshelper
dshelper.dshelp(df)
- For running in Jupyter Notebook you need to add
%gui wx
at the top of the file for the GUI to display properly
make build
to build the projectmake runlinux
to run in Linux- WIP for mac
- next version
- Sort by columns
- Import file (csv, excl)
- Add menu
- export file
- ability to change cells
- standalone version
- next big version
- correlation analysis
- feature importance
- support large file (sampling)
- next next big version
- Support for multiple index
- Time series analysis
- Optimization
If you like this project, please distribute it and star it for more people to see. Any suggestions and contributions are very welcomed.
ALL RIGHTS RESERVED