Loaders for classic datasets commonly used in Machine Learning:
Dataset | # Samples | # Features | # Classes | Balance |
---|---|---|---|---|
Ionosphere | 351 | 34 | 2 | 0.56 |
Letter Recognition | 20000 | 16 | 10 | 0.91 |
Telescope | 19020 | 10 | 2 | 0.54 |
Pen Digits | 10992 | 16 | 10 | 0.92 |
Robot Navigation | 5456 | 24 | 4 | 0.15 |
Segmentation | 2310 | 16 | 7 | 1.00 |
USPS | 9298 | 256 | 10 | 0.46 |
pip install classicdata
Run python -m classicdata.info
to list all implemented datasets.
from classicdata import Ionosphere
ionosphere = Ionosphere()
# Use ionosphere.points and ionosphere.labels...
There are other projects. They are more mature, more robust, more better. That is why this project is called classicdata. Sometimes you need small, simple datasets. Other times, consider the following projects.
- OpenML: better, faster, stronger; more complex, though
- sklearn.datasets: limited selection; no metadata
- torchvision.datasets: limited selection; datasets too modern (big)
- TensorFlow Datasets: datasets too modern (big)