Skip to content

adaa-polsl/imcp

Repository files navigation

IMCP: Imbalanced Multiclass Classification Performance Curve

ROC curves are a well known tool for multiple classifier performance comparison. However, it does not work with multiclass datasets (more than two labels for the target variable). Moreover, the ROC curve is sensitive to imbalance of class distribution.

The package provides a tool - called Imbalanced Multiclass Classification Performance curve - that solves both weaknesses of ROC: application to multiclass and imbalanced datasets.

With the IMCP curve the classification performance can be graphically shown for both multiclass and imbalanced datasets.

The package provides the methods for visualizing the IMCP curve and to provide the area under the IMCP curve.

Installation

IMCP can be installed from PyPI

pip install imcp

Or you can clone the repository and run:

pip install .

Sample usage

from imcp import plot_mcp_curve
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris

X, y = load_iris(return_X_y=True)

clf = LogisticRegression(solver="liblinear").fit(X, y)
plot_mcp_curve(y, clf.predict_proba(X))

Citation

The methodology is described in detail in:

[1] J. S. Aguilar-Ruiz and M. Michalak, “Classification performance assessment for imbalanced multiclass data”, Scientific Reports, 14:10759, 2024, doi: 10.1038/s41598-024-61365-z.

Also, the mathematical background of the multiclass classification performance can be found in:

[2] J. S. Aguilar-Ruiz and M. Michalak, "Multiclass Classification Performance Curve," in IEEE Access, 10:68915-68921, 2022, doi: 10.1109/ACCESS.2022.3186444.

Documentation

Full documentation is available here