Skip to content

Implementation of several preprocessing techniques for Association Rule Mining (ARM)

License

Notifications You must be signed in to change notification settings

firefly-cpp/arm-preprocessing

Repository files navigation

logo

arm-preprocessing

PyPI Version arm-preprocessing Documentation Status Repository size Downloads License GitHub commit activity Open issues Average time to resolve an issue Packaging status

  • Free software: MIT license
  • Documentation: http://arm-preprocessing.readthedocs.io
  • Python: 3.9.x, 3.10.x, 3.11.x, 3.12x
  • Tested OS: Windows, Ubuntu, Fedora, Alpine, Arch, macOS. However, that does not mean it does not work on others

About πŸ“‹

arm-preprocessing is a lightweight Python library supporting several key steps involving data preparation, manipulation, and discretisation for Association Rule Mining (ARM). 🧠 Embrace its minimalistic design that prioritises simplicity. πŸ’‘ The framework is intended to be fully extensible and offers seamless integration with related ARM libraries (e.g., NiaARM). πŸ”—

Why arm-preprocessing?

While numerous libraries facilitate data mining preprocessing tasks, this library is designed to integrate seamlessly with association rule mining. It harmonises well with the NiaARM library, a robust numerical association rule mining framework. The primary aim is to bridge the gap between preprocessing and rule mining, simplifying the workflow/pipeline. Additionally, its design allows for the effortless incorporation of new preprocessing methods and fast benchmarking.

Key features ✨

  • Loading various formats of datasets (CSV, JSON, TXT, TCX) πŸ“Š
  • Converting datasets to different formats πŸ”„
  • Loading different types of datasets (numerical dataset, discrete dataset, time-series data, text, etc.) πŸ“‰
  • Dataset identification (which type of dataset) πŸ”
  • Dataset statistics πŸ“ˆ
  • Discretisation methods πŸ“
  • Data squashing methods 🀏
  • Feature scaling methods βš–οΈ
  • Feature selection methods 🎯

Installation πŸ“¦

pip

To install arm-preprocessing with pip, use:

pip install arm-preprocessing

To install arm-preprocessing on Alpine Linux, please use:

$ apk add py3-arm-preprocessing

To install arm-preprocessing on Arch Linux, please use an AUR helper:

$ yay -Syyu python-arm-preprocessing

Usage πŸš€

Data loading

The following example demonstrates how to load a dataset from a file (csv, json, txt). More examples can be found in the examples/data_loading directory:

from arm_preprocessing.dataset import Dataset

# Initialise dataset with filename (without format) and format (csv, json, txt)
dataset = Dataset('path/to/datasets', format='csv')

# Load dataset
dataset.load_data()
df = dataset.data

Missing values

The following example demonstrates how to handle missing values in a dataset using imputation. More examples can be found in the examples/missing_values directory:

from arm_preprocessing.dataset import Dataset

# Initialise dataset with filename and format
dataset = Dataset('examples/missing_values/data', format='csv')
dataset.load()

# Impute missing data
dataset.missing_values(method='impute')

Data discretisation

The following example demonstrates how to discretise a dataset using the equal width method. More examples can be found in the examples/discretisation directory:

from arm_preprocessing.dataset import Dataset

# Initialise dataset with filename (without format) and format (csv, json, txt)
dataset = Dataset('datasets/sportydatagen', format='csv')
dataset.load_data()

# Discretise dataset using equal width discretisation
dataset.discretise(method='equal_width', num_bins=5, columns=['calories'])

Data squashing

The following example demonstrates how to squash a dataset using the euclidean similarity. More examples can be found in the examples/squashing directory:

from arm_preprocessing.dataset import Dataset

# Initialise dataset with filename and format
dataset = Dataset('datasets/breast', format='csv')
dataset.load()

# Squash dataset
dataset.squash(threshold=0.75, similarity='euclidean')

Feature scaling

The following example demonstrates how to scale the dataset's features. More examples can be found in the examples/scaling directory:

from arm_preprocessing.dataset import Dataset

# Initialise dataset with filename and format
dataset = Dataset('datasets/Abalone', format='csv')
dataset.load()

# Scale dataset using normalisation
dataset.scale(method='normalisation')

Feature selection

The following example demonstrates how to select features from a dataset. More examples can be found in the examples/feature_selection directory:

from arm_preprocessing.dataset import Dataset

# Initialise dataset with filename and format
dataset = Dataset('datasets/sportydatagen', format='csv')
dataset.load()

# Feature selection
dataset.feature_selection(
    method='kendall', threshold=0.15, class_column='calories')

Related frameworks πŸ”—

[1] NiaARM: A minimalistic framework for Numerical Association Rule Mining

[2] uARMSolver: universal Association Rule Mining Solver

References πŸ“š

[1] I. Fister, I. Fister Jr., D. Novak and D. Verber, Data squashing as preprocessing in association rule mining, 2022 IEEE Symposium Series on Computational Intelligence (SSCI), Singapore, Singapore, 2022, pp. 1720-1725, doi: 10.1109/SSCI51031.2022.10022240.

[2] I. Fister Jr., I. Fister A brief overview of swarm intelligence-based algorithms for numerical association rule mining. arXiv preprint arXiv:2010.15524 (2020).

License

This package is distributed under the MIT License. This license can be found online at http://www.opensource.org/licenses/MIT.

Disclaimer

This framework is provided as-is, and there are no guarantees that it fits your purposes or that it is bug-free. Use it at your own risk!

About

Implementation of several preprocessing techniques for Association Rule Mining (ARM)

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Languages