fim

fim is a collection of some popular frequent itemset mining algorithms implemented in Go.

fim contains the implementations of the following algorithms:

apriori: Based on Fast Algorithms for Mining Associations Rules. As an optimization, the approach to determine the support of the candidates with a hash-tree and a transaction bitmap was replaced with an approach based on a trie. As a further optimization, the generation of candidates of length 2 is omitted in favor of counting each occurring item pair in a 2D map.
eclat: As described in Scalable Algorithms for Association Mining. The paper describes and analyses several approaches. The eclat algorithm is implemented based on the approach named Eclat. For the determination of frequent itemsets of length 2, a 2D map is used instead of a 2D array as described in the paper.
fpgrowth: Based on Mining Frequent Patterns without Candidate Generation. Implemented mostly as described in the paper. The only difference is that there is no special-casing implemented for FP-trees that are paths.

Build

Execute the following commands to build the fim tool:

git clone https://github.com/paulfedorow/fim.git
cd fim
make

Usage

To see which arguments fim supports execute the following command:

build/fim -h

The following example finds all frequent itemsets with a minimal support of 1% in the dataset contained in datasets/retail.dat:

build/fim -a fpgrowth -i datasets/retail.dat -s 0.01

Dataset format

fims dataset file format is as follows:

File = {Transaction}
Transaction = Item {" " Item} "\n"
Item = {"0" ... "9"}

Each line in the file is a transaction. A line is expected to be a series of integers separated by a single space. Each integer is an item of the corresponding transaction.

Performance

To determine which algorithm is the most efficient, the runtime of each algorithm was measured with different datasets and decreasing minimal support. The datasets that were used are retail.dat and chess.dat from FIMI Dataset Repository. The datasets are respectively sparse and dense. The results are shown below. The algorithm fpgrowth is best in terms of runtime.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.github/workflows		.github/workflows
bench		bench
charts		charts
dataset		dataset
datasets		datasets
fim		fim
ints		ints
mine		mine
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
go.mod		go.mod

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

fim

Build

Usage

Dataset format

Performance

About

Languages

License

paulfedorow/fim

Folders and files

Latest commit

History

Repository files navigation

fim

Build

Usage

Dataset format

Performance

About

Topics

Resources

License

Stars

Watchers

Forks

Languages