Skip to content

R Implementation of Bag Of Patterns (Lin et al, 2012)

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md
Notifications You must be signed in to change notification settings

mattsq/bagofpatternsr

Repository files navigation

bagofpatternsr

The goal of bagofpatternsr is to provide a simple implementation of the ‘Bag of Patterns’ time series classification algorithm as described by Lin et al (2012). It’s based on the description at timeseriesclassification.com - there are other implementations of it in R, but some are built in Java which can be tricky to run.

It uses the seewave:: implementation of Symbolic Aggregate eXPressions (SAX) for the patterns, data.table:: for data munging and FNN:: for fast K-Nearest Neighbours matching.

Installation

You can install the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("mattsq/bagofpatternsr")

Example

Using it is easy! Models are fit like so:

library(bagofpatternsr)
data("FaceAll_TRAIN")
data("FaceAll_TEST")
model <- bagofpatternsr::bagofpatterns_knn(FaceAll_TRAIN, 
                                           target = "target",
                                           window_size = 20,
                                           verbose = FALSE,
                                           normalize = TRUE,
                                           alphabet_size = 2, 
                                           word_size = 4,
                                           k = 1)
print(model)
#> A trained Bag Of Patterns object with:
#> A time series of length 131 converted into a word histogram with 17 entries, predicting class: target 
#> The object has the following hyperparameters:
#>   Window Size: 20 
#>   Alphabet Size: 2 
#>   Word Size: 4 
#>   SAX breakpoint method: quantiles 
#>   Term weighting method: term frequency 
#>   Sparsity constraint: 1.00 
#>   Trained with sparse windows:  
#>   Windows Z-normalized before creating words: TRUE 
#> 
#> Examples of words in dictionary include: aaab, aaba, aabb, abaa, abab 
#>   K-Nearest Neighbours to be used: 1

Predictions on new datasets can be retrieved like so:

new_preds  <- predict(model, 
                newdata = FaceAll_TEST,
                verbose = FALSE)
table(new_preds, FaceAll_TEST$target)
#>          
#> new_preds  1 10 11 12 13 14  2  3  4  5  6  7  8  9
#>        1  44  2  0  0 17  0  2  6  5  5  4  6 11  5
#>        10  3 26  0  3 37  3  3 18  3 11  6  3 59  8
#>        11  2  3  3  1 55  1  6 11  8  2  8  4  9  0
#>        12  0  4  0 31 10  1 12  9  6  7 13  3 24  5
#>        13  4  6  1  0 51  4  9 10  1  4  2  3 10  1
#>        14  1  3  2  1  2  3 18  1  6  4  5  7  3 15
#>        2   0  3  0  5  5  3 20  0  9  3  8  4  5  5
#>        3   3 16  0 12 80  1  7 62  1 12  2  1 34  1
#>        4   2  1  0  1  1  1  9  2 26  1 15  2  4  2
#>        5   6  9  0  0  5  3  5  6  4 55 13  5 15 10
#>        6   0  0  0  1  1  4 10  0 18  5 29 10  5  3
#>        7   4  3  1  4  5  6 15  1 18  7 20 50  5  8
#>        8   0  8  0  6 10  2 12 10  9 17 14  5 36  1
#>        9   3 11  1  1  8  0 10  0 17  3  8  6 13 36
mean(new_preds == FaceAll_TEST$target)
#> [1] 0.2792899

There’s support for the entirely atheoretical idea of ‘sparse windows’ - essentially, rather than generating a dictionary out of every single window, we take inspiration from Time Series Forest by taking sqrt(m) random windows from each vector. It speed up training dramatically, and seems to improve generalization - compare:

library(tsforest)
data("FreezerRegularTrain_TRAIN")

model <- bagofpatterns_knn(FreezerRegularTrain_TRAIN, 
                           window_size = 20, 
                           sparse_windows = FALSE,
                           normalize = TRUE,
                           verbose = FALSE, 
                           alphabet_size = 3,
                           word_size = 8,
                           k = 1)

preds <- predict(model, FreezerRegularTrain_TEST, verbose = FALSE)

table(preds, FreezerRegularTrain_TEST$target)
#>      
#> preds    1    2
#>     1 1006  510
#>     2  419  915
mean(preds == FreezerRegularTrain_TEST$target)
#> [1] 0.6740351

With:

data("FreezerRegularTrain_TRAIN")

model <- bagofpatterns_knn(FreezerRegularTrain_TRAIN, 
                           window_size = 20, 
                           sparse_windows = TRUE, 
                           verbose = FALSE, 
                           alphabet_size = 3,
                           word_size = 8,
                           k = 1)

preds <- predict(model, FreezerRegularTrain_TEST, verbose = FALSE)

table(preds, FreezerRegularTrain_TEST$target)
#>      
#> preds   1   2
#>     1 994 880
#>     2 431 545
mean(preds == FreezerRegularTrain_TEST$target)
#> [1] 0.54

You can also use fit_bagofpatterns and bake_bagofpatterns to use the underlying Bag Of Patterns representation for other models. Here’s a simple example using logistic regression instead of K-Nearest Neighbors:

data("FreezerRegularTrain_TRAIN")
data("FreezerRegularTrain_TEST")

bop_obj <- fit_bagofpatterns(FreezerRegularTrain_TRAIN,
                              window_size = 20, 
                              sparse_windows = FALSE, 
                              verbose = FALSE, 
                              alphabet_size = 2,
                              word_size = 3)

FreezerRegularTrain_TRAIN_conv <- bake_bagofpatterns(bop_obj)

glm_model <- glm(target ~ ., data = FreezerRegularTrain_TRAIN_conv, family = "binomial")
summary(glm_model)
#> 
#> Call:
#> glm(formula = target ~ ., family = "binomial", data = FreezerRegularTrain_TRAIN_conv)
#> 
#> Deviance Residuals: 
#>      Min        1Q    Median        3Q       Max  
#> -1.89414  -1.01502   0.06302   1.04366   2.05352  
#> 
#> Coefficients:
#>             Estimate Std. Error z value Pr(>|z|)   
#> (Intercept)  5.69392    2.02425   2.813  0.00491 **
#> aaa         -0.02959    0.07189  -0.412  0.68063   
#> aab         -0.07574    0.09457  -0.801  0.42320   
#> aba          0.01853    0.09922   0.187  0.85187   
#> abb          0.05246    0.09540   0.550  0.58240   
#> baa         -0.01049    0.07977  -0.132  0.89533   
#> bab         -0.20666    0.10068  -2.053  0.04010 * 
#> bba         -0.23673    0.07919  -2.989  0.00280 **
#> bbb          0.11444    0.07018   1.631  0.10298   
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 207.94  on 149  degrees of freedom
#> Residual deviance: 186.61  on 141  degrees of freedom
#> AIC: 204.61
#> 
#> Number of Fisher Scoring iterations: 4

We can generate predictions by calling bake_bagofpatterns on a new dataset:

FreezerRegularTrain_TEST_conv  <- bake_bagofpatterns(bop_obj, FreezerRegularTrain_TEST)

preds <- predict(glm_model, FreezerRegularTrain_TEST_conv, type = "response")

preds <- as.numeric(preds>=0.5) + 1

table(preds, FreezerRegularTrain_TEST$target)
#>      
#> preds   1   2
#>     1 877 452
#>     2 548 973

About

R Implementation of Bag Of Patterns (Lin et al, 2012)

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages