lang : 'cs', 'de', 'es' ... Language to process (subfolder in "pickle" folder) method : 'sticho', 'word', 'lemma', '3gram_t' ... Which data to use for attribution (file in pickle > lang folder)
reduce_features(filters) Only for method == 'sticho' Filter features (columns) which should be used for attribution. E.g. drop all statistics on rhyme, or leave on stress profile. filters: conditions to filter features (format accepted by pandas .query method) default: None mfi(n) Only for method != 'sticho' Select how many most frequented items (words, lemmata, n-grams) will be analyzed. n: int number of mfi default: 500 reduce_sets(filters, n_min, remove_singles) Filter datasets (rows) according to specified conditions. filters: conditions to filter datasets (format accepted by pandas .query method) default: None n_min: int minimum number of all features to keep dataset default: 0 remove_singles: boolean whether to drop datasets author of which is not author of any other dataset default: True
zscores() Normalize data to z-scores across datasets.
nearest_neighbour() Classification by nearest neighbour (various distance metrics) svm(multiclass, **kwargs) Classification by support vector machine multiclass: boolean whether to perform multiclass or binary classification when 'True' each dataset is assigned to one author when 'False' on-vs.-rest. classifier is trained for every author resulting in: (a) assigning author to the dataset if precisely one classifier gives other decision than 'rest' (b) "I don't know" answer in other cases default: True **kwargs: Parameters for sklearn.svm.SVC (e.g. kernel, gamma...) random_forest(multiclass, **kwargs) Classification by random forest multiclass: boolean whether to perform multiclass or binary classification when 'True' each dataset is assigned to one author when 'False' on-vs.-rest. classifier is trained for every author resulting in: (a) assigning author to the dataset if precisely one classifier gives other decision than 'rest' (b) "I don't know" answer in other cases default: True **kwargs: Parameters for sklearn.ensemble.randomForestClassifier (e.g. n_estimators, class_weight...)
evaluate() Print evaluation of particulars methods that were applied dendrograms() Plot dendrograms (only if nearest_neighbour has been applied) complete_results(pickle, filename) Returns dictionary with complete results pickle: boolean whether to pickle dict into a file (stored in 'pickle' folder) default: True filename: specifies the name of a pickled file default: method name (e.g. sticho, word...)