Skip to content

Commit

Permalink
Describe new modules and classes
Browse files Browse the repository at this point in the history
  • Loading branch information
piroor committed Dec 18, 2017
1 parent f08768d commit 01528ca
Show file tree
Hide file tree
Showing 4 changed files with 10 additions and 0 deletions.
1 change: 1 addition & 0 deletions lib/classifier-reborn/extensions/token_filter/stemmer.rb
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@

module ClassifierReborn
module TokenFilter
# This filter converts given tokens to their stemmed versions in the language.
module Stemmer
module_function

Expand Down
1 change: 1 addition & 0 deletions lib/classifier-reborn/extensions/token_filter/stopword.rb
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@

module ClassifierReborn
module TokenFilter
# This filter removes stopwords in the language, from given tokens.
module Stopword
STOPWORDS_PATH = [File.expand_path(File.dirname(__FILE__) + '/../../../../data/stopwords')]

Expand Down
6 changes: 6 additions & 0 deletions lib/classifier-reborn/extensions/tokenizer/token.rb
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,12 @@
module ClassifierReborn
module Tokenizer
class Token < String
# The class can be created with one token string and extra attributes. E.g.,
# t = ClassifierReborn::Tokenizer::Token.new 'Tokenize', stemmable: true, maybe_stopword: false
#
# Attributes available are:
# stemmable: true Possibility that the token can be stemmed. This must be false for un-stemmable terms, otherwise this should be true.
# maybe_stopword: true Possibility that the token is a stopword. This must be false for terms which never been stopword, otherwise this should be true.
def initialize(string, stemmable: true, maybe_stopword: true)
super(string)
@stemmable = stemmable
Expand Down
2 changes: 2 additions & 0 deletions lib/classifier-reborn/extensions/tokenizer/whitespace.rb
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@

module ClassifierReborn
module Tokenizer
# This tokenizes given input as white-space separated terms.
# It mainly aims to tokenize sentences written with a space between words, like English, French, and others.
module Whitespace
module_function

Expand Down

0 comments on commit 01528ca

Please sign in to comment.