Skip to content

Releases: oduwsdl/sumgram

sumgram-v1.0.1

14 Mar 18:55
3963338
Compare
Choose a tag to compare

Major updates:

  • Support for reading text from STDIN: $ cat path/to/collection/of/text/files/*.txt | sumgram -
  • Sumgram uses an English stopwords list by default (switch off with --no-default-stopwords). To include additional stopwords --add-stopwords
    • may be used to include additional stopwords:
      $ sumgram --add-stopwords stopword1 stopword2 -t 10 path/to/collection/of/text/files/
    • may be used to include additional stopwords in a text file (1 stopword per line):
      $ sumgram --add-stopwords my_stopwords_file.txt -t 10 path/to/collection/of/text/files/
  • Extracting/processing text from URLs:
    $ sumgram "http://example.com/news/article-1.html" "http://example.com/news/article-1.html".
    To change the default new article boilerplate removal method (boilerpy3.ArticleExtractor), set --boilerplate-rm-method with one of the following 'boilerpy3.DefaultExtractor', 'boilerpy3.ArticleSentencesExtractor', 'boilerpy3.LargestContentExtractor', 'boilerpy3.CanolaExtractor', 'boilerpy3.KeepEverythingExtractor', 'boilerpy3.NumWordsRulesExtractor', and 'nltk' (regular expression for stripping all HTML tags)

sumgram-v0.0.19

19 May 21:22
Compare
Choose a tag to compare

sumgram-v0.0.18

10 Jul 11:34
ef48d75
Compare
Choose a tag to compare

Minor changes

  • Added -v --version command-line option
  • Made regex default --sentence-tokenizer