This repository is used to record the tools we are like to use in Natural Language Processing.
Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.
Just as the title says.
Interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries.
POS tagger, NER, the parser, the coreference resolution system, sentiment analysis, bootstrapped pattern learning, information extraction and the basic dependencies
https://stanfordnlp.github.io/CoreNLP/
tf-idf, LSA, LDA, word2vec
https://radimrehurek.com/gensim/
word2vec
https://nlp.stanford.edu/projects/glove/
contextualized word representation
sentence encode
https://bert-as-service.readthedocs.io/en/latest/index.html
https://github.com/google/sentencepiece
Chinese text segmentation, POS, NER, dependancy parsing , etc
https://github.com/fxsjy/jieba
Chinese text segmentation, POS, NER, dependancy parsing, etc
https://github.com/HIT-SCIR/pyltp
Chinese text segmentation, POS, NER, dependancy parsing, etc
https://github.com/hankcs/HanLP
Chinese word2vec, provides 100+ Chinese Word Vectors (embeddings) trained with different representations (dense and sparse), context features (word, ngram, character, and more), and corpora.
https://github.com/Embedding/Chinese-Word-Vectors
machine learning
https://scikit-learn.org/stable/index.html
Linear Programming
https://pythonhosted.org/PuLP/
numerical integration, interpolation, optimization, linear algebra, and statistics, etc.
https://www.scipy.org/scipylib/index.html
implementation of crf
https://taku910.github.io/crfpp/
SOTA NLP for tf2.0 and PyTorch, including BERT, GPT, XLNet, OpenAI etc.
https://github.com/huggingface/transformers
implementations of high quality models for almost any NLP problem
a high-level library to help with training neural networks in PyTorch
https://pytorch.org/ignite/index.html
an open source ecosystem for neural machine translation and neural sequence learning
Generic data loaders, abstractions, and iterators for text
https://github.com/pytorch/text
Google Translate API (unofficial)
distributed search engine
https://elasticsearch-py.readthedocs.io/en/master/
Chinese analyzer: IK analysis for elasticsearch
An open source and collaborative framework for extracting the data you need from websites.
a general purpose, document-based, distributed database