Skip to content

how to use NLTK, basics of Tokenization, Stemming, Lemmetization, Bag of Words, TfIdFs, custom word embeddings.

Notifications You must be signed in to change notification settings

ravis2114/NLP-tools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

NLP -Tools


  • Basics of NLTK
  • Tokenization (Sentense & Word)
  • Stemming
  • Lemmatization

Word Vectorization Technique NLP Tools


  • Bag of Words (Scikit-Learn Implemetation) BOW
  • TFidf(Scikit-Learn Implementation) TFidf (term frequency is normalized here)

Custom Word Embedding Custom_Word_Embedding


Steps :

  1. Make a corpus of unique words based on their frequency
  2. Choose the number of Word Vocabulary (this will be dimension of each words words)
  3. Based on Vocab, One_Hot encode every word in a sentence. if this is done using tf.keras, a list of dictionary keys will be created which will denote the hot place of that word in word vocab.
  4. Make each sentence of equal length by Padding. this can be achieved through pad_sequences in tf.keras.
  5. Create an Embedding Layer with any dimensions of your choice.
  6. Input of embedding layer will be vocab_size, embedding_length and length of sentence.

License

MIT

About

how to use NLTK, basics of Tokenization, Stemming, Lemmetization, Bag of Words, TfIdFs, custom word embeddings.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published