- Basics of NLTK
- Tokenization (Sentense & Word)
- Stemming
- Lemmatization
Word Vectorization Technique NLP Tools
- Bag of Words (Scikit-Learn Implemetation)
- TFidf(Scikit-Learn Implementation) (term frequency is normalized here)
Custom Word Embedding Custom_Word_Embedding
Steps :
- Make a corpus of unique words based on their frequency
- Choose the number of Word Vocabulary (this will be dimension of each words words)
- Based on Vocab, One_Hot encode every word in a sentence. if this is done using tf.keras, a list of dictionary keys will be created which will denote the hot place of that word in word vocab.
- Make each sentence of equal length by Padding. this can be achieved through pad_sequences in tf.keras.
- Create an Embedding Layer with any dimensions of your choice.
- Input of embedding layer will be vocab_size, embedding_length and length of sentence.
MIT