This repository contains the code for the Introduction to Natural Language Processing presentation given at the Quera Data Science Bootcamp in November 2023.
You can find the slides here.
- Text processing:
NLP_preprocessing.ipynb
contains the code for the text processing part of the presentation. - Text representation:
NLP_text_representation.ipynb
contains the code for the text representation part of the presentation. - Naive Bayes: A from scrach implementation of the Naive Bayes classifier can be found in
NLP_naive_bayes.ipynb
. - Word Embeddings:
NLP_word_embeddings.ipynb
contains an exploration of word embeddings. - Song Recommender: We built a song recommender system using the idea of word2vec. The code for this part can be found in
NLP_songs_recommandation.ipynb
.
- A brief introduction to NLP, NLU, and NLG
- Tokenization techniques
- Word-based
- Character-based
- Subword-based (Specifically BPE)
- Text normalization
- Stemming
- Lemmatization
- Numerical normalization
- Text Representation
- One-hot encoding
- Bag of words
- TF-IDF
- Naive Bayes
- Embedding methods
- Word2Vec
- Doc2Vec
You are free to use any of the materials in this repository for educational purposes. 😊 Feel free to contact me if you have any questions.