retro style tokenization for language models
-
Updated
May 30, 2024 - Python
retro style tokenization for language models
This repository contains the code and PLODv2 dataset to train character-level language models (CLM) for abbreviation and long-form detection released with our LREC-COLING 2024 publication
This is a diacritization model for Arabic language. This model was built/trained using the Tashkeela: the Arabic diacritization corpus on Kaggle
In this project, I worked with a small corpus consisting of simple sentences. I tokenized the words using n-grams from the NLTK library and performed word-level and character-level one-hot encoding. Additionally, I utilized the Keras Tokenizer to tokenize the sentences and implemented word embedding using the Embedding layer. For sentiment analysis
A causal intervention framework to learn robust and interpretable character representations inside subword-based language models
Recurrent neural network for building a character-level language model and its application to generating new dinosaur names
It aims to write new sentences by learning character units sentences using RNN. As training data, a collection of Shakespeare's novels was used.
Sequence Models coding assignments
Official code for Group-Transformer (Scale down Transformer by Grouping Features for a Lightweight Character-level Language Model, COLING-2020).
An implementation of "Character-level Convolutional Networks for Text Classification" in Tensorflow. See https://arxiv.org/pdf/1509.01626.pdf.
Notebooks of programming assignments of Sequence Models course of deeplearning.ai on coursera in May-2020
Name generation using RNN. This model was trained for generating indian names. Made using keras.
Lyrics Generation:notes: using LSTM , word2vec Analysis and more
Text Article generator using using Character level LSTM network.
Build a character level language model to generate new dinosaur names
Add a description, image, and links to the character-level-language-model topic page so that developers can more easily learn about it.
To associate your repository with the character-level-language-model topic, visit your repo's landing page and select "manage topics."