Semester VI Minor Project

Semester VI Minor Project

1.1. Objective of the Project

Modularizing Multi-Source Text Summarization with Transformer-based Deep Learning and NLP techniques in a Full-Stack Web Application.

Text summarizer that has two modes: one to input a blob of text and another to input documents. It figures out what model to use for summarization in the backend through evaluation of their respective outputs and displays the optimal summary to the user.

1.2. TODO

1.3. Models we're using

T5: This model is a transformer-based model that has achieved state-of-the-art results on several NLP tasks, including text summarization. It can be fine-tuned on the XSum dataset for summarizing research papers.
BART: BART is another transformer-based model that can be used for abstractive text summarization. It has achieved state-of-the-art results on several benchmark datasets and can be fine-tuned on the CNN/Daily Mail dataset for summarizing research papers.
GPT-2: GPT-2 is a large-scale transformer-based language model that has been pre-trained on a diverse range of text data. It can be fine-tuned on the CNNDM dataset for summarizing research papers.
Pegasus: Pegasus is a sequence-to-sequence transformer-based model that has been specifically designed for abstractive text summarization. It has achieved state-of-the-art results on several benchmark datasets and can be fine-tuned on the XSum dataset for summarizing research papers.
ProphetNet: ProphetNet is another transformer-based model that has been specifically designed for sequence-to-sequence tasks. It has achieved state-of-the-art results on several benchmark datasets and can be fine-tuned on the XSum dataset for summarizing research papers.

Questions to prepare for

Training the models

The training corpora for each model and what their significance is
Why we've implemented a dataset loader class to train-test split
1. What does the encode_plus() function do?
2. What are it's parameters?
3. What is the attention mask and what does it signify?
Testing and validation
1. What is ROUGE?
  1. ROUGE-N for number of matching n-grams
  2. ROUGE-L for LCS

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
models		models
project		project
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Semester VI Minor Project

1.1. Objective of the Project

1.2. TODO

1.3. Models we're using

Questions to prepare for

Training the models

1.4. References

About

Languages

License

0xVolt/splice-and-dice

Folders and files

Latest commit

History

Repository files navigation

Semester VI Minor Project

1.1. Objective of the Project

1.2. TODO

1.3. Models we're using

Questions to prepare for

Training the models

1.4. References

About

Resources

License

Stars

Watchers

Forks

Languages