Text Summarizer

This project implements a text summarizer using machine learning techniques. The summarizer takes articles and generates their highlights.

Dataset Identification

The dataset used for this project is sourced from Kaggle: Newspaper Text Summarization (CNN/DailyMail)

The dataset consists of articles and their corresponding highlights, which are used as the source text and target summaries, respectively. An ID is included for each entry but is not used in the summarization process.

Features

Article: The main text of the news article.
Highlights: The summarized text or key points of the article.
ID: A unique identifier for each entry (not used in the summarization process).

Preprocessing Steps

Loading the Dataset: The dataset is loaded into a Pandas DataFrame.
Cleaning the Data: Unnecessary columns, such as the ID, are removed. Any missing values are handled appropriately.
Tokenization: The text is tokenized into individual words or subwords, depending on the summarization model requirements.
Padding/Truncating: Ensuring uniform length for model input by padding shorter sequences and truncating longer ones.

Model Training

The summarizer model is trained using the preprocessed dataset. Various machine learning models and techniques can be applied, such as:

Transformer-based models (e.g., BERT)

Results

The trained model's performance on the test set is presented, showing the effectiveness of the summarization. Examples of generated summaries versus reference summaries are provided for qualitative analysis.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Custom_BART_Model		Custom_BART_Model
__pycache__		__pycache__
docs		docs
.gitattributes		.gitattributes
README.md		README.md
Text_Summarizer.ipynb		Text_Summarizer.ipynb
Text_Summarizer.py		Text_Summarizer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text Summarizer

Dataset Identification

Features

Preprocessing Steps

Model Training

Results

About

Releases

Packages

Languages

ridaamirr/TextSummarizer

Folders and files

Latest commit

History

Repository files navigation

Text Summarizer

Dataset Identification

Features

Preprocessing Steps

Model Training

Results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages