Project track A.Y. 2024-2025 DATA VISUALIZATION & TEXT MINING

The team has to build a text processing pipeline that performs a text classification on the given corpus: all the assigned datasets refer to Entity Extraction use cases, that can be solved applying a text classification approach at token level (Token-based Classification)

The project MUST show:

Data Exploratory Analysis (DEA)
- Data preparation, cleaning: to clean the data from the raw dataset provided.
- Exploratory Data Analysis using Data Visualization tools to show data variables from statistical distribution (frequency, coverage) to linguistic information (pos, depparse, lemmas)
- LDA or NMF can be used, if needed, for studying the text distribution.
Neural Network approach
- Use one Neural Network type to classify the data (feed forward, RNN, LSTM, BiLSTM , GRU)
- Show metrics for the implementation strategy
Transformer-based Approach
- Use a Transformer based / Language Model model to classify the data (*BERT)
- Show metrics for the implementation strategy
A comparison about the models
Dashboard
- the project MUST implement an interactive DashBoard that combines
  - the Data Exploratory Analysis with dynamic charts about the dataset
  - the metrics about the different strategies applied
  - the ability to have a input box to test the categorizer and to see how it works, moving from a model to another.

Project Artifacts

The project MUST be developed on Jupyter or Colab, and in a customer-ready form that means

well-documented
with descriptions about all the steps
all the materials to reproduce them such as data and models, and instruction to run the dashboard - a Github repository is more than welcome

Datasets

You can find the dataset into your team folder, available in this repository.

Schedule

As all the exams are on Thursdays, the project as to be delivered by previous Tuesday 8pm CET.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
DataDreamers		DataDreamers
DataDribblers		DataDribblers
Datafellas		Datafellas
Loriltex		Loriltex
PlotTwisters		PlotTwisters
Text Me Maybe		Text Me Maybe
The Big Data Theory		The Big Data Theory
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project track A.Y. 2024-2025 DATA VISUALIZATION & TEXT MINING

Project Artifacts

Datasets

Schedule

About

nluninja/text-mining-dataviz-project-2024

Folders and files

Latest commit

History

Repository files navigation

Project track A.Y. 2024-2025 DATA VISUALIZATION & TEXT MINING

Project Artifacts

Datasets

Schedule

About

Topics

Resources

Stars

Watchers

Forks