Italian reviews dataset

A dataset of italian reviews to train sentiment classification models.

This dataset has been collected from the internet using web scraping techniques. For further information take a look to the code:

https://github.com/AlessandroGianfelici/trustpilot_spider.git

Data

For each data point, the dataset contains the company name (hashed for privacy reasons), the title of the review, the text of the review and the number of stars (from 1 to 5).

As far as I know, this is the largest sentiment classification dataset for italian language freely available online.

Usage

The data are stored as a txt file with comma separated fields. For example, if you're using python you can load it with pandas:

import pandas as pd

data = pd.read_csv('raw_data.txt')

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.gitignore		.gitignore
README.md		README.md
raw_data.txt		raw_data.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Italian reviews dataset

Data

Usage

About

Releases

Packages

AlessandroGianfelici/italian_reviews_dataset

Folders and files

Latest commit

History

Repository files navigation

Italian reviews dataset

Data

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages