Skip to content

A datasets collection that we used for train our Turkish NLP Scenario Model Finetune.

License

Notifications You must be signed in to change notification settings

HEZARTECH/datasets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LOGO_PNG

Hezartech Datasets

Welcome to the Hezartech Datasets repository. This repository contains various datasets curated and maintained by Hezartech for machine learning, data science, and AI research purposes.

Table of Contents

About

Hezartech is committed to advancing machine learning and AI by providing high-quality datasets. These datasets can be used for various applications, including sentiment analysis, named entity recognition, and more.

Datasets

  1. Amazon: Amazon product comments with 1, 4 and 5 stars within text data.
  2. X_Twitter: X post's that sent to a firm (we filter tweets via queries)
  3. SikayetVar: SikayetVar articles data.
  4. GenerativeAI_Datasets: Datasets that generates data with Generative AI.
  5. Mixin_Datasets: Mixed up datasets for general purpose training-finetune our model.

P.S: 15 thousand post data was pulled from X (formerly Twitter). However, after the jury's recommendation, that data set was not uploaded to Github.

Usage

To use these datasets, you can clone the repository and load the datasets using your preferred programming language or tool.

Cloning the Repository

$ git clone https://github.com/hezartech/datasets.git
# Loading a Dataset (Python Example)
import pandas as pd

# Load the sentiment analysis dataset
df = pd.read_csv('datasets/sentiment_analysis.csv')

Dataset Formats

CSV: Commonly used for tabular data. JSON: Used for structured data with nested fields.

Contributing

We welcome contributions to enhance and expand the dataset collection. To contribute:

Fork the repository. Create a new branch (git checkout -b feature-branch). Make your changes. Submit a pull request. Please make sure to adhere to the contribution guidelines.

License

This repository is licensed under the MIT License. See the LICENSE file for more details.

Contact

For any questions, suggestions, or collaborations, please contact us at:

Email: hezartech@gmail.com

About

A datasets collection that we used for train our Turkish NLP Scenario Model Finetune.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published