Description

Digikala online market has recently published some open source data in various categories.

Since I always wanted to do some NLP project, then I thought of some useful tutorials in python for newcomers. I really hope this could be useful for you guys.

I still keep updating the package and also will share the link of video and article related to this post soon!

If you like the content

If you like the content, just add a star. 😏

Before you run models

First you should run the 0 - data Wrangling.ipynb to preprocess the data before going for the rest of files and creating your models.

Requirements

Use these conda commands to install the packages in environment:

conda install -c conda-forge --file requirements.txt

Dataset

I used mini-version of digikala customers comment dataset from here

🔗 www.quera.ir

which was uploaded for a AI competetion on 1398/08/16 and can be found here.

🔗 dataset download.

(Of course Needs authentication 😎).

Full version available in these links:

🔗 source 1

🔗 Source 2

For more studies:

for text preprocessing:

🔗 https://www.kaggle.com/sudalairajkumar/getting-started-with-text-preprocessing 🔗 https://www.kaggle.com/kernels/scriptcontent/19201884/download

tfidf:

🔗 https://towardsdatascience.com/multi-label-text-classification-with-scikit-learn-30714b7819c5 🔗 https://kavita-ganesan.com/tfidftransformer-tfidfvectorizer-usage-differences/#.Xc3OG67ngRY

basic word2vec:

🔗 https://medium.com/explore-artificial-intelligence/word2vec-a-baby-step-in-deep-learning-but-a-giant-leap-towards-natural-language-processing-40fe4e8602ba

gensim:

🔗 https://towardsdatascience.com/machine-learning-word-embedding-sentiment-classification-using-keras-b83c28087456

keras with gensim:

🔗 https://www.depends-on-the-definition.com/guide-to-word-vectors-with-gensim-and-keras/

LSTM:

🔗 https://medium.com/free-code-camp/applied-introduction-to-lstms-for-text-generation-380158b29fb3

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
data/original		data/original
fonts		fonts
pic		pic
0 - Data Wrangling.ipynb		0 - Data Wrangling.ipynb
1 - Dense Classifier.ipynb		1 - Dense Classifier.ipynb
1 - gensim word to vector.ipynb		1 - gensim word to vector.ipynb
2 - Simple Sklearn methods.ipynb		2 - Simple Sklearn methods.ipynb
3 - Convolutional Classifier.ipynb		3 - Convolutional Classifier.ipynb
4 - LSTM Classifier.ipynb		4 - LSTM Classifier.ipynb
5 - Bidirectional LSTM.ipynb		5 - Bidirectional LSTM.ipynb
6 - Stacked LSTM.ipynb		6 - Stacked LSTM.ipynb
7 - Stacked Conv - LSTM.ipynb		7 - Stacked Conv - LSTM.ipynb
8 - Parallel - Convolutional - Network.ipynb		8 - Parallel - Convolutional - Network.ipynb
9 - Parallel - LSTM-CONV.ipynb		9 - Parallel - LSTM-CONV.ipynb
README.md		README.md
_config.yml		_config.yml
digikala_tsne_word_model.csv		digikala_tsne_word_model.csv
digikala_words.w2v		digikala_words.w2v
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Description

If you like the content

Before you run models

Requirements

Dataset

For more studies:

About

Releases

Packages

Languages

masouduut94/Digikala_comments_verification

Folders and files

Latest commit

History

Repository files navigation

Description

If you like the content

Before you run models

Requirements

Dataset

For more studies:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages