TKINTER_PYTHON --> PREPROCESSING / TEXT CLEANSING:

Text cleansing process is to clean the extracted text and preprocessing module ensures that data is ready for analysis process. As in literature, different preprocessing techniques can be applied to this step. After applying these preprocessing techniques, most interesting terms can be found from the data. In Promine, following preprocessing methods are involved in text cleaning.

Tokenization:

First, tokenization is applied to the data and converts a stream of characters into streams of words, which are our processing unit.

Stop Words Filtering:

To reduce the dimensionality of tokenized data, stop word filter is applied. In this process most frequent but unimportant words are removed from the data.

Lemmaztization:

Lemmatization is the process of grouping together the different inflected forms of a word so they can be analysed as a single item. Lemmatization is similar to stemming but it brings context to the words. So it links words with similar meaning to one word.Text preprocessing includes both Stemming as well as Lemmatization. Many times people find these two terms confusing. Some treat these two as same. Actually, lemmatization is preferred over Stemming because lemmatization does morphological analysis of the words.

TDIDF:

This measures the frequency of a word in a document. In information retrieval, tf–idf or TFIDF, short for term frequency–inverse document frequency, is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus. We have calculated it by using built-in functions in python.

WordNet:

At the end of preprocessing step, a list of keywords is created. This list of words came from a file that is generated by a process model. Single file cannot provide enough information for generating knowledge elements that we need for domain specific ontology. For every keyword, we get a set of synonyms from WordNet and generate a list of words of that keyword

ABOUT SCREEN

TEXT ANALYSIS SCREEN

PROCESSING CORPUS SCREEN

WORKING ON TEXT ANALYSIS PHASE

Open Directory Clicked

While loading file:

After file loaded:

Applied Tokenization

Stopword Removal applied

Applied POS TAGGER:

After applied Lemmatization

WordNet Applied

Applied TD/IDF

Applied Clear Text:

Applied RESET:

Do validation when there is no item in Analysis textarea:

Working on processing Corpus

Selecting Corpus File:

Loading File for corpus:

File Loaded in Text Area for Corpus Generation:

File Saved In Corpus:

Corpus File Validation if file is already exist:

Corpus Files in Explorer:

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.idea		.idea
media		media
venv		venv
README.md		README.md
nlp_gui.py		nlp_gui.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TKINTER_PYTHON --> PREPROCESSING / TEXT CLEANSING:

Tokenization:

Stop Words Filtering:

Lemmaztization:

TDIDF:

WordNet:

ABOUT SCREEN

TEXT ANALYSIS SCREEN

PROCESSING CORPUS SCREEN

WORKING ON TEXT ANALYSIS PHASE

Working on processing Corpus

About

Releases

Packages

Languages

AliAzaz/TKINTER_PYTHON

Folders and files

Latest commit

History

Repository files navigation

TKINTER_PYTHON --> PREPROCESSING / TEXT CLEANSING:

Tokenization:

Stop Words Filtering:

Lemmaztization:

TDIDF:

WordNet:

ABOUT SCREEN

TEXT ANALYSIS SCREEN

PROCESSING CORPUS SCREEN

WORKING ON TEXT ANALYSIS PHASE

Working on processing Corpus

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages