GOKTU_NLP - TURKISH NLP SYNTACTIC PARSER /w CKY ALGORITHM

Note This toolbox is prepared for CMPE561 Natural Language Processing course given by Prof. Dr. Tunga Gungor in Boğaziçi University.

Syntactic parsing is the process of analyzing a sentence or a piece of text and determining its grammatical structure. This includes identifying the constituent phrases and dependencies between the words, as well as determining the roles played by each word in the sentence (such as the subject, verb, and object). In this project, we have developed a Turkish Language CKY Parser which is fed from Chomsky's Normal Form (CNF) grammar rules and a lexicon. The process of parsing and the generation of CNF rules and lexicon are described.

Installation

1- Download codes.

$ git clone https://github.com/GoktugOcal/turkish-syntactic-parser.git

2- Install required packages.

$ pip install -r requirements.txt

Usage

Use from CLI

$ python tr_parse.py -s [<string>]

$ python tr_parse.py -s "Ben okula gittim."

Tokens : ['ben', 'okula', 'gittim']
POS Tags : [['PRO1'], ['DAT'], ['VPPAST1']]
Sentence is grammatically correct.

######### CKY CHART #########
--------  -------  -----------
ben       okula    gittim
['PRO1']  []       ['S']
[]        ['DAT']  ['VPPAST1']
[]        []       ['VPPAST1']
--------  -------  -----------
##### BEST SENTENCE STRUCTURE #####
(S(PRO1 ben ) (VPPAST1(DAT okula ) (VPPAST1 gittim ) ))

Use in Python

from tr_syntactic_parser.tools.helper import *
from tr_syntactic_parser.tr_parser import TurkishCKYParser

sentence = "..." # put your sentence
sentence = preprocess(sentence) # preprocess the sentence

filename = "tr_syntactic_parser/grammar/grammar.txt" # specify the location of CNF grammar"
parser = TurkishCKYParser(filename) # initialize the parser

parser.parse(sentence) # parse
parser.show_cky_chart() # show filled CKY chart
print("##### BEST SENTENCE STRUCTURE #####")
parser.show_sentence_structure() # show best possible sentence structure

Visualization

A parse visualizer class have been implemented with using Plotly and Spacy. The parse visualizer has three components.

First of all, run the parser

from tr_syntactic_parser.tools.helper import *
from tr_syntactic_parser.tr_parser import TurkishCKYParser
sentence = "..."
sentence = preprocess(sentence)
filename = "tr_syntactic_parser/grammar/grammar.txt"
parser = TurkishCKYParser(filename)

terminals = parser.get_terminal_nodes(parser.get_tree()) # Get terminal nodes

All the visualizations can be done esily on Jupyter Notebook.

POS tag visualizer (powered by Spacy)

Show on Notebook

from tr_syntactic_parser.tools.visualizer import parse_visualizer

visualizer = parse_visualizer() # Initialize 
visualizer.pos_vis(sentence, terminals)

Sentence structure visualizer (powered by Spacy)

Show on Notebook

from tr_syntactic_parser.tools.visualizer import parse_visualizer

visualizer = parse_visualizer() # Initialize 
visualizer.pos_tree_vis(sentence, parser.tokens, parser.get_tree()) # we need tokens of sentence and root of the tree in that case

Save Spacy output as PNG

from tr_syntactic_parser.tools.visualizer import parse_visualizer

visualizer = parse_visualizer() # Initialize 


svg = visualizer.pos_vis(sentence, terminals, jupyter=False)# set jupyter=False
# or
svg = visualizer.pos_tree_vis(sentence, parser.tokens, parser.get_tree(), jupyter=False)

from tr_syntactic_parser.tools.helper import spacy_svg2png_save # import function from helpers
spacy_svg2png_save(svg, sentence, output_path = "./") # convert svg to png

Structure tree visualizer (powered by Plotly)

Show on Notebook

from tr_syntactic_parser.tools.visualizer import parse_visualizer

visualizer = parse_visualizer() # Initialize 
visualizer.tree_vis(sentence, parser.tokens, parser.get_tree()) # we need tokens of sentence and root of the tree in that case

Save as

output_file = "..."
visualizer.tree_vis(sentence, parser.tokens, parser.get_tree()).write_image(output_file)

Acknowledgement

Some parts of this tool is created by using Zeyrek Morphology Analyzer and NLTK.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.ipynb_checkpoints		.ipynb_checkpoints
data		data
img		img
tests		tests
tr_syntactic_parser		tr_syntactic_parser
LICENSE		LICENSE
README.md		README.md
demo.ipynb		demo.ipynb
report.pdf		report.pdf
requirements.txt		requirements.txt
tr_parse.py		tr_parse.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GOKTU_NLP - TURKISH NLP SYNTACTIC PARSER /w CKY ALGORITHM

Installation

Usage

Use from CLI

Use in Python

Visualization

POS tag visualizer (powered by Spacy)

Sentence structure visualizer (powered by Spacy)

Structure tree visualizer (powered by Plotly)

Acknowledgement

About

Releases

Packages

Languages

License

GoktugOcal/turkish-syntactic-parser

Folders and files

Latest commit

History

Repository files navigation

GOKTU_NLP - TURKISH NLP SYNTACTIC PARSER /w CKY ALGORITHM

Installation

Usage

Use from CLI

Use in Python

Visualization

POS tag visualizer (powered by Spacy)

Sentence structure visualizer (powered by Spacy)

Structure tree visualizer (powered by Plotly)

Acknowledgement

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages