NeSy-Code-Generation-Workflow

A neuro-symbolic workflow for generating controlled synthetic data for a code comment dataset

This is the official code repository for the paper: "NeSy is alive and well: A LLM-driven symbolic approach for better code comment data generation and classification ".

data

This directory contains three data files:

Seed data: The data provided by the IRSE 2023 shared task organizers to train the ML models.
ChatGPT-generated data: The data generated by a LLM assistant (ChatGPT in this case) to evaluate the overall increase in model performance after data augmentation.
Symbolic-generated data: The data generated by a script created by ChatGPT by learning symbolic rules to evaluate the overall increase in model performance after data augmentation.

experiments

This directory contains the code for training and evaluating ML models on all datasets. The code also contains data augmentation techniques using synthetic data.

src

This directory contains the source material such as the symbolic rules framework and the symbolic script for synthetic data generation.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
experiments		experiments
src		src
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NeSy-Code-Generation-Workflow

data

experiments

src

About

Releases

Packages

Languages

License

HannaAbiAkl/NeSy-Code-Generation-Workflow

Folders and files

Latest commit

History

Repository files navigation

NeSy-Code-Generation-Workflow

data

experiments

src

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages