PythonGenerator

This system generates fake python code that is human-readable and at first glance, appears to be relatively legitimate code. In particular, we feel that the code generated closely mirrors code used in industry, since there are no comments.

We read in a corpus of training Python data, and train CFG rules using those parsed files. We then induce a PCFG using the relevant frequences of the CFG rules, and attempt to stochastically generate our own Python "code" based on the PCFG rules, and with the help of this Unparser class.

I've included the training data in the repo since it's very small, about 15 MB.

To run the code, just run

./main.py [NUM_FILES]

where NUM_FILES is the number of fake Python files you want to generate; it is an optional argument, with the default being 5.

Fetching the Python files (corpus)

If you don't have the Python training files in data/, you can easily fetch them by first running

python data/fetch_python_ids.py

and then

python data/fetch_python_files.py

We use the searchcode API to fetch data.

Generating fake Python files

The entry point to generate the Python files is the main.py file. Obviously, the data/ files must exist before running this.

As stated above, it takes an optional argument, the number of files to generate.

To run it, you can either call

python main.py [NUM_FILES]

or

./main.py [NUM_FILES]

when inside the repo.

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
data		data
legacy		legacy
paper		paper
.gitignore		.gitignore
LICENSE		LICENSE
PythonGenerator.tar.bz2		PythonGenerator.tar.bz2
README.md		README.md
Unparser.py		Unparser.py
dict.py		dict.py
generate_python.py		generate_python.py
generate_rules.py		generate_rules.py
main.py		main.py
postprocess.py		postprocess.py
unparse.py		unparse.py
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PythonGenerator

Fetching the Python files (corpus)

Generating fake Python files

About

Releases

Packages

Languages

License

jakebarnwell/PythonGenerator

Folders and files

Latest commit

History

Repository files navigation

PythonGenerator

Fetching the Python files (corpus)

Generating fake Python files

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages