Code Semantic Enrichment for Deep Code Search
Tested in Ubuntu 16.04
- Python 3.6
- Keras 2.1.3
- Tensorflow-gpu 1.7.0
- lucene 7.7.1
The datasets used in our paper will be found at: https://drive.google.com/drive/folders/1j-0xukLQWGrJ8-Lxw7vFAbubFTyXJT2C?usp=sharing
If you want to reprocess the data, you can process it into a usable form for the model by following steps:
1.Build corpus for each features (i.e., description, tokens):
python createCorpus.py
python createVocab.py
python vocab2pkl.py
2.Processing training data and testing data according to the corpus:
python txt2pkl.py
Build retrieval base: python Index.py
Perform search: python Search.py
Remove stop words: python deleteStopWords.py
Put the data set into the data/github
directory under keras
Edit hyper-parameters and settings in config.py
python main.py --mode train
python main.py --mode eval