[WIP] A tool for C++ code modification to augment data for clone detection tools
Clone or download repo from GitHub
git clone https://github.com/JetBrains-Research/gorshochek.git
cd gorshochek
It is more preferable to run gorshochek
using Docker
.
To build docker image run:
sudo docker build -t gorshochek .
Then to run a container from just created image run:
sh ./scripts/run.sh input_folder output_folder
Note that input_folder
will be traversed recursively and every .cpp
file from it will be transformed.
To specify which exact transformations to apply edit file config.yaml
, which should have the following structure:
n_transformations: 3
transformations:
- identity transform:
p: 0.99
- add comments:
...
The output will have structure as follows:
output_path
├── log.txt
├── file1
| ├── description.txt
│ ├── transformation_0.cpp
| ├── transformation_1.cpp
│ └── transformation_2.cpp
├── file2
| ├── description.txt
│ ├── transformation_0.cpp
| ├── transformation_1.cpp
│ └── transformation_2.cpp
...
Here log.txt
file contains data on how many errors appeared after applying each transformation from config.yaml
.
Logs are stored in the following format:
- transformation_1
file1/transformation_1.cpp 1
file2/transformation_1.cpp 1
file3/transformation_1.cpp 1
- transformation_2
file1/transformation_1.cpp 1
...
The log.txt
file is split in several blocks, each titled as follows: - transformation_i
.
Each block belongs to a certain transformation. After block title each line corresponds to a
transformed file and the number of errors found in that file, for instance: file1/transformation_1.cpp 1
.
More examples can be found in tests
folder
- Identity transformation
- Add, remove comments
- Rename variables, functions
- Swap
if
andelse
blocks and change the corresponding condition insideif
- Rearranging function declarations
- Replace
for
withwhile
and vice versa - Replace
printf
withstd::cout
- Open macros
- Random change between
x++
,++x
,x+=1
,x=x+1
- Change the signature of functions by making variables global
- Replace
std::cout
withprintf
- Useless variables, functions, defines
More detailed documentation can be found in DOCS.md
If want to contribute to the project and add new transformation (e.g. Example
, note that Example
is just a name of a transformation) the following classes should be implemented:
ExampleTransformation
derived fromITransformation
-- class that aggregates all the sufficient information fromconfig.yaml
and creates instances ofExampleASTConsumer
usinggetConsumer
methodExampleASTConsumer
derived fromASTConsumer
ExampleASTVisitor
derived fromRecursiveASTVisitor<ExampleASTVisitor>