A repository of datasets in the domain of code for instruction fine-tuning.
Note: The following datasets are not processed, only collected.
Dataset | Release time | Scale | Lang | Programming Lang | Task |
---|---|---|---|---|---|
Instruct-to-Code | Mar 28,2023 | 451k | Mul | python…et al. | et al. |
godot_dodo_4x_60k | Apr 27,2023 | 62533 | EN | GDScript | Code Generation |
TSSB-3M-instructions | Apr 28,2023 | 3M | EN | python…et al. | Code bugfix |
Codegen | May 4,2023 | 4535 | EN | C++,Node.js,Python,shell script,Java,JavaScript,et al. | Code Generation,Code Summary,QA et al. |
codealpaca | May 13,2023 | 20k | EN | HTML,CSS,Java,SQL,Python,JavaScript,JSX,C++,Swift,Ruby,PHP,et al. | Code Generation,Code Search et al. |
CodeGPT | May 10,2023 | 32k | CN | C#,C,C++,Go,Java,JavaScript,PHP,Python,Ruby,et al. | Code Generation,Code Search, QA el al. |