Skip to content

Latest commit

 

History

History
16 lines (13 loc) · 1.71 KB

README.md

File metadata and controls

16 lines (13 loc) · 1.71 KB

Instruction_Code_Datasets

A repository of datasets in the domain of code for instruction fine-tuning.

Datasets

Note: The following datasets are not processed, only collected.

Dataset Release time Scale Lang Programming Lang Task
Instruct-to-Code Mar 28,2023 451k Mul python…et al. et al.
godot_dodo_4x_60k Apr 27,2023 62533 EN GDScript Code Generation
TSSB-3M-instructions Apr 28,2023 3M EN python…et al. Code bugfix
Codegen May 4,2023 4535 EN C++,Node.js,Python,shell script,Java,JavaScript,et al. Code Generation,Code Summary,QA et al.
codealpaca May 13,2023 20k EN HTML,CSS,Java,SQL,Python,JavaScript,JSX,C++,Swift,Ruby,PHP,et al. Code Generation,Code Search et al.
CodeGPT May 10,2023 32k CN C#,C,C++,Go,Java,JavaScript,PHP,Python,Ruby,et al. Code Generation,Code Search, QA el al.