Running Reflexion on Human Eval

This is a repo which contains a methodology to build build Reflexion as a flow engineering technique for the HumanEval problem solving dataset described in the paper "Evaluating Large Language Models Trained on Code". Though this is not an exact replication of the reflexion methodology it works on the same principles of making LLMs self-relective by using tests as evaluation.

This repo adapts code heavily from the human_eval repo by openAI. So, kudos to them for making evaluating on humaneval this easy.

Installation

Make sure to use python 3.7 or later:

$ conda create -n codex python=3.7
$ conda activate codex

Check out and install this repository:

$ git clone https://github.com/openai/human-eval
$ pip install -e human-eval

Usage

Run Reflection

Python3 main.py

To evaluate the samples, run

$ evaluate_functional_correctness samples_reflexion.jsonl
Reading samples...
32800it [00:01, 23787.50it/s]
Running test suites...
100%|...| 32800/32800 [16:11<00:00, 33.76it/s]
Writing results to samples.jsonl_results.jsonl...
100%|...| 32800/32800 [00:00<00:00, 42876.84it/s]
{'pass@1': ..., 'pass@10': ..., 'pass@100': ...}

Citation

@article{chen2021codex,
  title={Evaluating Large Language Models Trained on Code},
  author={Mark Chen and Jerry Tworek and Heewoo Jun and Qiming Yuan and Henrique Ponde de Oliveira Pinto and Jared Kaplan and Harri Edwards and Yuri Burda and Nicholas Joseph and Greg Brockman and Alex Ray and Raul Puri and Gretchen Krueger and Michael Petrov and Heidy Khlaaf and Girish Sastry and Pamela Mishkin and Brooke Chan and Scott Gray and Nick Ryder and Mikhail Pavlov and Alethea Power and Lukasz Kaiser and Mohammad Bavarian and Clemens Winter and Philippe Tillet and Felipe Petroski Such and Dave Cummings and Matthias Plappert and Fotios Chantzis and Elizabeth Barnes and Ariel Herbert-Voss and William Hebgen Guss and Alex Nichol and Alex Paino and Nikolas Tezak and Jie Tang and Igor Babuschkin and Suchir Balaji and Shantanu Jain and William Saunders and Christopher Hesse and Andrew N. Carr and Jan Leike and Josh Achiam and Vedant Misra and Evan Morikawa and Alec Radford and Matthew Knight and Miles Brundage and Mira Murati and Katie Mayer and Peter Welinder and Bob McGrew and Dario Amodei and Sam McCandlish and Ilya Sutskever and Wojciech Zaremba},
  year={2021},
  eprint={2107.03374},
  archivePrefix={arXiv},
  primaryClass={cs.LG}
}

@misc{shinn2023reflexion,
      title={Reflexion: Language Agents with Verbal Reinforcement Learning}, 
      author={Noah Shinn and Federico Cassano and Edward Berman and Ashwin Gopinath and Karthik Narasimhan and Shunyu Yao},
      year={2023},
      eprint={2303.11366},
      archivePrefix={arXiv},
      primaryClass={cs.AI}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
human-eval		human-eval
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
samples_reflexion.jsonl		samples_reflexion.jsonl
samples_reflexion.jsonl_results.jsonl		samples_reflexion.jsonl_results.jsonl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Running Reflexion on Human Eval

Installation

Usage

Citation

About

Releases

Packages

Languages

License

shubchat/easyreflexion

Folders and files

Latest commit

History

Repository files navigation

Running Reflexion on Human Eval

Installation

Usage

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages