[EACL 2024] ICE-Score: Instructing Large Language Models to Evaluate Code
-
Updated
Jun 16, 2024 - Python
[EACL 2024] ICE-Score: Instructing Large Language Models to Evaluate Code
MONSERRATE is a dataset specifically created to evaluate Question Generation systems. It has, on average, 26 questions associated to each source sentence, attempting to be an “exhaustive” reference.
Automatic Evaluation of Textual Answers on the famous Kaggle Automated Essay Scoring (AES) dataset.
Multidimensional Evaluation for Text Style Transfer Using ChatGPT. Human Judgement as a Compass to Navigate Automatic Metrics for Formality Transfer (HumEval 2022)
Success and Failure Linguistic Simplification Annotation 💃
An AI expert system to automatically evaluate subjective answers submitted in online assessments.
Add a description, image, and links to the automatic-evaluation topic page so that developers can more easily learn about it.
To associate your repository with the automatic-evaluation topic, visit your repo's landing page and select "manage topics."