This repository contains a benchmark for evaluating large language models (LLMs) on Faroese grammar and translation accuracy. It includes tests for grammar rule application, morphological understanding, and translation tasks.
The Faroese LLM Benchmark is designed to assess the proficiency of language models in Faroese. This benchmark focuses on three key areas:
- Grammar Rule Application: Tests the model's ability to apply grammatical rules correctly.
- Morphological Understanding: Evaluates the model's understanding of morphological forms.
- Translation Accuracy: Assesses the model's ability to translate between Faroese and English.
The datasets used for testing are located in the data/
directory:
- File:
test_grammar_rules.json
- Example:
{ "type": "Fill-in-the-Blank", "question": "Complete the sentence: 'Eg síggi ___ (the elephant - acc.).'", "sentence": "Eg síggi ___ (the elephant - acc.).", "expected_answer": "fílin", "grammar_rule": "accusative definite singular" }
- File:
test_morphological_understanding.json
- Example:
{ "question": "What is the accusative singular indefinite form of 'fílur'?", "word": "fílur", "expected_answer": "fíl", "grammatical_context": { "case": "accusative", "number": "singular", "definiteness": "indefinite" } }
- File:
test_translation_accuracy.json
- Example:
{ "type": "Faroese to English", "question": "Translate: 'Fílarnir ganga í skóginum.'", "faroese": "Fílarnir ganga í skóginum.", "expected_answer": "The elephants walk in the forest.", "grammar_focus": ["definite plural", "present tense"] }
- Python 3.10 or later
- An OpenAI API key
-
Clone the repository:
git clone https://github.com/TrygviZL/faroese-llm-benchmark cd Faroese-LLM-Benchmark
-
Install the required packages:
poetry shell poetry install
-
Setup OpenAI API keyby adding the key to a .env file in the root of the project:
OPENAI_API_KEY='your-api-key'
To run all the tests, execute the following command in your terminal:
python scripts/run_tests.py
python scripts/generate_plots.py
The results are stored in the results/ directory:
- grammar_results.json: Contains the results of grammar rule application tests.
- morphological_results.json: Contains the results of morphological understanding tests.
- translation_results.json: Contains the results of translation accuracy tests.