Release 2.5.0
This release adds a fourth evaluation task (IfEval) into the current competition starting on block 4,344,030. At this time the weighting of each task will be 85% MMLU, 5% Word Sorting, 5% Fineweb, and 5% IfEval.
Subnet
- Added new IfEval (Instruction Following) evaluation task.
- This evaluation scores models based on how well they follow generated rules about their response. To start with this will include rules about casing, comma usage, word count, and sentence count.
- Includes a check to make sure models are generating reasonable output. Meaning they are not using the same response for the same rules when asked different questions.
Validators
-
The expected time per evaluation cycle has increased due to the new evaluation task.
-
TTLs have been adjusted and each model is required to complete all evaluation tasks in 12 minutes.
-
Alpha has also been adjusted. Models should first receive weight after 2 cycles (~360 blocks) and will receive all weight after 17 cycles (~3060 blocks) of consecutive wins.
-
Output width is set explicitly to improve readability of pm2 rich tables in logging. Thanks coldint!
Miners
- The new dataset loader for the if_eval task can be found at https://github.com/macrocosm-os/finetuning/blob/main/finetune/datasets/generated/if_eval_loader.py.
- As mentioned this will be incorporated into the existing competition starting in block 4,344,030 so please take this into consideration for your training.
This release requires running pip install -e .
to pick up the latest dependencies