𝐑𝐀𝐓𝐈𝐎𝐍𝐀𝐋𝐘𝐒𝐓 is a model pre-trained on implicit rationales extracted from web text and reasoning datasets to provide process supervision. 𝐑𝐀𝐓𝐈𝐎𝐍𝐀𝐋𝐘𝐒𝐓 generalizes over reasoning tasks with little human intervention while beating much larger models!
We assume the use of slurm for model training and inference.
To install other requirements: pip install -r requirements.txt
Due to the size of The Pile, we implement a pre-filtering process to identify reasoning-rich documents by (1) computing the average semantic embedding of representative reasoning training sets using a paragraph embedding model, and (2) selecting documents from unlabelled datasets that exceed a cosine similarity threshold
cd sampling/prefilter_code
sh start_multiple_server.sh
This process will break The Pile into different chunks and run pre-filtering in parallel
Implicit rationales are often embedded in unlabelled text, reflecting natural thought processes in daily communication. Our extraction process aims to make these rationales explicit.
cd sampling/sampling_code
python sampling_rationales_all_datasets.py/sampling_rationales_c4.py
python calculate_perplexity_sampled_rationale_new_method.py
python filter_rationale.py
The rationale extraction process requires a vllm server to be running elsewhere. Please update the URL in the script to connect to the correct vllm server. The prompts for extracting rationales are already included in the script.
Rationales extracted from GSM8K can be found here
The goal of 𝐑𝐀𝐓𝐈𝐎𝐍𝐀𝐋𝐘𝐒𝐓 training is to develop a model that can generate implicit rationales to guide stepwise problem-solving during inference time. We choose to fine-tune a LLaMa-3-8B model on extracted rationales from step 2.
python parse_sampled_rationale.py
sbatch sbatch_llama_finetune.sh
This training script is adapted from stanford-alpaca
Trained RATIONALYST can be found here
How does 𝐑𝐀𝐓𝐈𝐎𝐍𝐀𝐋𝐘𝐒𝐓 work during inference time?
It's a stepwise process: 1. 𝐑𝐀𝐓𝐈𝐎𝐍𝐀𝐋𝐘𝐒𝐓 generates rationale based on the current trajectory. 2. Agent LLM proposes multiple next reasoning steps. 3. Implicit rationales help estimate which step is most probable. 4. The best step is chosen, and the process repeats.
cd model_inference/evaluation_scripts
The entrance of all task inference is inference.py
Test on different task: change dataset name
With RATIONALYST (using implicit supervision): change heuristic to world_model
Without RATIONALYST: change heuristic to random