We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OnlineDPO
Copy-paste the following information when reporting an issue:
examples
from trl import OnlineDPOConfig, OnlineDPOTrainer from transformers import AutoModelForCausalLM, AutoTokenizer, AutoModelForSequenceClassification # Load model to be trained model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2-0.5B-Instruct") tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2-0.5B-Instruct") # Load reward model reward_model = AutoModelForSequenceClassification.from_pretrained("trl-lib/Qwen2-0.5B-Reward", num_labels=1) reward_tokenizer = AutoTokenizer.from_pretrained("trl-lib/Qwen2-0.5B-Reward") # Load dataset ds = load_dataset("trl-lib/ultrafeedback-prompt", split="train") ds = ds.train_test_split(test_size=0.1, seed=42) training_args = OnlineDPOConfig( output_dir=str(Path(__file__).parent / f"online_dpo_checkpoints"), num_train_epochs=10, overwrite_output_dir=True, eval_strategy="steps", save_strategy="steps", report_to="none", save_steps=5, eval_steps=5, # weight_decay=0.01, # Add weight decay # warmup_steps=2, # Add warmup gradient_checkpointing=True, # Great for memory saving gradient_checkpointing_kwargs={"use_reentrant": False}, # To remove the warning per_device_train_batch_size=2, gradient_accumulation_steps=1, per_device_eval_batch_size=2, save_total_limit=1, bf16=True, logging_steps=1, dataloader_num_workers=8, # Use multiple processes for data loading dataloader_pin_memory=True, load_best_model_at_end=True, metric_for_best_model="eval_loss", ddp_find_unused_parameters=False, # Don't know what this does remove_unused_columns=False, max_new_tokens=128, missing_eos_penalty=1.0, max_grad_norm=1.0, eval_delay=0.1, optim="adamw_torch_fused", ) trainer = OnlineDPOTrainer( model=model, processing_class=tokenizer, reward_model=reward_model, reward_processing_class=reward_tokenizer, args=training_args, train_dataset=ds["train"], eval_dataset=ds["test"], # data_collator=data_collator, callbacks=[EarlyStoppingCallback(early_stopping_patience=3)], )
outputs:
raise ValueError("You must specify exactly one of input_ids or inputs_embeds") ValueError: You must specify exactly one of input_ids or inputs_embeds
This problem happens when i go into evaluation
Evaluation should work
The text was updated successfully, but these errors were encountered:
Successfully merging a pull request may close this issue.
System Info
Copy-paste the following information when reporting an issue:
Information
Tasks
examples
folderReproduction
outputs:
This problem happens when i go into evaluation
Expected behavior
Evaluation should work
Checklist
The text was updated successfully, but these errors were encountered: