Skip to content

Commit

Permalink
[RAFT] Fix Datapoint Field in Formatter for Data Generation (ShishirP…
Browse files Browse the repository at this point in the history
…atil#535)

This PR addresses an issue with the datapoint field in the formatter for
data generation. Specifically, it corrects the column renaming in
`format.py` on line 107. The line:

```python
newds = ds.rename_columns({'question': 'prompt', 'cot_answer': 'completion'})
```

has been updated to:

```python
newds = ds.rename_columns({'instruction': 'prompt', 'cot_answer': 'completion'})
```

The change is necessary because the "instruction" field already includes
the question. Here is the relevant code snippet that sets the
"instruction" field:

```python
context = ""
for doc in docs:
    context += "<DOCUMENT>" + str(doc) + "</DOCUMENT>\n"
context += q
datapt["instruction"] = context
```

We want to thank @HuiyingLi for bringing this up. 
Fixes ShishirPatil#534.

---------

Co-authored-by: Charlie Cheng-Jie Ji <charliechengjieji@berkeley.edu>
  • Loading branch information
HuanzhiMao and CharlieJCJ authored Jul 20, 2024
1 parent 181cbef commit 7b230df
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion raft/format.py
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,7 @@ class OpenAiCompletionDatasetFormatter(DatasetFormatter):
https://platform.openai.com/docs/guides/fine-tuning/preparing-your-dataset
"""
def format(self, ds: Dataset, params: Dict[str, str]) -> Dataset:
newds = ds.rename_columns({'question': 'prompt', 'cot_answer': 'completion'})
newds = ds.rename_columns({'instruction': 'prompt', 'cot_answer': 'completion'})
return _remove_all_columns_but(newds, ['prompt', 'completion'])

class OpenAiChatDatasetFormatter(OpenAiCompletionDatasetFormatter):
Expand Down

0 comments on commit 7b230df

Please sign in to comment.