Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DNM] Adapter Implementation of MM inputs in OAI friendly format #1495

Draft
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

isaacbmiller
Copy link
Collaborator

@isaacbmiller isaacbmiller commented Sep 15, 2024

Adds Image multimodal support. Currently buggy with parsing where you get prompts and outputs like the following:

[{'role': 'system', 'content': 'Your input fields are:
1. `question` (str): A question about the image(s)
2. `image_1` (Optional): An image of a math problem
3. `image_2` (Optional): An image of a math problem
4. `options` (List): The options to the question

Your output fields are:
1. `rationale` (str): ${produce the answer}. We ...
2. `answer` (str): The answer to the question

All interactions will be structured in the following way, with the appropriate values filled in.

[[[[ #### question #### ]]]]
{question}

[[[[ #### image_1 #### ]]]]
{image_1}

[[[[ #### image_2 #### ]]]]
{image_2}

[[[[ #### options #### ]]]]
{options}

[[[[ #### rationale #### ]]]]
{rationale}

[[[[ #### answer #### ]]]]
{answer}

[[[[ #### completed #### ]]]]

You will receive some input fields in each interaction. Respond only with the corresponding output fields, starting with the field `rationale`, then `answer`, and then ending with the marker for `completed`.

In adhering to this structure, your objective is: 
        Output a rationale and the answer to a multiple choice question about an image.'}, 
{'role': 'user', 'content': [{'type': 'image_url', 'image_url': {'url': 'data:image/jpeg;base64,XXX'}}, 
{'type': 'text', 'text': "[[[[ #### question #### ]]]]
Paper Submarine Manufacturing is investigating a lockbox system to reduce its collection time. It has determined the following:<image 1> The total collection time will be reduced by three days if the lockbox system is adopted.What is the net cash flow per check from adopting? 

[[[[ #### options #### ]]]]
['$.02', '$7.79', '$8.65']"}]}] 

Output

The net cash flow per check from adopting the lockbox system is the difference between the average value of a payment and the variable lockbox fee. The average value of a payment is $865 and the variable lockbox fee is $.50. Therefore, the net cash flow per check is $865 - $.50 = $8.65.```

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants