Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In Structured Output, a JSON schema with a date string format will yield invalid JSON #392

Closed
2 of 4 tasks
oscarjohansson94 opened this issue Apr 5, 2024 · 3 comments
Closed
2 of 4 tasks
Assignees

Comments

@oscarjohansson94
Copy link

System Info

lorax 0.9.0, running with docker.

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

I am getting a invalid json when using Structured Output and sending a json schema that contains a string with a date format.

This might be because LoRAX structured output is using Outlines, and Outlines does not support all json formats (dottxt-ai/outlines#215). But I would still expect LoRAX to output a valid json object, or clearly document this behavior.

curl --request POST \
  --url <lorax_url>/generate \
  --data '{
  "inputs": "set Today to 20222",
  "parameters": {
    "response_format": {
          "type": "json_object",
          "schema": {
            "properties": {
              "today": {
                "format": "date",
                "title": "Today",
                "type": "string"
              }
            },
            "required": ["today"],
            "title": "Test",
            "type": "object"
          }
    }
  }
}

Will result in an output such as

{
  "generated_text": "{\n\n    \"today\": 2022-02-22\n}"
}

This json output is invalid since the date is not quoted.

If you remove the format from the schema the results is a valid json object:

curl --request POST \
  --url <lorax_url>/generate \
  --data '{
  "inputs": "set Today to 20222",
  "parameters": {
    "response_format": {
          "type": "json_object",
          "schema": {
            "properties": {
              "today": {
                "title": "Today",
                "type": "string"
              }
            },
            "required": ["today"],
            "title": "Test",
            "type": "object"
          }
    }
  }
}
{
  "generated_text": "{\n\n    \"today\": \"2022-02-22\"\n}"
}

However, I guess it would be possible to get the llm to output a string that is not in a valid date format in this case.

Expected behavior

I expect the structured output to be a valid json, following the provided json schema.

@prd-tuong-nguyen
Copy link

same problem

@oscarjohansson94
Copy link
Author

@jeffreyftang I see that you are assigned to this. Looks like this is solved by #567, and all that is needed is an upgrade to the latest version of outlines.

@tgaddair
Copy link
Contributor

Thanks @oscarjohansson94, we updated Outline to v0.0.40 in #447, which should address this issue. Closing for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants