Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed sonnet json formatting issue #293

Merged
merged 2 commits into from
Jun 25, 2024
Merged

Fixed sonnet json formatting issue #293

merged 2 commits into from
Jun 25, 2024

Conversation

whitead
Copy link
Collaborator

@whitead whitead commented Jun 25, 2024

Fixes #292

@whitead whitead requested a review from jamesbraza June 25, 2024 15:33
Copy link
Collaborator

@jamesbraza jamesbraza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, added some comments

return match.group(0).replace("\n", "\\n")

pattern = r'"(?:[^"\\]|\\.)*"'
text = re.sub(pattern, replace_newlines, text)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL repl arg can be a function

def replace_newlines(match):
return match.group(0).replace("\n", "\\n")

pattern = r'"(?:[^"\\]|\\.)*"'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bonus points for regex 101 link

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added!


# escape new lines within strings
def replace_newlines(match):
return match.group(0).replace("\n", "\\n")
Copy link
Collaborator

@jamesbraza jamesbraza Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a latent NoneType error here? Can we type hint match with re.Match, and the return as str

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a type - I don't think sub will call on None matches

@@ -184,4 +184,12 @@ def llm_read_json(text: str) -> dict:
text = "{" + text.split("{", 1)[-1]
# split anything after the last }
text = text.rsplit("}", 1)[0] + "}"

# escape new lines within strings
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of re.escape?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't see the connection

tests/test_paperqa.py Show resolved Hide resolved
@whitead whitead merged commit f16240a into main Jun 25, 2024
1 check passed
@whitead whitead deleted the json-sonnet-issues branch June 25, 2024 20:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parsing JSON with newlines
2 participants