Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: Add encoding config to json for load_prompt() #26935

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

zakki
Copy link

@zakki zakki commented Sep 27, 2024

  • Description:

This PR lets you specify file encoding of "*_path" in the config.
If an encoding is set in the config, it will be used when opening the file.
Otherwise, the default encoding will be applied.

The default encoding in Python differs depending on the environment.
On Windows, it is the ANSI code page (ex: "cp932").

Copy link

vercel bot commented Sep 27, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Visit Preview Sep 27, 2024 9:30am

@dosubot dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. Ɑ: core Related to langchain-core labels Sep 27, 2024
@zakki
Copy link
Author

zakki commented Oct 2, 2024

@hwchase17 @efriis Could you please review this?
This patch addresses an issue similar to #21559.
While #21559 fixes json encoding, this patch focuses on the encoding of text files as specified within a prompt json.

@eyurtsev
Copy link
Collaborator

eyurtsev commented Oct 8, 2024

@zakki i don't think that this we should be supporting multiple encodings. I'd rather just specify the encoding explicitly to be utf-8 and remove all the OS dependence

Copy link
Collaborator

@eyurtsev eyurtsev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should hard-code the encoding as utf-8 in open I think.

Just need to figure out what to do this as it could be a breaking change to folks on windows

@zakki
Copy link
Author

zakki commented Oct 9, 2024

I don't think hard-coding UTF-8 is a bad idea, and in the long term, the problem will be solved by Python 3.15 with PEP-686.
If it's my personal execution environment, I can also handle it with environment variables PYTHONUTF8=1.

However, in the short term, the problem is the environment dependency when distributing applications.

@eyurtsev
Copy link
Collaborator

eyurtsev commented Oct 9, 2024

@zakki also I wanted to confirm the current PR only handles the decoding path? What code path takes care of the encoding path so users on windows that may use cp1252 or some other encoding end up creating appropriate json files?

However, in the short term, the problem is the environment dependency when distributing applications.

Sorry could you paraphrase, are you arguing for your solution vs. the hard-coding UTF-8 everywhere?

@zakki
Copy link
Author

zakki commented Oct 15, 2024

@zakki also I wanted to confirm the current PR only handles the decoding path? What code path takes care of the encoding path so users on windows that may use cp1252 or some other encoding end up creating appropriate json files?

For my use case, I create the output file in my own python code, not within langchain framework code.

Considering completeness for other use cases, it may be necessary to explicitly specify the encoding of the output file.
Perhaps this creates cp1252 json on windows for example.

https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/callbacks/sagemaker_callback.py#L24

Sorry could you paraphrase, are you arguing for your solution vs. the hard-coding UTF-8 everywhere?

I think it would be reasonable to enforce UTF-8 as a breaking change in version 0.4.x.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ɑ: core Related to langchain-core size:M This PR changes 30-99 lines, ignoring generated files.
Projects
Status: In review
Development

Successfully merging this pull request may close these issues.

2 participants