Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF16 and possibly other UTF encodings support #5

Closed
phiresky opened this issue Jun 16, 2019 · 2 comments
Closed

UTF16 and possibly other UTF encodings support #5

phiresky opened this issue Jun 16, 2019 · 2 comments
Labels
enhancement New feature or request

Comments

@phiresky
Copy link
Owner

is currently broken. At least UTF16LE with BOM should be auto detected and correctly parsed like in normal ripgrep. Caused by my own binary file detection

if fourk.contains(&0u8) {

Probably need to ask encoding_rs about encoding before checking for null bytes?

@phiresky phiresky changed the title UTF16 support UTF16 and possibly other UTF encodings support Jun 16, 2019
@phiresky
Copy link
Owner Author

phiresky commented Jun 16, 2019

Other encodings apart from UTF8 and UTF16 with BOM are not automatically detected by ripgrep either, though they can be parsed using --encoding=xyz. Maybe useful reference: BurntSushi/ripgrep#1

@phiresky
Copy link
Owner Author

Decoding should now be on parity with ripgrep: 29b8f1d

There is the chardet crate that should be able to detect more types of encodings, but I don't think I care enough about other encodings right now to integrate it.

Also, it's currently not possible to disable BOM parsing / transcoding, pending BurntSushi/ripgrep#1305

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant