--encoding auto #1103

roblourens · 2018-11-07T18:23:46Z

It's not really clear how the automatic encoding detection is supposed to work - which encodings should it be able to detect? Do you have any test cases that I can look at, or can you point to where the code is? It doesn't appear that encoding_rs is responsible for this, as best I can tell?

If I have a better idea of how it should work, I can file a better issue (or not file one)

BurntSushi · 2018-11-07T18:28:27Z

Yeah, the man page is pretty light on details here, but I think the guide explains it a bit better: https://github.com/BurntSushi/ripgrep/blob/master/GUIDE.md#file-encoding Although, the guide never mentions the auto value, instead, it just talks about what ripgrep does by "default."

In summary, by default, ripgrep looks for a BOM. If it sees a UTF-16 BOM, then it does UTF-16 to UTF-8 transcoding automatically and searches the UTF-8. I don't think anything else is done.

Folks have filed issues in the past about being more aggressive in encoding detection, but I'd rather not dive into those waters if possible. It is possible for one to use --pre and probably --pre-glob to implement one's own detection & transcoding though.

roblourens · 2018-11-07T18:35:01Z

Got it, yeah I think the man page implies that it might do more than it currently does.

BurntSushi added the doc An issue with or an improvement to documentation. label Nov 7, 2018

BurntSushi closed this as completed in 6d5dba8 Jan 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

--encoding auto #1103

--encoding auto #1103

roblourens commented Nov 7, 2018

BurntSushi commented Nov 7, 2018

roblourens commented Nov 7, 2018

--encoding auto #1103

--encoding auto #1103

Comments

roblourens commented Nov 7, 2018

BurntSushi commented Nov 7, 2018

roblourens commented Nov 7, 2018