-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to work with projects using non-UTF-8 coding systems? #135
Comments
I would not call this as anyone (eglot or ccls)'s fault. GBK encoded file would not work. You can suggest your employer use Unicode. Fortunately I think the encoding issue only applies to comments, and in rare occasions string literals. Just ignore them. |
It might be either eglot's or jsonrpc's library fault, I quickly looked at the sources and I couldn't find anything related to encoding conversion there. |
Does it stem from the spec that only supporting UTF-8?
I noticed that if the comment for functions ("docstring") contains gbk, eglot won't be able to prompt the completion candidates, so it hurts usability. |
Is it related to make-process ? I've tried changing it to |
With current implementations of both sides, the problem comes by:
I can think of two workarounds to this:
I tend to prefer the second one considering the spec, I've tried to convert the encoding to UTF-8 with |
For option 2, ccls uses a naive UTF-8 transcoding stuff to convert between clang byte-based SourceLocation/SourceRange and UTF-8 measured line/character. In practice, these clang based language servers (ccls, clangd, cquery) may read files from the file system (i.e. do not take file contents send by the language client as the single source) for indexing (and completion/diagnostics). A couple of months ago someone asked if it is reasonable to have native UTF-16 support on cfe-dev, the answer is basically that MS should fix their stuff. // ccls/src/working_files.cc
int GetOffsetForPosition(lsPosition pos, std::string_view content) {
size_t i = 0;
for (; pos.line > 0 && i < content.size(); i++)
if (content[i] == '\n')
pos.line--;
for (; pos.character > 0 && i < content.size() && content[i] != '\n';
pos.character--)
if (uint8_t(content[i++]) >= 128) {
// Skip 0b10xxxxxx
while (i < content.size() && uint8_t(content[i]) >= 128 &&
uint8_t(content[i]) < 192)
i++;
}
return int(i);
}
// and also in src/message_handler.cc src/messages/textDocument_formatting.cc The line number will be correct, but the character can be inaccurate for other encodings retaining the feature of low bytes being 1-byte characters. If you don't put indexable identities on a line after GBK, this should work fine: long RIP; // 金庸 This will break character measurement: int 紅顏彈指老, 剎那芳華; I think you only care about Chinese in comments... I guess |
Thanks for pointing out I may need some time to see whether my workaround fork works well or not.
There is no code like the second case above in our projects, so it should work fine I guess. |
@whatacold can you check if using
|
Thanks @whatacold, what LSP server are you using (name and version). |
I use clangd with version:
|
looks like clangd version 6. If you switch to version 7 or 8 you should be OK, I think. @mkcms? |
It seems that if we want to support non utf-8 coding systems, we need to
I tried to work around it there, but it seems not that elegant, so I quit. And I also have managed to convert the json string to UTF-8 in ccls side, At last, I came up with a workaround with a thin translator written in python, |
Is it related to clangd version? I'll try to see if I could get clangd 7/8 to verify this. ---UPDATE---
but it seems clangd doesn't include the comment for the function under point:
|
I'm sorry perhaps I misread the whole discussion, but at some point it was related to misreported character/column positions. Text communication between client and LSP server as specified by the standard (Base Protocol, Content Part) is
completion? I gave you a solution for incorrect column reporting. Everything else would be a problem with the server. |
I'm sorry that I haven't make the issue clear.
Totally understand that, so I gonna close this.
I'll verify this if I have some time, and report back here. |
Hi,
At work I have some cpp projects using
chinese-gkb
to encode source files,eglot
together withccls
errors to work when doing completion:And if I change the coding to
utf-8
temporally, it works fine.Is it a bug for
eglot
orccls
? The specsays:
Software info:
ccls
as LSThe text was updated successfully, but these errors were encountered: