-
Notifications
You must be signed in to change notification settings - Fork 691
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix corrupted config file header for non-ASCII package names #5804
Conversation
A test-case would be nice to have but I ran out of time over the weekend @edsko can you verify whether this also fixes your issue? |
@hvr I just checked. It does not. The first time I build the package it works, but if I then make a change and recompile again I get
|
@edsko would it be possible for you to create a small (or not so small -- whatever's easier) repro-case? |
Sure, it's not hard. Create a new project that has some unicode in the name, compile, then recompile. The attached will do the trick. Compile, make a modification to Main.hs (insert blank line), recompile. |
@edsko I see the problem now... yet another place where we're using the wrong char encoding... fix on the way... |
@edsko I added a fix for your observed problem; you should be good as long as you don't enable the standard (if you need CPP, you're gonna require a custom |
I tried to confirm, but couldn't. I probably did something wrong. No matter how much I removed, and rebuilt, problem still persists. So I am at commit |
@edsko here's how it behaves for me with a
|
Any suggestions for what I should try? |
@edsko what OS are you on, and what's your current locale (e.g. I used |
|
@edsko look into the |
Sorry, I'm being stupid. I had copied the recompiled binary to the wrong place, rather than overriding the existing one 🤦♂️ I can confirm it works :) |
@edsko in any case, I think we need a section in the user's guide to document all gotchas re Unicode support (issues with standard CPP processors, requirements for filesystem and locale, etc). But otherwise I think this PR makes unicode support in package names usable -- unless you run into yet other issues :-). |
Yup, agreed on both counts. Nice work :) Next up: fix syntax highlighting in my editor, which gets equally confused by unicode :D |
The config-state header is a human readable line prepended to the binary serialisation which looks like Saved package config for pkgname-1.2.3 written by Cabal-2.5.0.0 using ghc-8.6 However, the functions generating and parsing this header didn't take into account that package names are not limited to the ASCII subset and blindly used the ByteString `pack` function which truncates away the high bits of the `Char` code point resulting in a corrupted header with a non-sensical package-name. The fix is simply to serialise the package-name with the UTF-8 encoding which works nicely with the rest of the UTF-8 unaware string handling functions. Hence the fix is a lot shorter than this commit message. Fixes haskell#2557
…encoding This takes care of knock-off effects of haskell#2557 Specifically, the `Paths_*.hs` and `cabal_macros.h` files would result being incorrectly by a `rewriteFileEx` which isn't UTF-8 capable. Now the `cabal_macros.h` file is written out exactly like the `.h` file generated internally by `ghc` is generated; note however that standard CPP doesn't support non-ASCII characters in CPP symbols and will thus not work with a standard CPP preprocessor.
The config-state header is a human readable line prepended to the
binary serialisation which looks like
However, the functions generating and parsing this header didn't take into
account that package names are not limited to the ASCII subset and blindly used
the ByteString
pack
function which truncates away the high bits of theChar
code point resulting in a corrupted header with a non-sensical package-name.
The fix is simply to serialise the package-name with the UTF-8 encoding which
works nicely with the rest of the UTF-8 unaware string handling functions.
Hence the fix is a lot shorter than this commit message.
Fixes #2557