Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: handle utf8 characters in the dune files(#9728) #10113

Merged
merged 5 commits into from
Mar 20, 2024

Conversation

moyodiallo
Copy link
Collaborator

@moyodiallo moyodiallo commented Feb 22, 2024

This PR is about principally handling utf8 characters in quoted strings.

@moyodiallo moyodiallo force-pushed the allow-utf8-characters branch from 920470b to 4f73860 Compare February 22, 2024 13:04
@rgrinberg
Copy link
Member

Why do we need to copy paste code from the stdlib?

@moyodiallo
Copy link
Collaborator Author

Why do we need to copy paste code from the stdlib?

No need, we could write our own functions. But I did this because I used functions introduced in OCaml.4.14 (later version).

@rgrinberg
Copy link
Member

rgrinberg commented Feb 22, 2024 via email

@moyodiallo
Copy link
Collaborator Author

I've just tested uutf, I'm going to use vendor/uutf present in Dune project. That is sufficient for this PR.

@moyodiallo
Copy link
Collaborator Author

Indeed uutf use Uchar which is present in OCaml 4.08.

@moyodiallo moyodiallo force-pushed the allow-utf8-characters branch from 4f73860 to 0907f5f Compare February 24, 2024 12:27
@moyodiallo
Copy link
Collaborator Author

The copy paste is removed and uutf is used instead.

@moyodiallo moyodiallo force-pushed the allow-utf8-characters branch from 0907f5f to 2b45467 Compare February 24, 2024 12:33
@rgrinberg
Copy link
Member

It's a bit sad, but this PR has a non trivial increase in pressure on the GC. Since this fix only affects a tiny fraction of users, is it possible to implement this in a less intrusive way?

@moyodiallo
Copy link
Collaborator Author

Yes it is possible. I'll try to fix it.

@emillon
Copy link
Collaborator

emillon commented Mar 11, 2024

I don't think that the lexer changes are useful for this change. Just changing Escape will fix the formatter, which was the original issue. The parsing error for %{...} will continue refering to individual bytes, but I think that's fine compared to changing the lexer.

src/dune_sexp/escape.ml Outdated Show resolved Hide resolved
@moyodiallo
Copy link
Collaborator Author

I don't think that the lexer changes are useful for this change. Just changing Escape will fix the formatter, which was the original issue. The parsing error for %{...} will continue refering to individual bytes, but I think that's fine compared to changing the lexer.

I get it but it won't be coherent, accepting utf8 char and printing escaped characters.

@moyodiallo moyodiallo force-pushed the allow-utf8-characters branch 4 times, most recently from f255718 to 509b6f5 Compare March 15, 2024 09:12
@moyodiallo
Copy link
Collaborator Author

It's a bit sad, but this PR has a non trivial increase in pressure on the GC. Since this fix only affects a tiny fraction of users, is it possible to implement this in a less intrusive way?

It's fixed.

@moyodiallo moyodiallo requested a review from emillon March 15, 2024 09:41
Copy link
Collaborator

@emillon emillon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will also require a changelog entry

src/dune_sexp/escape.mli Outdated Show resolved Hide resolved
src/dune_sexp/lexer.mll Outdated Show resolved Hide resolved
@moyodiallo moyodiallo force-pushed the allow-utf8-characters branch from 509b6f5 to 1d34180 Compare March 15, 2024 17:11
@moyodiallo moyodiallo requested a review from emillon March 15, 2024 17:12
@moyodiallo
Copy link
Collaborator Author

this will also require a changelog entry

The changelog entry added.

src/dune_sexp/escape.ml Outdated Show resolved Hide resolved
@emillon
Copy link
Collaborator

emillon commented Mar 19, 2024

Thanks. I'll adapt the API so that it's more in line with the stdlib API and thus easier to remove later.

@moyodiallo moyodiallo force-pushed the allow-utf8-characters branch from 1d34180 to b52cef4 Compare March 19, 2024 10:46
@moyodiallo moyodiallo requested a review from rgrinberg March 19, 2024 10:47
@moyodiallo moyodiallo force-pushed the allow-utf8-characters branch from b52cef4 to 8cb27fc Compare March 19, 2024 10:49
Signed-off-by: Alpha DIALLO <moyodiallo@gmail.com>
Signed-off-by: Etienne Millon <me@emillon.org>
moyodiallo and others added 3 commits March 19, 2024 13:10
Signed-off-by: Alpha DIALLO <moyodiallo@gmail.com>
Signed-off-by: Etienne Millon <me@emillon.org>
Also leaving a note about using the stdlib later.

Signed-off-by: Etienne Millon <me@emillon.org>
Signed-off-by: Etienne Millon <me@emillon.org>
@emillon emillon force-pushed the allow-utf8-characters branch from 8cb27fc to f413a42 Compare March 19, 2024 12:10
@emillon
Copy link
Collaborator

emillon commented Mar 19, 2024

I tried to use the stdlib way, but doing so and adding polyfills for 4.08 meant that stdune would depend on uutf. So instead I adapted the code to use uutf decoder (that work on 4.08) in dune_sexp with a note on how to migrate to stdlib when we bump the compat.

@emillon
Copy link
Collaborator

emillon commented Mar 19, 2024

@rgrinberg is the library structure (dune_sexp depending on dune_uutf) sensible?

Copy link
Member

@rgrinberg rgrinberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine to me. Make sure to double check the benchmarks before merging.

@emillon
Copy link
Collaborator

emillon commented Mar 20, 2024

Benchmarks look good so I'm merging this, thanks.

@emillon emillon merged commit 171c231 into ocaml:main Mar 20, 2024
24 of 27 checks passed
@moyodiallo moyodiallo deleted the allow-utf8-characters branch March 20, 2024 09:46
@moyodiallo moyodiallo restored the allow-utf8-characters branch March 20, 2024 09:46
@moyodiallo moyodiallo deleted the allow-utf8-characters branch March 20, 2024 09:47
@rgrinberg rgrinberg added this to the 3.15.0 milestone Mar 20, 2024
Leonidas-from-XIV added a commit to Leonidas-from-XIV/opam-repository that referenced this pull request Mar 26, 2024
CHANGES:

### Added

- Add link flags to to `ocamlmklib` for ctypes stubs (ocaml/dune#8784, @frejsoya)

- Remove some unnecessary limitations in the expansions of percent forms in
  install stanza. For example, the `%{env:..}` form can be used to select files
  to be installed. (ocaml/dune#10160, @rgrinberg)

- Allow artifact expansion percent forms (`%{cma:..}`, `%{cmo:..}`, etc.) in
  more contexts. Previously, they would be randomly forbidden in some fields.
  (ocaml/dune#10169, @rgrinberg)

- Allow `%{inline_tests}` in more contexts (ocaml/dune#10191, @rgrinberg)

- Remove limitations on percent forms in the `(enabled_if ..)` field of
  libraries (ocaml/dune#10250, @rgrinberg)

- Support dialects in `dune describe pp` (ocaml/dune#10283, @emillon)

- Allow defining executables or melange emit stanzas with the same name in the
  same folder under different contexts. (ocaml/dune#10220, @rgrinberg, @jchavarri)

### Fixed

- coq: Delay Coq rule setup checks so OCaml-only packages can build in hybrid
  Coq/OCaml projects when `coqc` is not present. Thanks to @vzaliva for the
  test case and report (ocaml/dune#9845, fixes ocaml/dune#9818, @rgrinberg, @ejgallego)

- Fix conditional source selection with `select` on `bigarray` in OCaml 5
  (ocaml/dune#10011, @moyodiallo)

- melange: fix inconsistency in virtual library implementation. Concrete
  modules within a virtual library can now refer to its virtual modules too
  (ocaml/dune#10051, fixes ocaml/dune#7104, @anmonteiro)

- melange: fix a bug that would cause stale `import` paths to be emitted when
  moving source files within `(include_subdirs ..)` (ocaml/dune#10286, fixes ocaml/dune#9190,
  @anmonteiro)

- Dune file formatting: output utf8 if input is correctly encoded (ocaml/dune#10113,
  fixes ocaml/dune#9728, @moyodiallo)

- Fix expanding dependencies and locks specified in the cram stanza.
  Previously, they would be installed in the context of the cram test, rather
  than the cram stanza itself (ocaml/dune#10165, @rgrinberg)

- Fix bug with `dune exec --watch` where the working directory would always be
  set to the project root rather than the directory where the command was run
  (ocaml/dune#10262, @gridbugs)

- Regression fix: sign executables that are promoted into the source tree
  (ocaml/dune#10263, fixes ocaml/dune#9272, @emillon)

- Fix crash when decoding dune-package for libraries with `(include_subdirs
  qualified)` (ocaml/dune#10269, fixes ocaml/dune#10264, @emillon)

### Changed

- Remove the `--react-to-insignificant-changes` option. (ocaml/dune#10083, @rgrinberg)
Leonidas-from-XIV added a commit to Leonidas-from-XIV/opam-repository that referenced this pull request Apr 3, 2024
CHANGES:

### Added

- Add link flags to to `ocamlmklib` for ctypes stubs (ocaml/dune#8784, @frejsoya)

- Remove some unnecessary limitations in the expansions of percent forms in
  install stanza. For example, the `%{env:..}` form can be used to select files
  to be installed. (ocaml/dune#10160, @rgrinberg)

- Allow artifact expansion percent forms (`%{cma:..}`, `%{cmo:..}`, etc.) in
  more contexts. Previously, they would be randomly forbidden in some fields.
  (ocaml/dune#10169, @rgrinberg)

- Allow `%{inline_tests}` in more contexts (ocaml/dune#10191, @rgrinberg)

- Remove limitations on percent forms in the `(enabled_if ..)` field of
  libraries (ocaml/dune#10250, @rgrinberg)

- Support dialects in `dune describe pp` (ocaml/dune#10283, @emillon)

- Allow defining executables or melange emit stanzas with the same name in the
  same folder under different contexts. (ocaml/dune#10220, @rgrinberg, @jchavarri)

### Fixed

- coq: Delay Coq rule setup checks so OCaml-only packages can build in hybrid
  Coq/OCaml projects when `coqc` is not present. Thanks to @vzaliva for the
  test case and report (ocaml/dune#9845, fixes ocaml/dune#9818, @rgrinberg, @ejgallego)

- Fix conditional source selection with `select` on `bigarray` in OCaml 5
  (ocaml/dune#10011, @moyodiallo)

- melange: fix inconsistency in virtual library implementation. Concrete
  modules within a virtual library can now refer to its virtual modules too
  (ocaml/dune#10051, fixes ocaml/dune#7104, @anmonteiro)

- melange: fix a bug that would cause stale `import` paths to be emitted when
  moving source files within `(include_subdirs ..)` (ocaml/dune#10286, fixes ocaml/dune#9190,
  @anmonteiro)

- Dune file formatting: output utf8 if input is correctly encoded (ocaml/dune#10113,
  fixes ocaml/dune#9728, @moyodiallo)

- Fix expanding dependencies and locks specified in the cram stanza.
  Previously, they would be installed in the context of the cram test, rather
  than the cram stanza itself (ocaml/dune#10165, @rgrinberg)

- Fix bug with `dune exec --watch` where the working directory would always be
  set to the project root rather than the directory where the command was run
  (ocaml/dune#10262, @gridbugs)

- Regression fix: sign executables that are promoted into the source tree
  (ocaml/dune#10263, fixes ocaml/dune#9272, @emillon)

- Fix crash when decoding dune-package for libraries with `(include_subdirs
  qualified)` (ocaml/dune#10269, fixes ocaml/dune#10264, @emillon)

### Changed

- Remove the `--react-to-insignificant-changes` option. (ocaml/dune#10083, @rgrinberg)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants