Add Uchar module to the standard library. #80

dbuenzli · 2014-07-09T19:01:28Z

As I already made clear in previous discussions on the caml-list, I find that OCaml's current support for Unicode is outstanding (au propre comme au figuré).

I don't think introducing a Unicode string data structure and a corresponding syntax for literals would be a good thing do to. Since, if one wanted to that in a correct and useful way, it would entail importing a good deal of the Unicode processing machinery (e.g. normalization) in the compiler and I really think it's better to leave that outside the compiler. Unicode processing can perfectly be left to a set of modularized, external libraries. I also think it's actually a good idea to proceed that way as libraries are in a better position to evolve with the standard (e.g. newly encoded characters on Unicode standard updates may imply changes to normalisation results and would entail updates to the compiler).

There is however one thing that I really find missing to get utterly excellent Unicode support in OCaml: an abstract datatype, in the standard library, to represent an Unicode scalar value (by abusing terminology: an Unicode character). An Unicode scalar value is simply an integer in the ranges 0x0000…0xD7FF or 0xE000…0x10FFFF.

Such a data type would allow independent libraries dealing with unicode characters (e.g. ulex, camomile, uutf, uunf, uucp, uucd) to interchange data without relying on ints and as such strengthen the abstractions and guarantees a bit; avoid documentation warnings blabla that the given ints need to be in the above range, avoid needless (re)checks if data flows among modules, well you get the idea, the basic advantages of data abstraction...

This proposal simply adds such a minimal data type along with a few functions which by themselves don't do much except integrating with the standard library; doing real Unicode processing is left to external libraries, as it should be.

One question is whether a Pervasives.uchar type equal to Uchar.t should be introduced (not part of this proposal). I don't think it's essential, it could be a nice touch though.

Chris00 · 2014-07-10T09:19:49Z

stdlib/uchar.mli

+
+    @since 4.03 *) 
+
+type t


Should it be abstract or a private int?

Good question. I personally have no problem in doing

match Uchar.to_int u with | 0x000A -> ... ...

I don't know what's the stance of the dev team about using so called "Language extensions" in the stdlib.

You would still need to write Uchar.to_int u or write a coercion if t was defined as private int. Having t = private int is more an optimization: if the compiler knows that Uchar.t is always represented by an immediate value, the code generator can skip calls to caml_modify and/or float array checks.

c-cube · 2014-07-10T10:07:43Z

I like this idea of only adding standard types in the compiler library. It makes interoperability much easier and still doesn't require Inria people to support and maintain such complicated things as comprehensive unicode support... I don't see any drawback to this PR.

whitequark · 2014-07-10T18:45:57Z

I think this is an excellent idea!

bobot · 2014-07-13T09:02:12Z

stdlib/uchar.mli

+(** [equal u u'] is [u = u']. *)
+
+val compare : t -> t -> int 
+(** [compare u u'] is [Pervasives.compare u u']. *)


Could you add a hash function? Just an alias for to_int, but it is useful for application with Hashtbl.Make.

Right, added a hash function.

bobot · 2014-07-14T10:32:40Z

I agree it is a nice idea to add the abstract datatype in the standard library, and only that. What is the opinion of other unicode ocaml library makers? @yoriyuki @alainfrisch

yoriyuki · 2014-07-14T11:57:09Z

I do not see the point to add Uchar module without standard Unicode string data type and literals. They are needed for the precisely same reason to Uchar, interoperability between Unicode processing libraries. We do not need normalization etc. inside the stdlib.

To that said, adding Uchar is a good step toward more satisfactory Unicode support in OCaml. I have only minor comments.

Code points like 0xFFFF are also non-character. Should we raise the error or not?
Should we mark the function which raises the exception by, say _exn? (I know it is a controversial point)

dbuenzli · 2014-07-14T13:20:00Z

Le lundi, 14 juillet 2014 à 12:57, Yoriyuki Yamagata a écrit :

I do not see the point to add Uchar module without standard Unicode string data type and literals. They are needed for the precisely same reason to Uchar, interoperability between Unicode processing libraries. We do not need normalization etc. inside the stdlib.

I disagree with that, if you introduce an Unicode string data type and literals, then you most likely also want pattern matching on them. And if you want pattern matching on them you need to take normalization into account, in particular you want to be able to specify in which normalisation form your literal is supposed to be, otherwise it is useless, deceiving and could even be the source of a new class of potential security bugs. Formal unicode string literals without normalisation would be irresponsible IMHO.

It is currently perfectly possible to write unnormalized UTF-8 literals in OCaml which is entirely sufficient for many programs out there and a function away to translate into the representation of your particular library at the cost of a negligible initial runtime cost. Introducing the Uchar module greatly enhance the possibility of modular implementations of Unicode and allow for exemple ulex to talk to uunf with strong invariants guaranteed by the abstraction.

Code points like 0xFFFF are also non-character. Should we raise the error or not?
I would say no. For the following reasons (reference are to the pdf of Unicode 6.2):

Applications are allowed to use non-characters internally (D12 p. 68 Coded character sequence, bullet 2+3). Also on page 24. we have:

"Noncharacter code points are reserved for internal use, such as for sentinel values. They should never be interchanged. They do, however, have well-formed representations in Unicode encoding forms and survive conversions between encoding forms. This allows sentinel values to be preserved internally across Unicode encoding forms, even though they are not designed to be used in open interchange."

Applications should not interchange (serialize to UTF-X) non-characters (D14 p. 68) but stricto sensu these code points may happen in interchange as they do not produce invalid sequences of bytes: UTF-X are explicitely defined as a map from scalar values to byte code units (see 3.9 p. 89., D79 p. 90), non-characters are part of scalar values. More specifically on p. 560 we have:

"Applications are free to use any of these noncharacter code points internally but should never attempt to exchange them. If a noncharacter is received in open interchange, an application is not required to interpret it in any way. It is good practice, however, to recognize it as a noncharacter and to take appropriate action, such as replacing it with U+FFFD replacement character, to indicate the problem in the text. It is not recommended to simply delete noncharacter code points from such text, because of the potential security issues caused by deleting uninterpreted characters."

As such it's better if we have a way to represent these characters since UTF-X decoders can then pass them to the application which is then free to take the appropriate context dependent action.

Should we mark the function which raises the exception by, say _exn? (I know it is a controversial point)
I would say no. a) that's for people who like Hungarian notation b) Remember that Invalid_argument means programming error, you are not supposed to behave in a way that raises this exception and should not try to catch it, except at the toplevel as part of a general recovering procedure, see [1].

Best,

Daniel

[1] https://sympa.inria.fr/sympa/arc/caml-list/2007-10/msg00475.html

yoriyuki · 2014-07-14T15:08:39Z

For the latter two points, I now concur. I am not against to merging your patch.

For the first point,

I disagree with that, if you introduce an Unicode string data type and literals, then you most likely also want
pattern matching on them. And if you want pattern matching on them you need to take normalization into
account, in particular you want to be able to specify in which normalisation form your literal is supposed to
be, otherwise it is useless, deceiving and could even be the source of a new class of potential security bugs.
Formal unicode string literals without normalisation would be irresponsible IMHO.

If you mean that comparison and pattern matching should be always respect to canonical equivalence, and all string literals should be in normal forms, then I disagree with you. Code-point comparison has a place, like comparison which is used in binary trees, say, OCaml's Set. String literals in non-normalized form have also in place, for example, passing strings to legacy encodings. Unicode security is complex issue. Leave it to the programmer and we should satisfy that the necessary tools are provided by the compiler and libraries.

It is currently perfectly possible to write unnormalized UTF-8 literals in OCaml which is entirely sufficient for
many programs out there and a function away to translate into the representation of your particular library
at the cost of a negligible initial runtime cost.

Using the raw byte string which is encoded by UTF-8, as an alternative to proper Unicode string, is a troubling tendency. UTF-8 encoding can be broken, and creates serious security issues. It is much worse than your normalization apocalypse.

But, this topic (whether we need a standard Unicode string or not) is not related to your patch. If you want to continue the discussion, let us move to caml-list,

dbuenzli · 2014-07-14T15:39:57Z

Le lundi, 14 juillet 2014 à 16:08, Yoriyuki Yamagata a écrit :

If you mean that comparison and pattern matching should be always respect to canonical equivalence,

That's exactly not what I said. First I never talked about comparison at all, pattern matching is about equality and what I was precisely suggesting is that the equality you'd like (i.e. the underlying unicode equivalence) depends on context, which is why literals should be able to indicate the normal form you want them to be in, in order to be useful in pattern matching. You could say we want the literal notation without the pattern matching but that would feel odd as this would mismatch all other literal notations we have in the language.

Code-point comparison has a place, like comparison which is used in binary trees, say, OCaml's Set.
Again, never talked about comparison here. Pay attention to the words I use.

Unicode security is complex issue. Leave it to the programmer and we should satisfy that the necessary tools are provided by the compiler and libraries.

That's precisely the aim of this proposal.

Using the raw byte string which is encoded by UTF-8, as an alternative to proper Unicode string, is a troubling tendency. UTF-8 encoding can be broken, and creates serious security issues.

I don't think so, you are not supposed and can't use them blindly: if you do any processing with them you must have them go through some validating function (which will detect malformed sequences) if only to be able to normalize them so that you can match them against normalized user provided input.

Best,

Daniel

yoriyuki · 2014-07-14T16:48:54Z

That's exactly not what I said. First I never talked about comparison at all, pattern matching is about
equality and what I was precisely suggesting is that the equality you'd like (i.e. the underlying unicode
equivalence) depends on context, which is why literals should be able to indicate the normal form you want
them to be in, in order to be useful in pattern matching. You could say we want the literal notation without the
pattern matching but that would feel odd as this would mismatch all other literal notations we have in the
language.

Comparison has a broader meaning, which includes equality test, I think. Although my example of Set is using comparison in narrow sense, there is a plenty of the case which code-point equality test are used. (say, hash table)

As for pattern matching, code-point comparison is enough. If you need canonical equivalence or others, you can preprocess the input and making a normal form for literals by hand or use when clauses.

I don't think so, you are not supposed and can't use them blindly: if you do any processing with them you must have them go through some validating function (which will detect malformed sequences) if only to be able to normalize them so that you can match them against normalized user provided input.

Of course we must have them validated, but there is no guarantee whether such validation is performed from the type system. Having abstract Unicode string enforces validation, and increases safety.

dbuenzli · 2014-07-14T18:00:54Z

Le lundi, 14 juillet 2014 à 17:48, Yoriyuki Yamagata a écrit :

Comparison has a broader meaning, which includes equality test, I think. Although my example of Set is using comparison in narrow sense, there is a plenty of the case which code-point equality test are used. (say, hash table)

I think you are making this discussion more confusing than it should be. Binary comparison which includes binary equality has its uses, especially when you have normalized your inputs including your string literals and you actually know in which normal form they are.

As for pattern matching, code-point comparison is enough. If you need canonical equivalence or others, you can preprocess the input and making a normal form for literals by hand or use when clauses.

Well it's enough if you want people to write broken Unicode programs. Making a normal form by hand is certainly painful and when clauses are impossible: you need to normalize the literal constant of the pattern, otherwise you are just acting on variables which you can already perfectly do right now:

let ustr nf s = (* function that validates the UTF-8 encoded s and normalizes to nf *)
let cst = ustr `NFD "Éole"

match ustr `NFD x with
| x when x = cst -> ...

Overall I think that unicode string literals without pattern matching and normalization is just a waste of time for everybody.

Daniel

yoriyuki · 2014-07-14T18:34:19Z

I think you miss my points.

I think you are making this discussion more confusing than it should be. Binary comparison which includes binary equality has its uses, especially when you have normalized your inputs including your string literals and you actually know in which normal form they are.

My point here is that, there are cases that binary comparison and equality is enough or even necessary without normalization.

First examples of such kinds are data-structures which only requires consistent equality or ordering over Unicode string. The second example is to interact the legacy encoding, which, say, distinguishes Ω (unit) and Greek Ω.

Well it's enough if you want people to write broken Unicode programs. Making a normal form by hand is certainly painful and when clauses are impossible: you need to normalize the literal constant of the pattern, otherwise you are just acting on variables which you can already perfectly do right now:

let ustr nf s = (* function that validates the UTF-8 encoded s and normalizes to nf *)
let cst = ustr `NFD "Éole"

match ustr `NFD x with
| x when x = cst -> ...

Overall I think that unicode string literals without pattern matching and normalization is just a waste of time for everybody.

Again, you miss my point. My point is that, by introducing abstract Unicode string type, we can enforce that the internal representation of Unicode string (say, UTF-8) is valid by type system. We need string literal for just a convenience to write down such abstract data type. We do not need pattern matching for this purpose.

Beside, if you use UTF-8 encoded byte string to represent Unicode string, a.[0], a.[1]... are bytes of UTF-8 encoded string, not first and second Unicode characters. I think it is conceptually ugly.

dbuenzli · 2014-07-14T21:14:39Z

My point here is that, there are cases that binary comparison and equality is enough or even necessary without normalization.

They are certainly not the average case, there may be a few specific cases or some data sets may give you the illusion that this is the case, until you fall on a damned decomposed é. Even if you want to deal with something "relatively simple" like latin1 characters it's not going to be enough, better not lure programmers in fallacies; it seems they have already enough hard time understanding all of this. I think you miss both the social and technical point here.

Again, you miss my point. My point is that, by introducing abstract Unicode string type, we can enforce that the internal representation of Unicode string (say, UTF-8) is valid by type system.

I perfectly get that point: it has the same basis as this very proposal on which we agree. Sure it would be useful. But then it's much more contentious, for example I expect there will already be disagreement over the actual internal representation (e.g. I would make them immutable arrays of ints, not UTF-8 encoded strings), over what the minimal support should be (as we have right at the moment). Then if you want to introduce literals you will need to hook an UTF-8 decoder in the compiler then you will need to find an actual syntax in the very crowded surface syntax of OCaml, and this for not much gain in my opinion, that is unless we get pattern matching and normalization, which, unlike what you suggest is a basic need in most cases to perform correct unicode processing. I prefer nothing than broken things that will confuse everyone. I prefer small things that improve my coding life than nothing because the change was too invasive.

We need string literal for just a convenience to write down such abstract data type. We do not need pattern matching for this purpose.

I don't like the idea of having literals on which you cannot pattern match. This is conceptually ugly.

Beside, if you use UTF-8 encoded byte string to represent Unicode string, a.[0], a.[1]... are bytes of UTF-8 encoded string, not first and second Unicode characters.

As I already said on the caml-list indexing Unicode characters is worthless in general. From an abstract character point of view, for layout purposes, etc. direct indexing doesn't bring you anything, so I don't really care about that and in real programs it has never been a problem for me not to have direct indexing. The UTF-8 encoded sources files/strings may not be a perfect solution but it works well enough in real programs. Having that as a basis we can move to consolidate it, step by step.

I think it is conceptually ugly.

It's not a concept ! I was not made for that… It's a way to move forward. Progress is made in small steps. I'm already glad we don't have the conceptual mess other languages have with their Unicode support. Again, rather have nothing than broken things. The actual literal notation you'd like is a function call away, from a pragmatic point of view I'd say it is not at the moment (if ever) worth pursuing the idea (that is unless the dev. team is willing to commit to some form of useful unicode string support in the compiler).

chambart · 2014-11-04T10:49:43Z

stdlib/uchar.mli

+(** [to_int u] is [u] as an integer. *)
+
+val is_char : t -> bool 
+(** [is_char u] is [true] iff [u] is a latin1 OCaml character. *) 


It was suggested that this function should be named is_valid because we don't want to encourage to open this module and Uchar.is_char is ugly

I don't see the connection to opening the modules. Why not another name but Uchar.is_valid wouldn't make sense at all, we are talking about a function that checks whether [u] can be represented by char. Maybe is_latin1 ? That would makes it less consistent with Uchar.of_char and Uchar.to_char but why not. What do you think ?

I think the question was rather on is_uchar.

Ah ! Makes more sense. Ok'll rename it.

Oups sorry for the misleading typo...

mshinwell · 2014-11-04T10:50:31Z

Daniel, in your first comment, you put in emphasis "in the standard library". Can you provide some more justification for that? (In particular, with the advent of OPAM simplifying the writing of new libraries, could we put this in a "base Unicode" library that the other Unicode libraries all depend on?)

dbuenzli · 2014-11-04T11:38:31Z

Le mardi, 4 novembre 2014 à 11:50, Mark Shinwell a écrit :

Daniel, in your first comment, you put in emphasis "in the standard library". Can you provide some more justification for that? (In particular, with the advent of OPAM simplifying the writing of new libraries, could we put this in a "base Unicode" library that the other Unicode libraries all depend on?)

We could of course publish this module separately but it would be a real maintenance burden (not code-wise, infrastructure-wise) for such small functionality — 31 loc which are basically cast in stone. In the end every program using some form of unicode character (and which don't these days ?) would end up with this tiny package in their dependency list and the only benefit would be, in my opinion, to introduce noise in the whole infrastructure; e.g. if you take uutf, uucp, ulex or camomile they don't have any dependencies at the moment. Having it in the standard library is also a better way of enforcing use of that representation for such a fundamental type.

Best,

Daniel

dbuenzli · 2014-11-04T12:12:58Z

Renamed Uchar.is_uchar to Uchar.is_valid.

dbuenzli · 2014-12-06T13:43:31Z

Removed UTF-8 comment as per request.

damiendoligez · 2014-12-08T19:59:19Z

I'm in favor of adding this to the stdlib.

avsm · 2015-02-15T14:41:49Z

Is there anything blocking this from being merged into trunk now? It would be useful to be able to start depending on it, and putting in a transitionary package into OPAM for older compiler revisions (as we did for bytes)

gasche · 2015-02-15T14:44:02Z

I wouldn't mind merging it if there was a clear consensus in favor, but right now I'm not sure there is -- apparently it wasn't discussed at the last developer meeting? Maybe you could ask other developers for their opinion.

dbuenzli · 2015-04-14T22:03:45Z

It seems this PR goes against the very idea of the stdlib. So let's just close this.

lpw25 · 2015-04-15T01:01:48Z

Reopening. Whilst I appreciate Daniel's frustration, this is a pull request with fairly broad support that I would very much like to see merged.

murmour · 2015-12-02T03:47:36Z

stdlib/uchar.mli

+(** [compare u u'] is [Pervasives.compare u u']. *)
+
+val hash : t -> int
+(** [hash u] associates a non negative integer to [u]. *)


I knew something was wrong with this otherwise stellar pull request: "non negative" should be either "non-negative" or "nonnegative" (in case you find hyphens outrageous). Thank God we caught this early!

Thanks. Dash added.

damiendoligez · 2015-12-04T15:32:28Z

Let's merge it now.

gasche · 2015-12-04T21:03:45Z

@damiendoligez any reason not to merge it yourself?

alainfrisch · 2015-12-09T17:05:45Z

Minor nitpicks: can you add an entry to Changes and update copyright headers to 2015 for new files?

hcarty · 2016-01-06T15:09:15Z

@alainfrisch @damiendoligez If the Changes and copyright changes are holding a merge, I can submit a separate PR with those changes after this gets in.

Add Uchar module to the standard library.

alainfrisch · 2016-01-06T15:21:26Z

I can submit a separate PR with those changes after this gets in.

That would be very nice to you!

dbuenzli · 2016-01-06T15:28:57Z

I don't see why copyright dates should be changed they all correspond to the year when the code was written.

alainfrisch · 2016-01-06T15:33:52Z

Yeah ok, what matters is really the Changes file.

Alloc API change (3/4)

Backport PR#10205 from upstream

…rt-pr10205 Backport PR#10205 from upstream

@inline

23a7f73 flambda-backend: Fix some Debuginfo.t scopes in the frontend (ocaml#248) 33a04a6 flambda-backend: Attempt to shrink the heap before calling the assembler (ocaml#429) 8a36a16 flambda-backend: Fix to allow stage 2 builds in Flambda 2 -Oclassic mode (ocaml#442) d828db6 flambda-backend: Rename -no-extensions flag to -disable-all-extensions (ocaml#425) 68c39d5 flambda-backend: Fix mistake with extension records (ocaml#423) 423f312 flambda-backend: Refactor -extension and -standard flags (ocaml#398) 585e023 flambda-backend: Improved simplification of array operations (ocaml#384) faec6b1 flambda-backend: Typos (ocaml#407) 8914940 flambda-backend: Ensure allocations are initialised, even dead ones (ocaml#405) 6b58001 flambda-backend: Move compiler flag -dcfg out of ocaml/ subdirectory (ocaml#400) 4fd57cf flambda-backend: Use ghost loc for extension to avoid expressions with overlapping locations (ocaml#399) 8d993c5 flambda-backend: Let's fix instead of reverting flambda_backend_args (ocaml#396) d29b133 flambda-backend: Revert "Move flambda-backend specific flags out of ocaml/ subdirectory (ocaml#382)" (ocaml#395) d0cda93 flambda-backend: Revert ocaml#373 (ocaml#393) 1c6eee1 flambda-backend: Fix "make check_all_arches" in ocaml/ subdirectory (ocaml#388) a7960dd flambda-backend: Move flambda-backend specific flags out of ocaml/ subdirectory (ocaml#382) bf7b1a8 flambda-backend: List and Array Comprehensions (ocaml#147) f2547de flambda-backend: Compile more stdlib files with -O3 (ocaml#380) 3620c58 flambda-backend: Four small inliner fixes (ocaml#379) 2d165d2 flambda-backend: Regenerate ocaml/configure 3838b56 flambda-backend: Bump Menhir to version 20210419 (ocaml#362) 43c14d6 flambda-backend: Re-enable -flambda2-join-points (ocaml#374) 5cd2520 flambda-backend: Disable inlining of recursive functions by default (ocaml#372) e98b277 flambda-backend: Import ocaml#10736 (stack limit increases) (ocaml#373) 82c8086 flambda-backend: Use hooks for type tree and parse tree (ocaml#363) 33bbc93 flambda-backend: Fix parsecmm.mly in ocaml subdirectory (ocaml#357) 9650034 flambda-backend: Right-to-left evaluation of arguments of String.get and friends (ocaml#354) f7d3775 flambda-backend: Revert "Magic numbers" (ocaml#360) 0bd2fa6 flambda-backend: Add [@inline ready] attribute and remove [@inline hint] (not [@inlined hint]) (ocaml#351) cee74af flambda-backend: Ensure that functions are evaluated after their arguments (ocaml#353) 954be59 flambda-backend: Bootstrap dd5c299 flambda-backend: Change prefix of all magic numbers to avoid clashes with upstream. c2b1355 flambda-backend: Fix wrong shift generation in Cmm_helpers (ocaml#347) 739243b flambda-backend: Add flambda_oclassic attribute (ocaml#348) dc9b7fd flambda-backend: Only speculate during inlining if argument types have useful information (ocaml#343) aa190ec flambda-backend: Backport fix from PR#10719 (ocaml#342) c53a574 flambda-backend: Reduce max inlining depths at -O2 and -O3 (ocaml#334) a2493dc flambda-backend: Tweak error messages in Compenv. 1c7b580 flambda-backend: Change Name_abstraction to use a parameterized type (ocaml#326) 07e0918 flambda-backend: Save cfg to file (ocaml#257) 9427a8d flambda-backend: Make inlining parameters more aggressive (ocaml#332) fe0610f flambda-backend: Do not cache young_limit in a processor register (upstream PR 9876) (ocaml#315) 56f28b8 flambda-backend: Fix an overflow bug in major GC work computation (ocaml#310) 8e43a49 flambda-backend: Cmm invariants (port upstream PR 1400) (ocaml#258) e901f16 flambda-backend: Add attributes effects and coeffects (#18) aaa1cdb flambda-backend: Expose Flambda 2 flags via OCAMLPARAM (ocaml#304) 62db54f flambda-backend: Fix freshening substitutions 57231d2 flambda-backend: Evaluate signature substitutions lazily (upstream PR 10599) (ocaml#280) a1a07de flambda-backend: Keep Sys.opaque_identity in Cmm and Mach (port upstream PR 9412) (ocaml#238) faaf149 flambda-backend: Rename Un_cps -> To_cmm (ocaml#261) ecb0201 flambda-backend: Add "-dcfg" flag to ocamlopt (ocaml#254) 32ec58a flambda-backend: Bypass Simplify (ocaml#162) bd4ce4a flambda-backend: Revert "Semaphore without probes: dummy notes (ocaml#142)" (ocaml#242) c98530f flambda-backend: Semaphore without probes: dummy notes (ocaml#142) c9b6a04 flambda-backend: Remove hack for .depend from runtime/dune (ocaml#170) 6e5d4cf flambda-backend: Build and install Semaphore (ocaml#183) 924eb60 flambda-backend: Special constructor for %sys_argv primitive (ocaml#166) 2ac6334 flambda-backend: Build ocamldoc (ocaml#157) c6f7267 flambda-backend: Add -mbranches-within-32B to major_gc.c compilation (where supported) a99fdee flambda-backend: Merge pull request ocaml#10195 from stedolan/mark-prefetching bd72dcb flambda-backend: Prefetching optimisations for sweeping (ocaml#9934) 27fed7e flambda-backend: Add missing index param for Obj.field (ocaml#145) cd48b2f flambda-backend: Fix camlinternalOO at -O3 with Flambda 2 (ocaml#132) 9d85430 flambda-backend: Fix testsuite execution (ocaml#125) ac964ca flambda-backend: Comment out `[@inlined]` annotation. (ocaml#136) ad4afce flambda-backend: Fix magic numbers (test suite) (ocaml#135) 9b033c7 flambda-backend: Disable the comparison of bytecode programs (`ocamltest`) (ocaml#128) e650abd flambda-backend: Import flambda2 changes (`Asmpackager`) (ocaml#127) 14dcc38 flambda-backend: Fix error with Record_unboxed (bug in block kind patch) (ocaml#119) 2d35761 flambda-backend: Resurrect [@inline never] annotations in camlinternalMod (ocaml#121) f5985ad flambda-backend: Magic numbers for cmx and cmxa files (ocaml#118) 0e8b9f0 flambda-backend: Extend conditions to include flambda2 (ocaml#115) 99870c8 flambda-backend: Fix Translobj assertions for Flambda 2 (ocaml#112) 5106317 flambda-backend: Minor fix for "lazy" compilation in Matching with Flambda 2 (ocaml#110) dba922b flambda-backend: Oclassic/O2/O3 etc (ocaml#104) f88af3e flambda-backend: Wire in the remaining Flambda 2 flags (ocaml#103) 678d647 flambda-backend: Wire in the Flambda 2 inlining flags (ocaml#100) 1a8febb flambda-backend: Formatting of help text for some Flambda 2 options (ocaml#101) 9ae1c7a flambda-backend: First set of command-line flags for Flambda 2 (ocaml#98) bc0bc5e flambda-backend: Add config variables flambda_backend, flambda2 and probes (ocaml#99) efb8304 flambda-backend: Build our own ocamlobjinfo from tools/objinfo/ at the root (ocaml#95) d2cfaca flambda-backend: Add mutability annotations to Pfield etc. (ocaml#88) 5532555 flambda-backend: Lambda block kinds (ocaml#86) 0c597ba flambda-backend: Revert VERSION, etc. back to 4.12.0 (mostly reverts 822d0a0 from upstream 4.12) (ocaml#93) 037c3d0 flambda-backend: Float blocks 7a9d190 flambda-backend: Allow --enable-middle-end=flambda2 etc (ocaml#89) 9057474 flambda-backend: Root scanning fixes for Flambda 2 (ocaml#87) 08e02a3 flambda-backend: Ensure that Lifthenelse has a boolean-valued condition (ocaml#63) 77214b7 flambda-backend: Obj changes for Flambda 2 (ocaml#71) ecfdd72 flambda-backend: Cherry-pick 9432cfdadb043a191b414a2caece3e4f9bbc68b7 (ocaml#84) d1a4396 flambda-backend: Add a `returns` field to `Cmm.Cextcall` (ocaml#74) 575dff5 flambda-backend: CMM traps (ocaml#72) 8a87272 flambda-backend: Remove Obj.set_tag and Obj.truncate (ocaml#73) d9017ae flambda-backend: Merge pull request ocaml#80 from mshinwell/fb-backport-pr10205 3a4824e flambda-backend: Backport PR#10205 from upstream: Avoid overwriting closures while initialising recursive modules f31890e flambda-backend: Install missing headers of ocaml/runtime/caml (ocaml#77) 83516f8 flambda-backend: Apply node created for probe should not be annotated as tailcall (ocaml#76) bc430cb flambda-backend: Add Clflags.is_flambda2 (ocaml#62) ed87247 flambda-backend: Preallocation of blocks in Translmod for value let rec w/ flambda2 (ocaml#59) a4b04d5 flambda-backend: inline never on Gc.create_alarm (ocaml#56) cef0bb6 flambda-backend: Config.flambda2 (ocaml#58) ff0e4f7 flambda-backend: Pun labelled arguments with type constraint in function applications (ocaml#53) d72c5fb flambda-backend: Remove Cmm.memory_chunk.Double_u (ocaml#42) 9d34d99 flambda-backend: Install missing artifacts 10146f2 flambda-backend: Add ocamlcfg (ocaml#34) 819d38a flambda-backend: Use OC_CFLAGS, OC_CPPFLAGS, and SHAREDLIB_CFLAGS for foreign libs (#30) f98b564 flambda-backend: Pass -function-sections iff supported. (#29) e0eef5e flambda-backend: Bootstrap (#11 part 2) 17374b4 flambda-backend: Add [@@Builtin] attribute to Primitives (#11 part 1) 85127ad flambda-backend: Add builtin, effects and coeffects fields to Cextcall (#12) b670bcf flambda-backend: Replace tuple with record in Cextcall (#10) db451b5 flambda-backend: Speedups in Asmlink (#8) 2fe489d flambda-backend: Cherry-pick upstream PR#10184 from upstream, dynlink invariant removal (rev 3dc3cd7 upstream) d364bfa flambda-backend: Local patch against upstream: enable function sections in the Dune build 886b800 flambda-backend: Local patch against upstream: remove Raw_spacetime_lib (does not build with -m32) 1a7db7c flambda-backend: Local patch against upstream: make dune ignore ocamldoc/ directory e411dd3 flambda-backend: Local patch against upstream: remove ocaml/testsuite/tests/tool-caml-tex/ 1016d03 flambda-backend: Local patch against upstream: remove ocaml/dune-project and ocaml/ocaml-variants.opam 93785e3 flambda-backend: To upstream: export-dynamic for otherlibs/dynlink/ via the natdynlinkops files (still needs .gitignore + way of generating these files) 63db8c1 flambda-backend: To upstream: stop using -O3 in otherlibs/Makefile.otherlibs.common eb2f1ed flambda-backend: To upstream: stop using -O3 for dynlink/ 6682f8d flambda-backend: To upstream: use flambda_o3 attribute instead of -O3 in the Makefile for systhreads/ de197df flambda-backend: To upstream: renamed ocamltest_unix.xxx files for dune bf3773d flambda-backend: To upstream: dune build fixes (depends on previous to-upstream patches) 6fbc80e flambda-backend: To upstream: refactor otherlibs/dynlink/, removing byte/ and native/ 71a03ef flambda-backend: To upstream: fix to Ocaml_modifiers in ocamltest 686d6e3 flambda-backend: To upstream: fix dependency problem with Instruct c311155 flambda-backend: To upstream: remove threadUnix 52e6e78 flambda-backend: To upstream: stabilise filenames used in backtraces: stdlib/, otherlibs/systhreads/, toplevel/toploop.ml 7d08e0e flambda-backend: To upstream: use flambda_o3 attribute in stdlib 403b82e flambda-backend: To upstream: flambda_o3 attribute support (includes bootstrap) 65032b1 flambda-backend: To upstream: use nolabels attribute instead of -nolabels for otherlibs/unix/ f533fad flambda-backend: To upstream: remove Compflags, add attributes, etc. 49fc1b5 flambda-backend: To upstream: Add attributes and bootstrap compiler a4b9e0d flambda-backend: Already upstreamed: stdlib capitalisation patch 4c1c259 flambda-backend: ocaml#9748 from xclerc/share-ev_defname (cherry-pick 3e937fc) 00027c4 flambda-backend: permanent/default-to-best-fit (cherry-pick 64240fd) 2561dd9 flambda-backend: permanent/reraise-by-default (cherry-pick 50e9490) c0aa4f4 flambda-backend: permanent/gc-tuning (cherry-pick e9d6d2f) git-subtree-dir: ocaml git-subtree-split: 23a7f73

a09392d Set Menhir version back to 20210419 again (ocaml#89) cc63992 Merge pull request ocaml#88 from mshinwell/flambda-backend-changes-2022-12-27 3e49df3 HACKING.jst.adoc 1866676 Merge flambda-backend changes e012992 Merge pull request ocaml#87 from mshinwell/merge-4.14.1 ac5c7c8 Merge tag '4.14.1' into main 3da21bc add a useful debug printer 83b7c72 Document the debug_printers script 98896e0 Remove a tiny code stutter I came across 99cb5d9 release 4.14.1 b49060f last commit before tagging 4.14.1 fae9aef Add documentation 708e5a9 Add tests c609eee Bootstrap 7f922d0 Polymorphic parameters 51aeb04 Keep generalized structure from patterns when typing let 4b68bb3 Add test of princiaplity from polymorphic type constraints 82c7afe fix wong raise aca252f x86: Force result of Icomp to be in a register (ocaml#11808) 985725b Add dynlink_compilerlibs.mli to .gitignore (ocaml#79) 2b1fa24 Regenerate parser (ocaml#80) 1bb6c79 Merge pull request ocaml#78 from mshinwell/flambda-backend-patches-2022-12-13 9029581 Update otherlibs/dynlink/Makefile 3e4f1b9 Revert toplevel/native/dune to ocaml-jst version 6061e4c Regenerate configure using autoconf 2.71 888d4b1 Back out patch which disables alloc-check in ocaml-jst a6d5796 Fix dynlink build 3e46daf Update .depend files a5c547e Bootstrap a6a9031 Merge flambda-backend changes 0ac7fdd temp fix for linker error (ocaml#77) 1018602 Remove references to 32-bit Cygwin (ocaml#11797) e2d0d9e Enable individual testing with Makefile.jst (ocaml#76) f10cbf6 increment version number after tagging 4.14.1~rc1 11c5ab7 release 4.14.1~rc1 e4c3920 last commit before tagging 4.14.1~rc1 9e598ca Merge pull request ocaml#11793 from dra27/then-than 2a7e501 Use a more relaxed mode for unification in Ctype.subst (ocaml#11771) (ocaml#73) 7b35ef7 Statically initialize `caml_global_data` with a valid value (ocaml#11788) cbd791a Allow immediates to cross modes (ocaml#58) 85a0817 Merge pull request ocaml#11534 from gasche/follow-synonyms-in-show-module-type 699f43c Changes e54e9bc fix the 'stuttering' issue in #show d9799d3 test comments fec3b23 follow synonyms when #show-ing module types 06a1ad7 regression tests for ocaml#11533 (still failing) 549d757 Run "misplaced attributes" check when compiling mlis (ocaml#72) b2b74bf Fix bug in `Mtype.strengthen_lazy` causing spurious typing errors (ocaml#11776) a6c0e75 Ensure that Ctype.nongen always calls remove_mode_variables (ocaml#70) 6c50831 array elements are global (ocaml#67) bc510ed Ensure that types from packed modules are always generalised (ocaml#11732) 4d47036 Fix ocaml#10768 8788ff6 Add/move some documentation 9891a36 Propagate location information to `local_` in expressions 988306d Add support for `global_` and `nonlocal_` constructor arguments (ocaml#50) 6729eb8 Missing CAMLparam in win32's Unix.stat (ocaml#11737) e7dd740 Add debug_printers.ml (ocaml#63) 65f2896 more entries in gitignore (ocaml#62) a9a84d0 Move `global_flag` to `Asttypes` (ocaml#60) fac5896 Minor attribute fixes from flambda-backend 75f402e Note about make install and Makefile.jst (ocaml#56) fb5b1e4 Remove the -force-tmc flag (ocaml#11661) bd87a61 ocamlmklib: use `ar rcs` instead of `ar rc` (ocaml#11670) 83762af Merge pull request ocaml#11622 from Octachron/fix_recursive_types_in_constructor_mismatch ca48730 Merge pull request ocaml#11609 from Octachron/pr11194_unbound_and_printing_context git-subtree-dir: ocaml git-subtree-split: a09392d

Chris00 reviewed Jul 10, 2014
View reviewed changes

bobot reviewed Jul 13, 2014
View reviewed changes

chambart reviewed Nov 4, 2014
View reviewed changes

dbuenzli force-pushed the uchar branch from bd595a3 to b0844db Compare November 4, 2014 12:12

dbuenzli force-pushed the uchar branch from b0844db to 428512d Compare December 6, 2014 13:40

Drup mentioned this pull request Mar 2, 2015

added result type #147

Closed

dbuenzli closed this Apr 14, 2015

murmour reviewed Dec 2, 2015
View reviewed changes

Add Uchar module to the standard library.

940144f

dbuenzli force-pushed the uchar branch from 74ba533 to 940144f Compare December 2, 2015 14:39

damiendoligez self-assigned this Dec 21, 2015

alainfrisch added a commit that referenced this pull request Jan 6, 2016

Merge pull request #80 from dbuenzli/uchar

4b59df8

Add Uchar module to the standard library.

alainfrisch merged commit 4b59df8 into ocaml:trunk Jan 6, 2016

dbuenzli deleted the uchar branch January 6, 2016 15:43

hcarty mentioned this pull request Jan 6, 2016

Add Changes entry for GPR#80 (Uchar module) #400

Merged

dbuenzli mentioned this pull request Mar 9, 2017

Add Buffer.add_utf_{8,16be,16le}_uchar and Uchar.{bom,rep} #1091

Merged

stedolan pushed a commit to stedolan/ocaml that referenced this pull request Mar 14, 2017

Merge pull request ocaml#80 from ocamllabs/alloc-api

d849db4

Alloc API change (3/4)

hgouraud mentioned this pull request Feb 26, 2018

capitalize_ascii don’t work for accentuated letters geneweb/geneweb#593

Closed

This was referenced Mar 14, 2019

add a UChar module to the standard library #6525

Closed

OCaml SIGSEGV in invert_pointer_at() (OCaml Garbage Collector / Compaction) #7431

Closed

Remove Uchar.dump #7500

Closed

dbuenzli mentioned this pull request Oct 25, 2019

Add String.hash and String.seeded_hash #8878

Merged

nojb mentioned this pull request Jul 15, 2021

Unicode support #10518

Closed

chambart pushed a commit to chambart/ocaml-1 that referenced this pull request Aug 4, 2021

Merge pull request ocaml#80 from mshinwell/fb-backport-pr10205

6c5e154

Backport PR#10205 from upstream

stedolan added a commit to stedolan/ocaml that referenced this pull request Dec 13, 2021

flambda-backend: Merge pull request ocaml#80 from mshinwell/fb-backpo…

d9017ae

…rt-pr10205 Backport PR#10205 from upstream

sadiqj pushed a commit to sadiqj/ocaml that referenced this pull request Feb 21, 2023

Regenerate parser (ocaml#80)

2b1fa24

EmileTrotignon pushed a commit to EmileTrotignon/ocaml that referenced this pull request Jan 12, 2024

Add opam users (ocaml#80)

1db6cce

hyphenrf mentioned this pull request Jun 17, 2024

Bring Uchar hashing on par with other base types like Int, Char, ... #13240

Merged

Add Uchar module to the standard library. #80

Add Uchar module to the standard library. #80

Conversation

dbuenzli commented Jul 9, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

c-cube commented Jul 10, 2014

whitequark commented Jul 10, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bobot commented Jul 14, 2014

yoriyuki commented Jul 14, 2014

dbuenzli commented Jul 14, 2014

yoriyuki commented Jul 14, 2014

dbuenzli commented Jul 14, 2014

yoriyuki commented Jul 14, 2014

dbuenzli commented Jul 14, 2014

yoriyuki commented Jul 14, 2014

dbuenzli commented Jul 14, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mshinwell commented Nov 4, 2014

dbuenzli commented Nov 4, 2014

dbuenzli commented Nov 4, 2014

dbuenzli commented Dec 6, 2014

damiendoligez commented Dec 8, 2014

avsm commented Feb 15, 2015

gasche commented Feb 15, 2015

dbuenzli commented Apr 14, 2015

lpw25 commented Apr 15, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

damiendoligez commented Dec 4, 2015

gasche commented Dec 4, 2015

alainfrisch commented Dec 9, 2015

hcarty commented Jan 6, 2016

alainfrisch commented Jan 6, 2016

dbuenzli commented Jan 6, 2016

alainfrisch commented Jan 6, 2016