From 97bf8e9ac068b0ca3c9f002a33e06978cef4d4b5 Mon Sep 17 00:00:00 2001 From: Simon Sapin Date: Tue, 18 Mar 2014 12:19:18 +0800 Subject: [PATCH] `char`: s/character/Unicode scalar value/ Tweak the definition of `char` to use the appropriate Unicode terminology. --- src/doc/rust.md | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/src/doc/rust.md b/src/doc/rust.md index 7233288a81328..39b6261553618 100644 --- a/src/doc/rust.md +++ b/src/doc/rust.md @@ -3136,8 +3136,12 @@ machine. The types `char` and `str` hold textual data. -A value of type `char` is a Unicode character, -represented as a 32-bit unsigned word holding a UCS-4 codepoint. +A value of type `char` is a [Unicode scalar value]( +http://www.unicode.org/glossary/#unicode_scalar_value) +(ie. a code point that is not a surrogate), +represented as a 32-bit unsigned word in the 0x0000 to 0xD7FF +or 0xE000 to 0x10FFFF range. +A `[char]` vector is effectively an UCS-4 / UTF-32 string. A value of type `str` is a Unicode string, represented as a vector of 8-bit unsigned bytes holding a sequence of UTF-8 codepoints.