Skip to content

Commit

Permalink
Describe character combining table layout
Browse files Browse the repository at this point in the history
  • Loading branch information
eschnett committed Dec 28, 2024
1 parent 384b9ee commit 23ccf2b
Showing 1 changed file with 32 additions and 0 deletions.
32 changes: 32 additions & 0 deletions utf8proc.h
Original file line number Diff line number Diff line change
Expand Up @@ -255,6 +255,38 @@ typedef struct utf8proc_property_struct {
utf8proc_uint16_t uppercase_seqindex;
utf8proc_uint16_t lowercase_seqindex;
utf8proc_uint16_t titlecase_seqindex;
/**
* Character combining table.
*
* The character combining table is formally indexed by two
* characters, the first and second character that might form a
* combining pair. The table entry then contains the combined
* character. Most character pairs cannot be combined. There are
* about 1,000 characters that can be the first character in a
* combining pair, and for most, there are only a handful for
* possible second characters.
*
* The combining table is stored as `utf8proc_uint32_t
* utf8proc_combinations[][2]`. That is, it contains a pair `(second
* combining character, combined character)` for every character
* that can be a first combining character.
*
* - `comb_index`: Index into the combining table if this character
* is the first character in a combining pair, else 0x3ff
*
* - `comb_length`: Number of table entries for this first character
*
* - `comb_is_second`: As optimization we also record whether this
* characther is the second combining character in any pair. If
* not, we can skip the table lookup.
*
* A table lookup starts from a given character pair. It first
* checks whether the first character is stored in the table
* (checking whether the index is 0x3ff) and whether the second
* index is stored in the table (looking at `comb_is_second`). If
* so, the `comb_length` table entries will be checked sequentially
* for a match.
*/
utf8proc_uint16_t comb_index:10;
utf8proc_uint16_t comb_length:5;
utf8proc_uint16_t comb_issecond:1;
Expand Down

0 comments on commit 23ccf2b

Please sign in to comment.