-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement full Unicode 16.0.0 extended grapheme breaking. #719
Conversation
Includes rule GB9c (Indict Conjunt Break based). This change has a significant cost in size since the information needed per character no longer fits in 4 bits. The base table is therefore twice as big (one byte per entry rather than half of that). The number of states in the state automatons have also increased slightly, but in comparison that's a negligible change. Tests have been made more thorough, testing not only the Unicode Consortium provided tests, but also variants of those with representative characters for each category of character that either in or not-in the BMP, to test that surrogate pair decoding works correctly. Test also check that the created automatons are minimal, in that no state is unreachable and no two states are indistinguishable.
Package publishing
Documentation at https://github.com/dart-lang/ecosystem/wiki/Publishing-automation. |
PR HealthBreaking changes ✔️
Coverage
|
File | Coverage |
---|---|
pkgs/characters/benchmark/benchmark.dart | 💔 Not covered |
pkgs/characters/lib/characters.dart | 💔 Not covered |
pkgs/characters/lib/src/characters.dart | 💚 100 % |
pkgs/characters/lib/src/characters_impl.dart | 💚 89 % |
pkgs/characters/lib/src/grapheme_clusters/breaks.dart | 💚 97 % |
pkgs/characters/lib/src/grapheme_clusters/constants.dart | 💔 Not covered |
pkgs/characters/lib/src/grapheme_clusters/table.dart | 💚 100 % |
pkgs/characters/tool/benchmark.dart | 💔 Not covered |
pkgs/characters/tool/bin/generate_tables.dart | 💔 Not covered |
pkgs/characters/tool/bin/generate_tests.dart | 💔 Not covered |
pkgs/characters/tool/generate.dart | 💔 Not covered |
pkgs/characters/tool/src/args.dart | 💔 Not covered |
pkgs/characters/tool/src/atsp.dart | 💔 Not covered |
pkgs/characters/tool/src/automaton_builder.dart | 💔 Not covered |
pkgs/characters/tool/src/data_files.dart | 💔 Not covered |
pkgs/characters/tool/src/debug_names.dart | 💚 12 % |
pkgs/characters/tool/src/graph.dart | 💔 Not covered |
pkgs/characters/tool/src/grapheme_category_loader.dart | 💔 Not covered |
pkgs/characters/tool/src/indirect_table.dart | 💔 Not covered |
pkgs/characters/tool/src/list_overlap.dart | 💔 Not covered |
pkgs/characters/tool/src/shared.dart | 💔 Not covered |
pkgs/characters/tool/src/string_literal_writer.dart | 💔 Not covered |
pkgs/characters/tool/src/table_builder.dart | 💔 Not covered |
This check for test coverage is informational (issues shown here will not fail the PR).
This check can be disabled by tagging the PR with skip-coverage-check
.
API leaks ✔️
The following packages contain symbols visible in the public API, but not exported by the library. Export these symbols or remove them from your publicly visible API.
Package | Leaked API symbols |
---|
License Headers ⚠️
// Copyright (c) 2024, the Dart project authors. Please see the AUTHORS file
// for details. All rights reserved. Use of this source code is governed by a
// BSD-style license that can be found in the LICENSE file.
Files |
---|
pkgs/characters/lib/src/grapheme_clusters/breaks.dart |
All source files should start with a license header.
This check can be disabled by tagging the PR with skip-license-check
.
Health check is wrong. The changelog is correct since the version wasn't changed, and the existing changelog didn't list missing part that is now implemented. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of these comments my primary concern is the return
statements in loops in tests.
Should some of those be continue
instead of return
?
I think I broke |
Add direct test for `isGrahphemeClusterBoundary`.
6158e75
to
e382ab4
Compare
Remove `StateId` abstraction, just have one number per state.
They could be |
Think I'm done now. Much cleaned up (and nicer, IMO). |
Revisions updated by `dart tools/rev_sdk_deps.dart`. core (https://github.com/dart-lang/core/compare/6af0821..1de8372): 1de83727 2024-11-20 Lasse R.H. Nielsen Implement full Unicode 16.0.0 extended grapheme breaking. (dart-lang/core#719) dartdoc (https://github.com/dart-lang/dartdoc/compare/f8a55e4..c7f1160): c7f11603 2024-11-20 Sam Rawlins Fix sidebars via correct web API for anchor href values (dart-lang/dartdoc#3934) http (https://github.com/dart-lang/http/compare/e37093f..79470d0): 79470d0 2024-11-19 Brian Quinlan Include names in argument lists (dart-lang/http#1408) shelf (https://github.com/dart-lang/shelf/compare/0bb44cb..a2708cd): a2708cd 2024-11-21 Devon Carew shorten the issue badges (dart-lang/shelf#456) Change-Id: Iee20d2300d1bf0e43a57b352b73235ae24fa5e51 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/396960 Auto-Submit: Devon Carew <devoncarew@google.com> Commit-Queue: Konstantin Shcheglov <scheglov@google.com> Reviewed-by: Konstantin Shcheglov <scheglov@google.com>
Includes rule GB9c (Indic Conjunt Break rule).
This change has a significant cost in size since the information needed per character no longer fits in 4 bits. The base table is therefore twice as big (one byte per entry rather than half of that).
The number of states in the state automatons have also increased slightly, but in comparison that's a negligible change.
Tests have been made more thorough, testing not only the Unicode Consortium provided tests, but also variants of those with representative characters for each category of character that either in or not-in the BMP, to test that surrogate pair decoding works correctly.
Test also check that the created automatons are minimal, in that no state is unreachable and no two states are indistinguishable.