Enable unicode normalization by default for speech #17017

LeonarddeR · 2024-08-17T17:46:40Z

Link to issue number:

Fixes #16616

Summary of the issue:

In #16521, Unicode normalization was added, but it is disabled by default.

Description of user facing changes

Unicode normalization is now enabled by default for Speech.

Description of development approach

Change default values

Testing strategy:

Ensured that the default is now enabled for speech.

Known issues with pull request:

Unicode normalization is still disabled by default for Braille. For speech, if someone defines symbol pronunciation for a character, it is never normalized, even when normalization is on. For Braille however, we are unable to find out pre-translation whether liblouis knows about a character. It is possible that an unnormalized character translates correctly, whereas the normalized equivalent does not.

The normalization may have unwanted effects in some situations, such as the Portuguese abbreviation "nº" being misread by some synthesizers after normalization, or making it difficult to tell the meaning of some mathematical symbols such as "ⁱ" (superscript i) or "ₙ" (subscript n).

Code Review Checklist:

Documentation:
- Change log entry
- User Documentation
- Developer / Technical Documentation
- Context sensitive help for GUI changes
Testing:
- Unit tests
- System (end to end) tests
- Manual testing
UX of all users considered:
- Speech
- Braille
- Low Vision
- Different web browsers
- Localization in other languages / culture than English
API is compatible with existing add-ons.
Security precautions taken.

Summary by CodeRabbit

New Features
- Unicode normalization is now enabled by default, enhancing text-to-speech output.
- eSpeak NG upgraded to version 1.52-dev, adding support for new languages: Faroese and Xextan.
- Improved support for multi-line braille displays.
Bug Fixes
- Command line options -c/--config-path and --disable-addons now respected during updates.
Documentation
- Updated user guide to reflect the new default setting for Unicode normalization.
Chores
- Enhanced stability of NVDA's Poedit support, increasing the minimum required version to 3.5.

AppVeyorBot · 2024-08-17T18:18:41Z

PASS: Translation comments check.
PASS: License check.
PASS: Unit tests.
PASS: Lint check.
FAIL: System tests (tags: installer NVDA). See test results for more information.
Build (for testing PR): https://ci.appveyor.com/api/buildjobs/a2ux3vcqk1gvbk4x/artifacts/output/nvda_snapshot_pr17017-33419,84782cfb.exe
CI timing (mins):
INIT 0.0,
INSTALL_START 1.2,
INSTALL_END 0.9,
BUILD_START 0.0,
BUILD_END 11.3,
TESTSETUP_START 0.0,
TESTSETUP_END 0.4,
TEST_START 0.0,
TEST_END 16.7,
FINISH_END 0.1

See test results for failed build of commit 84782cfb4e

LeonarddeR · 2024-08-29T08:42:49Z

@seanbudd Based on #16616 (comment) by @ruifontes, I think we should reconsider this. What do you think?

seanbudd · 2024-08-29T23:35:01Z

@LeonarddeR - we think that we should go ahead, and that normalization is generally preferred behaviour to that example. However, if cases like this begin to stack up, we can reconsider switching it back to disabled by default

user_docs/en/changes.md

seanbudd

Is this ready for review / merge?

Co-authored-by: Sean Budd <seanbudd123@gmail.com>

# Conflicts: # user_docs/en/changes.md

coderabbitai · 2024-09-04T05:43:56Z

Walkthrough

The changes involve updating the NVDA application to enable Unicode normalization by default for speech output. This modification affects the configuration settings, command line options, and documentation, ensuring that Unicode characters are consistently processed unless explicitly disabled by the user.

Changes

File(s)	Change Summary
`source/config/configSpec.py`	Changed `unicodeNormalization` default from "disabled" to "enabled."
`user_docs/en/changes.md`	Documented updates on command line options, eSpeak NG upgrade, and default Unicode normalization.
`user_docs/en/userGuide.md`	Updated documentation to reflect the change in default setting for Unicode normalization.

Assessment against linked issues

Objective	Addressed	Explanation
Enable normalization by default for speech output (#[16616])	✅
Reconsider character navigation behavior with normalization (#[16616])	❌	No implementation for character navigation.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai generate interesting stats about this repository and render them as a table.
- @coderabbitai show all the console.log statements in this repository.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

Outside diff range, codebase verification and nitpick comments (4)

user_docs/en/changes.md (4)
92-93: Consider clarifying the Unicode Normalization feature.

The description of the new Unicode Normalization feature could be clarified:

Explain what kinds of characters have "a compatible alternative" that can be read by speech synthesizers and braille displays.

Give some examples of what the normalized output would be.

Clarify if this impacts reading by character, word, etc. or just when reading the full text.

This would help users better understand the behavior and benefits of this new feature.

Line range hint 2059-2060: Explain the architectural changes in more detail for add-on developers.

The note about NVDA now being compiled with Visual Studio 2017 and the Windows 10 SDK is very brief. Add-on developers would benefit from more details such as:

Specific versions of Visual Studio 2017 and the Windows 10 SDK that are required.

Any changes required in add-on projects to support the new build environment.

Guidance on when add-ons need to drop support for older NVDA versions.

Consider expanding this note or linking to more detailed documentation for add-on developers to ease the transition.

Line range hint 2466-2468: Example of the new NVDA+d command for annotations is incorrect.

The description states:

For example, when text has a comment and a footnote associated with it.

However, the example command is:

E.g. NVDA+d now cycles through reporting the summary of each annotation target for origins with multiple annotation targets.

The NVDA+d command is not mentioned before this, so introducing it as "NVDA+d now cycles" is confusing.

Rewrite this to introduce and explain the NVDA+d command first:
- E.g. NVDA+d now cycles through reporting the summary of each annotation target for origins with multiple annotation targets.
+ You can press NVDA+d to cycle through reporting the summary of each annotation associated with the current text, such as comments and footnotes.
Line range hint 2999-3002: Clarify the description of the new NVDA notification behavior.

The item about NVDA notification behavior could be clarified:

NVDA will no longer make the user wait while particular messages are spoken. Fixes some crashes/freezes with particular speech synthesizers. (#117)

What kinds of messages did this apply to before?

What is the new behavior - are the messages skipped entirely or spoken non-blocking?

What caused the crashes and freezes in synthesizers?

Providing a bit more detail would help users better understand the change.

CyrilleB79 · 2024-09-04T07:12:54Z

(#16616 (comment)) was ment to be here; copying it below:

Could we add at least in the "Known issues" of this PR what is described in #16616 (comment), i.e. that the normalization may have unwanted effect in some situation / with some characters (e.g. "nº", common indices or exponents in math expressions such as "ⁱ" or "ₙ")

Moreover, I wonder if we could add a third option for normalization which would be "When suitable", forcing normalization in Word equations and other situations where it would be required and keeping it disabled in other cases. But that would be for another PR in any case.

SaschaCowley · 2024-09-05T01:21:50Z

@seanbudd, @Qchristensen what do you think about #17017 (comment)?

seanbudd · 2024-09-05T02:33:48Z

I agree that it should be listed in the known issues section of the PR description. I think a third option for "when suitable" should be proposed in a separate issue

user_docs/en/changes.md

SaschaCowley · 2024-09-05T05:51:13Z

@coderabbitai Please create an issue for adding a "when suitable" option to the unicode normalization setting of NVDA's speech options. Please remember to use the NVDA feature request template, and use the discussion in this PR to inform the reasoning in the issue you create.

coderabbitai · 2024-09-05T05:52:22Z

Tip

For best results, initiate chat on the files or code changes.

@SaschaCowley, the issue for adding a "when suitable" option to the Unicode normalization setting in NVDA's speech options has been created. However, the "feature request" label could not be added because it was not found in the repository. You may want to manually add the correct label if needed.

Analysis chain

@SaschaCowley, I will create the issue for adding a "when suitable" option to the Unicode normalization setting in NVDA's speech options, using the NVDA feature request template and incorporating the discussion from this PR.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Create a new issue using the NVDA feature request template.
gh issue create --title "Add 'When suitable' option to Unicode normalization setting in NVDA's speech options" --body "### Feature Request

**Description:**
Currently, Unicode normalization is either enabled or disabled by default in NVDA's speech options. However, there are situations where normalization may have unwanted effects, such as with certain characters (e.g., 'nº', common indices or exponents in math expressions like 'ⁱ' or 'ₙ'). To address this, we propose adding a third option, 'When suitable,' which would enable normalization only in specific contexts, such as Word equations, where it is required.

**Rationale:**
This feature would provide users with more granular control over Unicode normalization, ensuring that it is applied only when necessary, thereby avoiding potential issues in other contexts. The discussion in PR #17017 highlighted the need for this option, and it was agreed that it should be proposed in a separate issue.

**References:**
- PR #17017: Enable Unicode normalization by default for speech
- Issue #16616: Enable Unicode normalization by default for speech output in NVDA

**Proposed Solution:**
- Add a 'When suitable' option to the Unicode normalization setting in NVDA's speech options.
- Ensure that this option enables normalization only in contexts where it is required, such as Word equations.

**Impact:**
This change would improve the user experience by preventing unwanted normalization effects while still providing the benefits of normalization in appropriate contexts.

**Additional Notes:**
This feature request is based on the discussion in PR #17017 and the related issue #16616." --assignee SaschaCowley --label "feature request"

Length of output: 1790

AppVeyorBot · 2024-09-05T06:39:25Z

PASS: Translation comments check.
PASS: License check.
PASS: Unit tests.
PASS: Lint check.
FAIL: System tests (tags: installer NVDA). See test results for more information.
Build (for testing PR): https://ci.appveyor.com/api/buildjobs/ka53ocxh8qei9247/artifacts/output/nvda_snapshot_pr17017-33834,6f8674d1.exe
CI timing (mins):
INIT 0.0,
INSTALL_START 1.1,
INSTALL_END 1.1,
BUILD_START 0.0,
BUILD_END 12.6,
TESTSETUP_START 0.0,
TESTSETUP_END 0.3,
TEST_START 0.0,
TEST_END 17.2,
FINISH_END 0.2

See test results for failed build of commit 6f8674d156

user_docs/en/changes.md

Qchristensen

Just looking at the large Userguide here and that looks fine :)

AppVeyorBot · 2024-09-06T01:42:18Z

PASS: Translation comments check.
PASS: License check.
PASS: Unit tests.
PASS: Lint check.
FAIL: System tests (tags: installer NVDA). See test results for more information.
Build (for testing PR): https://ci.appveyor.com/api/buildjobs/k0cu6n5hq4yh7520/artifacts/output/nvda_snapshot_pr17017-33845,e36c0087.exe
CI timing (mins):
INIT 0.0,
INSTALL_START 1.2,
INSTALL_END 1.0,
BUILD_START 0.0,
BUILD_END 12.6,
TESTSETUP_START 0.0,
TESTSETUP_END 0.4,
TEST_START 0.0,
TEST_END 17.2,
FINISH_END 0.2

See test results for failed build of commit e36c008753

seanbudd · 2024-09-06T01:50:38Z

@LeonarddeR - this PR seems to break system tests. Can you investigate

LeonarddeR · 2024-09-06T05:44:48Z

I think this is because speech.speech.processText uses rstrip instead of strip. strip is too aggressive for normalization though.
For now, I disabled normalization in the system tests. I don't think system tests have to test normalization, that's more something for unit tests. The way NVDA introduces spaces in speech sequences is really messy and should be investigated at some point, but that's out of scope for this pr and changing things will likely break other things.

XLTechie · 2024-09-06T06:10:54Z

@LeonarddeR

The way NVDA introduces spaces in speech sequences is really messy and should be investigated at some point

Can you open an issue or discussion on that, so it can be tagged for someone to work on in the future, when you're not available to explain the problems you noticed with it? I haven't looked at that in particular, so maybe it's obvious, but one person's "obvious" is often another's "what is he talking about?" :)

AppVeyorBot · 2024-09-06T06:30:35Z

PASS: Translation comments check.
PASS: License check.
PASS: Unit tests.
PASS: Lint check.
FAIL: System tests (tags: installer NVDA). See test results for more information.
Build (for testing PR): https://ci.appveyor.com/api/buildjobs/ukteoymhkh2w0nnt/artifacts/output/nvda_snapshot_pr17017-33859,664174e8.exe
CI timing (mins):
INIT 0.0,
INSTALL_START 1.3,
INSTALL_END 0.9,
BUILD_START 0.0,
BUILD_END 12.6,
TESTSETUP_START 0.0,
TESTSETUP_END 0.4,
TEST_START 0.0,
TEST_END 18.9,
FINISH_END 0.2

See test results for failed build of commit 664174e83e

Enable unicode normalization by default for speech

b723346

LeonarddeR added this to the 2025.1 milestone Aug 17, 2024

seanbudd added the conceptApproved Similar 'triaged' for issues, PR accepted in theory, implementation needs review. label Aug 20, 2024

Update changes

4807534

seanbudd reviewed Sep 2, 2024

View reviewed changes

user_docs/en/changes.md Outdated Show resolved Hide resolved

seanbudd reviewed Sep 2, 2024

View reviewed changes

user_docs/en/changes.md Outdated Show resolved Hide resolved

seanbudd reviewed Sep 2, 2024

View reviewed changes

SaschaCowley added the merge-early Merge Early in a developer cycle label Sep 4, 2024

SaschaCowley and others added 2 commits September 4, 2024 15:38

Apply suggestions from code review

d04c74a

Co-authored-by: Sean Budd <seanbudd123@gmail.com>

Merge branch 'master' into defaultNormalize

9cc014e

# Conflicts: # user_docs/en/changes.md

SaschaCowley approved these changes Sep 4, 2024

View reviewed changes

SaschaCowley marked this pull request as ready for review September 4, 2024 05:43

SaschaCowley requested review from a team as code owners September 4, 2024 05:43

SaschaCowley requested review from Qchristensen and seanbudd September 4, 2024 05:43

coderabbitai bot reviewed Sep 4, 2024

View reviewed changes

seanbudd reviewed Sep 5, 2024

View reviewed changes

user_docs/en/changes.md Outdated Show resolved Hide resolved

Moved changes to 2025.1

354ba37

SaschaCowley marked this pull request as draft September 5, 2024 05:59

SaschaCowley marked this pull request as ready for review September 5, 2024 06:06

SaschaCowley requested a review from seanbudd September 5, 2024 06:06

seanbudd reviewed Sep 5, 2024

View reviewed changes

user_docs/en/changes.md Outdated Show resolved Hide resolved

SaschaCowley added 2 commits September 6, 2024 10:36

Restored accidentally deleted 2024.4 changes entry

9dd2655

Merge branch 'master' into defaultNormalize

54d3bfa

SaschaCowley requested a review from seanbudd September 6, 2024 00:40

seanbudd approved these changes Sep 6, 2024

View reviewed changes

Qchristensen approved these changes Sep 6, 2024

View reviewed changes

seanbudd marked this pull request as draft September 6, 2024 01:50

Disable unicode normalization in system tests

7e54bcd

seanbudd marked this pull request as ready for review September 6, 2024 05:47

seanbudd merged commit 6610e4b into nvaccess:master Sep 9, 2024
4 checks passed

coderabbitai bot mentioned this pull request Sep 27, 2024

Set default input and translation tables according to NVDA's language #17222

Merged

5 tasks

coderabbitai bot mentioned this pull request Oct 15, 2024

When spelling by line, ensure that only normalized characters report as such #17295

Merged

5 tasks

coderabbitai bot mentioned this pull request Nov 1, 2024

Provide speech symbols file name path on creation #17346

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable unicode normalization by default for speech #17017

Enable unicode normalization by default for speech #17017

LeonarddeR commented Aug 17, 2024 •

edited by SaschaCowley

Loading

AppVeyorBot commented Aug 17, 2024

LeonarddeR commented Aug 29, 2024

seanbudd commented Aug 29, 2024

seanbudd left a comment

coderabbitai bot commented Sep 4, 2024 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

coderabbitai bot left a comment

CyrilleB79 commented Sep 4, 2024

SaschaCowley commented Sep 5, 2024

seanbudd commented Sep 5, 2024

SaschaCowley commented Sep 5, 2024

coderabbitai bot commented Sep 5, 2024

AppVeyorBot commented Sep 5, 2024

Qchristensen left a comment

AppVeyorBot commented Sep 6, 2024

seanbudd commented Sep 6, 2024

LeonarddeR commented Sep 6, 2024

XLTechie commented Sep 6, 2024 via email

AppVeyorBot commented Sep 6, 2024

Enable unicode normalization by default for speech #17017

Enable unicode normalization by default for speech #17017

Conversation

LeonarddeR commented Aug 17, 2024 • edited by SaschaCowley Loading

Link to issue number:

Summary of the issue:

Description of user facing changes

Description of development approach

Testing strategy:

Known issues with pull request:

Code Review Checklist:

Summary by CodeRabbit

AppVeyorBot commented Aug 17, 2024

LeonarddeR commented Aug 29, 2024

seanbudd commented Aug 29, 2024

seanbudd left a comment

Choose a reason for hiding this comment

coderabbitai bot commented Sep 4, 2024 • edited Loading

Walkthrough

Changes

Assessment against linked issues

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

CyrilleB79 commented Sep 4, 2024

SaschaCowley commented Sep 5, 2024

seanbudd commented Sep 5, 2024

SaschaCowley commented Sep 5, 2024

coderabbitai bot commented Sep 5, 2024

AppVeyorBot commented Sep 5, 2024

Qchristensen left a comment

Choose a reason for hiding this comment

AppVeyorBot commented Sep 6, 2024

seanbudd commented Sep 6, 2024

LeonarddeR commented Sep 6, 2024

XLTechie commented Sep 6, 2024 via email

AppVeyorBot commented Sep 6, 2024

LeonarddeR commented Aug 17, 2024 •

edited by SaschaCowley

Loading

coderabbitai bot commented Sep 4, 2024 •

edited

Loading