Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable unicode normalization by default for speech #17017

Merged
merged 8 commits into from
Sep 9, 2024

Conversation

LeonarddeR
Copy link
Collaborator

@LeonarddeR LeonarddeR commented Aug 17, 2024

Link to issue number:

Fixes #16616

Summary of the issue:

In #16521, Unicode normalization was added, but it is disabled by default.

Description of user facing changes

Unicode normalization is now enabled by default for Speech.

Description of development approach

Change default values

Testing strategy:

Ensured that the default is now enabled for speech.

Known issues with pull request:

Unicode normalization is still disabled by default for Braille. For speech, if someone defines symbol pronunciation for a character, it is never normalized, even when normalization is on. For Braille however, we are unable to find out pre-translation whether liblouis knows about a character. It is possible that an unnormalized character translates correctly, whereas the normalized equivalent does not.

The normalization may have unwanted effects in some situations, such as the Portuguese abbreviation "nº" being misread by some synthesizers after normalization, or making it difficult to tell the meaning of some mathematical symbols such as "ⁱ" (superscript i) or "ₙ" (subscript n).

Code Review Checklist:

  • Documentation:
    • Change log entry
    • User Documentation
    • Developer / Technical Documentation
    • Context sensitive help for GUI changes
  • Testing:
    • Unit tests
    • System (end to end) tests
    • Manual testing
  • UX of all users considered:
    • Speech
    • Braille
    • Low Vision
    • Different web browsers
    • Localization in other languages / culture than English
  • API is compatible with existing add-ons.
  • Security precautions taken.

Summary by CodeRabbit

  • New Features

    • Unicode normalization is now enabled by default, enhancing text-to-speech output.
    • eSpeak NG upgraded to version 1.52-dev, adding support for new languages: Faroese and Xextan.
    • Improved support for multi-line braille displays.
  • Bug Fixes

    • Command line options -c/--config-path and --disable-addons now respected during updates.
  • Documentation

    • Updated user guide to reflect the new default setting for Unicode normalization.
  • Chores

    • Enhanced stability of NVDA's Poedit support, increasing the minimum required version to 3.5.

@LeonarddeR LeonarddeR added this to the 2025.1 milestone Aug 17, 2024
@AppVeyorBot
Copy link

See test results for failed build of commit 84782cfb4e

@seanbudd seanbudd added the conceptApproved Similar 'triaged' for issues, PR accepted in theory, implementation needs review. label Aug 20, 2024
@LeonarddeR
Copy link
Collaborator Author

@seanbudd Based on #16616 (comment) by @ruifontes, I think we should reconsider this. What do you think?

@seanbudd
Copy link
Member

@LeonarddeR - we think that we should go ahead, and that normalization is generally preferred behaviour to that example. However, if cases like this begin to stack up, we can reconsider switching it back to disabled by default

user_docs/en/changes.md Outdated Show resolved Hide resolved
user_docs/en/changes.md Outdated Show resolved Hide resolved
Copy link
Member

@seanbudd seanbudd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this ready for review / merge?

@SaschaCowley SaschaCowley added the merge-early Merge Early in a developer cycle label Sep 4, 2024
SaschaCowley and others added 2 commits September 4, 2024 15:38
Co-authored-by: Sean Budd <seanbudd123@gmail.com>
# Conflicts:
#	user_docs/en/changes.md
@SaschaCowley SaschaCowley marked this pull request as ready for review September 4, 2024 05:43
@SaschaCowley SaschaCowley requested review from a team as code owners September 4, 2024 05:43
Copy link
Contributor

coderabbitai bot commented Sep 4, 2024

Walkthrough

The changes involve updating the NVDA application to enable Unicode normalization by default for speech output. This modification affects the configuration settings, command line options, and documentation, ensuring that Unicode characters are consistently processed unless explicitly disabled by the user.

Changes

File(s) Change Summary
source/config/configSpec.py Changed unicodeNormalization default from "disabled" to "enabled."
user_docs/en/changes.md Documented updates on command line options, eSpeak NG upgrade, and default Unicode normalization.
user_docs/en/userGuide.md Updated documentation to reflect the change in default setting for Unicode normalization.

Assessment against linked issues

Objective Addressed Explanation
Enable normalization by default for speech output (#[16616])
Reconsider character navigation behavior with normalization (#[16616]) No implementation for character navigation.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Outside diff range, codebase verification and nitpick comments (4)
user_docs/en/changes.md (4)

92-93: Consider clarifying the Unicode Normalization feature.

The description of the new Unicode Normalization feature could be clarified:

  • Explain what kinds of characters have "a compatible alternative" that can be read by speech synthesizers and braille displays.
  • Give some examples of what the normalized output would be.
  • Clarify if this impacts reading by character, word, etc. or just when reading the full text.

This would help users better understand the behavior and benefits of this new feature.


Line range hint 2059-2060: Explain the architectural changes in more detail for add-on developers.

The note about NVDA now being compiled with Visual Studio 2017 and the Windows 10 SDK is very brief. Add-on developers would benefit from more details such as:

  • Specific versions of Visual Studio 2017 and the Windows 10 SDK that are required.
  • Any changes required in add-on projects to support the new build environment.
  • Guidance on when add-ons need to drop support for older NVDA versions.

Consider expanding this note or linking to more detailed documentation for add-on developers to ease the transition.


Line range hint 2466-2468: Example of the new NVDA+d command for annotations is incorrect.

The description states:

For example, when text has a comment and a footnote associated with it.

However, the example command is:

E.g. NVDA+d now cycles through reporting the summary of each annotation target for origins with multiple annotation targets.

The NVDA+d command is not mentioned before this, so introducing it as "NVDA+d now cycles" is confusing.

Rewrite this to introduce and explain the NVDA+d command first:

- E.g. NVDA+d now cycles through reporting the summary of each annotation target for origins with multiple annotation targets.
+ You can press NVDA+d to cycle through reporting the summary of each annotation associated with the current text, such as comments and footnotes.

Line range hint 2999-3002: Clarify the description of the new NVDA notification behavior.

The item about NVDA notification behavior could be clarified:

NVDA will no longer make the user wait while particular messages are spoken. Fixes some crashes/freezes with particular speech synthesizers. (#117)

  • What kinds of messages did this apply to before?
  • What is the new behavior - are the messages skipped entirely or spoken non-blocking?
  • What caused the crashes and freezes in synthesizers?

Providing a bit more detail would help users better understand the change.

@CyrilleB79
Copy link
Collaborator

(#16616 (comment)) was ment to be here; copying it below:

Could we add at least in the "Known issues" of this PR what is described in #16616 (comment), i.e. that the normalization may have unwanted effect in some situation / with some characters (e.g. "nº", common indices or exponents in math expressions such as "ⁱ" or "ₙ")

Moreover, I wonder if we could add a third option for normalization which would be "When suitable", forcing normalization in Word equations and other situations where it would be required and keeping it disabled in other cases. But that would be for another PR in any case.

@SaschaCowley
Copy link
Member

@seanbudd, @Qchristensen what do you think about #17017 (comment)?

@seanbudd
Copy link
Member

seanbudd commented Sep 5, 2024

I agree that it should be listed in the known issues section of the PR description. I think a third option for "when suitable" should be proposed in a separate issue

user_docs/en/changes.md Outdated Show resolved Hide resolved
@SaschaCowley
Copy link
Member

@coderabbitai Please create an issue for adding a "when suitable" option to the unicode normalization setting of NVDA's speech options. Please remember to use the NVDA feature request template, and use the discussion in this PR to inform the reasoning in the issue you create.

Copy link
Contributor

coderabbitai bot commented Sep 5, 2024

Tip

For best results, initiate chat on the files or code changes.

@SaschaCowley, the issue for adding a "when suitable" option to the Unicode normalization setting in NVDA's speech options has been created. However, the "feature request" label could not be added because it was not found in the repository. You may want to manually add the correct label if needed.

Analysis chain

@SaschaCowley, I will create the issue for adding a "when suitable" option to the Unicode normalization setting in NVDA's speech options, using the NVDA feature request template and incorporating the discussion from this PR.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Create a new issue using the NVDA feature request template.
gh issue create --title "Add 'When suitable' option to Unicode normalization setting in NVDA's speech options" --body "### Feature Request

**Description:**
Currently, Unicode normalization is either enabled or disabled by default in NVDA's speech options. However, there are situations where normalization may have unwanted effects, such as with certain characters (e.g., 'nº', common indices or exponents in math expressions like 'ⁱ' or 'ₙ'). To address this, we propose adding a third option, 'When suitable,' which would enable normalization only in specific contexts, such as Word equations, where it is required.

**Rationale:**
This feature would provide users with more granular control over Unicode normalization, ensuring that it is applied only when necessary, thereby avoiding potential issues in other contexts. The discussion in PR #17017 highlighted the need for this option, and it was agreed that it should be proposed in a separate issue.

**References:**
- PR #17017: Enable Unicode normalization by default for speech
- Issue #16616: Enable Unicode normalization by default for speech output in NVDA

**Proposed Solution:**
- Add a 'When suitable' option to the Unicode normalization setting in NVDA's speech options.
- Ensure that this option enables normalization only in contexts where it is required, such as Word equations.

**Impact:**
This change would improve the user experience by preventing unwanted normalization effects while still providing the benefits of normalization in appropriate contexts.

**Additional Notes:**
This feature request is based on the discussion in PR #17017 and the related issue #16616." --assignee SaschaCowley --label "feature request"

Length of output: 1790

@SaschaCowley SaschaCowley marked this pull request as draft September 5, 2024 05:59
@SaschaCowley SaschaCowley marked this pull request as ready for review September 5, 2024 06:06
@AppVeyorBot
Copy link

See test results for failed build of commit 6f8674d156

user_docs/en/changes.md Outdated Show resolved Hide resolved
Copy link
Member

@Qchristensen Qchristensen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just looking at the large Userguide here and that looks fine :)

@AppVeyorBot
Copy link

See test results for failed build of commit e36c008753

@seanbudd
Copy link
Member

seanbudd commented Sep 6, 2024

@LeonarddeR - this PR seems to break system tests. Can you investigate

@seanbudd seanbudd marked this pull request as draft September 6, 2024 01:50
@LeonarddeR
Copy link
Collaborator Author

I think this is because speech.speech.processText uses rstrip instead of strip. strip is too aggressive for normalization though.
For now, I disabled normalization in the system tests. I don't think system tests have to test normalization, that's more something for unit tests. The way NVDA introduces spaces in speech sequences is really messy and should be investigated at some point, but that's out of scope for this pr and changing things will likely break other things.

@seanbudd seanbudd marked this pull request as ready for review September 6, 2024 05:47
@XLTechie
Copy link
Collaborator

XLTechie commented Sep 6, 2024 via email

@AppVeyorBot
Copy link

See test results for failed build of commit 664174e83e

@seanbudd seanbudd merged commit 6610e4b into nvaccess:master Sep 9, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
conceptApproved Similar 'triaged' for issues, PR accepted in theory, implementation needs review. merge-early Merge Early in a developer cycle
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unicode normalization: Enable by default for speech and reconsider character navigation
7 participants