Add Unicode Normalization to speech and braille #16521

LeonarddeR · 2024-05-10T07:04:35Z

Link to issue number:

Summary of the issue:

Several speech synthesizers and braille tables are unable to speak or braille some characters, such as ligatures ("ĳ") or decomposed characters (latin letters with a modifier to add acute, diaeresis, etc.). Also, italic or bold Unicode characters can't be spoken or brailled by default.

Description of user facing changes

None by default. If unicode normalization is enabled for speech, speech output for objects and text navigation is normalized. For braille, normalization is applied for all braille output.

The reason for speech to apply normalization only for objects and text navigation is chosen on purpose, as for individual character navigation or text selection, we really want to pass the original character to the synthesizer. If we don't Unicode bold and italic characters are read as their normalized counterparts, which makes it impossible to distinguish them. This problem is less relevant when working with braille.

Description of development approach

Added UnicodeNormalizationOffsetConverter to textUtils with tests. It stores the normalized version of a given string and based on diffing, calculates offset mappings between the original and normalized strings.
Processes output using UnicodeNormalizationOffsetConverter when translating braille with normalization on. The several mappings (braille to raw pos, raw to braille pos) are processed to account for normalization.
Added normalization to getTextInfoSpeech and getPropertiesSpeech.

Testing strategy:

Unit tests
Working with braille, testing cursor positioning and cursor routing on the several strings in the unit tests.

Known issues with pull request:

None known.

Code Review Checklist:

Documentation:
- Change log entry
- User Documentation
- Developer / Technical Documentation
- Context sensitive help for GUI changes
Testing:
- Unit tests
- System (end to end) tests
- Manual testing
UX of all users considered:
- Speech
- Braille
- Low Vision
- Different web browsers
- Localization in other languages / culture than English
API is compatible with existing add-ons.
Security precautions taken.

Summary by CodeRabbit

New Features
- Added support for Unicode Normalization in speech and braille settings.
- Users can enable Unicode Normalization via the NVDA Settings dialog for both speech and braille.
Documentation
- Updated user guide with a new section on Unicode normalization for speech and braille settings.
- Added details on the benefits of Unicode normalization, including character standardization and compatibility improvements.

Simon818 · 2024-05-12T03:07:58Z

I'm running a build that contains this PR and am receiving an error pretty frequently when trying to navigate with speech normalization enabled.

Could not execute function event_gainFocus defined in appModules.nvda module; kwargs: {}
Traceback (most recent call last):
File "eventHandler.pyc", line 127, in next
File "NVDAObjects_init_.pyc", line 1306, in event_gainFocus
File "NVDAObjects_init_.pyc", line 1157, in reportFocus
File "speech\speech.pyc", line 694, in speakObject
File "speech\speech.pyc", line 733, in getObjectSpeech
File "speech\speech.pyc", line 660, in getObjectPropertiesSpeech
File "speech\speech.pyc", line 1975, in getPropertiesSpeech
File "speech\speech.pyc", line 1975, in
File "textUtils.pyc", line 568, in unicodeNormalize
TypeError: normalize() argument 2 must be str, not CharacterModeCommand

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "eventHandler.pyc", line 127, in next
File "appModules\nvda.pyc", line 211, in event_gainFocus
File "eventHandler.pyc", line 134, in next
File "extensionPoints\util.pyc", line 216, in callWithSupportedKwargs
File "NVDAObjects_init_.pyc", line 1306, in event_gainFocus
File "NVDAObjects_init_.pyc", line 1157, in reportFocus
File "speech\speech.pyc", line 694, in speakObject
File "speech\speech.pyc", line 733, in getObjectSpeech
File "speech\speech.pyc", line 660, in getObjectPropertiesSpeech
File "speech\speech.pyc", line 1975, in getPropertiesSpeech
File "speech\speech.pyc", line 1975, in
File "textUtils.pyc", line 568, in unicodeNormalize
TypeError: normalize() argument 2 must be str, not CharacterModeCommand

CC @codeofdusk

…nOffsetConverter

LeonarddeR · 2024-05-13T05:53:23Z

It looks like there is an erroneous type annotation in getPropertiesSpeech that miss-leaded me. getKeyboardShortcutsSpeech can now also return character mode commands, so the getPropertiesSpeech doesn't only return strings now.

@Simon818

…a speech sequence with only strings. Thanks @Simon818

LeonarddeR · 2024-05-14T10:15:31Z

@Qchristensen Could you have a look at the proposed documentation changes?

CyrilleB79

Here are my first comments. I may have other comments later since I have not yet tested this PR.

source/gui/settingsDialogs.py

user_docs/en/userGuide.md

CyrilleB79 · 2024-05-14T11:24:16Z

user_docs/en/userGuide.md

+
+1. Decomposition of some ligatures, Including "ĳ" (ligature ij) to their two letter form ("ij").
+
+1. Stable ordering of modifiers in composite characters, for example in ancient Hebrew.


Suggested change

1. Stable ordering of modifiers in composite characters, for example in ancient Hebrew.

1. Stable ordering of modifiers in composite characters, for example in ancient Hebrew.

Could you clarify the impact of this on speech?

The modifiers will be spoken in normalized order rather than in the order they are in the text. For speech, this could be a bit confusing, but that's why this is optional. It has no impact on speech with the hebrew Vocalizer voice, for example. I'm afraid there is no normalization form where we can disable this behavior. On the other hand, for braille this can be of great benefit, as it requires a braille table to account for the normalized form of the modifiers only.

Co-authored-by: Cyrille Bougot <cyrille.bougot2@laposte.net>

user_docs/en/userGuide.md

Co-authored-by: Cyrille Bougot <cyrille.bougot2@laposte.net>

seanbudd · 2024-05-15T01:12:06Z

Hi - testing code rabbit on this PR for an AI review.
I've tried to adjust it to only pick up serious issues rather than linting.
Apologies for the noise if it picks up junk.

@coderabbitai review

coderabbitai · 2024-05-15T01:14:22Z

@seanbudd, I have identified the issue causing the error reported by Simon818. The getPropertiesSpeech function attempts to normalize all items in the textList, but some items are not strings, leading to a type error. Here is the updated code for the getPropertiesSpeech function to ensure only strings are passed to unicodeNormalize:

def getPropertiesSpeech(  # noqa: C901
        reason: OutputReason = OutputReason.QUERY,
        **propertyValues
) -> SpeechSequence:
    textList: SpeechSequence = []
    name: Optional[str] = propertyValues.get('name')
    if name:
        textList.append(name)
    if 'role' in propertyValues:
        role: controlTypes.Role = propertyValues['role']
        speakRole=True
    elif '_role' in propertyValues:
        speakRole=False
        role: controlTypes.Role = propertyValues['_role']
    else:
        speakRole=False
        role=controlTypes.Role.UNKNOWN
    role = controlTypes.Role(role)
    value: Optional[str] = propertyValues.get('value') if role not in controlTypes.silentValuesForRoles else None
    cellCoordsText: Optional[str] = propertyValues.get('cellCoordsText')
    rowNumber = propertyValues.get('rowNumber')
    columnNumber = propertyValues.get('columnNumber')
    includeTableCellCoords = propertyValues.get('includeTableCellCoords', True)

    if role == controlTypes.Role.CHARTELEMENT:
        speakRole = False
    roleText: Optional[str] = propertyValues.get('roleText')
    if (
        speakRole
        and (
            roleText
            or reason not in (
                OutputReason.SAYALL,
                OutputReason.CARET,
                OutputReason.FOCUS,
                OutputReason.QUICKNAV
            )
            or not (
                name
                or value
                or cellCoordsText
                or rowNumber
                or columnNumber
            )
            or role not in controlTypes.silentRolesOnFocus
        )
        and (
            role != controlTypes.Role.MATH
            or reason not in (
                OutputReason.CARET,
                OutputReason.SAYALL
            )
        )):
        textList.append(roleText if roleText else role.displayString)
    if value:
        textList.append(value)
    states = propertyValues.get('states')
    realStates=propertyValues.get('_states',states)
    negativeStates=propertyValues.get('negativeStates',set())
    if states is not None or negativeStates:
        if states is None:
            states = set()
        labelStates = controlTypes.processAndLabelStates(role, realStates, reason, states, negativeStates)
        textList.extend(labelStates)
    description: Optional[str] = propertyValues.get('description')
    if description:
        textList.append(description)
    keyboardShortcut: Optional[str] = propertyValues.get('keyboardShortcut')
    textList.extend(getKeyboardShortcutsSpeech(keyboardShortcut))
    if includeTableCellCoords and cellCoordsText:
        textList.append(cellCoordsText)
    if cellCoordsText or rowNumber or columnNumber:
        tableID = propertyValues.get("_tableID")
        sameTable = (tableID and tableID == _speechState.oldTableID)
        if tableID and not sameTable:
            _speechState.oldTableID = tableID
        rowSpan = propertyValues.get("rowSpan") or 1
        columnSpan = propertyValues.get("columnSpan") or 1
        if rowNumber and (
            not sameTable
            or rowNumber != _speechState.oldRowNumber
            or rowSpan != _speechState.oldRowSpan
        ):
            rowHeaderText: Optional[str] = propertyValues.get("rowHeaderText")
            if rowHeaderText:
                textList.append(rowHeaderText)
            if includeTableCellCoords and not cellCoordsText: 
                rowNumberTranslation: str = _("row %s") % rowNumber
                textList.append(rowNumberTranslation)
                if rowSpan>1 and columnSpan<=1:
                    rowSpanAddedTranslation: str = _("through {endRow}").format(endRow=rowNumber + rowSpan - 1)
                    textList.append(rowSpanAddedTranslation)
            _speechState.oldRowNumber = rowNumber
            _speechState.oldRowSpan = rowSpan
        if columnNumber and (
            not sameTable
            or columnNumber != _speechState.oldColumnNumber
            or columnSpan != _speechState.oldColumnSpan
        ):
            columnHeaderText: Optional[str] = propertyValues.get("columnHeaderText")
            if columnHeaderText:
                textList.append(columnHeaderText)
            if includeTableCellCoords and not cellCoordsText:
                colNumberTranslation: str = _("column %s") % columnNumber
                textList.append(colNumberTranslation)
                if columnSpan>1 and rowSpan<=1:
                    colSpanAddedTranslation: str = _("through {endCol}").format(endCol=columnNumber + columnSpan - 1)
                    textList.append(colSpanAddedTranslation)
            _speechState.oldColumnNumber = columnNumber
            _speechState.oldColumnSpan = columnSpan
        if includeTableCellCoords and not cellCoordsText and rowSpan>1 and columnSpan>1:
            rowColSpanTranslation: str = _("through row {row} column {column}").format(
                row=rowNumber + rowSpan - 1,
                column=columnNumber + columnSpan - 1
            )
            textList.append(rowColSpanTranslation)
    rowCount=propertyValues.get('rowCount',0)
    columnCount=propertyValues.get('columnCount',0)
    rowAndColumnCountText = _rowAndColumnCountText(rowCount, columnCount)
    if rowAndColumnCountText:
        textList.append(rowAndColumnCountText)
    if rowCount or columnCount:
        _speechState.oldTableID = None

    isCurrent = propertyValues.get('current', controlTypes.IsCurrent.NO)
    if isCurrent != controlTypes.IsCurrent.NO:
        textList.append(isCurrent.displayString)

    hasDetails = propertyValues.get('hasDetails', False)
    if hasDetails:
        detailsRoles: _AnnotationRolesT = propertyValues.get("detailsRoles", tuple())
        if detailsRoles:
            roleStrings = (role.displayString if role else _("details") for role in detailsRoles)
            for roleString in roleStrings:
                textList.append(
                    _("has %s") % roleString
                )
        else:
            textList.append(
                _("has details")
            )

    placeholder: Optional[str] = propertyValues.get('placeholder', None)
    if placeholder:
        textList.append(placeholder)
    indexInGroup=propertyValues.get('positionInfo_indexInGroup',0)
    similarItemsInGroup=propertyValues.get('positionInfo_similarItemsInGroup',0)
    if 0<indexInGroup<=similarItemsInGroup:
        itemPosTranslation: str = _("{number} of {total}").format(
            number=indexInGroup,
            total=similarItemsInGroup
        )
        textList.append(itemPosTranslation)
    if 'positionInfo_level' in propertyValues:
        level=propertyValues.get('positionInfo_level',None)
        role=propertyValues.get('role',None)
        if level is not None:
            levelTranslation: str = _('level %s') % level
            if (
                role in (controlTypes.Role.TREEVIEWITEM, controlTypes.Role.LISTITEM)
                and level != _speechState.oldTreeLevel
            ):
                textList.insert(0, levelTranslation)
                _speechState.oldTreeLevel = level
            else:
                textList.append(levelTranslation)
    
    errorMessage: str | None = propertyValues.get("errorMessage", None)
    if errorMessage:
        textList.append(errorMessage)
    if config.conf["speech"]["unicodeNormalization"]:
        textList = [
            unicodeNormalize(t) if isinstance(t, str) else t
            for t in textList
        ]
    types.logBadSequenceTypes(textList)
    return textList

This update ensures that only strings are passed to unicodeNormalize, preventing the type error.

Analysis chain

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for the definition of the getPropertiesSpeech function in the codebase.
ast-grep --lang python --pattern $'def getPropertiesSpeech($_):\n    $$$' 

# Search for the usage of the unicodeNormalize function in the codebase.
ast-grep --lang python --pattern $'def unicodeNormalize($_):\n    $$$' 

# Search for the CharacterModeCommand class or type in the codebase.
ast-grep --lang python --pattern $'class CharacterModeCommand($_):\n    $$$'

Length of output: 1517

Script:

#!/bin/bash
# Search for the definition of the getPropertiesSpeech function in the speech module.
ast-grep --lang python --pattern $'def getPropertiesSpeech($_):\n    $$$' source/speech

# Search for the definition of the unicodeNormalize function in the textUtils module.
ast-grep --lang python --pattern $'def unicodeNormalize($_):\n    $$$' source/textUtils

Length of output: 239

Script:

#!/bin/bash
# Locate the textUtils module in the repository.
fd textUtils.py

# Locate the getPropertiesSpeech function in the repository.
ast-grep --lang python --pattern $'def getPropertiesSpeech($_):\n    $$$'

Length of output: 140

Script:

#!/bin/bash
# Search for the definition of the unicodeNormalize function in the source/textUtils.py file.
ast-grep --lang python --pattern $'def unicodeNormalize($_):\n    $$$' source/textUtils.py

# Locate the getPropertiesSpeech function or method in the repository.
rg 'def getPropertiesSpeech' --glob '*.py'

Length of output: 198

Script:

#!/bin/bash
# Display the implementation of the unicodeNormalize function in source/textUtils.py.
sed -n '/def unicodeNormalize/,/^$/p' source/textUtils.py

# Display the implementation of the getPropertiesSpeech function in source/speech/speech.py.
sed -n '/def getPropertiesSpeech/,/^$/p' source/speech/speech.py

Length of output: 47901

Actions Performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai · 2024-05-15T01:14:31Z

Walkthrough

The recent changes introduce Unicode normalization to NVDA's speech and braille output, enhancing the handling of complex characters like ligatures and composed characters. This feature can be enabled via the settings dialog for both speech and braille, ensuring compatibility with various synthesizers and braille tables.

Changes

File Path	Change Summary
`source/braille.py`	Added Unicode normalization handling for braille translation, including offset processing.
`source/config/configSpec.py`	Introduced a `unicodeNormalization` feature flag for different sections, including speech and braille.
`source/gui/settingsDialogs.py`	Added `unicodeNormalizationCombo` control for managing Unicode normalization settings in the GUI.
`source/speech/speech.py`	Integrated Unicode normalization in speech processing based on configuration settings.
`source/textUtils.py`	Added utilities for Unicode normalization, including `UnicodeNormalizationOffsetConverter` class.
`tests/unit/test_textUtils.py`	Added tests for `UnicodeNormalizationOffsetConverter` to verify normalization scenarios.
`user_docs/en/changes.md`	Documented the addition of Unicode normalization support for speech and braille.
`user_docs/en/userGuide.md`	Added sections explaining Unicode normalization settings for speech and braille.

Assessment against linked issues

Objective	Addressed	Explanation
Add optional Unicode normalization before passing strings to speech or braille (#16466)	✅
Ensure normalization handles ligatures and composed characters correctly (#16466)	✅
Provide settings to enable/disable Unicode normalization for both speech and braille (#16466)	✅

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai generate interesting stats about this repository and render them as a table.
- @coderabbitai show all the console.log statements in this repository.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 12

Out of diff range and nitpick comments (78)

source/speech/speech.py (71)
Line range hint 95-102: Undefined name pgettext. Ensure pgettext is imported or defined.
+ from gettext import pgettext

@property
def _displayStringLabels(self) -> dict[Self, str]:
    return {
        # Translators: Name of the speech mode which disables speech output.
        self.off: pgettext("speechModes", "off"),
        # Translators: Name of the speech mode which will cause NVDA to beep instead of speaking.
        self.beeps: pgettext("speechModes", "beeps"),
        # Translators: Name of the speech mode which causes NVDA to speak normally.
        self.talk: pgettext("speechModes", "talk"),
        # Translators: Name of the on-demand speech mode,
        # in which NVDA only speaks in response to commands that report content.
        self.onDemand: pgettext("speechModes", "on-demand"),
    }
Line range hint 204-204: Undefined name _. Ensure _ is imported or defined.
+ from gettext import gettext as _
Line range hint 345-345: Undefined name _. Ensure _ is imported or defined.
+ from gettext import gettext as _
Line range hint 385-385: Undefined name _. Ensure _ is imported or defined.
+ from gettext import gettext as _
Line range hint 923-923: Undefined name _. Ensure _ is imported or defined.
+ from gettext import gettext as _
Line range hint 1013-1013: Multiple statements on one line. Separate them for better readability.
- if sectionBreakType == "0" : text=_("continuous section break")
+ if sectionBreakType == "0":
+     text = _("continuous section break")
Line range hint 1018-1018: Multiple statements on one line. Separate them for better readability.
- elif sectionBreakType == "1" : text=_("new column section break")
+ elif sectionBreakType == "1":
+     text = _("new column section break")
Line range hint 1084-1084: Undefined name _. Ensure _ is imported or defined.
+ from gettext import gettext as _
Line range hint 1103-1103: Undefined name _. Ensure _ is imported or defined.
+ from gettext import gettext as _
Line range hint 1128-1128: Undefined name ngettext. Ensure ngettext is imported or defined.
+ from gettext import ngettext
Line range hint 1202-1202: Undefined name _. Ensure _ is imported or defined.
+ from gettext import gettext as _
Line range hint 1209-1209: Undefined name _. Ensure _ is imported or defined.
+ from gettext import gettext as _
Line range hint 1212-1212: Undefined name _. Ensure _ is imported or defined.
+ from gettext import gettext as _
Line range hint 1672-1672: Multiple statements on one line. Separate them for better readability.
- if not curLanguage or (not autoDialectSwitching and curLanguage.split('_')[0]==defaultLanguageRoot): curLanguage=defaultLanguage
+ if not curLanguage or (not autoDialectSwitching and curLanguage.split('_')[0] == defaultLanguageRoot):
+     curLanguage = defaultLanguage
Line range hint 1707-1707: Undefined name _. Ensure _ is imported or defined.
+ from gettext import gettext as _
Line range hint 1876-1876: Undefined name _. Ensure _ is imported or defined.
+ from gettext import gettext as _
Line range hint 1880-1880: Undefined name _. Ensure _ is imported or defined.
+ from gettext import gettext as _
Line range hint 1894-1894: Undefined name _. Ensure _ is imported or defined.
+ from gettext import gettext as _
Line range hint 1898-1898: Undefined name _. Ensure _ is imported or defined.
+ from gettext import gettext as _
Line range hint 1905-1905: Undefined name _. Ensure _ is imported or defined.
+ from gettext import gettext as _
Line range hint 1929-1929: Undefined name _. Ensure _ is imported or defined.
+ from gettext import gettext as _
Line range hint 1934-1934: Undefined name _. Ensure _ is imported or defined.
+ from gettext import gettext as _
Line range hint 1939-1939: Undefined name _. Ensure _ is imported or defined.
+ from gettext import gettext as _
Line range hint 1951-1951: Undefined name _. Ensure _ is imported or defined.
+ from gettext import gettext as _
Line range hint 1961-1961: Undefined name _. Ensure _ is imported or defined.
+ from gettext import gettext as _
Line range hint 1990-1990: Undefined name _. Ensure _ is imported or defined.
+ from gettext import gettext as _
Line range hint 1996-1996: Undefined name ngettext. Ensure ngettext is imported or defined.
+ from gettext import ngettext
Line range hint 1999-1999: Undefined name ngettext. Ensure ngettext is imported or defined.
+ from gettext import ngettext
Line range hint 2004-2004: Undefined name ngettext. Ensure ngettext is imported or defined.
+ from gettext import ngettext
Line range hint 2014-2014: Undefined name ngettext. Ensure ngettext is imported or defined.
+ from gettext import ngettext
Line range hint 2050-2050: Test for membership should be not in.
- if not (set(t) <= LINE_END_CHARS)
+ if set(t) not in LINE_END_CHARS
Line range hint 2207-2207: Undefined name ngettext. Ensure ngettext is imported or defined.
+ from gettext import ngettext
Line range hint 2349-2349: Undefined name _. Ensure _ is imported or defined.
+ from gettext import gettext as _
Line range hint 2415-2415: Undefined name _. Ensure _ is imported or defined.
+ from gettext import gettext as _
Line range hint 2422-2422: Undefined name _. Ensure _ is imported or defined.
+ from gettext import gettext as _
Line range hint 2436-2436: Comparison to None should be cond is None.
- if oldTableInfo == None:
+ if oldTableInfo is None:
Line range hint 2441-2441: Undefined name _. Ensure _ is imported or defined.
+ from gettext import gettext as _
Line range hint 2446-2446: Undefined name ngettext. Ensure ngettext is imported or defined.
+ from gettext import ngettext
Line range hint 2450-2450: Undefined name _. Ensure _ is imported or defined.
+ from gettext import gettext as _
Line range hint 2456-2456: Undefined name _. Ensure _ is imported or defined.
+ from gettext import gettext as _
Line range hint 2458-2458: Undefined name _. Ensure _ is imported or defined.
+ from gettext import gettext as _
Line range hint 2460-2460: Undefined name _. Ensure _ is imported or defined.
+ from gettext import gettext as _
Line range hint 2462-2462: Undefined name _. Ensure _ is imported or defined.
+ from gettext import gettext as _
Line range hint 2464-2464: Undefined name _. Ensure _ is imported or defined.
+ from gettext import gettext as _
Line range hint 2470-2470: Undefined name _. Ensure _ is imported or defined.
+ from gettext import gettext as _
Line range hint 2488-2488: Undefined name _. Ensure _ is imported or defined.
+ from gettext import gettext as _
Line range hint 2498-2498: Undefined name _. Ensure _ is imported or defined.
+ from gettext import gettext as _
Line range hint 2502-2502: Undefined name _. Ensure _ is imported or defined.
+ from gettext import gettext as _
Line range hint 2512-2512: Undefined name _. Ensure _ is imported or defined.
+ from gettext import gettext as _
Line range hint 2543-2543: Undefined name _. Ensure _ is imported or defined.
+ from gettext import gettext as _
Line range hint 2548-2548: Undefined name _. Ensure _ is imported or defined.
+ from gettext import gettext as _
Line range hint 2554-2554: Undefined name _. Ensure _ is imported or defined.
+ from gettext import gettext as _
Line range hint 2558-2558: Undefined name _. Ensure _ is imported or defined.
+ from gettext import gettext as _
Line range hint 2565-2565: Undefined name _. Ensure _ is imported or defined.
+ from gettext import gettext as _
Line range hint 2566-2566: Undefined name _. Ensure _ is imported or defined.
+ from gettext import gettext as _
Line range hint 2573-2573: Undefined name _. Ensure _ is imported or defined.
+ from gettext import gettext as _
Line range hint 2581-2581: Undefined name _. Ensure _ is imported or defined.
+ from gettext import gettext as _
Line range hint 2583-2583: Undefined name _. Ensure _ is imported or defined.
+ from gettext import gettext as _
Line range hint 2589-2589: Undefined name _. Ensure _ is imported or defined.
+ from gettext import gettext as _
Line range hint 2591-2591: Undefined name _. Ensure _ is imported or defined.
+ from gettext import gettext as _
Line range hint 2598-2598: Undefined name _. Ensure _ is imported or defined.
+ from gettext import gettext as _
Line range hint 2601-2601: Undefined name _. Ensure _ is imported or defined.
+ from gettext import gettext as _
Line range hint 2609-2609: Undefined name _. Ensure _ is imported or defined.
+ from gettext import gettext as _
Line range hint 2611-2611: Undefined name _. Ensure _ is imported or defined.
+ from gettext import gettext as _
Line range hint 2620-2620: Undefined name _. Ensure _ is imported or defined.
+ from gettext import gettext as _
Line range hint 2622-2622: Undefined name _. Ensure _ is imported or defined.
+ from gettext import gettext as _
Line range hint 2630-2630: Undefined name _. Ensure _ is imported or defined.
+ from gettext import gettext as _
Line range hint 2632-2632: Undefined name _. Ensure _ is imported or defined.
+ from gettext import gettext as _
Line range hint 2639-2639: Undefined name _. Ensure _ is imported or defined.
+ from gettext import gettext as _
Line range hint 2641-2641: Undefined name _. Ensure _ is imported or defined.
+ from gettext import gettext as _
Line range hint 2648-2648: Undefined name _. Ensure _ is imported or defined.
+ from gettext import gettext as _
source/braille.py (1)
Line range hint 582-582: Add missing import for _ to handle translations.
+ from gettext import gettext as _
source/gui/settingsDialogs.py (6)
Line range hint 4640-4640: Remove unnecessary f-string.

The f-string on this line does not contain any placeholders and can be simplified to a regular string.
- f"Current sizerDict: {self.sizerDict!r}"
+ "Current sizerDict: {!r}".format(self.sizerDict)
Line range hint 4773-4773: Ensure pgettext is defined before usage.

The pgettext function is used for translations but is not defined or imported in this context. Ensure that pgettext is imported from the appropriate module, typically gettext.
from gettext import pgettext
Line range hint 4952-4953: Ensure _ is defined before usage.

The _ function is used for translations but is not defined or imported in this context. Ensure that _ is imported from the appropriate module, typically gettext.
from gettext import gettext as _
Line range hint 79-79: Remove unused import.

The keyLabels import is not used anywhere in the file and should be removed to clean up the code.
- import keyLabels
Line range hint 108-108: Avoid multiple statements on one line.

Multiple statements on one line can reduce code readability. Consider splitting them into separate lines.
- except RuntimeError: updateCheck = None
+ except RuntimeError:
+     updateCheck = None
Line range hint 425-425: Ensure _ is defined before usage.

The _ function is used for translations but is not defined or imported in this context. Ensure that _ is imported from the appropriate module, typically gettext.
from gettext import gettext as _

Review Details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 2d86502 and 8d0f95e.

Files selected for processing (8)

source/braille.py (2 hunks)
source/config/configSpec.py (2 hunks)
source/gui/settingsDialogs.py (4 hunks)
source/speech/speech.py (4 hunks)
source/textUtils.py (4 hunks)
tests/unit/test_textUtils.py (3 hunks)
user_docs/en/changes.md (1 hunks)
user_docs/en/userGuide.md (2 hunks)

Additional Context Used

Ruff (547)

source/braille.py (114)

8-8: os imported but unused

81-81: Undefined name _

84-84: Undefined name _

87-87: Undefined name _

90-90: Undefined name _

93-93: Undefined name _

96-96: Undefined name _

99-99: Undefined name _

102-102: Undefined name _

105-105: Undefined name _

108-108: Undefined name _

111-111: Undefined name _

114-114: Undefined name _

117-117: Undefined name _

120-120: Undefined name _

123-123: Undefined name _

126-126: Undefined name _

129-129: Undefined name _

132-132: Undefined name _

135-135: Undefined name _

138-138: Undefined name _

141-141: Undefined name _

144-144: Undefined name _

147-147: Undefined name _

150-150: Undefined name _

153-153: Undefined name _

159-159: Undefined name _

162-162: Undefined name _

165-165: Undefined name _

168-168: Undefined name _

171-171: Undefined name _

174-174: Undefined name _

177-177: Undefined name _

180-180: Undefined name _

183-183: Undefined name _

186-186: Undefined name _

189-189: Undefined name _

192-192: Undefined name _

195-195: Undefined name _

198-198: Undefined name _

201-201: Undefined name _

204-204: Undefined name _

207-207: Undefined name _

210-210: Undefined name _

212-212: Undefined name _

214-214: Undefined name _

216-216: Undefined name _

218-218: Undefined name _

220-220: Undefined name _

222-222: Undefined name _

224-224: Undefined name _

226-226: Undefined name _

228-228: Undefined name _

230-230: Undefined name _

232-232: Undefined name _

237-237: Undefined name _

247-247: Undefined name _

249-249: Undefined name _

251-251: Undefined name _

253-253: Undefined name _

255-255: Undefined name _

257-257: Undefined name _

259-259: Undefined name _

261-261: Undefined name _

263-263: Undefined name _

265-265: Undefined name _

267-267: Undefined name _

269-269: Undefined name _

271-271: Undefined name _

273-273: Undefined name _

275-275: Undefined name _

281-281: Undefined name _

292-292: Undefined name pgettext

294-294: Undefined name pgettext

296-296: Undefined name pgettext

298-298: Undefined name pgettext

300-300: Undefined name pgettext

302-302: Undefined name pgettext

304-304: Undefined name pgettext

310-310: Undefined name _

312-312: Undefined name _

314-314: Undefined name _

346-346: Undefined name _

350-350: Undefined name _

354-354: Undefined name _

364-364: Undefined name _

376-376: Undefined name _

380-380: Undefined name _

432-432: Do not use bare except

582-582: Undefined name _

592-592: Undefined name _

630-630: Undefined name _

636-636: Undefined name _

681-681: Undefined name _

685-685: Undefined name _

691-691: Undefined name _

695-695: Undefined name _

705-705: Undefined name _

709-709: Undefined name _

918-918: Undefined name _

1069-1069: Undefined name _

1082-1082: Undefined name _

1085-1085: Undefined name _

1088-1088: Undefined name _

1096-1096: Undefined name _

1133-1133: Do not use bare except

2433-2433: Do not use bare except

2844-2844: Do not use bare except

3194-3194: Undefined name _

3207-3207: Undefined name _

3217-3217: Undefined name _

3369-3369: Undefined name _

3407-3407: Undefined name _

3414-3414: Undefined name _

source/gui/settingsDialogs.py (322)

79-79: keyLabels imported but unused

108-108: Multiple statements on one line (colon)

425-425: Undefined name _

437-437: Undefined name _

494-494: Multiple statements on one line (colon)

545-545: Undefined name _

771-771: Undefined name _

775-775: Undefined name _

777-777: Undefined name _

779-779: Undefined name _

781-781: Undefined name _

783-783: Undefined name _

804-804: Undefined name _

810-810: Undefined name _

825-825: Undefined name _

833-833: Undefined name _

839-839: Undefined name _

847-847: Undefined name _

864-864: Undefined name _

876-876: Undefined name _

886-886: Undefined name _

902-902: Undefined name _

911-911: Undefined name _

920-920: Undefined name _

933-933: Undefined name _

942-942: Undefined name _

949-949: Undefined name _

952-952: Undefined name _

962-962: Undefined name _

964-964: Undefined name _

969-969: Do not use bare except

977-977: Undefined name _

977-977: Undefined name _

980-980: Undefined name _

980-980: Undefined name _

1002-1002: Undefined name _

1002-1002: Undefined name _

1024-1024: Undefined name _

1028-1028: Undefined name _

1032-1032: Undefined name _

1037-1037: Undefined name _

1054-1054: Undefined name _

1060-1060: Undefined name _

1082-1082: Undefined name _

1137-1137: Undefined name _

1145-1145: Undefined name _

1163-1163: Do not use bare except

1518-1518: Undefined name _

1544-1544: Undefined name _

1558-1558: Undefined name _

1569-1569: Undefined name _

1585-1585: Undefined name _

1593-1593: Undefined name _

1603-1603: Undefined name _

1630-1630: Undefined name _

1644-1644: Undefined name _

1655-1655: Undefined name _

1669-1669: Undefined name _

1683-1683: Undefined name _

1697-1697: Undefined name _

1740-1740: Undefined name _

1747-1747: Undefined name _

1762-1762: Undefined name _

1765-1765: Undefined name _

1774-1774: Undefined name _

1781-1781: Undefined name _

1790-1790: Do not use bare except

1795-1795: Undefined name _

1808-1808: Undefined name _

1818-1818: Undefined name _

1825-1825: Undefined name _

1832-1832: Undefined name _

1839-1839: Undefined name _

1847-1847: Undefined name _

1854-1854: Undefined name _

1861-1861: Undefined name _

1870-1870: Undefined name _

1881-1881: Undefined name _

1884-1884: Undefined name _

1907-1907: Undefined name _

1915-1915: Undefined name _

1922-1922: Undefined name _

1929-1929: Undefined name _

1937-1937: Do not use bare except

1943-1943: Undefined name _

1952-1952: Undefined name _

1959-1959: Undefined name _

1966-1966: Undefined name _

1985-1985: Undefined name _

1991-1991: Undefined name _

1997-1997: Undefined name _

2003-2003: Undefined name _

2009-2009: Undefined name _

2023-2023: Undefined name _

2029-2029: Undefined name _

2035-2035: Undefined name _

2041-2041: Undefined name _

2050-2050: Undefined name _

2059-2059: Undefined name _

2077-2077: Undefined name _

2085-2085: Undefined name _

2091-2091: Undefined name _

2095-2095: Undefined name _

2099-2099: Undefined name _

2103-2103: Undefined name _

2116-2116: Undefined name _

2123-2123: Undefined name _

2130-2130: Undefined name _

2137-2137: Undefined name _

2144-2144: Undefined name _

2151-2151: Undefined name _

2158-2158: Undefined name _

2171-2171: Undefined name _

2181-2181: Undefined name _

2191-2191: Undefined name _

2213-2213: Undefined name _

2220-2220: Undefined name _

2229-2229: Undefined name _

2238-2238: Undefined name _

2245-2245: Undefined name _

2252-2252: Undefined name _

2259-2259: Undefined name _

2266-2266: Undefined name _

2276-2276: Undefined name _

2286-2286: Undefined name _

2296-2296: Undefined name _

2306-2306: Undefined name _

2336-2336: Undefined name _

2340-2340: Undefined name _

2349-2349: Undefined name _

2357-2357: Undefined name _

2363-2363: Undefined name _

2369-2369: Undefined name _

2375-2375: Undefined name _

2385-2385: Undefined name _

2391-2391: Undefined name _

2401-2401: Undefined name _

2407-2407: Undefined name _

2413-2413: Undefined name _

2421-2421: Undefined name _

2427-2427: Undefined name _

2433-2433: Undefined name _

2439-2439: Undefined name _

2445-2445: Undefined name _

2453-2453: Undefined name _

2459-2459: Undefined name _

2467-2467: Undefined name _

2481-2481: Undefined name _

2493-2493: Undefined name _

2500-2500: Undefined name _

2507-2507: Undefined name _

2513-2513: Undefined name _

2521-2521: Undefined name _

2528-2528: Undefined name _

2536-2536: Undefined name _

2544-2544: Undefined name _

2552-2552: Undefined name _

2560-2560: Undefined name _

2565-2565: Undefined name _

2570-2570: Undefined name _

2575-2575: Undefined name _

2580-2580: Undefined name _

2586-2586: Undefined name _

2592-2592: Undefined name _

2598-2598: Undefined name _

2603-2603: Undefined name _

2609-2609: Undefined name _

2614-2614: Undefined name _

2619-2619: Undefined name _

2673-2673: Undefined name _

2679-2679: Undefined name _

2696-2696: Undefined name _

2699-2699: Undefined name _

2706-2706: Undefined name _

2714-2714: Undefined name _

2719-2719: Undefined name _

2729-2729: Undefined name _

2743-2743: Undefined name _

2751-2751: Undefined name _

2762-2762: Undefined name _

2776-2776: Undefined name _

2797-2797: Undefined name _

2818-2818: Undefined name _

2826-2826: Undefined name _

2888-2888: Undefined name _

2891-2891: Undefined name _

2899-2899: Undefined name _

2912-2912: Undefined name _

2919-2919: Undefined name _

2925-2925: Undefined name _

2937-2937: Undefined name _

2955-2955: Undefined name _

2965-2965: Undefined name _

2996-2996: Undefined name _

3004-3004: Undefined name _

3017-3017: Undefined name _

3027-3027: Undefined name _

3036-3036: Undefined name _

3041-3041: Undefined name _

3045-3045: Undefined name _

3049-3049: Undefined name _

3075-3075: Undefined name pgettext

3083-3083: Undefined name pgettext

3085-3085: Undefined name pgettext

3087-3087: Undefined name pgettext

3089-3089: Undefined name pgettext

3098-3098: Undefined name _

3106-3106: Undefined name _

3111-3111: Undefined name _

3114-3114: Undefined name _

3118-3118: Undefined name _

3137-3137: Undefined name pgettext

3146-3146: Undefined name pgettext

3148-3148: Undefined name pgettext

3150-3150: Undefined name pgettext

3152-3152: Undefined name pgettext

3160-3160: Undefined name _

3171-3171: Undefined name _

3180-3180: Undefined name _

3187-3187: Undefined name _

3196-3196: Undefined name _

3203-3203: Undefined name _

3212-3212: Undefined name _

3219-3219: Undefined name _

3226-3226: Undefined name _

3237-3237: Undefined name _

3242-3242: Undefined name _

3246-3246: Undefined name _

3250-3250: Undefined name _

3270-3270: Undefined name _

3282-3282: Undefined name _

3290-3290: Undefined name _

3293-3293: Undefined name _

3296-3296: Undefined name _

3300-3300: Undefined name _

3316-3316: Undefined name _

3322-3322: Undefined name _

3333-3333: Undefined name _

3340-3340: Undefined name _

3353-3353: Undefined name _

3361-3361: Undefined name _

3374-3374: Undefined name _

3376-3376: Local variable audioBox is assigned to but never used

3381-3381: Undefined name _

3392-3392: Undefined name _

3416-3416: Undefined name _

3434-3434: Undefined name _

3437-3437: Undefined name pgettext

3439-3439: Undefined name pgettext

3448-3448: Undefined name _

3468-3468: Undefined name _

3583-3583: Undefined name _

3588-3588: Undefined name _

3590-3590: Undefined name _

3617-3617: Undefined name _

3630-3630: Undefined name _

3670-3670: Undefined name _

3677-3677: Undefined name _

3692-3692: Undefined name _

3746-3746: Undefined name _

3755-3755: Undefined name _

3761-3761: Undefined name _

3770-3770: Undefined name _

3804-3804: Do not use bare except

3875-3875: Undefined name _

3878-3878: Undefined name _

3912-3912: Undefined name _

3921-3921: Do not use bare except

3930-3930: Undefined name _

3938-3938: Do not use bare except

3946-3946: Undefined name _

3962-3962: Undefined name _

3970-3970: Undefined name _

3979-3979: Undefined name _

3990-3990: Undefined name _

4010-4010: Undefined name _

4020-4020: Do not use bare except

4026-4026: Undefined name _

4036-4036: Do not use bare except

4045-4045: Undefined name _

4063-4063: Undefined name _

4079-4079: Undefined name _

4098-4098: Undefined name _

4116-4116: Undefined name _

4124-4124: Undefined name _

4135-4135: Do not use bare except

4140-4140: Undefined name _

4155-4155: Undefined name _

4161-4161: Undefined name _

4172-4172: Undefined name _

4246-4246: Undefined name _

4253-4253: Undefined name _

4259-4259: Undefined name _

4276-4276: Undefined name _

4281-4281: Undefined name _

4290-4290: Undefined name _

4380-4380: Undefined name _

4384-4384: Undefined name _

4557-4557: Undefined name _

4564-4564: Undefined name _

4610-4610: Undefined name _

4613-4613: Undefined name _

4640-4640: f-string without any placeholders

4653-4653: Undefined name _

4690-4690: Undefined name _

4726-4726: Undefined name _

4731-4731: Undefined name _

4759-4759: Undefined name _

4773-4773: Undefined name pgettext

4782-4782: Undefined name _

4792-4792: Undefined name _

4794-4794: Undefined name _

4796-4796: Undefined name _

4799-4799: Undefined name _

4803-4803: Undefined name _

4820-4820: Undefined name _

4829-4829: Undefined name _

4836-4836: Undefined name _

4844-4844: Undefined name _

4847-4847: Undefined name _

4952-4952: Undefined name _

4953-4953: Undefined name _

source/speech/speech.py (111)

31-31: Redefinition of unused speechCanceled from line 30

95-95: Undefined name pgettext

97-97: Undefined name pgettext

99-99: Undefined name pgettext

102-102: Undefined name pgettext

204-204: Undefined name _

345-345: Undefined name _

385-385: Undefined name _

923-923: Undefined name _

1013-1013: Multiple statements on one line (colon)

1018-1018: Multiple statements on one line (colon)

1084-1084: Undefined name _

1103-1103: Undefined name _

1128-1128: Undefined name ngettext

1202-1202: Undefined name _

1209-1209: Undefined name _

1212-1212: Undefined name _

1672-1672: Multiple statements on one line (colon)

1707-1707: Undefined name _

1876-1876: Undefined name _

1880-1880: Undefined name _

1894-1894: Undefined name _

1898-1898: Undefined name _

1905-1905: Undefined name _

1929-1929: Undefined name _

1934-1934: Undefined name _

1939-1939: Undefined name _

1951-1951: Undefined name _

1961-1961: Undefined name _

1990-1990: Undefined name _

1996-1996: Undefined name ngettext

1999-1999: Undefined name ngettext

2004-2004: Undefined name ngettext

2014-2014: Undefined name ngettext

2050-2050: Test for membership should be not in

2207-2207: Undefined name ngettext

2349-2349: Undefined name _

2415-2415: Undefined name _

2422-2422: Undefined name _

2436-2436: Comparison to None should be cond is None

2441-2441: Undefined name _

2446-2446: Undefined name ngettext

2450-2450: Undefined name _

2456-2456: Undefined name _

2458-2458: Undefined name _

2460-2460: Undefined name _

2462-2462: Undefined name _

2464-2464: Undefined name _

2470-2470: Undefined name _

2488-2488: Undefined name _

2498-2498: Undefined name _

2502-2502: Undefined name _

2512-2512: Undefined name _

2543-2543: Undefined name _

2548-2548: Undefined name _

2554-2554: Undefined name _

2558-2558: Undefined name _

2565-2565: Undefined name _

2566-2566: Undefined name _

2573-2573: Undefined name _

2581-2581: Undefined name _

2583-2583: Undefined name _

2589-2589: Undefined name _

2591-2591: Undefined name _

2598-2598: Undefined name _

2601-2601: Undefined name _

2609-2609: Undefined name _

2611-2611: Undefined name _

2620-2620: Undefined name _

2622-2622: Undefined name _

2630-2630: Undefined name _

2632-2632: Undefined name _

2639-2639: Undefined name _

2641-2641: Undefined name _

2648-2648: Undefined name _

2650-2650: Undefined name _

2656-2656: Undefined name _

2658-2658: Undefined name _

2666-2666: Undefined name _

2669-2669: Undefined name _

2673-2673: Undefined name _

2679-2679: Undefined name _

2681-2681: Undefined name _

2688-2688: Undefined name _

2690-2690: Undefined name _

2721-2721: Undefined name _

2723-2723: Undefined name _

2727-2727: Undefined name _

2729-2729: Undefined name _

2733-2733: Undefined name _

2735-2735: Undefined name _

2739-2739: Undefined name _

2741-2741: Undefined name _

2757-2757: Undefined name _

2762-2762: Undefined name _

2762-2762: Undefined name _

2762-2762: Undefined name _

2771-2771: Undefined name _

2774-2774: Undefined name _

2777-2777: Undefined name _

2781-2781: Undefined name _

2789-2789: Undefined name _

2793-2793: Undefined name _

2801-2801: Undefined name _

2804-2804: Undefined name _

2814-2814: Undefined name _

2817-2817: Undefined name _

2848-2848: Undefined name _

2863-2863: Undefined name _

2871-2871: Undefined name _

2875-2875: Undefined name _

Path-based Instructions (8)

source/config/configSpec.py (1)

Pattern **/*: focus on code smells, logic errors, edge cases, and serious issues.
Avoid minor issues such as linting, formatting and style issues.

tests/unit/test_textUtils.py (1)

Pattern **/*: focus on code smells, logic errors, edge cases, and serious issues.
Avoid minor issues such as linting, formatting and style issues.

source/textUtils.py (1)

Pattern **/*: focus on code smells, logic errors, edge cases, and serious issues.
Avoid minor issues such as linting, formatting and style issues.

source/speech/speech.py (1)

Pattern **/*: focus on code smells, logic errors, edge cases, and serious issues.
Avoid minor issues such as linting, formatting and style issues.

source/braille.py (1)

Pattern **/*: focus on code smells, logic errors, edge cases, and serious issues.
Avoid minor issues such as linting, formatting and style issues.

source/gui/settingsDialogs.py (1)

Pattern **/*: focus on code smells, logic errors, edge cases, and serious issues.
Avoid minor issues such as linting, formatting and style issues.

user_docs/en/userGuide.md (1)

Pattern **/*: focus on code smells, logic errors, edge cases, and serious issues.
Avoid minor issues such as linting, formatting and style issues.

user_docs/en/changes.md (1)

Pattern **/*: focus on code smells, logic errors, edge cases, and serious issues.
Avoid minor issues such as linting, formatting and style issues.

Additional comments not posted (21)

source/config/configSpec.py (2)
38-38: Ensure the unicodeNormalization feature flag is correctly integrated with the speech processing logic.

Verification successful

The unicodeNormalization feature flag is correctly integrated with the speech processing logic in source/speech/speech.py. The flag is used to conditionally apply the unicodeNormalize function to commands and text lists, ensuring that Unicode normalization is performed when the flag is enabled.

source/speech/speech.py, lines around if config.conf["speech"]["unicodeNormalization"]:

The flag is checked before applying unicodeNormalize to command.

The flag is checked before applying unicodeNormalize to elements in textList.

This confirms that the feature flag is properly utilized in the speech processing logic.
Scripts Executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify the integration of the `unicodeNormalization` feature flag with the speech processing logic.

# Test: Search for the usage of the `unicodeNormalization` feature flag in the speech processing code.
rg --type python 'config\.conf\["speech"\]\["unicodeNormalization"\]'
Length of output: 102

Script:
#!/bin/bash
# Description: Verify the integration of the `unicodeNormalization` feature flag with the speech processing logic.

# Test: Search for the usage of the `unicodeNormalization` feature flag in the speech processing code.
rg --type py 'config\.conf\["speech"\]\["unicodeNormalization"\]'
Length of output: 221

Script:
#!/bin/bash
# Description: Extract the relevant lines from `source/speech/speech.py` where `unicodeNormalization` is used.

# Test: Display the lines around the usage of the `unicodeNormalization` feature flag in the speech processing code.
rg --type py --context 5 'config\.conf\["speech"\]\["unicodeNormalization"\]' source/speech/speech.py
Length of output: 846
86-86: Ensure the unicodeNormalization feature flag is correctly integrated with the braille processing logic.

Verification successful

unicodeNormalization feature flag is correctly integrated with the braille processing logic

The unicodeNormalization feature flag is used in the braille processing code as expected.

Location:

source/braille.py: if config.conf["braille"]["unicodeNormalization"] and not isUnicodeNormalized(self.rawText):
Scripts Executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify the integration of the `unicodeNormalization` feature flag with the braille processing logic.

# Test: Search for the usage of the `unicodeNormalization` feature flag in the braille processing code.
rg --type python 'config\.conf\["braille"\]\["unicodeNormalization"\]'
Length of output: 103

Script:
#!/bin/bash
# Description: Verify the integration of the `unicodeNormalization` feature flag with the braille processing logic.

# Test: Search for the usage of the `unicodeNormalization` feature flag in the braille processing code.
rg --type-add 'python:*.py' 'config\.conf\["braille"\]\["unicodeNormalization"\]'
Length of output: 196
tests/unit/test_textUtils.py (4)

10-10: The import statement for UnicodeNormalizationOffsetConverter is correct.

235-237: The TestUnicodeNormalizationOffsetConverter class is well-defined and appropriately named.

238-255: The test_normalizedOfssetsSentence method is comprehensive and covers various normalization scenarios.

256-270: The test_normalizedOfssetsMixed and test_normalizedOfssetsDifferentOrder methods are well-implemented and cover edge cases effectively.

source/textUtils.py (9)

13-18: New imports for Unicode normalization functionality.

420-420: Setting the default Unicode normalization algorithm to "NFKC" is appropriate.

435-439: The __init__ method correctly initializes the UnicodeNormalizationOffsetConverter class and calculates offsets.

517-520: The encodedStringLength property correctly returns the length of the normalized string.

522-539: The strToEncodedOffsets method correctly converts string offsets to normalized string offsets using precomputed values.

541-558: The encodedToStrOffsets method correctly converts normalized string offsets to original string offsets using precomputed values.

561-563: The isUnicodeNormalized function correctly wraps unicodedata.is_normalized with a default normalization form.

566-568: The unicodeNormalize function correctly wraps unicodedata.normalize with a default normalization form.

570-573: The ENCODINGS_TO_CONVERTERS dictionary and getOffsetConverter function are correctly implemented to handle different encodings.

source/speech/speech.py (1)

1572-1573: Ensure Unicode normalization is applied only when the configuration is enabled.

Please verify that config.conf["speech"]["unicodeNormalization"] is correctly set and used.

source/braille.py (4)

500-510: Ensure converter is correctly initialized and used for Unicode normalization.

The logic for initializing and using the UnicodeNormalizationOffsetConverter looks correct. The normalization is applied conditionally based on the configuration.

514-519: Ensure louisHelper.translate parameters are correctly set.

The parameters for louisHelper.translate are correctly set, and the cursor position is handled appropriately.

521-527: Ensure braille and raw text positions are correctly processed using normalization offsets.

The logic for processing and mapping braille and raw text positions using normalization offsets is correctly implemented.

528-529: Ensure brailleToRawPos and rawToBraillePos lists are correctly updated.

The lists brailleToRawPos and rawToBraillePos are correctly updated based on the normalization offsets.

user_docs/en/userGuide.md (1)

2080-2080: Ensure the link fragment is valid and points to the correct section.

Please verify that the link fragment #SpeechUnicodeNormalization points to the correct section in the document.

user_docs/en/changes.md

coderabbitai · 2024-05-15T01:17:26Z

user_docs/en/userGuide.md

@@ -1806,6 +1806,30 @@ This option applies to all synthesizers, not just the currently active synthesiz
 On by default, this option tells NVDA if the current voice's language can be trusted when processing symbols and characters.
 If you find that NVDA is reading punctuation in the wrong language for a particular synthesizer or voice, you may wish to turn this off to force NVDA to use its global language setting instead.

+##### Unicode normalization {#SpeechUnicodeNormalization}


Add a blank line before the heading for better readability.

+ ##### Unicode normalization {#SpeechUnicodeNormalization}

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

##### Unicode normalization {#SpeechUnicodeNormalization}

##### Unicode normalization {#SpeechUnicodeNormalization}

user_docs/en/userGuide.md

source/gui/settingsDialogs.py

coderabbitai · 2024-05-15T01:17:27Z

source/textUtils.py

+	def _calculateOffsets(self) -> tuple[tuple[int], tuple[int]]:
+		diff = list(ndiff(self.decoded, self.encoded))
+		diff.append("!")  # Closing the diff
+		iOrigin = iNormalized = 0
+		originBuffer = ""
+		normalizedBuffer = ""
+		originToNormalizedDict = defaultdict(list)
+		normalizedToOriginDict = defaultdict(list)
+		originPending = normalizedPending = False
+		for char in diff:
+			if char[0] == "?":
+				raise RuntimeError("Unexpected entry in diff")
+			elif char[0] == "-":
+				originBuffer += char[2:]
+				originPending = True
+			elif char[0] == "+":
+				normalizedBuffer += char[2:]
+				normalizedPending = True
+			elif char[0] == " " and (
+				(not originPending and normalizedPending) or (originPending and not normalizedPending)
+			):
+				originBuffer += char[2:]
+				normalizedBuffer += char[2:]
+			else:
+				while originBuffer and normalizedBuffer:
+					originPart = ""
+					originPartLen = 0
+					normalizedPart = ""
+					normalizedPartLen = 0
+					for i in range(len(originBuffer)):
+						originPart = originBuffer[: (i + 1)]
+						normalizedPart = unicodedata.normalize(self.normalizationForm, originPart)
+						if (
+							originPart == normalizedPart
+							or not normalizedBuffer.startswith(normalizedPart)
+						):
+							continue
+						originPartLen = len(originPart)
+						originBuffer = originBuffer[originPartLen:]
+						normalizedPartLen = len(normalizedPart)
+						normalizedBuffer = normalizedBuffer[normalizedPartLen:]
+						break
+					originMultiplier = min(originPartLen / normalizedPartLen, 1)
+					normalizedMultiplier = min(normalizedPartLen / originPartLen, 1)
+					for i in range(max(originPartLen, normalizedPartLen)):
+						tempOrigin = iOrigin + int(i * originMultiplier)
+						tempNormalized = iNormalized + int(i * normalizedMultiplier)
+						originC = originPart[i] if i < originPartLen else None
+						if originC:
+							normalizedIndex = normalizedPart.find(originC)
+							if normalizedIndex != -1:
+								tempNormalized = iNormalized + normalizedIndex
+						normalizedC = normalizedPart[i] if i < normalizedPartLen else None
+						if normalizedC:
+							originIndex = originPart.find(normalizedC)
+							if originIndex != -1:
+								tempOrigin = iOrigin + originIndex
+						originToNormalizedDict[tempOrigin].append(tempNormalized)
+						normalizedToOriginDict[tempNormalized].append(tempOrigin)
+					iOrigin += originPartLen
+					iNormalized += normalizedPartLen
+				originPending = normalizedPending = False
+				if char[0] == " ":
+					originToNormalizedDict[iOrigin].append(iNormalized)
+					normalizedToOriginDict[iNormalized].append(iOrigin)
+					iOrigin += 1
+					iNormalized += 1
+		originResult = tuple(map(min, originToNormalizedDict.values()))
+		assert len(originResult) == len(self.decoded)
+		normalizedResult = tuple(map(min, normalizedToOriginDict.values()))
+		assert len(normalizedResult) == len(self.encoded)
+		return tuple((
+			originResult,
+			normalizedResult
+		))


The _calculateOffsets method correctly calculates offsets between the original and normalized strings. Consider adding comments to explain the logic for better readability.

def _calculateOffsets(self) -> tuple[tuple[int], tuple[int]]: diff = list(ndiff(self.decoded, self.encoded)) diff.append("!") # Closing the diff iOrigin = iNormalized = 0 originBuffer = "" normalizedBuffer = "" originToNormalizedDict = defaultdict(list) normalizedToOriginDict = defaultdict(list) originPending = normalizedPending = False for char in diff: if char[0] == "?": raise RuntimeError("Unexpected entry in diff") elif char[0] == "-": originBuffer += char[2:] originPending = True elif char[0] == "+": normalizedBuffer += char[2:] normalizedPending = True elif char[0] == " " and ( (not originPending and normalizedPending) or (originPending and not normalizedPending) ): originBuffer += char[2:] normalizedBuffer += char[2:] else: while originBuffer and normalizedBuffer: originPart = "" originPartLen = 0 normalizedPart = "" normalizedPartLen = 0 for i in range(len(originBuffer)): originPart = originBuffer[: (i + 1)] normalizedPart = unicodedata.normalize(self.normalizationForm, originPart) if ( originPart == normalizedPart or not normalizedBuffer.startswith(normalizedPart) ): continue originPartLen = len(originPart) originBuffer = originBuffer[originPartLen:] normalizedPartLen = len(normalizedPart) normalizedBuffer = normalizedBuffer[normalizedPartLen:] break originMultiplier = min(originPartLen / normalizedPartLen, 1) normalizedMultiplier = min(normalizedPartLen / originPartLen, 1) for i in range(max(originPartLen, normalizedPartLen)): tempOrigin = iOrigin + int(i * originMultiplier) tempNormalized = iNormalized + int(i * normalizedMultiplier) originC = originPart[i] if i < originPartLen else None if originC: normalizedIndex = normalizedPart.find(originC) if normalizedIndex != -1: tempNormalized = iNormalized + normalizedIndex normalizedC = normalizedPart[i] if i < normalizedPartLen else None if normalizedC: originIndex = originPart.find(normalizedC) if originIndex != -1: tempOrigin = iOrigin + originIndex originToNormalizedDict[tempOrigin].append(tempNormalized) normalizedToOriginDict[tempNormalized].append(tempOrigin) iOrigin += originPartLen iNormalized += normalizedPartLen originPending = normalizedPending = False if char[0] == " ": originToNormalizedDict[iOrigin].append(iNormalized) normalizedToOriginDict[iNormalized].append(iOrigin) iOrigin += 1 iNormalized += 1 originResult = tuple(map(min, originToNormalizedDict.values())) assert len(originResult) == len(self.decoded) normalizedResult = tuple(map(min, normalizedToOriginDict.values())) assert len(normalizedResult) == len(self.encoded) return tuple(( originResult, normalizedResult ))

I agree that _calculateOffsets needs more commenting. @coderabbitai Could you please generate a suggestion with commenting?

@seanbudd, here's the _calculateOffsets method with added comments to enhance readability and understanding:

def _calculateOffsets(self) -> tuple[tuple[int], tuple[int]]: # Initialize a diff list between the decoded original and the normalized strings diff = list(ndiff(self.decoded, self.encoded)) diff.append("!") # Append a closing marker to the diff list # Initialize indices and buffers for tracking positions and changes iOrigin = iNormalized = 0 originBuffer = "" normalizedBuffer = "" originToNormalizedDict = defaultdict(list) normalizedToOriginDict = defaultdict(list) originPending = normalizedPending = False # Iterate over each character in the diff list for char in diff: if char[0] == "?": raise RuntimeError("Unexpected entry in diff") elif char[0] == "-": # Accumulate deleted characters in the origin buffer originBuffer += char[2:] originPending = True elif char[0] == "+": # Accumulate added characters in the normalized buffer normalizedBuffer += char[2:] normalizedPending = True elif char[0] == " " and ( (not originPending and normalizedPending) or (originPending and not normalizedPending) ): # Accumulate unchanged characters in both buffers originBuffer += char[2:] normalizedBuffer += char[2:] else: # Process accumulated characters in the buffers while originBuffer and normalizedBuffer: originPart = "" originPartLen = 0 normalizedPart = "" normalizedPartLen = 0 # Find the smallest part that can be normalized and still match the beginning of the normalized buffer for i in range(len(originBuffer)): originPart = originBuffer[: (i + 1)] normalizedPart = unicodedata.normalize(self.normalizationForm, originPart) if ( originPart == normalizedPart or not normalizedBuffer.startswith(normalizedPart) ): continue originPartLen = len(originPart) originBuffer = originBuffer[originPartLen:] normalizedPartLen = len(normalizedPart) normalizedBuffer = normalizedBuffer[normalizedPartLen:] break # Map the original indices to the normalized indices originMultiplier = min(originPartLen / normalizedPartLen, 1) normalizedMultiplier = min(normalizedPartLen / originPartLen, 1) for i in range(max(originPartLen, normalizedPartLen)): tempOrigin = iOrigin + int(i * originMultiplier) tempNormalized = iNormalized + int(i * normalizedMultiplier) originC = originPart[i] if i < originPartLen else None if originC: normalizedIndex = normalizedPart.find(originC) if normalizedIndex != -1: tempNormalized = iNormalized + normalizedIndex normalizedC = normalizedPart[i] if i < normalizedPartLen else None if normalizedC: originIndex = originPart.find(normalizedC) if originIndex != -1: tempOrigin = iOrigin + originIndex originToNormalizedDict[tempOrigin].append(tempNormalized) normalizedToOriginDict[tempNormalized].append(tempOrigin) iOrigin += originPartLen iNormalized += normalizedPartLen originPending = normalizedPending = False if char[0] == " ": # Map indices directly for unchanged characters originToNormalizedDict[iOrigin].append(iNormalized) normalizedToOriginDict[iNormalized].append(iOrigin) iOrigin += 1 iNormalized += 1 # Finalize the mapping by selecting the minimum index for each original position originResult = tuple(map(min, originToNormalizedDict.values())) assert len(originResult) == len(self.decoded) normalizedResult = tuple(map(min, normalizedToOriginDict.values())) assert len(normalizedResult) == len(self.encoded) return tuple(( originResult, normalizedResult ))

This version includes detailed comments that explain each part of the method, making it easier to understand how the offsets are calculated between the original and normalized strings.

@LeonarddeR - can you add comments? When that's done and #16521 (comment) is addressed I think this is ready

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

@coderabbitai I have just reworded some of your comments in a new commit. DO you think its appropriate now?

@LeonarddeR, I have reviewed the reworded comments in the _calculateOffsets method. The comments now provide clear explanations for each part of the method, enhancing readability and understanding. The logic and functionality of the method are well-documented, making it easier for future developers to follow the code.

The comments are appropriate and effectively describe the purpose and process of each section of the method. Great job on the rewording!

If there are any further adjustments or additional comments needed, feel free to let me know.

Analysis chain

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash # Fetch the latest version of the `_calculateOffsets` method from the `source/textUtils.py` file cat source/textUtils.py | sed -n '/def _calculateOffsets/,/return tuple((/p'

Length of output: 4735

New Learnings Added

User: LeonarddeR PR: nvaccess/nvda#16521 File: source/textUtils.py:0-0 Timestamp: 2024-05-18T11:21:34.925Z Learning: LeonarddeR reworded the comments in the `_calculateOffsets` method to enhance readability and understanding. The comments now provide clear explanations for each part of the method.

Note: Learnings are effective only in the context of specific code segments. To apply general coding guidelines, please configure review instructions.

source/speech/speech.py

LeonarddeR · 2024-05-15T04:48:07Z

It looks mainly as noise to me honestly

user_docs/en/userGuide.md

tests/unit/test_textUtils.py

source/braille.py

seanbudd · 2024-05-17T02:45:57Z

source/textUtils.py

+	def _calculateOffsets(self) -> tuple[tuple[int], tuple[int]]:
+		diff = list(ndiff(self.decoded, self.encoded))
+		diff.append("!")  # Closing the diff
+		iOrigin = iNormalized = 0
+		originBuffer = ""
+		normalizedBuffer = ""
+		originToNormalizedDict = defaultdict(list)
+		normalizedToOriginDict = defaultdict(list)
+		originPending = normalizedPending = False
+		for char in diff:
+			if char[0] == "?":
+				raise RuntimeError("Unexpected entry in diff")
+			elif char[0] == "-":
+				originBuffer += char[2:]
+				originPending = True
+			elif char[0] == "+":
+				normalizedBuffer += char[2:]
+				normalizedPending = True
+			elif char[0] == " " and (
+				(not originPending and normalizedPending) or (originPending and not normalizedPending)
+			):
+				originBuffer += char[2:]
+				normalizedBuffer += char[2:]
+			else:
+				while originBuffer and normalizedBuffer:
+					originPart = ""
+					originPartLen = 0
+					normalizedPart = ""
+					normalizedPartLen = 0
+					for i in range(len(originBuffer)):
+						originPart = originBuffer[: (i + 1)]
+						normalizedPart = unicodedata.normalize(self.normalizationForm, originPart)
+						if (
+							originPart == normalizedPart
+							or not normalizedBuffer.startswith(normalizedPart)
+						):
+							continue
+						originPartLen = len(originPart)
+						originBuffer = originBuffer[originPartLen:]
+						normalizedPartLen = len(normalizedPart)
+						normalizedBuffer = normalizedBuffer[normalizedPartLen:]
+						break
+					originMultiplier = min(originPartLen / normalizedPartLen, 1)
+					normalizedMultiplier = min(normalizedPartLen / originPartLen, 1)
+					for i in range(max(originPartLen, normalizedPartLen)):
+						tempOrigin = iOrigin + int(i * originMultiplier)
+						tempNormalized = iNormalized + int(i * normalizedMultiplier)
+						originC = originPart[i] if i < originPartLen else None
+						if originC:
+							normalizedIndex = normalizedPart.find(originC)
+							if normalizedIndex != -1:
+								tempNormalized = iNormalized + normalizedIndex
+						normalizedC = normalizedPart[i] if i < normalizedPartLen else None
+						if normalizedC:
+							originIndex = originPart.find(normalizedC)
+							if originIndex != -1:
+								tempOrigin = iOrigin + originIndex
+						originToNormalizedDict[tempOrigin].append(tempNormalized)
+						normalizedToOriginDict[tempNormalized].append(tempOrigin)
+					iOrigin += originPartLen
+					iNormalized += normalizedPartLen
+				originPending = normalizedPending = False
+				if char[0] == " ":
+					originToNormalizedDict[iOrigin].append(iNormalized)
+					normalizedToOriginDict[iNormalized].append(iOrigin)
+					iOrigin += 1
+					iNormalized += 1
+		originResult = tuple(map(min, originToNormalizedDict.values()))
+		assert len(originResult) == len(self.decoded)
+		normalizedResult = tuple(map(min, normalizedToOriginDict.values()))
+		assert len(normalizedResult) == len(self.encoded)
+		return tuple((
+			originResult,
+			normalizedResult
+		))


I agree that _calculateOffsets needs more commenting. @coderabbitai Could you please generate a suggestion with commenting?

…nOffsetConverter

seanbudd · 2024-05-21T05:52:18Z

source/textUtils.py

+		assert len(originResult) == len(self.decoded)
+		normalizedResult = tuple(map(min, normalizedToOriginDict.values()))
+		assert len(normalizedResult) == len(self.encoded)
+		return tuple((


Isn't this a tuple as is?

Suggested change

return tuple((

return (

I think you are right. I personally notice that I prefer the more explicit syntax because only brackets can also produce a generator if there is a list comprehension between the brackets. But as far as I'm concerned, the final judgment is up to you.
@coderabbitai what do you think about the suggestion made here?

Qchristensen

Looks good, thanks Leonard!

burmancomp · 2024-05-22T12:20:33Z

I am getting empty braille line when reading given message in thunderbird. I encountered likely problem earlier as well with squirrelmail (web mail).

This happens when braille is tethered to review and unicode normalization is enabled for braille.

Here are log lines (running from source with current main branch code):

Input: kb(laptop):downArrow
DEBUGWARNING - NVDAObjects.IAccessible.normalizeIA2TextFormatField (15:11:47.970) - MainThread (13568):
Unsupported value for text-align attribute: "start"
IO - speech.speech.speak (15:11:47.971) - MainThread (13568):
Speaking ['removed original text\xa0\n']
DEBUGWARNING - NVDAObjects.IAccessible.normalizeIA2TextFormatField (15:11:47.992) - MainThread (13568):
Unsupported value for text-align attribute: "start"
DEBUGWARNING - braille.BrailleHandler._handlePendingUpdate (15:11:47.992) - MainThread (13568):
Region update failed for ReviewCursorManagerRegion ('removed original text\xa0 '), object probably died
Traceback (most recent call last):
File "braille.py", line 2621, in _handlePendingUpdate
region.update()
File "braille.py", line 1378, in update
super(TextInfoRegion, self).update()
File "braille.py", line 503, in update
converter = UnicodeNormalizationOffsetConverter(self.rawText)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "textUtils.py", line 439, in init
self.computedStrToEncodedOffsets, self.computedEncodedToStrOffsets = self._calculateOffsets()
^^^^^^^^^^^^^^^^^^^^^^^^
File "textUtils.py", line 496, in _calculateOffsets
originMultiplier = min(originPartLen / normalizedPartLen, 1)
~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~
ZeroDivisionError: division by zero
IO - braille.BrailleBuffer.update (15:11:47.993) - MainThread (13568):
Braille regions text: ['removed original text\xa0 ']
IO - braille.BrailleHandler.update (15:11:47.993) - MainThread (13568):
Braille window dots: -

LeonarddeR · 2024-05-22T12:29:44Z

Thanks for reporting. Easily reproducible on the python console:
textUtils.UnicodeNormalizationOffsetConverter("removed original text\xa0 ")
I will handle this as part of #16584

ABuffEr · 2024-05-22T12:39:46Z

Hi,
viewing a FB post starting with styled chars, I noticed that requesting window title to NVDA I get the not normalized string.
Anyway, the post is correctly readable by voice, that is a great, great enhancement!

LeonarddeR · 2024-05-22T12:59:44Z

We only normalize objects and TextInfo, not messages. This can be changed if we agree that moving by character should report normalized characters instead, because then it is much easier to apply normalization over every speech sequence, including character by character navigation. However I'm still reluctant to do so.

XLTechie · 2024-05-23T02:18:35Z

@LeonarddeR Can you edit the PR description to add that it closes #4631 and #11570, in case this gets reverted?

LeonarddeR · 2024-05-23T05:57:12Z

These are not yet closed. #16584 will probably close them.

@burmancomp

Fixup of #16521 Fixes #11570 Partial fix for #4631 Summary of the issue: It turns out that rawTextTypeforms on a region may be None, this was an oversight on my end. cursorPos may also be None. @burmancomp reported a zero division error in case a string ended with a non breaking space and a space. Description of user facing changes No longer errors in the log when getting flash messages in Thunderbird and/or reading messages in WhatsApp UWP. Description of development approach Explicitly check for None typeforms and cursorPos, thereby improving readability as well. Improve the calculateOffsets method in textUtils to ensure it can handle the case as reported by @burmancomp

…ch panel (#16810) Fix-up of #16521. Summary of the issue: When opening the speech settings panel, the checkbox "Report 'Normalized' when navigating by character" is enabled, no matter the value of the "Unicode normalization" combo-box. Description of user facing changes When opening the speech settings panel, the checkbox "Report 'Normalized' when navigating by character" will be enabled or disabled (greyed out), depending on the value of the "Unicode normalization" combo-box, as it is already the case when the selection of this combo-box is modified. Description of development approach As done for "Ignore blank lines for line indentation reporting" checkbox in Document formatting settings, explicitly call the .Enable method when the panel is initialized.

mush42 · 2024-07-22T21:23:28Z

@LeonarddeR

What form of unicode normalization is used?

Both NFKC and NFKD causes a big issue with vocalized (aka diacritized) arabic text.
The issue is that both forms of normalization switches the order of some diacritical marks. At least eSpeak-ng, will mispronounce normalized-diacritized Arabic text. Also Sonata will face the same issue since it relies on eSpeak-ng for phonemization.

Best
Musharraf

SaschaCowley · 2024-07-22T23:24:58Z

@mush42 do you know if an issue has been filed with eSpeak NG for that problem?

LeonarddeR · 2024-07-23T05:53:23Z

@mush42 In NVDA< we're using the NFKC algorithm.
I agree with @SaschaCowley that this is probably something that should be fixed upstream, i.e. ESpeak should definitely support speaking normalized Arabic text.
That said, there's a reason why this is not default yet.
If you feel the discussion should continu here, please create a new issue.

Fixes #16616 Summary of the issue: In #16521, Unicode normalization was added, but it is disabled by default. Description of user facing changes Unicode normalization is now enabled by default for Speech. Description of development approach Change default values

LeonarddeR added 6 commits May 9, 2024 12:50

Add unicode normalization offset converter

0291159

Add support to speech and braille

e7581b1

user guide

32cffd1

More consistent speech output

83933da

Documentation

f41256e

Add comments

c9fadee

LeonarddeR requested review from a team as code owners May 10, 2024 07:04

LeonarddeR requested review from Qchristensen and SaschaCowley May 10, 2024 07:04

Add convenience functions for normalization

ebde9ef

LeonarddeR marked this pull request as draft May 10, 2024 07:10

Merge remote-tracking branch 'origin/master' into unicodeNormalizatio…

df5f03d

…nOffsetConverter

Fixed incorrect assumption that getPropertiesSpeech would generate …

f6caff8

…a speech sequence with only strings. Thanks @Simon818

seanbudd added the conceptApproved Similar 'triaged' for issues, PR accepted in theory, implementation needs review. label May 14, 2024

LeonarddeR marked this pull request as ready for review May 14, 2024 04:47

CyrilleB79 reviewed May 14, 2024

View reviewed changes

Apply suggestions from code review

3d51c24

Co-authored-by: Cyrille Bougot <cyrille.bougot2@laposte.net>

CyrilleB79 reviewed May 14, 2024

View reviewed changes

user_docs/en/userGuide.md Outdated Show resolved Hide resolved

Update user_docs/en/userGuide.md

8d0f95e

Co-authored-by: Cyrille Bougot <cyrille.bougot2@laposte.net>

coderabbitai bot reviewed May 15, 2024

View reviewed changes

seanbudd reviewed May 17, 2024

View reviewed changes

seanbudd marked this pull request as draft May 17, 2024 02:48

seanbudd and others added 3 commits May 17, 2024 12:49

Apply suggestions from code review

161b259

Merge remote-tracking branch 'origin/master' into unicodeNormalizatio…

e33c4d6

…nOffsetConverter

Add comments to normalizeOffsets

b8fff11

LeonarddeR marked this pull request as ready for review May 20, 2024 09:57

LeonarddeR mentioned this pull request May 20, 2024

glyph vs diaeresis liblouis/liblouis#98

Open

seanbudd reviewed May 21, 2024

View reviewed changes

seanbudd approved these changes May 21, 2024

View reviewed changes

Qchristensen approved these changes May 21, 2024

View reviewed changes

seanbudd merged commit 57ce236 into nvaccess:master May 21, 2024
1 check passed

LeonarddeR mentioned this pull request May 21, 2024

Several fixups for unicode normalization #16584

Merged

5 tasks

This was referenced May 21, 2024

Support mathematical alphanumeric symbols #11570

Closed

NVDA does not read the symbols inserted by the word 2013 equation editor #4631

Open

LeonarddeR deleted the unicodeNormalizationOffsetConverter branch May 22, 2024 05:21

seanbudd mentioned this pull request May 29, 2024

Normalization of unicode cahracter: allow excluding the symbols in the symbols.dic file from the normalization #16624

Closed

CyrilleB79 mentioned this pull request Jul 3, 2024

Enable/disable the 'Report normalized' checkbox when opening the speech panel #16810

Merged

5 tasks

LeonarddeR mentioned this pull request Aug 17, 2024

Enable unicode normalization by default for speech #17017

Merged

5 tasks

coderabbitai bot mentioned this pull request Sep 27, 2024

Set default input and translation tables according to NVDA's language #17222

Merged

5 tasks

coderabbitai bot mentioned this pull request Oct 15, 2024

When spelling by line, ensure that only normalized characters report as such #17295

Merged

5 tasks


		1. Decomposition of some ligatures, Including "ĳ" (ligature ij) to their two letter form ("ij").

		1. Stable ordering of modifiers in composite characters, for example in ancient Hebrew.

	##### Unicode normalization {#SpeechUnicodeNormalization}

	##### Unicode normalization {#SpeechUnicodeNormalization}

Add Unicode Normalization to speech and braille #16521

Add Unicode Normalization to speech and braille #16521

Conversation

LeonarddeR commented May 10, 2024 • edited by coderabbitai bot Loading

Link to issue number:

Summary of the issue:

Description of user facing changes

Description of development approach

Testing strategy:

Known issues with pull request:

Code Review Checklist:

Summary by CodeRabbit

Simon818 commented May 12, 2024

LeonarddeR commented May 13, 2024

LeonarddeR commented May 14, 2024

CyrilleB79 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seanbudd commented May 15, 2024

coderabbitai bot commented May 15, 2024

coderabbitai bot commented May 15, 2024 • edited Loading

Walkthrough

Changes

Assessment against linked issues

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

unicodeNormalization feature flag is correctly integrated with the braille processing logic

coderabbitai bot May 15, 2024

Choose a reason for hiding this comment

coderabbitai bot May 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coderabbitai bot May 17, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coderabbitai bot May 17, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coderabbitai bot May 18, 2024

Choose a reason for hiding this comment

LeonarddeR commented May 15, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Qchristensen left a comment

Choose a reason for hiding this comment

burmancomp commented May 22, 2024

LeonarddeR commented May 22, 2024

ABuffEr commented May 22, 2024

LeonarddeR commented May 22, 2024 via email

XLTechie commented May 23, 2024

LeonarddeR commented May 23, 2024

mush42 commented Jul 22, 2024 • edited Loading

SaschaCowley commented Jul 22, 2024

LeonarddeR commented Jul 23, 2024

LeonarddeR commented May 10, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented May 15, 2024 •

edited

Loading

CodeRabbit Configration File (`.coderabbit.yaml`)

`unicodeNormalization` feature flag is correctly integrated with the braille processing logic

coderabbitai bot May 15, 2024 •

edited

Loading

mush42 commented Jul 22, 2024 •

edited

Loading