Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Unicode Normalization to speech and braille #16521

Merged

Conversation

LeonarddeR
Copy link
Collaborator

@LeonarddeR LeonarddeR commented May 10, 2024

Link to issue number:

Closes #16466

Summary of the issue:

Several speech synthesizers and braille tables are unable to speak or braille some characters, such as ligatures ("ij") or decomposed characters (latin letters with a modifier to add acute, diaeresis, etc.). Also, italic or bold Unicode characters can't be spoken or brailled by default.

Description of user facing changes

None by default. If unicode normalization is enabled for speech, speech output for objects and text navigation is normalized. For braille, normalization is applied for all braille output.

The reason for speech to apply normalization only for objects and text navigation is chosen on purpose, as for individual character navigation or text selection, we really want to pass the original character to the synthesizer. If we don't Unicode bold and italic characters are read as their normalized counterparts, which makes it impossible to distinguish them. This problem is less relevant when working with braille.

Description of development approach

  1. Added UnicodeNormalizationOffsetConverter to textUtils with tests. It stores the normalized version of a given string and based on diffing, calculates offset mappings between the original and normalized strings.
  2. Processes output using UnicodeNormalizationOffsetConverter when translating braille with normalization on. The several mappings (braille to raw pos, raw to braille pos) are processed to account for normalization.
  3. Added normalization to getTextInfoSpeech and getPropertiesSpeech.

Testing strategy:

  1. Unit tests
  2. Working with braille, testing cursor positioning and cursor routing on the several strings in the unit tests.

Known issues with pull request:

None known.

Code Review Checklist:

  • Documentation:
    • Change log entry
    • User Documentation
    • Developer / Technical Documentation
    • Context sensitive help for GUI changes
  • Testing:
    • Unit tests
    • System (end to end) tests
    • Manual testing
  • UX of all users considered:
    • Speech
    • Braille
    • Low Vision
    • Different web browsers
    • Localization in other languages / culture than English
  • API is compatible with existing add-ons.
  • Security precautions taken.

Summary by CodeRabbit

  • New Features

    • Added support for Unicode Normalization in speech and braille settings.
    • Users can enable Unicode Normalization via the NVDA Settings dialog for both speech and braille.
  • Documentation

    • Updated user guide with a new section on Unicode normalization for speech and braille settings.
    • Added details on the benefits of Unicode normalization, including character standardization and compatibility improvements.

@LeonarddeR LeonarddeR requested review from a team as code owners May 10, 2024 07:04
@LeonarddeR LeonarddeR marked this pull request as draft May 10, 2024 07:10
@Simon818
Copy link

I'm running a build that contains this PR and am receiving an error pretty frequently when trying to navigate with speech normalization enabled.

Could not execute function event_gainFocus defined in appModules.nvda module; kwargs: {}
Traceback (most recent call last):
File "eventHandler.pyc", line 127, in next
File "NVDAObjects_init_.pyc", line 1306, in event_gainFocus
File "NVDAObjects_init_.pyc", line 1157, in reportFocus
File "speech\speech.pyc", line 694, in speakObject
File "speech\speech.pyc", line 733, in getObjectSpeech
File "speech\speech.pyc", line 660, in getObjectPropertiesSpeech
File "speech\speech.pyc", line 1975, in getPropertiesSpeech
File "speech\speech.pyc", line 1975, in
File "textUtils.pyc", line 568, in unicodeNormalize
TypeError: normalize() argument 2 must be str, not CharacterModeCommand

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "eventHandler.pyc", line 127, in next
File "appModules\nvda.pyc", line 211, in event_gainFocus
File "eventHandler.pyc", line 134, in next
File "extensionPoints\util.pyc", line 216, in callWithSupportedKwargs
File "NVDAObjects_init_.pyc", line 1306, in event_gainFocus
File "NVDAObjects_init_.pyc", line 1157, in reportFocus
File "speech\speech.pyc", line 694, in speakObject
File "speech\speech.pyc", line 733, in getObjectSpeech
File "speech\speech.pyc", line 660, in getObjectPropertiesSpeech
File "speech\speech.pyc", line 1975, in getPropertiesSpeech
File "speech\speech.pyc", line 1975, in
File "textUtils.pyc", line 568, in unicodeNormalize
TypeError: normalize() argument 2 must be str, not CharacterModeCommand

CC @codeofdusk

@LeonarddeR
Copy link
Collaborator Author

It looks like there is an erroneous type annotation in getPropertiesSpeech that miss-leaded me. getKeyboardShortcutsSpeech can now also return character mode commands, so the getPropertiesSpeech doesn't only return strings now.

@seanbudd seanbudd added the conceptApproved Similar 'triaged' for issues, PR accepted in theory, implementation needs review. label May 14, 2024
@LeonarddeR LeonarddeR marked this pull request as ready for review May 14, 2024 04:47
@LeonarddeR
Copy link
Collaborator Author

@Qchristensen Could you have a look at the proposed documentation changes?

Copy link
Collaborator

@CyrilleB79 CyrilleB79 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are my first comments. I may have other comments later since I have not yet tested this PR.

source/gui/settingsDialogs.py Outdated Show resolved Hide resolved
source/gui/settingsDialogs.py Outdated Show resolved Hide resolved
user_docs/en/userGuide.md Outdated Show resolved Hide resolved
user_docs/en/userGuide.md Outdated Show resolved Hide resolved
user_docs/en/userGuide.md Outdated Show resolved Hide resolved
user_docs/en/userGuide.md Outdated Show resolved Hide resolved

1. Decomposition of some ligatures, Including "ij" (ligature ij) to their two letter form ("ij").

1. Stable ordering of modifiers in composite characters, for example in ancient Hebrew.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. Stable ordering of modifiers in composite characters, for example in ancient Hebrew.
1. Stable ordering of modifiers in composite characters, for example in ancient Hebrew.

Could you clarify the impact of this on speech?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The modifiers will be spoken in normalized order rather than in the order they are in the text. For speech, this could be a bit confusing, but that's why this is optional. It has no impact on speech with the hebrew Vocalizer voice, for example. I'm afraid there is no normalization form where we can disable this behavior. On the other hand, for braille this can be of great benefit, as it requires a braille table to account for the normalized form of the modifiers only.

Co-authored-by: Cyrille Bougot <cyrille.bougot2@laposte.net>
Co-authored-by: Cyrille Bougot <cyrille.bougot2@laposte.net>
@seanbudd
Copy link
Member

Hi - testing code rabbit on this PR for an AI review.
I've tried to adjust it to only pick up serious issues rather than linting.
Apologies for the noise if it picks up junk.

@coderabbitai review

Copy link
Contributor

coderabbitai bot commented May 15, 2024

@seanbudd, I have identified the issue causing the error reported by Simon818. The getPropertiesSpeech function attempts to normalize all items in the textList, but some items are not strings, leading to a type error. Here is the updated code for the getPropertiesSpeech function to ensure only strings are passed to unicodeNormalize:

def getPropertiesSpeech(  # noqa: C901
        reason: OutputReason = OutputReason.QUERY,
        **propertyValues
) -> SpeechSequence:
    textList: SpeechSequence = []
    name: Optional[str] = propertyValues.get('name')
    if name:
        textList.append(name)
    if 'role' in propertyValues:
        role: controlTypes.Role = propertyValues['role']
        speakRole=True
    elif '_role' in propertyValues:
        speakRole=False
        role: controlTypes.Role = propertyValues['_role']
    else:
        speakRole=False
        role=controlTypes.Role.UNKNOWN
    role = controlTypes.Role(role)
    value: Optional[str] = propertyValues.get('value') if role not in controlTypes.silentValuesForRoles else None
    cellCoordsText: Optional[str] = propertyValues.get('cellCoordsText')
    rowNumber = propertyValues.get('rowNumber')
    columnNumber = propertyValues.get('columnNumber')
    includeTableCellCoords = propertyValues.get('includeTableCellCoords', True)

    if role == controlTypes.Role.CHARTELEMENT:
        speakRole = False
    roleText: Optional[str] = propertyValues.get('roleText')
    if (
        speakRole
        and (
            roleText
            or reason not in (
                OutputReason.SAYALL,
                OutputReason.CARET,
                OutputReason.FOCUS,
                OutputReason.QUICKNAV
            )
            or not (
                name
                or value
                or cellCoordsText
                or rowNumber
                or columnNumber
            )
            or role not in controlTypes.silentRolesOnFocus
        )
        and (
            role != controlTypes.Role.MATH
            or reason not in (
                OutputReason.CARET,
                OutputReason.SAYALL
            )
        )):
        textList.append(roleText if roleText else role.displayString)
    if value:
        textList.append(value)
    states = propertyValues.get('states')
    realStates=propertyValues.get('_states',states)
    negativeStates=propertyValues.get('negativeStates',set())
    if states is not None or negativeStates:
        if states is None:
            states = set()
        labelStates = controlTypes.processAndLabelStates(role, realStates, reason, states, negativeStates)
        textList.extend(labelStates)
    description: Optional[str] = propertyValues.get('description')
    if description:
        textList.append(description)
    keyboardShortcut: Optional[str] = propertyValues.get('keyboardShortcut')
    textList.extend(getKeyboardShortcutsSpeech(keyboardShortcut))
    if includeTableCellCoords and cellCoordsText:
        textList.append(cellCoordsText)
    if cellCoordsText or rowNumber or columnNumber:
        tableID = propertyValues.get("_tableID")
        sameTable = (tableID and tableID == _speechState.oldTableID)
        if tableID and not sameTable:
            _speechState.oldTableID = tableID
        rowSpan = propertyValues.get("rowSpan") or 1
        columnSpan = propertyValues.get("columnSpan") or 1
        if rowNumber and (
            not sameTable
            or rowNumber != _speechState.oldRowNumber
            or rowSpan != _speechState.oldRowSpan
        ):
            rowHeaderText: Optional[str] = propertyValues.get("rowHeaderText")
            if rowHeaderText:
                textList.append(rowHeaderText)
            if includeTableCellCoords and not cellCoordsText: 
                rowNumberTranslation: str = _("row %s") % rowNumber
                textList.append(rowNumberTranslation)
                if rowSpan>1 and columnSpan<=1:
                    rowSpanAddedTranslation: str = _("through {endRow}").format(endRow=rowNumber + rowSpan - 1)
                    textList.append(rowSpanAddedTranslation)
            _speechState.oldRowNumber = rowNumber
            _speechState.oldRowSpan = rowSpan
        if columnNumber and (
            not sameTable
            or columnNumber != _speechState.oldColumnNumber
            or columnSpan != _speechState.oldColumnSpan
        ):
            columnHeaderText: Optional[str] = propertyValues.get("columnHeaderText")
            if columnHeaderText:
                textList.append(columnHeaderText)
            if includeTableCellCoords and not cellCoordsText:
                colNumberTranslation: str = _("column %s") % columnNumber
                textList.append(colNumberTranslation)
                if columnSpan>1 and rowSpan<=1:
                    colSpanAddedTranslation: str = _("through {endCol}").format(endCol=columnNumber + columnSpan - 1)
                    textList.append(colSpanAddedTranslation)
            _speechState.oldColumnNumber = columnNumber
            _speechState.oldColumnSpan = columnSpan
        if includeTableCellCoords and not cellCoordsText and rowSpan>1 and columnSpan>1:
            rowColSpanTranslation: str = _("through row {row} column {column}").format(
                row=rowNumber + rowSpan - 1,
                column=columnNumber + columnSpan - 1
            )
            textList.append(rowColSpanTranslation)
    rowCount=propertyValues.get('rowCount',0)
    columnCount=propertyValues.get('columnCount',0)
    rowAndColumnCountText = _rowAndColumnCountText(rowCount, columnCount)
    if rowAndColumnCountText:
        textList.append(rowAndColumnCountText)
    if rowCount or columnCount:
        _speechState.oldTableID = None

    isCurrent = propertyValues.get('current', controlTypes.IsCurrent.NO)
    if isCurrent != controlTypes.IsCurrent.NO:
        textList.append(isCurrent.displayString)

    hasDetails = propertyValues.get('hasDetails', False)
    if hasDetails:
        detailsRoles: _AnnotationRolesT = propertyValues.get("detailsRoles", tuple())
        if detailsRoles:
            roleStrings = (role.displayString if role else _("details") for role in detailsRoles)
            for roleString in roleStrings:
                textList.append(
                    _("has %s") % roleString
                )
        else:
            textList.append(
                _("has details")
            )

    placeholder: Optional[str] = propertyValues.get('placeholder', None)
    if placeholder:
        textList.append(placeholder)
    indexInGroup=propertyValues.get('positionInfo_indexInGroup',0)
    similarItemsInGroup=propertyValues.get('positionInfo_similarItemsInGroup',0)
    if 0<indexInGroup<=similarItemsInGroup:
        itemPosTranslation: str = _("{number} of {total}").format(
            number=indexInGroup,
            total=similarItemsInGroup
        )
        textList.append(itemPosTranslation)
    if 'positionInfo_level' in propertyValues:
        level=propertyValues.get('positionInfo_level',None)
        role=propertyValues.get('role',None)
        if level is not None:
            levelTranslation: str = _('level %s') % level
            if (
                role in (controlTypes.Role.TREEVIEWITEM, controlTypes.Role.LISTITEM)
                and level != _speechState.oldTreeLevel
            ):
                textList.insert(0, levelTranslation)
                _speechState.oldTreeLevel = level
            else:
                textList.append(levelTranslation)
    
    errorMessage: str | None = propertyValues.get("errorMessage", None)
    if errorMessage:
        textList.append(errorMessage)
    if config.conf["speech"]["unicodeNormalization"]:
        textList = [
            unicodeNormalize(t) if isinstance(t, str) else t
            for t in textList
        ]
    types.logBadSequenceTypes(textList)
    return textList

This update ensures that only strings are passed to unicodeNormalize, preventing the type error.

Analysis chain



Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Search for the definition of the getPropertiesSpeech function in the codebase.
ast-grep --lang python --pattern $'def getPropertiesSpeech($_):\n    $$$' 

# Search for the usage of the unicodeNormalize function in the codebase.
ast-grep --lang python --pattern $'def unicodeNormalize($_):\n    $$$' 

# Search for the CharacterModeCommand class or type in the codebase.
ast-grep --lang python --pattern $'class CharacterModeCommand($_):\n    $$$' 

Length of output: 1517



Script:

#!/bin/bash
# Search for the definition of the getPropertiesSpeech function in the speech module.
ast-grep --lang python --pattern $'def getPropertiesSpeech($_):\n    $$$' source/speech

# Search for the definition of the unicodeNormalize function in the textUtils module.
ast-grep --lang python --pattern $'def unicodeNormalize($_):\n    $$$' source/textUtils

Length of output: 239



Script:

#!/bin/bash
# Locate the textUtils module in the repository.
fd textUtils.py

# Locate the getPropertiesSpeech function in the repository.
ast-grep --lang python --pattern $'def getPropertiesSpeech($_):\n    $$$'

Length of output: 140



Script:

#!/bin/bash
# Search for the definition of the unicodeNormalize function in the source/textUtils.py file.
ast-grep --lang python --pattern $'def unicodeNormalize($_):\n    $$$' source/textUtils.py

# Locate the getPropertiesSpeech function or method in the repository.
rg 'def getPropertiesSpeech' --glob '*.py'

Length of output: 198



Script:

#!/bin/bash
# Display the implementation of the unicodeNormalize function in source/textUtils.py.
sed -n '/def unicodeNormalize/,/^$/p' source/textUtils.py

# Display the implementation of the getPropertiesSpeech function in source/speech/speech.py.
sed -n '/def getPropertiesSpeech/,/^$/p' source/speech/speech.py

Length of output: 47901

Actions Performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

coderabbitai bot commented May 15, 2024

Walkthrough

The recent changes introduce Unicode normalization to NVDA's speech and braille output, enhancing the handling of complex characters like ligatures and composed characters. This feature can be enabled via the settings dialog for both speech and braille, ensuring compatibility with various synthesizers and braille tables.

Changes

File Path Change Summary
source/braille.py Added Unicode normalization handling for braille translation, including offset processing.
source/config/configSpec.py Introduced a unicodeNormalization feature flag for different sections, including speech and braille.
source/gui/settingsDialogs.py Added unicodeNormalizationCombo control for managing Unicode normalization settings in the GUI.
source/speech/speech.py Integrated Unicode normalization in speech processing based on configuration settings.
source/textUtils.py Added utilities for Unicode normalization, including UnicodeNormalizationOffsetConverter class.
tests/unit/test_textUtils.py Added tests for UnicodeNormalizationOffsetConverter to verify normalization scenarios.
user_docs/en/changes.md Documented the addition of Unicode normalization support for speech and braille.
user_docs/en/userGuide.md Added sections explaining Unicode normalization settings for speech and braille.

Assessment against linked issues

Objective Addressed Explanation
Add optional Unicode normalization before passing strings to speech or braille (#16466)
Ensure normalization handles ligatures and composed characters correctly (#16466)
Provide settings to enable/disable Unicode normalization for both speech and braille (#16466)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

Out of diff range and nitpick comments (78)
source/speech/speech.py (71)

Line range hint 95-102: Undefined name pgettext. Ensure pgettext is imported or defined.

+ from gettext import pgettext

@property
def _displayStringLabels(self) -> dict[Self, str]:
    return {
        # Translators: Name of the speech mode which disables speech output.
        self.off: pgettext("speechModes", "off"),
        # Translators: Name of the speech mode which will cause NVDA to beep instead of speaking.
        self.beeps: pgettext("speechModes", "beeps"),
        # Translators: Name of the speech mode which causes NVDA to speak normally.
        self.talk: pgettext("speechModes", "talk"),
        # Translators: Name of the on-demand speech mode,
        # in which NVDA only speaks in response to commands that report content.
        self.onDemand: pgettext("speechModes", "on-demand"),
    }

Line range hint 204-204: Undefined name _. Ensure _ is imported or defined.

+ from gettext import gettext as _

Line range hint 345-345: Undefined name _. Ensure _ is imported or defined.

+ from gettext import gettext as _

Line range hint 385-385: Undefined name _. Ensure _ is imported or defined.

+ from gettext import gettext as _

Line range hint 923-923: Undefined name _. Ensure _ is imported or defined.

+ from gettext import gettext as _

Line range hint 1013-1013: Multiple statements on one line. Separate them for better readability.

- if sectionBreakType == "0" : text=_("continuous section break")
+ if sectionBreakType == "0":
+     text = _("continuous section break")

Line range hint 1018-1018: Multiple statements on one line. Separate them for better readability.

- elif sectionBreakType == "1" : text=_("new column section break")
+ elif sectionBreakType == "1":
+     text = _("new column section break")

Line range hint 1084-1084: Undefined name _. Ensure _ is imported or defined.

+ from gettext import gettext as _

Line range hint 1103-1103: Undefined name _. Ensure _ is imported or defined.

+ from gettext import gettext as _

Line range hint 1128-1128: Undefined name ngettext. Ensure ngettext is imported or defined.

+ from gettext import ngettext

Line range hint 1202-1202: Undefined name _. Ensure _ is imported or defined.

+ from gettext import gettext as _

Line range hint 1209-1209: Undefined name _. Ensure _ is imported or defined.

+ from gettext import gettext as _

Line range hint 1212-1212: Undefined name _. Ensure _ is imported or defined.

+ from gettext import gettext as _

Line range hint 1672-1672: Multiple statements on one line. Separate them for better readability.

- if not curLanguage or (not autoDialectSwitching and curLanguage.split('_')[0]==defaultLanguageRoot): curLanguage=defaultLanguage
+ if not curLanguage or (not autoDialectSwitching and curLanguage.split('_')[0] == defaultLanguageRoot):
+     curLanguage = defaultLanguage

Line range hint 1707-1707: Undefined name _. Ensure _ is imported or defined.

+ from gettext import gettext as _

Line range hint 1876-1876: Undefined name _. Ensure _ is imported or defined.

+ from gettext import gettext as _

Line range hint 1880-1880: Undefined name _. Ensure _ is imported or defined.

+ from gettext import gettext as _

Line range hint 1894-1894: Undefined name _. Ensure _ is imported or defined.

+ from gettext import gettext as _

Line range hint 1898-1898: Undefined name _. Ensure _ is imported or defined.

+ from gettext import gettext as _

Line range hint 1905-1905: Undefined name _. Ensure _ is imported or defined.

+ from gettext import gettext as _

Line range hint 1929-1929: Undefined name _. Ensure _ is imported or defined.

+ from gettext import gettext as _

Line range hint 1934-1934: Undefined name _. Ensure _ is imported or defined.

+ from gettext import gettext as _

Line range hint 1939-1939: Undefined name _. Ensure _ is imported or defined.

+ from gettext import gettext as _

Line range hint 1951-1951: Undefined name _. Ensure _ is imported or defined.

+ from gettext import gettext as _

Line range hint 1961-1961: Undefined name _. Ensure _ is imported or defined.

+ from gettext import gettext as _

Line range hint 1990-1990: Undefined name _. Ensure _ is imported or defined.

+ from gettext import gettext as _

Line range hint 1996-1996: Undefined name ngettext. Ensure ngettext is imported or defined.

+ from gettext import ngettext

Line range hint 1999-1999: Undefined name ngettext. Ensure ngettext is imported or defined.

+ from gettext import ngettext

Line range hint 2004-2004: Undefined name ngettext. Ensure ngettext is imported or defined.

+ from gettext import ngettext

Line range hint 2014-2014: Undefined name ngettext. Ensure ngettext is imported or defined.

+ from gettext import ngettext

Line range hint 2050-2050: Test for membership should be not in.

- if not (set(t) <= LINE_END_CHARS)
+ if set(t) not in LINE_END_CHARS

Line range hint 2207-2207: Undefined name ngettext. Ensure ngettext is imported or defined.

+ from gettext import ngettext

Line range hint 2349-2349: Undefined name _. Ensure _ is imported or defined.

+ from gettext import gettext as _

Line range hint 2415-2415: Undefined name _. Ensure _ is imported or defined.

+ from gettext import gettext as _

Line range hint 2422-2422: Undefined name _. Ensure _ is imported or defined.

+ from gettext import gettext as _

Line range hint 2436-2436: Comparison to None should be cond is None.

- if oldTableInfo == None:
+ if oldTableInfo is None:

Line range hint 2441-2441: Undefined name _. Ensure _ is imported or defined.

+ from gettext import gettext as _

Line range hint 2446-2446: Undefined name ngettext. Ensure ngettext is imported or defined.

+ from gettext import ngettext

Line range hint 2450-2450: Undefined name _. Ensure _ is imported or defined.

+ from gettext import gettext as _

Line range hint 2456-2456: Undefined name _. Ensure _ is imported or defined.

+ from gettext import gettext as _

Line range hint 2458-2458: Undefined name _. Ensure _ is imported or defined.

+ from gettext import gettext as _

Line range hint 2460-2460: Undefined name _. Ensure _ is imported or defined.

+ from gettext import gettext as _

Line range hint 2462-2462: Undefined name _. Ensure _ is imported or defined.

+ from gettext import gettext as _

Line range hint 2464-2464: Undefined name _. Ensure _ is imported or defined.

+ from gettext import gettext as _

Line range hint 2470-2470: Undefined name _. Ensure _ is imported or defined.

+ from gettext import gettext as _

Line range hint 2488-2488: Undefined name _. Ensure _ is imported or defined.

+ from gettext import gettext as _

Line range hint 2498-2498: Undefined name _. Ensure _ is imported or defined.

+ from gettext import gettext as _

Line range hint 2502-2502: Undefined name _. Ensure _ is imported or defined.

+ from gettext import gettext as _

Line range hint 2512-2512: Undefined name _. Ensure _ is imported or defined.

+ from gettext import gettext as _

Line range hint 2543-2543: Undefined name _. Ensure _ is imported or defined.

+ from gettext import gettext as _

Line range hint 2548-2548: Undefined name _. Ensure _ is imported or defined.

+ from gettext import gettext as _

Line range hint 2554-2554: Undefined name _. Ensure _ is imported or defined.

+ from gettext import gettext as _

Line range hint 2558-2558: Undefined name _. Ensure _ is imported or defined.

+ from gettext import gettext as _

Line range hint 2565-2565: Undefined name _. Ensure _ is imported or defined.

+ from gettext import gettext as _

Line range hint 2566-2566: Undefined name _. Ensure _ is imported or defined.

+ from gettext import gettext as _

Line range hint 2573-2573: Undefined name _. Ensure _ is imported or defined.

+ from gettext import gettext as _

Line range hint 2581-2581: Undefined name _. Ensure _ is imported or defined.

+ from gettext import gettext as _

Line range hint 2583-2583: Undefined name _. Ensure _ is imported or defined.

+ from gettext import gettext as _

Line range hint 2589-2589: Undefined name _. Ensure _ is imported or defined.

+ from gettext import gettext as _

Line range hint 2591-2591: Undefined name _. Ensure _ is imported or defined.

+ from gettext import gettext as _

Line range hint 2598-2598: Undefined name _. Ensure _ is imported or defined.

+ from gettext import gettext as _

Line range hint 2601-2601: Undefined name _. Ensure _ is imported or defined.

+ from gettext import gettext as _

Line range hint 2609-2609: Undefined name _. Ensure _ is imported or defined.

+ from gettext import gettext as _

Line range hint 2611-2611: Undefined name _. Ensure _ is imported or defined.

+ from gettext import gettext as _

Line range hint 2620-2620: Undefined name _. Ensure _ is imported or defined.

+ from gettext import gettext as _

Line range hint 2622-2622: Undefined name _. Ensure _ is imported or defined.

+ from gettext import gettext as _

Line range hint 2630-2630: Undefined name _. Ensure _ is imported or defined.

+ from gettext import gettext as _

Line range hint 2632-2632: Undefined name _. Ensure _ is imported or defined.

+ from gettext import gettext as _

Line range hint 2639-2639: Undefined name _. Ensure _ is imported or defined.

+ from gettext import gettext as _

Line range hint 2641-2641: Undefined name _. Ensure _ is imported or defined.

+ from gettext import gettext as _

Line range hint 2648-2648: Undefined name _. Ensure _ is imported or defined.

+ from gettext import gettext as _
source/braille.py (1)

Line range hint 582-582: Add missing import for _ to handle translations.

+ from gettext import gettext as _
source/gui/settingsDialogs.py (6)

Line range hint 4640-4640: Remove unnecessary f-string.

The f-string on this line does not contain any placeholders and can be simplified to a regular string.

- f"Current sizerDict: {self.sizerDict!r}"
+ "Current sizerDict: {!r}".format(self.sizerDict)

Line range hint 4773-4773: Ensure pgettext is defined before usage.

The pgettext function is used for translations but is not defined or imported in this context. Ensure that pgettext is imported from the appropriate module, typically gettext.

from gettext import pgettext

Line range hint 4952-4953: Ensure _ is defined before usage.

The _ function is used for translations but is not defined or imported in this context. Ensure that _ is imported from the appropriate module, typically gettext.

from gettext import gettext as _

Line range hint 79-79: Remove unused import.

The keyLabels import is not used anywhere in the file and should be removed to clean up the code.

- import keyLabels

Line range hint 108-108: Avoid multiple statements on one line.

Multiple statements on one line can reduce code readability. Consider splitting them into separate lines.

- except RuntimeError: updateCheck = None
+ except RuntimeError:
+     updateCheck = None

Line range hint 425-425: Ensure _ is defined before usage.

The _ function is used for translations but is not defined or imported in this context. Ensure that _ is imported from the appropriate module, typically gettext.

from gettext import gettext as _
Review Details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits Files that changed from the base of the PR and between 2d86502 and 8d0f95e.
Files selected for processing (8)
  • source/braille.py (2 hunks)
  • source/config/configSpec.py (2 hunks)
  • source/gui/settingsDialogs.py (4 hunks)
  • source/speech/speech.py (4 hunks)
  • source/textUtils.py (4 hunks)
  • tests/unit/test_textUtils.py (3 hunks)
  • user_docs/en/changes.md (1 hunks)
  • user_docs/en/userGuide.md (2 hunks)
Additional Context Used
Ruff (547)
source/braille.py (114)

8-8: os imported but unused


81-81: Undefined name _


84-84: Undefined name _


87-87: Undefined name _


90-90: Undefined name _


93-93: Undefined name _


96-96: Undefined name _


99-99: Undefined name _


102-102: Undefined name _


105-105: Undefined name _


108-108: Undefined name _


111-111: Undefined name _


114-114: Undefined name _


117-117: Undefined name _


120-120: Undefined name _


123-123: Undefined name _


126-126: Undefined name _


129-129: Undefined name _


132-132: Undefined name _


135-135: Undefined name _


138-138: Undefined name _


141-141: Undefined name _


144-144: Undefined name _


147-147: Undefined name _


150-150: Undefined name _


153-153: Undefined name _


159-159: Undefined name _


162-162: Undefined name _


165-165: Undefined name _


168-168: Undefined name _


171-171: Undefined name _


174-174: Undefined name _


177-177: Undefined name _


180-180: Undefined name _


183-183: Undefined name _


186-186: Undefined name _


189-189: Undefined name _


192-192: Undefined name _


195-195: Undefined name _


198-198: Undefined name _


201-201: Undefined name _


204-204: Undefined name _


207-207: Undefined name _


210-210: Undefined name _


212-212: Undefined name _


214-214: Undefined name _


216-216: Undefined name _


218-218: Undefined name _


220-220: Undefined name _


222-222: Undefined name _


224-224: Undefined name _


226-226: Undefined name _


228-228: Undefined name _


230-230: Undefined name _


232-232: Undefined name _


237-237: Undefined name _


247-247: Undefined name _


249-249: Undefined name _


251-251: Undefined name _


253-253: Undefined name _


255-255: Undefined name _


257-257: Undefined name _


259-259: Undefined name _


261-261: Undefined name _


263-263: Undefined name _


265-265: Undefined name _


267-267: Undefined name _


269-269: Undefined name _


271-271: Undefined name _


273-273: Undefined name _


275-275: Undefined name _


281-281: Undefined name _


292-292: Undefined name pgettext


294-294: Undefined name pgettext


296-296: Undefined name pgettext


298-298: Undefined name pgettext


300-300: Undefined name pgettext


302-302: Undefined name pgettext


304-304: Undefined name pgettext


310-310: Undefined name _


312-312: Undefined name _


314-314: Undefined name _


346-346: Undefined name _


350-350: Undefined name _


354-354: Undefined name _


364-364: Undefined name _


376-376: Undefined name _


380-380: Undefined name _


432-432: Do not use bare except


582-582: Undefined name _


592-592: Undefined name _


630-630: Undefined name _


636-636: Undefined name _


681-681: Undefined name _


685-685: Undefined name _


691-691: Undefined name _


695-695: Undefined name _


705-705: Undefined name _


709-709: Undefined name _


918-918: Undefined name _


1069-1069: Undefined name _


1082-1082: Undefined name _


1085-1085: Undefined name _


1088-1088: Undefined name _


1096-1096: Undefined name _


1133-1133: Do not use bare except


2433-2433: Do not use bare except


2844-2844: Do not use bare except


3194-3194: Undefined name _


3207-3207: Undefined name _


3217-3217: Undefined name _


3369-3369: Undefined name _


3407-3407: Undefined name _


3414-3414: Undefined name _

source/gui/settingsDialogs.py (322)

79-79: keyLabels imported but unused


108-108: Multiple statements on one line (colon)


425-425: Undefined name _


437-437: Undefined name _


494-494: Multiple statements on one line (colon)


545-545: Undefined name _


771-771: Undefined name _


775-775: Undefined name _


777-777: Undefined name _


779-779: Undefined name _


781-781: Undefined name _


783-783: Undefined name _


804-804: Undefined name _


810-810: Undefined name _


825-825: Undefined name _


833-833: Undefined name _


839-839: Undefined name _


847-847: Undefined name _


864-864: Undefined name _


876-876: Undefined name _


886-886: Undefined name _


902-902: Undefined name _


911-911: Undefined name _


920-920: Undefined name _


933-933: Undefined name _


942-942: Undefined name _


949-949: Undefined name _


952-952: Undefined name _


962-962: Undefined name _


964-964: Undefined name _


969-969: Do not use bare except


977-977: Undefined name _


977-977: Undefined name _


980-980: Undefined name _


980-980: Undefined name _


1002-1002: Undefined name _


1002-1002: Undefined name _


1024-1024: Undefined name _


1028-1028: Undefined name _


1032-1032: Undefined name _


1037-1037: Undefined name _


1054-1054: Undefined name _


1060-1060: Undefined name _


1082-1082: Undefined name _


1137-1137: Undefined name _


1145-1145: Undefined name _


1163-1163: Do not use bare except


1518-1518: Undefined name _


1544-1544: Undefined name _


1558-1558: Undefined name _


1569-1569: Undefined name _


1585-1585: Undefined name _


1593-1593: Undefined name _


1603-1603: Undefined name _


1630-1630: Undefined name _


1644-1644: Undefined name _


1655-1655: Undefined name _


1669-1669: Undefined name _


1683-1683: Undefined name _


1697-1697: Undefined name _


1740-1740: Undefined name _


1747-1747: Undefined name _


1762-1762: Undefined name _


1765-1765: Undefined name _


1774-1774: Undefined name _


1781-1781: Undefined name _


1790-1790: Do not use bare except


1795-1795: Undefined name _


1808-1808: Undefined name _


1818-1818: Undefined name _


1825-1825: Undefined name _


1832-1832: Undefined name _


1839-1839: Undefined name _


1847-1847: Undefined name _


1854-1854: Undefined name _


1861-1861: Undefined name _


1870-1870: Undefined name _


1881-1881: Undefined name _


1884-1884: Undefined name _


1907-1907: Undefined name _


1915-1915: Undefined name _


1922-1922: Undefined name _


1929-1929: Undefined name _


1937-1937: Do not use bare except


1943-1943: Undefined name _


1952-1952: Undefined name _


1959-1959: Undefined name _


1966-1966: Undefined name _


1985-1985: Undefined name _


1991-1991: Undefined name _


1997-1997: Undefined name _


2003-2003: Undefined name _


2009-2009: Undefined name _


2023-2023: Undefined name _


2029-2029: Undefined name _


2035-2035: Undefined name _


2041-2041: Undefined name _


2050-2050: Undefined name _


2059-2059: Undefined name _


2077-2077: Undefined name _


2085-2085: Undefined name _


2091-2091: Undefined name _


2095-2095: Undefined name _


2099-2099: Undefined name _


2103-2103: Undefined name _


2116-2116: Undefined name _


2123-2123: Undefined name _


2130-2130: Undefined name _


2137-2137: Undefined name _


2144-2144: Undefined name _


2151-2151: Undefined name _


2158-2158: Undefined name _


2171-2171: Undefined name _


2181-2181: Undefined name _


2191-2191: Undefined name _


2213-2213: Undefined name _


2220-2220: Undefined name _


2229-2229: Undefined name _


2238-2238: Undefined name _


2245-2245: Undefined name _


2252-2252: Undefined name _


2259-2259: Undefined name _


2266-2266: Undefined name _


2276-2276: Undefined name _


2286-2286: Undefined name _


2296-2296: Undefined name _


2306-2306: Undefined name _


2336-2336: Undefined name _


2340-2340: Undefined name _


2349-2349: Undefined name _


2357-2357: Undefined name _


2363-2363: Undefined name _


2369-2369: Undefined name _


2375-2375: Undefined name _


2385-2385: Undefined name _


2391-2391: Undefined name _


2401-2401: Undefined name _


2407-2407: Undefined name _


2413-2413: Undefined name _


2421-2421: Undefined name _


2427-2427: Undefined name _


2433-2433: Undefined name _


2439-2439: Undefined name _


2445-2445: Undefined name _


2453-2453: Undefined name _


2459-2459: Undefined name _


2467-2467: Undefined name _


2481-2481: Undefined name _


2493-2493: Undefined name _


2500-2500: Undefined name _


2507-2507: Undefined name _


2513-2513: Undefined name _


2521-2521: Undefined name _


2528-2528: Undefined name _


2536-2536: Undefined name _


2544-2544: Undefined name _


2552-2552: Undefined name _


2560-2560: Undefined name _


2565-2565: Undefined name _


2570-2570: Undefined name _


2575-2575: Undefined name _


2580-2580: Undefined name _


2586-2586: Undefined name _


2592-2592: Undefined name _


2598-2598: Undefined name _


2603-2603: Undefined name _


2609-2609: Undefined name _


2614-2614: Undefined name _


2619-2619: Undefined name _


2673-2673: Undefined name _


2679-2679: Undefined name _


2696-2696: Undefined name _


2699-2699: Undefined name _


2706-2706: Undefined name _


2714-2714: Undefined name _


2719-2719: Undefined name _


2729-2729: Undefined name _


2743-2743: Undefined name _


2751-2751: Undefined name _


2762-2762: Undefined name _


2776-2776: Undefined name _


2797-2797: Undefined name _


2818-2818: Undefined name _


2826-2826: Undefined name _


2888-2888: Undefined name _


2891-2891: Undefined name _


2899-2899: Undefined name _


2912-2912: Undefined name _


2919-2919: Undefined name _


2925-2925: Undefined name _


2937-2937: Undefined name _


2955-2955: Undefined name _


2965-2965: Undefined name _


2996-2996: Undefined name _


3004-3004: Undefined name _


3017-3017: Undefined name _


3027-3027: Undefined name _


3036-3036: Undefined name _


3041-3041: Undefined name _


3045-3045: Undefined name _


3049-3049: Undefined name _


3075-3075: Undefined name pgettext


3083-3083: Undefined name pgettext


3085-3085: Undefined name pgettext


3087-3087: Undefined name pgettext


3089-3089: Undefined name pgettext


3098-3098: Undefined name _


3106-3106: Undefined name _


3111-3111: Undefined name _


3114-3114: Undefined name _


3118-3118: Undefined name _


3137-3137: Undefined name pgettext


3146-3146: Undefined name pgettext


3148-3148: Undefined name pgettext


3150-3150: Undefined name pgettext


3152-3152: Undefined name pgettext


3160-3160: Undefined name _


3171-3171: Undefined name _


3180-3180: Undefined name _


3187-3187: Undefined name _


3196-3196: Undefined name _


3203-3203: Undefined name _


3212-3212: Undefined name _


3219-3219: Undefined name _


3226-3226: Undefined name _


3237-3237: Undefined name _


3242-3242: Undefined name _


3246-3246: Undefined name _


3250-3250: Undefined name _


3270-3270: Undefined name _


3282-3282: Undefined name _


3290-3290: Undefined name _


3293-3293: Undefined name _


3296-3296: Undefined name _


3300-3300: Undefined name _


3316-3316: Undefined name _


3322-3322: Undefined name _


3333-3333: Undefined name _


3340-3340: Undefined name _


3353-3353: Undefined name _


3361-3361: Undefined name _


3374-3374: Undefined name _


3376-3376: Local variable audioBox is assigned to but never used


3381-3381: Undefined name _


3392-3392: Undefined name _


3416-3416: Undefined name _


3434-3434: Undefined name _


3437-3437: Undefined name pgettext


3439-3439: Undefined name pgettext


3448-3448: Undefined name _


3468-3468: Undefined name _


3583-3583: Undefined name _


3588-3588: Undefined name _


3590-3590: Undefined name _


3617-3617: Undefined name _


3630-3630: Undefined name _


3670-3670: Undefined name _


3677-3677: Undefined name _


3692-3692: Undefined name _


3746-3746: Undefined name _


3755-3755: Undefined name _


3761-3761: Undefined name _


3770-3770: Undefined name _


3804-3804: Do not use bare except


3875-3875: Undefined name _


3878-3878: Undefined name _


3912-3912: Undefined name _


3921-3921: Do not use bare except


3930-3930: Undefined name _


3938-3938: Do not use bare except


3946-3946: Undefined name _


3962-3962: Undefined name _


3970-3970: Undefined name _


3979-3979: Undefined name _


3990-3990: Undefined name _


4010-4010: Undefined name _


4020-4020: Do not use bare except


4026-4026: Undefined name _


4036-4036: Do not use bare except


4045-4045: Undefined name _


4063-4063: Undefined name _


4079-4079: Undefined name _


4098-4098: Undefined name _


4116-4116: Undefined name _


4124-4124: Undefined name _


4135-4135: Do not use bare except


4140-4140: Undefined name _


4155-4155: Undefined name _


4161-4161: Undefined name _


4172-4172: Undefined name _


4246-4246: Undefined name _


4253-4253: Undefined name _


4259-4259: Undefined name _


4276-4276: Undefined name _


4281-4281: Undefined name _


4290-4290: Undefined name _


4380-4380: Undefined name _


4384-4384: Undefined name _


4557-4557: Undefined name _


4564-4564: Undefined name _


4610-4610: Undefined name _


4613-4613: Undefined name _


4640-4640: f-string without any placeholders


4653-4653: Undefined name _


4690-4690: Undefined name _


4726-4726: Undefined name _


4731-4731: Undefined name _


4759-4759: Undefined name _


4773-4773: Undefined name pgettext


4782-4782: Undefined name _


4792-4792: Undefined name _


4794-4794: Undefined name _


4796-4796: Undefined name _


4799-4799: Undefined name _


4803-4803: Undefined name _


4820-4820: Undefined name _


4829-4829: Undefined name _


4836-4836: Undefined name _


4844-4844: Undefined name _


4847-4847: Undefined name _


4952-4952: Undefined name _


4953-4953: Undefined name _

source/speech/speech.py (111)

31-31: Redefinition of unused speechCanceled from line 30


95-95: Undefined name pgettext


97-97: Undefined name pgettext


99-99: Undefined name pgettext


102-102: Undefined name pgettext


204-204: Undefined name _


345-345: Undefined name _


385-385: Undefined name _


923-923: Undefined name _


1013-1013: Multiple statements on one line (colon)


1018-1018: Multiple statements on one line (colon)


1084-1084: Undefined name _


1103-1103: Undefined name _


1128-1128: Undefined name ngettext


1202-1202: Undefined name _


1209-1209: Undefined name _


1212-1212: Undefined name _


1672-1672: Multiple statements on one line (colon)


1707-1707: Undefined name _


1876-1876: Undefined name _


1880-1880: Undefined name _


1894-1894: Undefined name _


1898-1898: Undefined name _


1905-1905: Undefined name _


1929-1929: Undefined name _


1934-1934: Undefined name _


1939-1939: Undefined name _


1951-1951: Undefined name _


1961-1961: Undefined name _


1990-1990: Undefined name _


1996-1996: Undefined name ngettext


1999-1999: Undefined name ngettext


2004-2004: Undefined name ngettext


2014-2014: Undefined name ngettext


2050-2050: Test for membership should be not in


2207-2207: Undefined name ngettext


2349-2349: Undefined name _


2415-2415: Undefined name _


2422-2422: Undefined name _


2436-2436: Comparison to None should be cond is None


2441-2441: Undefined name _


2446-2446: Undefined name ngettext


2450-2450: Undefined name _


2456-2456: Undefined name _


2458-2458: Undefined name _


2460-2460: Undefined name _


2462-2462: Undefined name _


2464-2464: Undefined name _


2470-2470: Undefined name _


2488-2488: Undefined name _


2498-2498: Undefined name _


2502-2502: Undefined name _


2512-2512: Undefined name _


2543-2543: Undefined name _


2548-2548: Undefined name _


2554-2554: Undefined name _


2558-2558: Undefined name _


2565-2565: Undefined name _


2566-2566: Undefined name _


2573-2573: Undefined name _


2581-2581: Undefined name _


2583-2583: Undefined name _


2589-2589: Undefined name _


2591-2591: Undefined name _


2598-2598: Undefined name _


2601-2601: Undefined name _


2609-2609: Undefined name _


2611-2611: Undefined name _


2620-2620: Undefined name _


2622-2622: Undefined name _


2630-2630: Undefined name _


2632-2632: Undefined name _


2639-2639: Undefined name _


2641-2641: Undefined name _


2648-2648: Undefined name _


2650-2650: Undefined name _


2656-2656: Undefined name _


2658-2658: Undefined name _


2666-2666: Undefined name _


2669-2669: Undefined name _


2673-2673: Undefined name _


2679-2679: Undefined name _


2681-2681: Undefined name _


2688-2688: Undefined name _


2690-2690: Undefined name _


2721-2721: Undefined name _


2723-2723: Undefined name _


2727-2727: Undefined name _


2729-2729: Undefined name _


2733-2733: Undefined name _


2735-2735: Undefined name _


2739-2739: Undefined name _


2741-2741: Undefined name _


2757-2757: Undefined name _


2762-2762: Undefined name _


2762-2762: Undefined name _


2762-2762: Undefined name _


2771-2771: Undefined name _


2774-2774: Undefined name _


2777-2777: Undefined name _


2781-2781: Undefined name _


2789-2789: Undefined name _


2793-2793: Undefined name _


2801-2801: Undefined name _


2804-2804: Undefined name _


2814-2814: Undefined name _


2817-2817: Undefined name _


2848-2848: Undefined name _


2863-2863: Undefined name _


2871-2871: Undefined name _


2875-2875: Undefined name _

Path-based Instructions (8)
source/config/configSpec.py (1)

Pattern **/*: focus on code smells, logic errors, edge cases, and serious issues.
Avoid minor issues such as linting, formatting and style issues.

tests/unit/test_textUtils.py (1)

Pattern **/*: focus on code smells, logic errors, edge cases, and serious issues.
Avoid minor issues such as linting, formatting and style issues.

source/textUtils.py (1)

Pattern **/*: focus on code smells, logic errors, edge cases, and serious issues.
Avoid minor issues such as linting, formatting and style issues.

source/speech/speech.py (1)

Pattern **/*: focus on code smells, logic errors, edge cases, and serious issues.
Avoid minor issues such as linting, formatting and style issues.

source/braille.py (1)

Pattern **/*: focus on code smells, logic errors, edge cases, and serious issues.
Avoid minor issues such as linting, formatting and style issues.

source/gui/settingsDialogs.py (1)

Pattern **/*: focus on code smells, logic errors, edge cases, and serious issues.
Avoid minor issues such as linting, formatting and style issues.

user_docs/en/userGuide.md (1)

Pattern **/*: focus on code smells, logic errors, edge cases, and serious issues.
Avoid minor issues such as linting, formatting and style issues.

user_docs/en/changes.md (1)

Pattern **/*: focus on code smells, logic errors, edge cases, and serious issues.
Avoid minor issues such as linting, formatting and style issues.

Additional comments not posted (21)
source/config/configSpec.py (2)

38-38: Ensure the unicodeNormalization feature flag is correctly integrated with the speech processing logic.

Verification successful

The unicodeNormalization feature flag is correctly integrated with the speech processing logic in source/speech/speech.py. The flag is used to conditionally apply the unicodeNormalize function to commands and text lists, ensuring that Unicode normalization is performed when the flag is enabled.

  • source/speech/speech.py, lines around if config.conf["speech"]["unicodeNormalization"]:
    • The flag is checked before applying unicodeNormalize to command.
    • The flag is checked before applying unicodeNormalize to elements in textList.

This confirms that the feature flag is properly utilized in the speech processing logic.

Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify the integration of the `unicodeNormalization` feature flag with the speech processing logic.

# Test: Search for the usage of the `unicodeNormalization` feature flag in the speech processing code.
rg --type python 'config\.conf\["speech"\]\["unicodeNormalization"\]'

Length of output: 102



Script:

#!/bin/bash
# Description: Verify the integration of the `unicodeNormalization` feature flag with the speech processing logic.

# Test: Search for the usage of the `unicodeNormalization` feature flag in the speech processing code.
rg --type py 'config\.conf\["speech"\]\["unicodeNormalization"\]'

Length of output: 221



Script:

#!/bin/bash
# Description: Extract the relevant lines from `source/speech/speech.py` where `unicodeNormalization` is used.

# Test: Display the lines around the usage of the `unicodeNormalization` feature flag in the speech processing code.
rg --type py --context 5 'config\.conf\["speech"\]\["unicodeNormalization"\]' source/speech/speech.py

Length of output: 846


86-86: Ensure the unicodeNormalization feature flag is correctly integrated with the braille processing logic.

Verification successful

unicodeNormalization feature flag is correctly integrated with the braille processing logic

The unicodeNormalization feature flag is used in the braille processing code as expected.

  • Location:
    • source/braille.py: if config.conf["braille"]["unicodeNormalization"] and not isUnicodeNormalized(self.rawText):
Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify the integration of the `unicodeNormalization` feature flag with the braille processing logic.

# Test: Search for the usage of the `unicodeNormalization` feature flag in the braille processing code.
rg --type python 'config\.conf\["braille"\]\["unicodeNormalization"\]'

Length of output: 103



Script:

#!/bin/bash
# Description: Verify the integration of the `unicodeNormalization` feature flag with the braille processing logic.

# Test: Search for the usage of the `unicodeNormalization` feature flag in the braille processing code.
rg --type-add 'python:*.py' 'config\.conf\["braille"\]\["unicodeNormalization"\]'

Length of output: 196

tests/unit/test_textUtils.py (4)

10-10: The import statement for UnicodeNormalizationOffsetConverter is correct.


235-237: The TestUnicodeNormalizationOffsetConverter class is well-defined and appropriately named.


238-255: The test_normalizedOfssetsSentence method is comprehensive and covers various normalization scenarios.


256-270: The test_normalizedOfssetsMixed and test_normalizedOfssetsDifferentOrder methods are well-implemented and cover edge cases effectively.

source/textUtils.py (9)

13-18: New imports for Unicode normalization functionality.


420-420: Setting the default Unicode normalization algorithm to "NFKC" is appropriate.


435-439: The __init__ method correctly initializes the UnicodeNormalizationOffsetConverter class and calculates offsets.


517-520: The encodedStringLength property correctly returns the length of the normalized string.


522-539: The strToEncodedOffsets method correctly converts string offsets to normalized string offsets using precomputed values.


541-558: The encodedToStrOffsets method correctly converts normalized string offsets to original string offsets using precomputed values.


561-563: The isUnicodeNormalized function correctly wraps unicodedata.is_normalized with a default normalization form.


566-568: The unicodeNormalize function correctly wraps unicodedata.normalize with a default normalization form.


570-573: The ENCODINGS_TO_CONVERTERS dictionary and getOffsetConverter function are correctly implemented to handle different encodings.

source/speech/speech.py (1)

1572-1573: Ensure Unicode normalization is applied only when the configuration is enabled.

Please verify that config.conf["speech"]["unicodeNormalization"] is correctly set and used.

source/braille.py (4)

500-510: Ensure converter is correctly initialized and used for Unicode normalization.

The logic for initializing and using the UnicodeNormalizationOffsetConverter looks correct. The normalization is applied conditionally based on the configuration.


514-519: Ensure louisHelper.translate parameters are correctly set.

The parameters for louisHelper.translate are correctly set, and the cursor position is handled appropriately.


521-527: Ensure braille and raw text positions are correctly processed using normalization offsets.

The logic for processing and mapping braille and raw text positions using normalization offsets is correctly implemented.


528-529: Ensure brailleToRawPos and rawToBraillePos lists are correctly updated.

The lists brailleToRawPos and rawToBraillePos are correctly updated based on the normalization offsets.

user_docs/en/userGuide.md (1)

2080-2080: Ensure the link fragment is valid and points to the correct section.

Please verify that the link fragment #SpeechUnicodeNormalization points to the correct section in the document.

user_docs/en/changes.md Show resolved Hide resolved
@@ -1806,6 +1806,30 @@ This option applies to all synthesizers, not just the currently active synthesiz
On by default, this option tells NVDA if the current voice's language can be trusted when processing symbols and characters.
If you find that NVDA is reading punctuation in the wrong language for a particular synthesizer or voice, you may wish to turn this off to force NVDA to use its global language setting instead.

##### Unicode normalization {#SpeechUnicodeNormalization}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a blank line before the heading for better readability.

+ 
##### Unicode normalization {#SpeechUnicodeNormalization}

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
##### Unicode normalization {#SpeechUnicodeNormalization}
##### Unicode normalization {#SpeechUnicodeNormalization}

user_docs/en/userGuide.md Show resolved Hide resolved
user_docs/en/userGuide.md Show resolved Hide resolved
user_docs/en/userGuide.md Show resolved Hide resolved
source/gui/settingsDialogs.py Show resolved Hide resolved
source/gui/settingsDialogs.py Show resolved Hide resolved
source/gui/settingsDialogs.py Show resolved Hide resolved
Comment on lines 441 to 515
def _calculateOffsets(self) -> tuple[tuple[int], tuple[int]]:
diff = list(ndiff(self.decoded, self.encoded))
diff.append("!") # Closing the diff
iOrigin = iNormalized = 0
originBuffer = ""
normalizedBuffer = ""
originToNormalizedDict = defaultdict(list)
normalizedToOriginDict = defaultdict(list)
originPending = normalizedPending = False
for char in diff:
if char[0] == "?":
raise RuntimeError("Unexpected entry in diff")
elif char[0] == "-":
originBuffer += char[2:]
originPending = True
elif char[0] == "+":
normalizedBuffer += char[2:]
normalizedPending = True
elif char[0] == " " and (
(not originPending and normalizedPending) or (originPending and not normalizedPending)
):
originBuffer += char[2:]
normalizedBuffer += char[2:]
else:
while originBuffer and normalizedBuffer:
originPart = ""
originPartLen = 0
normalizedPart = ""
normalizedPartLen = 0
for i in range(len(originBuffer)):
originPart = originBuffer[: (i + 1)]
normalizedPart = unicodedata.normalize(self.normalizationForm, originPart)
if (
originPart == normalizedPart
or not normalizedBuffer.startswith(normalizedPart)
):
continue
originPartLen = len(originPart)
originBuffer = originBuffer[originPartLen:]
normalizedPartLen = len(normalizedPart)
normalizedBuffer = normalizedBuffer[normalizedPartLen:]
break
originMultiplier = min(originPartLen / normalizedPartLen, 1)
normalizedMultiplier = min(normalizedPartLen / originPartLen, 1)
for i in range(max(originPartLen, normalizedPartLen)):
tempOrigin = iOrigin + int(i * originMultiplier)
tempNormalized = iNormalized + int(i * normalizedMultiplier)
originC = originPart[i] if i < originPartLen else None
if originC:
normalizedIndex = normalizedPart.find(originC)
if normalizedIndex != -1:
tempNormalized = iNormalized + normalizedIndex
normalizedC = normalizedPart[i] if i < normalizedPartLen else None
if normalizedC:
originIndex = originPart.find(normalizedC)
if originIndex != -1:
tempOrigin = iOrigin + originIndex
originToNormalizedDict[tempOrigin].append(tempNormalized)
normalizedToOriginDict[tempNormalized].append(tempOrigin)
iOrigin += originPartLen
iNormalized += normalizedPartLen
originPending = normalizedPending = False
if char[0] == " ":
originToNormalizedDict[iOrigin].append(iNormalized)
normalizedToOriginDict[iNormalized].append(iOrigin)
iOrigin += 1
iNormalized += 1
originResult = tuple(map(min, originToNormalizedDict.values()))
assert len(originResult) == len(self.decoded)
normalizedResult = tuple(map(min, normalizedToOriginDict.values()))
assert len(normalizedResult) == len(self.encoded)
return tuple((
originResult,
normalizedResult
))
Copy link
Contributor

@coderabbitai coderabbitai bot May 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _calculateOffsets method correctly calculates offsets between the original and normalized strings. Consider adding comments to explain the logic for better readability.

def _calculateOffsets(self) -> tuple[tuple[int], tuple[int]]:
    diff = list(ndiff(self.decoded, self.encoded))
    diff.append("!")  # Closing the diff
    iOrigin = iNormalized = 0
    originBuffer = ""
    normalizedBuffer = ""
    originToNormalizedDict = defaultdict(list)
    normalizedToOriginDict = defaultdict(list)
    originPending = normalizedPending = False
    for char in diff:
        if char[0] == "?":
            raise RuntimeError("Unexpected entry in diff")
        elif char[0] == "-":
            originBuffer += char[2:]
            originPending = True
        elif char[0] == "+":
            normalizedBuffer += char[2:]
            normalizedPending = True
        elif char[0] == " " and (
            (not originPending and normalizedPending) or (originPending and not normalizedPending)
        ):
            originBuffer += char[2:]
            normalizedBuffer += char[2:]
        else:
            while originBuffer and normalizedBuffer:
                originPart = ""
                originPartLen = 0
                normalizedPart = ""
                normalizedPartLen = 0
                for i in range(len(originBuffer)):
                    originPart = originBuffer[: (i + 1)]
                    normalizedPart = unicodedata.normalize(self.normalizationForm, originPart)
                    if (
                        originPart == normalizedPart
                        or not normalizedBuffer.startswith(normalizedPart)
                    ):
                        continue
                    originPartLen = len(originPart)
                    originBuffer = originBuffer[originPartLen:]
                    normalizedPartLen = len(normalizedPart)
                    normalizedBuffer = normalizedBuffer[normalizedPartLen:]
                    break
                originMultiplier = min(originPartLen / normalizedPartLen, 1)
                normalizedMultiplier = min(normalizedPartLen / originPartLen, 1)
                for i in range(max(originPartLen, normalizedPartLen)):
                    tempOrigin = iOrigin + int(i * originMultiplier)
                    tempNormalized = iNormalized + int(i * normalizedMultiplier)
                    originC = originPart[i] if i < originPartLen else None
                    if originC:
                        normalizedIndex = normalizedPart.find(originC)
                        if normalizedIndex != -1:
                            tempNormalized = iNormalized + normalizedIndex
                    normalizedC = normalizedPart[i] if i < normalizedPartLen else None
                    if normalizedC:
                        originIndex = originPart.find(normalizedC)
                        if originIndex != -1:
                            tempOrigin = iOrigin + originIndex
                    originToNormalizedDict[tempOrigin].append(tempNormalized)
                    normalizedToOriginDict[tempNormalized].append(tempOrigin)
                iOrigin += originPartLen
                iNormalized += normalizedPartLen
            originPending = normalizedPending = False
            if char[0] == " ":
                originToNormalizedDict[iOrigin].append(iNormalized)
                normalizedToOriginDict[iNormalized].append(iOrigin)
                iOrigin += 1
                iNormalized += 1
    originResult = tuple(map(min, originToNormalizedDict.values()))
    assert len(originResult) == len(self.decoded)
    normalizedResult = tuple(map(min, normalizedToOriginDict.values()))
    assert len(normalizedResult) == len(self.encoded)
    return tuple((
        originResult,
        normalizedResult
    ))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that _calculateOffsets needs more commenting. @coderabbitai Could you please generate a suggestion with commenting?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seanbudd, here's the _calculateOffsets method with added comments to enhance readability and understanding:

def _calculateOffsets(self) -> tuple[tuple[int], tuple[int]]:
    # Initialize a diff list between the decoded original and the normalized strings
    diff = list(ndiff(self.decoded, self.encoded))
    diff.append("!")  # Append a closing marker to the diff list

    # Initialize indices and buffers for tracking positions and changes
    iOrigin = iNormalized = 0
    originBuffer = ""
    normalizedBuffer = ""
    originToNormalizedDict = defaultdict(list)
    normalizedToOriginDict = defaultdict(list)
    originPending = normalizedPending = False

    # Iterate over each character in the diff list
    for char in diff:
        if char[0] == "?":
            raise RuntimeError("Unexpected entry in diff")
        elif char[0] == "-":
            # Accumulate deleted characters in the origin buffer
            originBuffer += char[2:]
            originPending = True
        elif char[0] == "+":
            # Accumulate added characters in the normalized buffer
            normalizedBuffer += char[2:]
            normalizedPending = True
        elif char[0] == " " and (
            (not originPending and normalizedPending) or (originPending and not normalizedPending)
        ):
            # Accumulate unchanged characters in both buffers
            originBuffer += char[2:]
            normalizedBuffer += char[2:]
        else:
            # Process accumulated characters in the buffers
            while originBuffer and normalizedBuffer:
                originPart = ""
                originPartLen = 0
                normalizedPart = ""
                normalizedPartLen = 0
                # Find the smallest part that can be normalized and still match the beginning of the normalized buffer
                for i in range(len(originBuffer)):
                    originPart = originBuffer[: (i + 1)]
                    normalizedPart = unicodedata.normalize(self.normalizationForm, originPart)
                    if (
                        originPart == normalizedPart
                        or not normalizedBuffer.startswith(normalizedPart)
                    ):
                        continue
                    originPartLen = len(originPart)
                    originBuffer = originBuffer[originPartLen:]
                    normalizedPartLen = len(normalizedPart)
                    normalizedBuffer = normalizedBuffer[normalizedPartLen:]
                    break
                # Map the original indices to the normalized indices
                originMultiplier = min(originPartLen / normalizedPartLen, 1)
                normalizedMultiplier = min(normalizedPartLen / originPartLen, 1)
                for i in range(max(originPartLen, normalizedPartLen)):
                    tempOrigin = iOrigin + int(i * originMultiplier)
                    tempNormalized = iNormalized + int(i * normalizedMultiplier)
                    originC = originPart[i] if i < originPartLen else None
                    if originC:
                        normalizedIndex = normalizedPart.find(originC)
                        if normalizedIndex != -1:
                            tempNormalized = iNormalized + normalizedIndex
                    normalizedC = normalizedPart[i] if i < normalizedPartLen else None
                    if normalizedC:
                        originIndex = originPart.find(normalizedC)
                        if originIndex != -1:
                            tempOrigin = iOrigin + originIndex
                    originToNormalizedDict[tempOrigin].append(tempNormalized)
                    normalizedToOriginDict[tempNormalized].append(tempOrigin)
                iOrigin += originPartLen
                iNormalized += normalizedPartLen
            originPending = normalizedPending = False
            if char[0] == " ":
                # Map indices directly for unchanged characters
                originToNormalizedDict[iOrigin].append(iNormalized)
                normalizedToOriginDict[iNormalized].append(iOrigin)
                iOrigin += 1
                iNormalized += 1
    # Finalize the mapping by selecting the minimum index for each original position
    originResult = tuple(map(min, originToNormalizedDict.values()))
    assert len(originResult) == len(self.decoded)
    normalizedResult = tuple(map(min, normalizedToOriginDict.values()))
    assert len(normalizedResult) == len(self.encoded)
    return tuple((
        originResult,
        normalizedResult
    ))

This version includes detailed comments that explain each part of the method, making it easier to understand how the offsets are calculated between the original and normalized strings.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LeonarddeR - can you add comments? When that's done and #16521 (comment) is addressed I think this is ready

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai I have just reworded some of your comments in a new commit. DO you think its appropriate now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LeonarddeR, I have reviewed the reworded comments in the _calculateOffsets method. The comments now provide clear explanations for each part of the method, enhancing readability and understanding. The logic and functionality of the method are well-documented, making it easier for future developers to follow the code.

The comments are appropriate and effectively describe the purpose and process of each section of the method. Great job on the rewording!

If there are any further adjustments or additional comments needed, feel free to let me know.

Analysis chain
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Fetch the latest version of the `_calculateOffsets` method from the `source/textUtils.py` file
cat source/textUtils.py | sed -n '/def _calculateOffsets/,/return tuple((/p'

Length of output: 4735


New Learnings Added
User: LeonarddeR
PR: nvaccess/nvda#16521
File: source/textUtils.py:0-0
Timestamp: 2024-05-18T11:21:34.925Z
Learning: LeonarddeR reworded the comments in the `_calculateOffsets` method to enhance readability and understanding. The comments now provide clear explanations for each part of the method.

Note: Learnings are effective only in the context of specific code segments. To apply general coding guidelines, please configure review instructions.

source/speech/speech.py Show resolved Hide resolved
@LeonarddeR
Copy link
Collaborator Author

It looks mainly as noise to me honestly

user_docs/en/userGuide.md Show resolved Hide resolved
user_docs/en/userGuide.md Outdated Show resolved Hide resolved
tests/unit/test_textUtils.py Outdated Show resolved Hide resolved
tests/unit/test_textUtils.py Outdated Show resolved Hide resolved
tests/unit/test_textUtils.py Outdated Show resolved Hide resolved
source/braille.py Show resolved Hide resolved
source/braille.py Show resolved Hide resolved
source/braille.py Show resolved Hide resolved
source/braille.py Show resolved Hide resolved
Comment on lines 441 to 515
def _calculateOffsets(self) -> tuple[tuple[int], tuple[int]]:
diff = list(ndiff(self.decoded, self.encoded))
diff.append("!") # Closing the diff
iOrigin = iNormalized = 0
originBuffer = ""
normalizedBuffer = ""
originToNormalizedDict = defaultdict(list)
normalizedToOriginDict = defaultdict(list)
originPending = normalizedPending = False
for char in diff:
if char[0] == "?":
raise RuntimeError("Unexpected entry in diff")
elif char[0] == "-":
originBuffer += char[2:]
originPending = True
elif char[0] == "+":
normalizedBuffer += char[2:]
normalizedPending = True
elif char[0] == " " and (
(not originPending and normalizedPending) or (originPending and not normalizedPending)
):
originBuffer += char[2:]
normalizedBuffer += char[2:]
else:
while originBuffer and normalizedBuffer:
originPart = ""
originPartLen = 0
normalizedPart = ""
normalizedPartLen = 0
for i in range(len(originBuffer)):
originPart = originBuffer[: (i + 1)]
normalizedPart = unicodedata.normalize(self.normalizationForm, originPart)
if (
originPart == normalizedPart
or not normalizedBuffer.startswith(normalizedPart)
):
continue
originPartLen = len(originPart)
originBuffer = originBuffer[originPartLen:]
normalizedPartLen = len(normalizedPart)
normalizedBuffer = normalizedBuffer[normalizedPartLen:]
break
originMultiplier = min(originPartLen / normalizedPartLen, 1)
normalizedMultiplier = min(normalizedPartLen / originPartLen, 1)
for i in range(max(originPartLen, normalizedPartLen)):
tempOrigin = iOrigin + int(i * originMultiplier)
tempNormalized = iNormalized + int(i * normalizedMultiplier)
originC = originPart[i] if i < originPartLen else None
if originC:
normalizedIndex = normalizedPart.find(originC)
if normalizedIndex != -1:
tempNormalized = iNormalized + normalizedIndex
normalizedC = normalizedPart[i] if i < normalizedPartLen else None
if normalizedC:
originIndex = originPart.find(normalizedC)
if originIndex != -1:
tempOrigin = iOrigin + originIndex
originToNormalizedDict[tempOrigin].append(tempNormalized)
normalizedToOriginDict[tempNormalized].append(tempOrigin)
iOrigin += originPartLen
iNormalized += normalizedPartLen
originPending = normalizedPending = False
if char[0] == " ":
originToNormalizedDict[iOrigin].append(iNormalized)
normalizedToOriginDict[iNormalized].append(iOrigin)
iOrigin += 1
iNormalized += 1
originResult = tuple(map(min, originToNormalizedDict.values()))
assert len(originResult) == len(self.decoded)
normalizedResult = tuple(map(min, normalizedToOriginDict.values()))
assert len(normalizedResult) == len(self.encoded)
return tuple((
originResult,
normalizedResult
))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that _calculateOffsets needs more commenting. @coderabbitai Could you please generate a suggestion with commenting?

@seanbudd seanbudd marked this pull request as draft May 17, 2024 02:48
@LeonarddeR LeonarddeR marked this pull request as ready for review May 20, 2024 09:57
assert len(originResult) == len(self.decoded)
normalizedResult = tuple(map(min, normalizedToOriginDict.values()))
assert len(normalizedResult) == len(self.encoded)
return tuple((
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this a tuple as is?

Suggested change
return tuple((
return (

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are right. I personally notice that I prefer the more explicit syntax because only brackets can also produce a generator if there is a list comprehension between the brackets. But as far as I'm concerned, the final judgment is up to you.
@coderabbitai what do you think about the suggestion made here?

Copy link
Member

@Qchristensen Qchristensen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks Leonard!

@seanbudd seanbudd merged commit 57ce236 into nvaccess:master May 21, 2024
1 check passed
@LeonarddeR LeonarddeR deleted the unicodeNormalizationOffsetConverter branch May 22, 2024 05:21
@burmancomp
Copy link
Contributor

I am getting empty braille line when reading given message in thunderbird. I encountered likely problem earlier as well with squirrelmail (web mail).

This happens when braille is tethered to review and unicode normalization is enabled for braille.

Here are log lines (running from source with current main branch code):

Input: kb(laptop):downArrow
DEBUGWARNING - NVDAObjects.IAccessible.normalizeIA2TextFormatField (15:11:47.970) - MainThread (13568):
Unsupported value for text-align attribute: "start"
IO - speech.speech.speak (15:11:47.971) - MainThread (13568):
Speaking ['removed original text\xa0\n']
DEBUGWARNING - NVDAObjects.IAccessible.normalizeIA2TextFormatField (15:11:47.992) - MainThread (13568):
Unsupported value for text-align attribute: "start"
DEBUGWARNING - braille.BrailleHandler._handlePendingUpdate (15:11:47.992) - MainThread (13568):
Region update failed for ReviewCursorManagerRegion ('removed original text\xa0 '), object probably died
Traceback (most recent call last):
File "braille.py", line 2621, in _handlePendingUpdate
region.update()
File "braille.py", line 1378, in update
super(TextInfoRegion, self).update()
File "braille.py", line 503, in update
converter = UnicodeNormalizationOffsetConverter(self.rawText)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "textUtils.py", line 439, in init
self.computedStrToEncodedOffsets, self.computedEncodedToStrOffsets = self._calculateOffsets()
^^^^^^^^^^^^^^^^^^^^^^^^
File "textUtils.py", line 496, in _calculateOffsets
originMultiplier = min(originPartLen / normalizedPartLen, 1)
~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~
ZeroDivisionError: division by zero
IO - braille.BrailleBuffer.update (15:11:47.993) - MainThread (13568):
Braille regions text: ['removed original text\xa0 ']
IO - braille.BrailleHandler.update (15:11:47.993) - MainThread (13568):
Braille window dots: -

@LeonarddeR
Copy link
Collaborator Author

Thanks for reporting. Easily reproducible on the python console:
textUtils.UnicodeNormalizationOffsetConverter("removed original text\xa0 ")
I will handle this as part of #16584

@ABuffEr
Copy link
Contributor

ABuffEr commented May 22, 2024

Hi,
viewing a FB post starting with styled chars, I noticed that requesting window title to NVDA I get the not normalized string.
Anyway, the post is correctly readable by voice, that is a great, great enhancement!

@LeonarddeR
Copy link
Collaborator Author

LeonarddeR commented May 22, 2024 via email

@XLTechie
Copy link
Collaborator

@LeonarddeR Can you edit the PR description to add that it closes #4631 and #11570, in case this gets reverted?

@LeonarddeR
Copy link
Collaborator Author

These are not yet closed. #16584 will probably close them.

seanbudd pushed a commit that referenced this pull request May 27, 2024
Fixup of #16521
Fixes #11570
Partial fix for #4631

Summary of the issue:
It turns out that rawTextTypeforms on a region may be None, this was an oversight on my end.
cursorPos may also be None.
@burmancomp reported a zero division error in case a string ended with a non breaking space and a space.
Description of user facing changes
No longer errors in the log when getting flash messages in Thunderbird and/or reading messages in WhatsApp UWP.

Description of development approach
Explicitly check for None typeforms and cursorPos, thereby improving readability as well.
Improve the calculateOffsets method in textUtils to ensure it can handle the case as reported by @burmancomp
seanbudd pushed a commit that referenced this pull request Jul 3, 2024
…ch panel (#16810)

Fix-up of #16521.

Summary of the issue:
When opening the speech settings panel, the checkbox "Report 'Normalized' when navigating by character" is enabled, no matter the value of the "Unicode normalization" combo-box.

Description of user facing changes
When opening the speech settings panel, the checkbox "Report 'Normalized' when navigating by character" will be enabled or disabled (greyed out), depending on the value of the "Unicode normalization" combo-box, as it is already the case when the selection of this combo-box is modified.

Description of development approach
As done for "Ignore blank lines for line indentation reporting" checkbox in Document formatting settings, explicitly call the .Enable method when the panel is initialized.
@mush42
Copy link

mush42 commented Jul 22, 2024

@LeonarddeR

What form of unicode normalization is used?

Both NFKC and NFKD causes a big issue with vocalized (aka diacritized) arabic text.
The issue is that both forms of normalization switches the order of some diacritical marks. At least eSpeak-ng, will mispronounce normalized-diacritized Arabic text. Also Sonata will face the same issue since it relies on eSpeak-ng for phonemization.

Best
Musharraf

@SaschaCowley
Copy link
Member

@mush42 do you know if an issue has been filed with eSpeak NG for that problem?

@LeonarddeR
Copy link
Collaborator Author

@mush42 In NVDA< we're using the NFKC algorithm.
I agree with @SaschaCowley that this is probably something that should be fixed upstream, i.e. ESpeak should definitely support speaking normalized Arabic text.
That said, there's a reason why this is not default yet.
If you feel the discussion should continu here, please create a new issue.

seanbudd pushed a commit that referenced this pull request Sep 9, 2024
Fixes #16616

Summary of the issue:
In #16521, Unicode normalization was added, but it is disabled by default.

Description of user facing changes
Unicode normalization is now enabled by default for Speech.

Description of development approach
Change default values
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
conceptApproved Similar 'triaged' for issues, PR accepted in theory, implementation needs review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add optional unicode normalization before passing strings to speech or braille
10 participants