-
-
Notifications
You must be signed in to change notification settings - Fork 651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Unicode Normalization to speech and braille #16521
Add Unicode Normalization to speech and braille #16521
Conversation
I'm running a build that contains this PR and am receiving an error pretty frequently when trying to navigate with speech normalization enabled. Could not execute function event_gainFocus defined in appModules.nvda module; kwargs: {} During handling of the above exception, another exception occurred: Traceback (most recent call last): CC @codeofdusk |
It looks like there is an erroneous type annotation in |
…a speech sequence with only strings. Thanks @Simon818
@Qchristensen Could you have a look at the proposed documentation changes? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are my first comments. I may have other comments later since I have not yet tested this PR.
|
||
1. Decomposition of some ligatures, Including "ij" (ligature ij) to their two letter form ("ij"). | ||
|
||
1. Stable ordering of modifiers in composite characters, for example in ancient Hebrew. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1. Stable ordering of modifiers in composite characters, for example in ancient Hebrew. | |
1. Stable ordering of modifiers in composite characters, for example in ancient Hebrew. |
Could you clarify the impact of this on speech?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The modifiers will be spoken in normalized order rather than in the order they are in the text. For speech, this could be a bit confusing, but that's why this is optional. It has no impact on speech with the hebrew Vocalizer voice, for example. I'm afraid there is no normalization form where we can disable this behavior. On the other hand, for braille this can be of great benefit, as it requires a braille table to account for the normalized form of the modifiers only.
Co-authored-by: Cyrille Bougot <cyrille.bougot2@laposte.net>
Co-authored-by: Cyrille Bougot <cyrille.bougot2@laposte.net>
Hi - testing code rabbit on this PR for an AI review. @coderabbitai review |
@seanbudd, I have identified the issue causing the error reported by Simon818. The def getPropertiesSpeech( # noqa: C901
reason: OutputReason = OutputReason.QUERY,
**propertyValues
) -> SpeechSequence:
textList: SpeechSequence = []
name: Optional[str] = propertyValues.get('name')
if name:
textList.append(name)
if 'role' in propertyValues:
role: controlTypes.Role = propertyValues['role']
speakRole=True
elif '_role' in propertyValues:
speakRole=False
role: controlTypes.Role = propertyValues['_role']
else:
speakRole=False
role=controlTypes.Role.UNKNOWN
role = controlTypes.Role(role)
value: Optional[str] = propertyValues.get('value') if role not in controlTypes.silentValuesForRoles else None
cellCoordsText: Optional[str] = propertyValues.get('cellCoordsText')
rowNumber = propertyValues.get('rowNumber')
columnNumber = propertyValues.get('columnNumber')
includeTableCellCoords = propertyValues.get('includeTableCellCoords', True)
if role == controlTypes.Role.CHARTELEMENT:
speakRole = False
roleText: Optional[str] = propertyValues.get('roleText')
if (
speakRole
and (
roleText
or reason not in (
OutputReason.SAYALL,
OutputReason.CARET,
OutputReason.FOCUS,
OutputReason.QUICKNAV
)
or not (
name
or value
or cellCoordsText
or rowNumber
or columnNumber
)
or role not in controlTypes.silentRolesOnFocus
)
and (
role != controlTypes.Role.MATH
or reason not in (
OutputReason.CARET,
OutputReason.SAYALL
)
)):
textList.append(roleText if roleText else role.displayString)
if value:
textList.append(value)
states = propertyValues.get('states')
realStates=propertyValues.get('_states',states)
negativeStates=propertyValues.get('negativeStates',set())
if states is not None or negativeStates:
if states is None:
states = set()
labelStates = controlTypes.processAndLabelStates(role, realStates, reason, states, negativeStates)
textList.extend(labelStates)
description: Optional[str] = propertyValues.get('description')
if description:
textList.append(description)
keyboardShortcut: Optional[str] = propertyValues.get('keyboardShortcut')
textList.extend(getKeyboardShortcutsSpeech(keyboardShortcut))
if includeTableCellCoords and cellCoordsText:
textList.append(cellCoordsText)
if cellCoordsText or rowNumber or columnNumber:
tableID = propertyValues.get("_tableID")
sameTable = (tableID and tableID == _speechState.oldTableID)
if tableID and not sameTable:
_speechState.oldTableID = tableID
rowSpan = propertyValues.get("rowSpan") or 1
columnSpan = propertyValues.get("columnSpan") or 1
if rowNumber and (
not sameTable
or rowNumber != _speechState.oldRowNumber
or rowSpan != _speechState.oldRowSpan
):
rowHeaderText: Optional[str] = propertyValues.get("rowHeaderText")
if rowHeaderText:
textList.append(rowHeaderText)
if includeTableCellCoords and not cellCoordsText:
rowNumberTranslation: str = _("row %s") % rowNumber
textList.append(rowNumberTranslation)
if rowSpan>1 and columnSpan<=1:
rowSpanAddedTranslation: str = _("through {endRow}").format(endRow=rowNumber + rowSpan - 1)
textList.append(rowSpanAddedTranslation)
_speechState.oldRowNumber = rowNumber
_speechState.oldRowSpan = rowSpan
if columnNumber and (
not sameTable
or columnNumber != _speechState.oldColumnNumber
or columnSpan != _speechState.oldColumnSpan
):
columnHeaderText: Optional[str] = propertyValues.get("columnHeaderText")
if columnHeaderText:
textList.append(columnHeaderText)
if includeTableCellCoords and not cellCoordsText:
colNumberTranslation: str = _("column %s") % columnNumber
textList.append(colNumberTranslation)
if columnSpan>1 and rowSpan<=1:
colSpanAddedTranslation: str = _("through {endCol}").format(endCol=columnNumber + columnSpan - 1)
textList.append(colSpanAddedTranslation)
_speechState.oldColumnNumber = columnNumber
_speechState.oldColumnSpan = columnSpan
if includeTableCellCoords and not cellCoordsText and rowSpan>1 and columnSpan>1:
rowColSpanTranslation: str = _("through row {row} column {column}").format(
row=rowNumber + rowSpan - 1,
column=columnNumber + columnSpan - 1
)
textList.append(rowColSpanTranslation)
rowCount=propertyValues.get('rowCount',0)
columnCount=propertyValues.get('columnCount',0)
rowAndColumnCountText = _rowAndColumnCountText(rowCount, columnCount)
if rowAndColumnCountText:
textList.append(rowAndColumnCountText)
if rowCount or columnCount:
_speechState.oldTableID = None
isCurrent = propertyValues.get('current', controlTypes.IsCurrent.NO)
if isCurrent != controlTypes.IsCurrent.NO:
textList.append(isCurrent.displayString)
hasDetails = propertyValues.get('hasDetails', False)
if hasDetails:
detailsRoles: _AnnotationRolesT = propertyValues.get("detailsRoles", tuple())
if detailsRoles:
roleStrings = (role.displayString if role else _("details") for role in detailsRoles)
for roleString in roleStrings:
textList.append(
_("has %s") % roleString
)
else:
textList.append(
_("has details")
)
placeholder: Optional[str] = propertyValues.get('placeholder', None)
if placeholder:
textList.append(placeholder)
indexInGroup=propertyValues.get('positionInfo_indexInGroup',0)
similarItemsInGroup=propertyValues.get('positionInfo_similarItemsInGroup',0)
if 0<indexInGroup<=similarItemsInGroup:
itemPosTranslation: str = _("{number} of {total}").format(
number=indexInGroup,
total=similarItemsInGroup
)
textList.append(itemPosTranslation)
if 'positionInfo_level' in propertyValues:
level=propertyValues.get('positionInfo_level',None)
role=propertyValues.get('role',None)
if level is not None:
levelTranslation: str = _('level %s') % level
if (
role in (controlTypes.Role.TREEVIEWITEM, controlTypes.Role.LISTITEM)
and level != _speechState.oldTreeLevel
):
textList.insert(0, levelTranslation)
_speechState.oldTreeLevel = level
else:
textList.append(levelTranslation)
errorMessage: str | None = propertyValues.get("errorMessage", None)
if errorMessage:
textList.append(errorMessage)
if config.conf["speech"]["unicodeNormalization"]:
textList = [
unicodeNormalize(t) if isinstance(t, str) else t
for t in textList
]
types.logBadSequenceTypes(textList)
return textList This update ensures that only strings are passed to Analysis chainScripts executedThe following scripts were executed for the analysis: Script: #!/bin/bash
# Search for the definition of the getPropertiesSpeech function in the codebase.
ast-grep --lang python --pattern $'def getPropertiesSpeech($_):\n $$$'
# Search for the usage of the unicodeNormalize function in the codebase.
ast-grep --lang python --pattern $'def unicodeNormalize($_):\n $$$'
# Search for the CharacterModeCommand class or type in the codebase.
ast-grep --lang python --pattern $'class CharacterModeCommand($_):\n $$$'
Length of output: 1517 Script: #!/bin/bash
# Search for the definition of the getPropertiesSpeech function in the speech module.
ast-grep --lang python --pattern $'def getPropertiesSpeech($_):\n $$$' source/speech
# Search for the definition of the unicodeNormalize function in the textUtils module.
ast-grep --lang python --pattern $'def unicodeNormalize($_):\n $$$' source/textUtils
Length of output: 239 Script: #!/bin/bash
# Locate the textUtils module in the repository.
fd textUtils.py
# Locate the getPropertiesSpeech function in the repository.
ast-grep --lang python --pattern $'def getPropertiesSpeech($_):\n $$$'
Length of output: 140 Script: #!/bin/bash
# Search for the definition of the unicodeNormalize function in the source/textUtils.py file.
ast-grep --lang python --pattern $'def unicodeNormalize($_):\n $$$' source/textUtils.py
# Locate the getPropertiesSpeech function or method in the repository.
rg 'def getPropertiesSpeech' --glob '*.py'
Length of output: 198 Script: #!/bin/bash
# Display the implementation of the unicodeNormalize function in source/textUtils.py.
sed -n '/def unicodeNormalize/,/^$/p' source/textUtils.py
# Display the implementation of the getPropertiesSpeech function in source/speech/speech.py.
sed -n '/def getPropertiesSpeech/,/^$/p' source/speech/speech.py
Length of output: 47901 Actions PerformedReview triggered.
|
WalkthroughThe recent changes introduce Unicode normalization to NVDA's speech and braille output, enhancing the handling of complex characters like ligatures and composed characters. This feature can be enabled via the settings dialog for both speech and braille, ensuring compatibility with various synthesizers and braille tables. Changes
Assessment against linked issues
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 12
Out of diff range and nitpick comments (78)
source/speech/speech.py (71)
Line range hint
95-102
: Undefined namepgettext
. Ensurepgettext
is imported or defined.+ from gettext import pgettext @property def _displayStringLabels(self) -> dict[Self, str]: return { # Translators: Name of the speech mode which disables speech output. self.off: pgettext("speechModes", "off"), # Translators: Name of the speech mode which will cause NVDA to beep instead of speaking. self.beeps: pgettext("speechModes", "beeps"), # Translators: Name of the speech mode which causes NVDA to speak normally. self.talk: pgettext("speechModes", "talk"), # Translators: Name of the on-demand speech mode, # in which NVDA only speaks in response to commands that report content. self.onDemand: pgettext("speechModes", "on-demand"), }
Line range hint
204-204
: Undefined name_
. Ensure_
is imported or defined.+ from gettext import gettext as _
Line range hint
345-345
: Undefined name_
. Ensure_
is imported or defined.+ from gettext import gettext as _
Line range hint
385-385
: Undefined name_
. Ensure_
is imported or defined.+ from gettext import gettext as _
Line range hint
923-923
: Undefined name_
. Ensure_
is imported or defined.+ from gettext import gettext as _
Line range hint
1013-1013
: Multiple statements on one line. Separate them for better readability.- if sectionBreakType == "0" : text=_("continuous section break") + if sectionBreakType == "0": + text = _("continuous section break")
Line range hint
1018-1018
: Multiple statements on one line. Separate them for better readability.- elif sectionBreakType == "1" : text=_("new column section break") + elif sectionBreakType == "1": + text = _("new column section break")
Line range hint
1084-1084
: Undefined name_
. Ensure_
is imported or defined.+ from gettext import gettext as _
Line range hint
1103-1103
: Undefined name_
. Ensure_
is imported or defined.+ from gettext import gettext as _
Line range hint
1128-1128
: Undefined namengettext
. Ensurengettext
is imported or defined.+ from gettext import ngettext
Line range hint
1202-1202
: Undefined name_
. Ensure_
is imported or defined.+ from gettext import gettext as _
Line range hint
1209-1209
: Undefined name_
. Ensure_
is imported or defined.+ from gettext import gettext as _
Line range hint
1212-1212
: Undefined name_
. Ensure_
is imported or defined.+ from gettext import gettext as _
Line range hint
1672-1672
: Multiple statements on one line. Separate them for better readability.- if not curLanguage or (not autoDialectSwitching and curLanguage.split('_')[0]==defaultLanguageRoot): curLanguage=defaultLanguage + if not curLanguage or (not autoDialectSwitching and curLanguage.split('_')[0] == defaultLanguageRoot): + curLanguage = defaultLanguage
Line range hint
1707-1707
: Undefined name_
. Ensure_
is imported or defined.+ from gettext import gettext as _
Line range hint
1876-1876
: Undefined name_
. Ensure_
is imported or defined.+ from gettext import gettext as _
Line range hint
1880-1880
: Undefined name_
. Ensure_
is imported or defined.+ from gettext import gettext as _
Line range hint
1894-1894
: Undefined name_
. Ensure_
is imported or defined.+ from gettext import gettext as _
Line range hint
1898-1898
: Undefined name_
. Ensure_
is imported or defined.+ from gettext import gettext as _
Line range hint
1905-1905
: Undefined name_
. Ensure_
is imported or defined.+ from gettext import gettext as _
Line range hint
1929-1929
: Undefined name_
. Ensure_
is imported or defined.+ from gettext import gettext as _
Line range hint
1934-1934
: Undefined name_
. Ensure_
is imported or defined.+ from gettext import gettext as _
Line range hint
1939-1939
: Undefined name_
. Ensure_
is imported or defined.+ from gettext import gettext as _
Line range hint
1951-1951
: Undefined name_
. Ensure_
is imported or defined.+ from gettext import gettext as _
Line range hint
1961-1961
: Undefined name_
. Ensure_
is imported or defined.+ from gettext import gettext as _
Line range hint
1990-1990
: Undefined name_
. Ensure_
is imported or defined.+ from gettext import gettext as _
Line range hint
1996-1996
: Undefined namengettext
. Ensurengettext
is imported or defined.+ from gettext import ngettext
Line range hint
1999-1999
: Undefined namengettext
. Ensurengettext
is imported or defined.+ from gettext import ngettext
Line range hint
2004-2004
: Undefined namengettext
. Ensurengettext
is imported or defined.+ from gettext import ngettext
Line range hint
2014-2014
: Undefined namengettext
. Ensurengettext
is imported or defined.+ from gettext import ngettext
Line range hint
2050-2050
: Test for membership should benot in
.- if not (set(t) <= LINE_END_CHARS) + if set(t) not in LINE_END_CHARS
Line range hint
2207-2207
: Undefined namengettext
. Ensurengettext
is imported or defined.+ from gettext import ngettext
Line range hint
2349-2349
: Undefined name_
. Ensure_
is imported or defined.+ from gettext import gettext as _
Line range hint
2415-2415
: Undefined name_
. Ensure_
is imported or defined.+ from gettext import gettext as _
Line range hint
2422-2422
: Undefined name_
. Ensure_
is imported or defined.+ from gettext import gettext as _
Line range hint
2436-2436
: Comparison toNone
should becond is None
.- if oldTableInfo == None: + if oldTableInfo is None:
Line range hint
2441-2441
: Undefined name_
. Ensure_
is imported or defined.+ from gettext import gettext as _
Line range hint
2446-2446
: Undefined namengettext
. Ensurengettext
is imported or defined.+ from gettext import ngettext
Line range hint
2450-2450
: Undefined name_
. Ensure_
is imported or defined.+ from gettext import gettext as _
Line range hint
2456-2456
: Undefined name_
. Ensure_
is imported or defined.+ from gettext import gettext as _
Line range hint
2458-2458
: Undefined name_
. Ensure_
is imported or defined.+ from gettext import gettext as _
Line range hint
2460-2460
: Undefined name_
. Ensure_
is imported or defined.+ from gettext import gettext as _
Line range hint
2462-2462
: Undefined name_
. Ensure_
is imported or defined.+ from gettext import gettext as _
Line range hint
2464-2464
: Undefined name_
. Ensure_
is imported or defined.+ from gettext import gettext as _
Line range hint
2470-2470
: Undefined name_
. Ensure_
is imported or defined.+ from gettext import gettext as _
Line range hint
2488-2488
: Undefined name_
. Ensure_
is imported or defined.+ from gettext import gettext as _
Line range hint
2498-2498
: Undefined name_
. Ensure_
is imported or defined.+ from gettext import gettext as _
Line range hint
2502-2502
: Undefined name_
. Ensure_
is imported or defined.+ from gettext import gettext as _
Line range hint
2512-2512
: Undefined name_
. Ensure_
is imported or defined.+ from gettext import gettext as _
Line range hint
2543-2543
: Undefined name_
. Ensure_
is imported or defined.+ from gettext import gettext as _
Line range hint
2548-2548
: Undefined name_
. Ensure_
is imported or defined.+ from gettext import gettext as _
Line range hint
2554-2554
: Undefined name_
. Ensure_
is imported or defined.+ from gettext import gettext as _
Line range hint
2558-2558
: Undefined name_
. Ensure_
is imported or defined.+ from gettext import gettext as _
Line range hint
2565-2565
: Undefined name_
. Ensure_
is imported or defined.+ from gettext import gettext as _
Line range hint
2566-2566
: Undefined name_
. Ensure_
is imported or defined.+ from gettext import gettext as _
Line range hint
2573-2573
: Undefined name_
. Ensure_
is imported or defined.+ from gettext import gettext as _
Line range hint
2581-2581
: Undefined name_
. Ensure_
is imported or defined.+ from gettext import gettext as _
Line range hint
2583-2583
: Undefined name_
. Ensure_
is imported or defined.+ from gettext import gettext as _
Line range hint
2589-2589
: Undefined name_
. Ensure_
is imported or defined.+ from gettext import gettext as _
Line range hint
2591-2591
: Undefined name_
. Ensure_
is imported or defined.+ from gettext import gettext as _
Line range hint
2598-2598
: Undefined name_
. Ensure_
is imported or defined.+ from gettext import gettext as _
Line range hint
2601-2601
: Undefined name_
. Ensure_
is imported or defined.+ from gettext import gettext as _
Line range hint
2609-2609
: Undefined name_
. Ensure_
is imported or defined.+ from gettext import gettext as _
Line range hint
2611-2611
: Undefined name_
. Ensure_
is imported or defined.+ from gettext import gettext as _
Line range hint
2620-2620
: Undefined name_
. Ensure_
is imported or defined.+ from gettext import gettext as _
Line range hint
2622-2622
: Undefined name_
. Ensure_
is imported or defined.+ from gettext import gettext as _
Line range hint
2630-2630
: Undefined name_
. Ensure_
is imported or defined.+ from gettext import gettext as _
Line range hint
2632-2632
: Undefined name_
. Ensure_
is imported or defined.+ from gettext import gettext as _
Line range hint
2639-2639
: Undefined name_
. Ensure_
is imported or defined.+ from gettext import gettext as _
Line range hint
2641-2641
: Undefined name_
. Ensure_
is imported or defined.+ from gettext import gettext as _
Line range hint
2648-2648
: Undefined name_
. Ensure_
is imported or defined.+ from gettext import gettext as _
source/braille.py (1)
Line range hint
582-582
: Add missing import for_
to handle translations.+ from gettext import gettext as _
source/gui/settingsDialogs.py (6)
Line range hint
4640-4640
: Remove unnecessary f-string.The f-string on this line does not contain any placeholders and can be simplified to a regular string.
- f"Current sizerDict: {self.sizerDict!r}" + "Current sizerDict: {!r}".format(self.sizerDict)
Line range hint
4773-4773
: Ensurepgettext
is defined before usage.The
pgettext
function is used for translations but is not defined or imported in this context. Ensure thatpgettext
is imported from the appropriate module, typicallygettext
.from gettext import pgettext
Line range hint
4952-4953
: Ensure_
is defined before usage.The
_
function is used for translations but is not defined or imported in this context. Ensure that_
is imported from the appropriate module, typicallygettext
.from gettext import gettext as _
Line range hint
79-79
: Remove unused import.The
keyLabels
import is not used anywhere in the file and should be removed to clean up the code.- import keyLabels
Line range hint
108-108
: Avoid multiple statements on one line.Multiple statements on one line can reduce code readability. Consider splitting them into separate lines.
- except RuntimeError: updateCheck = None + except RuntimeError: + updateCheck = None
Line range hint
425-425
: Ensure_
is defined before usage.The
_
function is used for translations but is not defined or imported in this context. Ensure that_
is imported from the appropriate module, typicallygettext
.from gettext import gettext as _
Review Details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (8)
- source/braille.py (2 hunks)
- source/config/configSpec.py (2 hunks)
- source/gui/settingsDialogs.py (4 hunks)
- source/speech/speech.py (4 hunks)
- source/textUtils.py (4 hunks)
- tests/unit/test_textUtils.py (3 hunks)
- user_docs/en/changes.md (1 hunks)
- user_docs/en/userGuide.md (2 hunks)
Additional Context Used
Ruff (547)
source/braille.py (114)
8-8:
os
imported but unused
81-81: Undefined name
_
84-84: Undefined name
_
87-87: Undefined name
_
90-90: Undefined name
_
93-93: Undefined name
_
96-96: Undefined name
_
99-99: Undefined name
_
102-102: Undefined name
_
105-105: Undefined name
_
108-108: Undefined name
_
111-111: Undefined name
_
114-114: Undefined name
_
117-117: Undefined name
_
120-120: Undefined name
_
123-123: Undefined name
_
126-126: Undefined name
_
129-129: Undefined name
_
132-132: Undefined name
_
135-135: Undefined name
_
138-138: Undefined name
_
141-141: Undefined name
_
144-144: Undefined name
_
147-147: Undefined name
_
150-150: Undefined name
_
153-153: Undefined name
_
159-159: Undefined name
_
162-162: Undefined name
_
165-165: Undefined name
_
168-168: Undefined name
_
171-171: Undefined name
_
174-174: Undefined name
_
177-177: Undefined name
_
180-180: Undefined name
_
183-183: Undefined name
_
186-186: Undefined name
_
189-189: Undefined name
_
192-192: Undefined name
_
195-195: Undefined name
_
198-198: Undefined name
_
201-201: Undefined name
_
204-204: Undefined name
_
207-207: Undefined name
_
210-210: Undefined name
_
212-212: Undefined name
_
214-214: Undefined name
_
216-216: Undefined name
_
218-218: Undefined name
_
220-220: Undefined name
_
222-222: Undefined name
_
224-224: Undefined name
_
226-226: Undefined name
_
228-228: Undefined name
_
230-230: Undefined name
_
232-232: Undefined name
_
237-237: Undefined name
_
247-247: Undefined name
_
249-249: Undefined name
_
251-251: Undefined name
_
253-253: Undefined name
_
255-255: Undefined name
_
257-257: Undefined name
_
259-259: Undefined name
_
261-261: Undefined name
_
263-263: Undefined name
_
265-265: Undefined name
_
267-267: Undefined name
_
269-269: Undefined name
_
271-271: Undefined name
_
273-273: Undefined name
_
275-275: Undefined name
_
281-281: Undefined name
_
292-292: Undefined name
pgettext
294-294: Undefined name
pgettext
296-296: Undefined name
pgettext
298-298: Undefined name
pgettext
300-300: Undefined name
pgettext
302-302: Undefined name
pgettext
304-304: Undefined name
pgettext
310-310: Undefined name
_
312-312: Undefined name
_
314-314: Undefined name
_
346-346: Undefined name
_
350-350: Undefined name
_
354-354: Undefined name
_
364-364: Undefined name
_
376-376: Undefined name
_
380-380: Undefined name
_
432-432: Do not use bare
except
582-582: Undefined name
_
592-592: Undefined name
_
630-630: Undefined name
_
636-636: Undefined name
_
681-681: Undefined name
_
685-685: Undefined name
_
691-691: Undefined name
_
695-695: Undefined name
_
705-705: Undefined name
_
709-709: Undefined name
_
918-918: Undefined name
_
1069-1069: Undefined name
_
1082-1082: Undefined name
_
1085-1085: Undefined name
_
1088-1088: Undefined name
_
1096-1096: Undefined name
_
1133-1133: Do not use bare
except
2433-2433: Do not use bare
except
2844-2844: Do not use bare
except
3194-3194: Undefined name
_
3207-3207: Undefined name
_
3217-3217: Undefined name
_
3369-3369: Undefined name
_
3407-3407: Undefined name
_
3414-3414: Undefined name
_
source/gui/settingsDialogs.py (322)
79-79:
keyLabels
imported but unused
108-108: Multiple statements on one line (colon)
425-425: Undefined name
_
437-437: Undefined name
_
494-494: Multiple statements on one line (colon)
545-545: Undefined name
_
771-771: Undefined name
_
775-775: Undefined name
_
777-777: Undefined name
_
779-779: Undefined name
_
781-781: Undefined name
_
783-783: Undefined name
_
804-804: Undefined name
_
810-810: Undefined name
_
825-825: Undefined name
_
833-833: Undefined name
_
839-839: Undefined name
_
847-847: Undefined name
_
864-864: Undefined name
_
876-876: Undefined name
_
886-886: Undefined name
_
902-902: Undefined name
_
911-911: Undefined name
_
920-920: Undefined name
_
933-933: Undefined name
_
942-942: Undefined name
_
949-949: Undefined name
_
952-952: Undefined name
_
962-962: Undefined name
_
964-964: Undefined name
_
969-969: Do not use bare
except
977-977: Undefined name
_
977-977: Undefined name
_
980-980: Undefined name
_
980-980: Undefined name
_
1002-1002: Undefined name
_
1002-1002: Undefined name
_
1024-1024: Undefined name
_
1028-1028: Undefined name
_
1032-1032: Undefined name
_
1037-1037: Undefined name
_
1054-1054: Undefined name
_
1060-1060: Undefined name
_
1082-1082: Undefined name
_
1137-1137: Undefined name
_
1145-1145: Undefined name
_
1163-1163: Do not use bare
except
1518-1518: Undefined name
_
1544-1544: Undefined name
_
1558-1558: Undefined name
_
1569-1569: Undefined name
_
1585-1585: Undefined name
_
1593-1593: Undefined name
_
1603-1603: Undefined name
_
1630-1630: Undefined name
_
1644-1644: Undefined name
_
1655-1655: Undefined name
_
1669-1669: Undefined name
_
1683-1683: Undefined name
_
1697-1697: Undefined name
_
1740-1740: Undefined name
_
1747-1747: Undefined name
_
1762-1762: Undefined name
_
1765-1765: Undefined name
_
1774-1774: Undefined name
_
1781-1781: Undefined name
_
1790-1790: Do not use bare
except
1795-1795: Undefined name
_
1808-1808: Undefined name
_
1818-1818: Undefined name
_
1825-1825: Undefined name
_
1832-1832: Undefined name
_
1839-1839: Undefined name
_
1847-1847: Undefined name
_
1854-1854: Undefined name
_
1861-1861: Undefined name
_
1870-1870: Undefined name
_
1881-1881: Undefined name
_
1884-1884: Undefined name
_
1907-1907: Undefined name
_
1915-1915: Undefined name
_
1922-1922: Undefined name
_
1929-1929: Undefined name
_
1937-1937: Do not use bare
except
1943-1943: Undefined name
_
1952-1952: Undefined name
_
1959-1959: Undefined name
_
1966-1966: Undefined name
_
1985-1985: Undefined name
_
1991-1991: Undefined name
_
1997-1997: Undefined name
_
2003-2003: Undefined name
_
2009-2009: Undefined name
_
2023-2023: Undefined name
_
2029-2029: Undefined name
_
2035-2035: Undefined name
_
2041-2041: Undefined name
_
2050-2050: Undefined name
_
2059-2059: Undefined name
_
2077-2077: Undefined name
_
2085-2085: Undefined name
_
2091-2091: Undefined name
_
2095-2095: Undefined name
_
2099-2099: Undefined name
_
2103-2103: Undefined name
_
2116-2116: Undefined name
_
2123-2123: Undefined name
_
2130-2130: Undefined name
_
2137-2137: Undefined name
_
2144-2144: Undefined name
_
2151-2151: Undefined name
_
2158-2158: Undefined name
_
2171-2171: Undefined name
_
2181-2181: Undefined name
_
2191-2191: Undefined name
_
2213-2213: Undefined name
_
2220-2220: Undefined name
_
2229-2229: Undefined name
_
2238-2238: Undefined name
_
2245-2245: Undefined name
_
2252-2252: Undefined name
_
2259-2259: Undefined name
_
2266-2266: Undefined name
_
2276-2276: Undefined name
_
2286-2286: Undefined name
_
2296-2296: Undefined name
_
2306-2306: Undefined name
_
2336-2336: Undefined name
_
2340-2340: Undefined name
_
2349-2349: Undefined name
_
2357-2357: Undefined name
_
2363-2363: Undefined name
_
2369-2369: Undefined name
_
2375-2375: Undefined name
_
2385-2385: Undefined name
_
2391-2391: Undefined name
_
2401-2401: Undefined name
_
2407-2407: Undefined name
_
2413-2413: Undefined name
_
2421-2421: Undefined name
_
2427-2427: Undefined name
_
2433-2433: Undefined name
_
2439-2439: Undefined name
_
2445-2445: Undefined name
_
2453-2453: Undefined name
_
2459-2459: Undefined name
_
2467-2467: Undefined name
_
2481-2481: Undefined name
_
2493-2493: Undefined name
_
2500-2500: Undefined name
_
2507-2507: Undefined name
_
2513-2513: Undefined name
_
2521-2521: Undefined name
_
2528-2528: Undefined name
_
2536-2536: Undefined name
_
2544-2544: Undefined name
_
2552-2552: Undefined name
_
2560-2560: Undefined name
_
2565-2565: Undefined name
_
2570-2570: Undefined name
_
2575-2575: Undefined name
_
2580-2580: Undefined name
_
2586-2586: Undefined name
_
2592-2592: Undefined name
_
2598-2598: Undefined name
_
2603-2603: Undefined name
_
2609-2609: Undefined name
_
2614-2614: Undefined name
_
2619-2619: Undefined name
_
2673-2673: Undefined name
_
2679-2679: Undefined name
_
2696-2696: Undefined name
_
2699-2699: Undefined name
_
2706-2706: Undefined name
_
2714-2714: Undefined name
_
2719-2719: Undefined name
_
2729-2729: Undefined name
_
2743-2743: Undefined name
_
2751-2751: Undefined name
_
2762-2762: Undefined name
_
2776-2776: Undefined name
_
2797-2797: Undefined name
_
2818-2818: Undefined name
_
2826-2826: Undefined name
_
2888-2888: Undefined name
_
2891-2891: Undefined name
_
2899-2899: Undefined name
_
2912-2912: Undefined name
_
2919-2919: Undefined name
_
2925-2925: Undefined name
_
2937-2937: Undefined name
_
2955-2955: Undefined name
_
2965-2965: Undefined name
_
2996-2996: Undefined name
_
3004-3004: Undefined name
_
3017-3017: Undefined name
_
3027-3027: Undefined name
_
3036-3036: Undefined name
_
3041-3041: Undefined name
_
3045-3045: Undefined name
_
3049-3049: Undefined name
_
3075-3075: Undefined name
pgettext
3083-3083: Undefined name
pgettext
3085-3085: Undefined name
pgettext
3087-3087: Undefined name
pgettext
3089-3089: Undefined name
pgettext
3098-3098: Undefined name
_
3106-3106: Undefined name
_
3111-3111: Undefined name
_
3114-3114: Undefined name
_
3118-3118: Undefined name
_
3137-3137: Undefined name
pgettext
3146-3146: Undefined name
pgettext
3148-3148: Undefined name
pgettext
3150-3150: Undefined name
pgettext
3152-3152: Undefined name
pgettext
3160-3160: Undefined name
_
3171-3171: Undefined name
_
3180-3180: Undefined name
_
3187-3187: Undefined name
_
3196-3196: Undefined name
_
3203-3203: Undefined name
_
3212-3212: Undefined name
_
3219-3219: Undefined name
_
3226-3226: Undefined name
_
3237-3237: Undefined name
_
3242-3242: Undefined name
_
3246-3246: Undefined name
_
3250-3250: Undefined name
_
3270-3270: Undefined name
_
3282-3282: Undefined name
_
3290-3290: Undefined name
_
3293-3293: Undefined name
_
3296-3296: Undefined name
_
3300-3300: Undefined name
_
3316-3316: Undefined name
_
3322-3322: Undefined name
_
3333-3333: Undefined name
_
3340-3340: Undefined name
_
3353-3353: Undefined name
_
3361-3361: Undefined name
_
3374-3374: Undefined name
_
3376-3376: Local variable
audioBox
is assigned to but never used
3381-3381: Undefined name
_
3392-3392: Undefined name
_
3416-3416: Undefined name
_
3434-3434: Undefined name
_
3437-3437: Undefined name
pgettext
3439-3439: Undefined name
pgettext
3448-3448: Undefined name
_
3468-3468: Undefined name
_
3583-3583: Undefined name
_
3588-3588: Undefined name
_
3590-3590: Undefined name
_
3617-3617: Undefined name
_
3630-3630: Undefined name
_
3670-3670: Undefined name
_
3677-3677: Undefined name
_
3692-3692: Undefined name
_
3746-3746: Undefined name
_
3755-3755: Undefined name
_
3761-3761: Undefined name
_
3770-3770: Undefined name
_
3804-3804: Do not use bare
except
3875-3875: Undefined name
_
3878-3878: Undefined name
_
3912-3912: Undefined name
_
3921-3921: Do not use bare
except
3930-3930: Undefined name
_
3938-3938: Do not use bare
except
3946-3946: Undefined name
_
3962-3962: Undefined name
_
3970-3970: Undefined name
_
3979-3979: Undefined name
_
3990-3990: Undefined name
_
4010-4010: Undefined name
_
4020-4020: Do not use bare
except
4026-4026: Undefined name
_
4036-4036: Do not use bare
except
4045-4045: Undefined name
_
4063-4063: Undefined name
_
4079-4079: Undefined name
_
4098-4098: Undefined name
_
4116-4116: Undefined name
_
4124-4124: Undefined name
_
4135-4135: Do not use bare
except
4140-4140: Undefined name
_
4155-4155: Undefined name
_
4161-4161: Undefined name
_
4172-4172: Undefined name
_
4246-4246: Undefined name
_
4253-4253: Undefined name
_
4259-4259: Undefined name
_
4276-4276: Undefined name
_
4281-4281: Undefined name
_
4290-4290: Undefined name
_
4380-4380: Undefined name
_
4384-4384: Undefined name
_
4557-4557: Undefined name
_
4564-4564: Undefined name
_
4610-4610: Undefined name
_
4613-4613: Undefined name
_
4640-4640: f-string without any placeholders
4653-4653: Undefined name
_
4690-4690: Undefined name
_
4726-4726: Undefined name
_
4731-4731: Undefined name
_
4759-4759: Undefined name
_
4773-4773: Undefined name
pgettext
4782-4782: Undefined name
_
4792-4792: Undefined name
_
4794-4794: Undefined name
_
4796-4796: Undefined name
_
4799-4799: Undefined name
_
4803-4803: Undefined name
_
4820-4820: Undefined name
_
4829-4829: Undefined name
_
4836-4836: Undefined name
_
4844-4844: Undefined name
_
4847-4847: Undefined name
_
4952-4952: Undefined name
_
4953-4953: Undefined name
_
source/speech/speech.py (111)
31-31: Redefinition of unused
speechCanceled
from line 30
95-95: Undefined name
pgettext
97-97: Undefined name
pgettext
99-99: Undefined name
pgettext
102-102: Undefined name
pgettext
204-204: Undefined name
_
345-345: Undefined name
_
385-385: Undefined name
_
923-923: Undefined name
_
1013-1013: Multiple statements on one line (colon)
1018-1018: Multiple statements on one line (colon)
1084-1084: Undefined name
_
1103-1103: Undefined name
_
1128-1128: Undefined name
ngettext
1202-1202: Undefined name
_
1209-1209: Undefined name
_
1212-1212: Undefined name
_
1672-1672: Multiple statements on one line (colon)
1707-1707: Undefined name
_
1876-1876: Undefined name
_
1880-1880: Undefined name
_
1894-1894: Undefined name
_
1898-1898: Undefined name
_
1905-1905: Undefined name
_
1929-1929: Undefined name
_
1934-1934: Undefined name
_
1939-1939: Undefined name
_
1951-1951: Undefined name
_
1961-1961: Undefined name
_
1990-1990: Undefined name
_
1996-1996: Undefined name
ngettext
1999-1999: Undefined name
ngettext
2004-2004: Undefined name
ngettext
2014-2014: Undefined name
ngettext
2050-2050: Test for membership should be
not in
2207-2207: Undefined name
ngettext
2349-2349: Undefined name
_
2415-2415: Undefined name
_
2422-2422: Undefined name
_
2436-2436: Comparison to
None
should becond is None
2441-2441: Undefined name
_
2446-2446: Undefined name
ngettext
2450-2450: Undefined name
_
2456-2456: Undefined name
_
2458-2458: Undefined name
_
2460-2460: Undefined name
_
2462-2462: Undefined name
_
2464-2464: Undefined name
_
2470-2470: Undefined name
_
2488-2488: Undefined name
_
2498-2498: Undefined name
_
2502-2502: Undefined name
_
2512-2512: Undefined name
_
2543-2543: Undefined name
_
2548-2548: Undefined name
_
2554-2554: Undefined name
_
2558-2558: Undefined name
_
2565-2565: Undefined name
_
2566-2566: Undefined name
_
2573-2573: Undefined name
_
2581-2581: Undefined name
_
2583-2583: Undefined name
_
2589-2589: Undefined name
_
2591-2591: Undefined name
_
2598-2598: Undefined name
_
2601-2601: Undefined name
_
2609-2609: Undefined name
_
2611-2611: Undefined name
_
2620-2620: Undefined name
_
2622-2622: Undefined name
_
2630-2630: Undefined name
_
2632-2632: Undefined name
_
2639-2639: Undefined name
_
2641-2641: Undefined name
_
2648-2648: Undefined name
_
2650-2650: Undefined name
_
2656-2656: Undefined name
_
2658-2658: Undefined name
_
2666-2666: Undefined name
_
2669-2669: Undefined name
_
2673-2673: Undefined name
_
2679-2679: Undefined name
_
2681-2681: Undefined name
_
2688-2688: Undefined name
_
2690-2690: Undefined name
_
2721-2721: Undefined name
_
2723-2723: Undefined name
_
2727-2727: Undefined name
_
2729-2729: Undefined name
_
2733-2733: Undefined name
_
2735-2735: Undefined name
_
2739-2739: Undefined name
_
2741-2741: Undefined name
_
2757-2757: Undefined name
_
2762-2762: Undefined name
_
2762-2762: Undefined name
_
2762-2762: Undefined name
_
2771-2771: Undefined name
_
2774-2774: Undefined name
_
2777-2777: Undefined name
_
2781-2781: Undefined name
_
2789-2789: Undefined name
_
2793-2793: Undefined name
_
2801-2801: Undefined name
_
2804-2804: Undefined name
_
2814-2814: Undefined name
_
2817-2817: Undefined name
_
2848-2848: Undefined name
_
2863-2863: Undefined name
_
2871-2871: Undefined name
_
2875-2875: Undefined name
_
Path-based Instructions (8)
source/config/configSpec.py (1)
Pattern
**/*
: focus on code smells, logic errors, edge cases, and serious issues.
Avoid minor issues such as linting, formatting and style issues.tests/unit/test_textUtils.py (1)
Pattern
**/*
: focus on code smells, logic errors, edge cases, and serious issues.
Avoid minor issues such as linting, formatting and style issues.source/textUtils.py (1)
Pattern
**/*
: focus on code smells, logic errors, edge cases, and serious issues.
Avoid minor issues such as linting, formatting and style issues.source/speech/speech.py (1)
Pattern
**/*
: focus on code smells, logic errors, edge cases, and serious issues.
Avoid minor issues such as linting, formatting and style issues.source/braille.py (1)
Pattern
**/*
: focus on code smells, logic errors, edge cases, and serious issues.
Avoid minor issues such as linting, formatting and style issues.source/gui/settingsDialogs.py (1)
Pattern
**/*
: focus on code smells, logic errors, edge cases, and serious issues.
Avoid minor issues such as linting, formatting and style issues.user_docs/en/userGuide.md (1)
Pattern
**/*
: focus on code smells, logic errors, edge cases, and serious issues.
Avoid minor issues such as linting, formatting and style issues.user_docs/en/changes.md (1)
Pattern
**/*
: focus on code smells, logic errors, edge cases, and serious issues.
Avoid minor issues such as linting, formatting and style issues.
Additional comments not posted (21)
source/config/configSpec.py (2)
38-38
: Ensure theunicodeNormalization
feature flag is correctly integrated with the speech processing logic.Verification successful
The
unicodeNormalization
feature flag is correctly integrated with the speech processing logic insource/speech/speech.py
. The flag is used to conditionally apply theunicodeNormalize
function to commands and text lists, ensuring that Unicode normalization is performed when the flag is enabled.
source/speech/speech.py
, lines aroundif config.conf["speech"]["unicodeNormalization"]
:
- The flag is checked before applying
unicodeNormalize
tocommand
.- The flag is checked before applying
unicodeNormalize
to elements intextList
.This confirms that the feature flag is properly utilized in the speech processing logic.
Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the integration of the `unicodeNormalization` feature flag with the speech processing logic. # Test: Search for the usage of the `unicodeNormalization` feature flag in the speech processing code. rg --type python 'config\.conf\["speech"\]\["unicodeNormalization"\]'Length of output: 102
Script:
#!/bin/bash # Description: Verify the integration of the `unicodeNormalization` feature flag with the speech processing logic. # Test: Search for the usage of the `unicodeNormalization` feature flag in the speech processing code. rg --type py 'config\.conf\["speech"\]\["unicodeNormalization"\]'Length of output: 221
Script:
#!/bin/bash # Description: Extract the relevant lines from `source/speech/speech.py` where `unicodeNormalization` is used. # Test: Display the lines around the usage of the `unicodeNormalization` feature flag in the speech processing code. rg --type py --context 5 'config\.conf\["speech"\]\["unicodeNormalization"\]' source/speech/speech.pyLength of output: 846
86-86
: Ensure theunicodeNormalization
feature flag is correctly integrated with the braille processing logic.Verification successful
unicodeNormalization
feature flag is correctly integrated with the braille processing logicThe
unicodeNormalization
feature flag is used in the braille processing code as expected.
- Location:
source/braille.py
:if config.conf["braille"]["unicodeNormalization"] and not isUnicodeNormalized(self.rawText):
Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the integration of the `unicodeNormalization` feature flag with the braille processing logic. # Test: Search for the usage of the `unicodeNormalization` feature flag in the braille processing code. rg --type python 'config\.conf\["braille"\]\["unicodeNormalization"\]'Length of output: 103
Script:
#!/bin/bash # Description: Verify the integration of the `unicodeNormalization` feature flag with the braille processing logic. # Test: Search for the usage of the `unicodeNormalization` feature flag in the braille processing code. rg --type-add 'python:*.py' 'config\.conf\["braille"\]\["unicodeNormalization"\]'Length of output: 196
tests/unit/test_textUtils.py (4)
10-10
: The import statement forUnicodeNormalizationOffsetConverter
is correct.
235-237
: TheTestUnicodeNormalizationOffsetConverter
class is well-defined and appropriately named.
238-255
: Thetest_normalizedOfssetsSentence
method is comprehensive and covers various normalization scenarios.
256-270
: Thetest_normalizedOfssetsMixed
andtest_normalizedOfssetsDifferentOrder
methods are well-implemented and cover edge cases effectively.source/textUtils.py (9)
13-18
: New imports for Unicode normalization functionality.
420-420
: Setting the default Unicode normalization algorithm to "NFKC" is appropriate.
435-439
: The__init__
method correctly initializes theUnicodeNormalizationOffsetConverter
class and calculates offsets.
517-520
: TheencodedStringLength
property correctly returns the length of the normalized string.
522-539
: ThestrToEncodedOffsets
method correctly converts string offsets to normalized string offsets using precomputed values.
541-558
: TheencodedToStrOffsets
method correctly converts normalized string offsets to original string offsets using precomputed values.
561-563
: TheisUnicodeNormalized
function correctly wrapsunicodedata.is_normalized
with a default normalization form.
566-568
: TheunicodeNormalize
function correctly wrapsunicodedata.normalize
with a default normalization form.
570-573
: TheENCODINGS_TO_CONVERTERS
dictionary andgetOffsetConverter
function are correctly implemented to handle different encodings.source/speech/speech.py (1)
1572-1573
: Ensure Unicode normalization is applied only when the configuration is enabled.Please verify that
config.conf["speech"]["unicodeNormalization"]
is correctly set and used.source/braille.py (4)
500-510
: Ensureconverter
is correctly initialized and used for Unicode normalization.The logic for initializing and using the
UnicodeNormalizationOffsetConverter
looks correct. The normalization is applied conditionally based on the configuration.
514-519
: EnsurelouisHelper.translate
parameters are correctly set.The parameters for
louisHelper.translate
are correctly set, and the cursor position is handled appropriately.
521-527
: Ensure braille and raw text positions are correctly processed using normalization offsets.The logic for processing and mapping braille and raw text positions using normalization offsets is correctly implemented.
528-529
: EnsurebrailleToRawPos
andrawToBraillePos
lists are correctly updated.The lists
brailleToRawPos
andrawToBraillePos
are correctly updated based on the normalization offsets.user_docs/en/userGuide.md (1)
2080-2080
: Ensure the link fragment is valid and points to the correct section.Please verify that the link fragment
#SpeechUnicodeNormalization
points to the correct section in the document.
@@ -1806,6 +1806,30 @@ This option applies to all synthesizers, not just the currently active synthesiz | |||
On by default, this option tells NVDA if the current voice's language can be trusted when processing symbols and characters. | |||
If you find that NVDA is reading punctuation in the wrong language for a particular synthesizer or voice, you may wish to turn this off to force NVDA to use its global language setting instead. | |||
|
|||
##### Unicode normalization {#SpeechUnicodeNormalization} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a blank line before the heading for better readability.
+
##### Unicode normalization {#SpeechUnicodeNormalization}
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
##### Unicode normalization {#SpeechUnicodeNormalization} | |
##### Unicode normalization {#SpeechUnicodeNormalization} |
source/textUtils.py
Outdated
def _calculateOffsets(self) -> tuple[tuple[int], tuple[int]]: | ||
diff = list(ndiff(self.decoded, self.encoded)) | ||
diff.append("!") # Closing the diff | ||
iOrigin = iNormalized = 0 | ||
originBuffer = "" | ||
normalizedBuffer = "" | ||
originToNormalizedDict = defaultdict(list) | ||
normalizedToOriginDict = defaultdict(list) | ||
originPending = normalizedPending = False | ||
for char in diff: | ||
if char[0] == "?": | ||
raise RuntimeError("Unexpected entry in diff") | ||
elif char[0] == "-": | ||
originBuffer += char[2:] | ||
originPending = True | ||
elif char[0] == "+": | ||
normalizedBuffer += char[2:] | ||
normalizedPending = True | ||
elif char[0] == " " and ( | ||
(not originPending and normalizedPending) or (originPending and not normalizedPending) | ||
): | ||
originBuffer += char[2:] | ||
normalizedBuffer += char[2:] | ||
else: | ||
while originBuffer and normalizedBuffer: | ||
originPart = "" | ||
originPartLen = 0 | ||
normalizedPart = "" | ||
normalizedPartLen = 0 | ||
for i in range(len(originBuffer)): | ||
originPart = originBuffer[: (i + 1)] | ||
normalizedPart = unicodedata.normalize(self.normalizationForm, originPart) | ||
if ( | ||
originPart == normalizedPart | ||
or not normalizedBuffer.startswith(normalizedPart) | ||
): | ||
continue | ||
originPartLen = len(originPart) | ||
originBuffer = originBuffer[originPartLen:] | ||
normalizedPartLen = len(normalizedPart) | ||
normalizedBuffer = normalizedBuffer[normalizedPartLen:] | ||
break | ||
originMultiplier = min(originPartLen / normalizedPartLen, 1) | ||
normalizedMultiplier = min(normalizedPartLen / originPartLen, 1) | ||
for i in range(max(originPartLen, normalizedPartLen)): | ||
tempOrigin = iOrigin + int(i * originMultiplier) | ||
tempNormalized = iNormalized + int(i * normalizedMultiplier) | ||
originC = originPart[i] if i < originPartLen else None | ||
if originC: | ||
normalizedIndex = normalizedPart.find(originC) | ||
if normalizedIndex != -1: | ||
tempNormalized = iNormalized + normalizedIndex | ||
normalizedC = normalizedPart[i] if i < normalizedPartLen else None | ||
if normalizedC: | ||
originIndex = originPart.find(normalizedC) | ||
if originIndex != -1: | ||
tempOrigin = iOrigin + originIndex | ||
originToNormalizedDict[tempOrigin].append(tempNormalized) | ||
normalizedToOriginDict[tempNormalized].append(tempOrigin) | ||
iOrigin += originPartLen | ||
iNormalized += normalizedPartLen | ||
originPending = normalizedPending = False | ||
if char[0] == " ": | ||
originToNormalizedDict[iOrigin].append(iNormalized) | ||
normalizedToOriginDict[iNormalized].append(iOrigin) | ||
iOrigin += 1 | ||
iNormalized += 1 | ||
originResult = tuple(map(min, originToNormalizedDict.values())) | ||
assert len(originResult) == len(self.decoded) | ||
normalizedResult = tuple(map(min, normalizedToOriginDict.values())) | ||
assert len(normalizedResult) == len(self.encoded) | ||
return tuple(( | ||
originResult, | ||
normalizedResult | ||
)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _calculateOffsets
method correctly calculates offsets between the original and normalized strings. Consider adding comments to explain the logic for better readability.
def _calculateOffsets(self) -> tuple[tuple[int], tuple[int]]:
diff = list(ndiff(self.decoded, self.encoded))
diff.append("!") # Closing the diff
iOrigin = iNormalized = 0
originBuffer = ""
normalizedBuffer = ""
originToNormalizedDict = defaultdict(list)
normalizedToOriginDict = defaultdict(list)
originPending = normalizedPending = False
for char in diff:
if char[0] == "?":
raise RuntimeError("Unexpected entry in diff")
elif char[0] == "-":
originBuffer += char[2:]
originPending = True
elif char[0] == "+":
normalizedBuffer += char[2:]
normalizedPending = True
elif char[0] == " " and (
(not originPending and normalizedPending) or (originPending and not normalizedPending)
):
originBuffer += char[2:]
normalizedBuffer += char[2:]
else:
while originBuffer and normalizedBuffer:
originPart = ""
originPartLen = 0
normalizedPart = ""
normalizedPartLen = 0
for i in range(len(originBuffer)):
originPart = originBuffer[: (i + 1)]
normalizedPart = unicodedata.normalize(self.normalizationForm, originPart)
if (
originPart == normalizedPart
or not normalizedBuffer.startswith(normalizedPart)
):
continue
originPartLen = len(originPart)
originBuffer = originBuffer[originPartLen:]
normalizedPartLen = len(normalizedPart)
normalizedBuffer = normalizedBuffer[normalizedPartLen:]
break
originMultiplier = min(originPartLen / normalizedPartLen, 1)
normalizedMultiplier = min(normalizedPartLen / originPartLen, 1)
for i in range(max(originPartLen, normalizedPartLen)):
tempOrigin = iOrigin + int(i * originMultiplier)
tempNormalized = iNormalized + int(i * normalizedMultiplier)
originC = originPart[i] if i < originPartLen else None
if originC:
normalizedIndex = normalizedPart.find(originC)
if normalizedIndex != -1:
tempNormalized = iNormalized + normalizedIndex
normalizedC = normalizedPart[i] if i < normalizedPartLen else None
if normalizedC:
originIndex = originPart.find(normalizedC)
if originIndex != -1:
tempOrigin = iOrigin + originIndex
originToNormalizedDict[tempOrigin].append(tempNormalized)
normalizedToOriginDict[tempNormalized].append(tempOrigin)
iOrigin += originPartLen
iNormalized += normalizedPartLen
originPending = normalizedPending = False
if char[0] == " ":
originToNormalizedDict[iOrigin].append(iNormalized)
normalizedToOriginDict[iNormalized].append(iOrigin)
iOrigin += 1
iNormalized += 1
originResult = tuple(map(min, originToNormalizedDict.values()))
assert len(originResult) == len(self.decoded)
normalizedResult = tuple(map(min, normalizedToOriginDict.values()))
assert len(normalizedResult) == len(self.encoded)
return tuple((
originResult,
normalizedResult
))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that _calculateOffsets
needs more commenting. @coderabbitai Could you please generate a suggestion with commenting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@seanbudd, here's the _calculateOffsets
method with added comments to enhance readability and understanding:
def _calculateOffsets(self) -> tuple[tuple[int], tuple[int]]:
# Initialize a diff list between the decoded original and the normalized strings
diff = list(ndiff(self.decoded, self.encoded))
diff.append("!") # Append a closing marker to the diff list
# Initialize indices and buffers for tracking positions and changes
iOrigin = iNormalized = 0
originBuffer = ""
normalizedBuffer = ""
originToNormalizedDict = defaultdict(list)
normalizedToOriginDict = defaultdict(list)
originPending = normalizedPending = False
# Iterate over each character in the diff list
for char in diff:
if char[0] == "?":
raise RuntimeError("Unexpected entry in diff")
elif char[0] == "-":
# Accumulate deleted characters in the origin buffer
originBuffer += char[2:]
originPending = True
elif char[0] == "+":
# Accumulate added characters in the normalized buffer
normalizedBuffer += char[2:]
normalizedPending = True
elif char[0] == " " and (
(not originPending and normalizedPending) or (originPending and not normalizedPending)
):
# Accumulate unchanged characters in both buffers
originBuffer += char[2:]
normalizedBuffer += char[2:]
else:
# Process accumulated characters in the buffers
while originBuffer and normalizedBuffer:
originPart = ""
originPartLen = 0
normalizedPart = ""
normalizedPartLen = 0
# Find the smallest part that can be normalized and still match the beginning of the normalized buffer
for i in range(len(originBuffer)):
originPart = originBuffer[: (i + 1)]
normalizedPart = unicodedata.normalize(self.normalizationForm, originPart)
if (
originPart == normalizedPart
or not normalizedBuffer.startswith(normalizedPart)
):
continue
originPartLen = len(originPart)
originBuffer = originBuffer[originPartLen:]
normalizedPartLen = len(normalizedPart)
normalizedBuffer = normalizedBuffer[normalizedPartLen:]
break
# Map the original indices to the normalized indices
originMultiplier = min(originPartLen / normalizedPartLen, 1)
normalizedMultiplier = min(normalizedPartLen / originPartLen, 1)
for i in range(max(originPartLen, normalizedPartLen)):
tempOrigin = iOrigin + int(i * originMultiplier)
tempNormalized = iNormalized + int(i * normalizedMultiplier)
originC = originPart[i] if i < originPartLen else None
if originC:
normalizedIndex = normalizedPart.find(originC)
if normalizedIndex != -1:
tempNormalized = iNormalized + normalizedIndex
normalizedC = normalizedPart[i] if i < normalizedPartLen else None
if normalizedC:
originIndex = originPart.find(normalizedC)
if originIndex != -1:
tempOrigin = iOrigin + originIndex
originToNormalizedDict[tempOrigin].append(tempNormalized)
normalizedToOriginDict[tempNormalized].append(tempOrigin)
iOrigin += originPartLen
iNormalized += normalizedPartLen
originPending = normalizedPending = False
if char[0] == " ":
# Map indices directly for unchanged characters
originToNormalizedDict[iOrigin].append(iNormalized)
normalizedToOriginDict[iNormalized].append(iOrigin)
iOrigin += 1
iNormalized += 1
# Finalize the mapping by selecting the minimum index for each original position
originResult = tuple(map(min, originToNormalizedDict.values()))
assert len(originResult) == len(self.decoded)
normalizedResult = tuple(map(min, normalizedToOriginDict.values()))
assert len(normalizedResult) == len(self.encoded)
return tuple((
originResult,
normalizedResult
))
This version includes detailed comments that explain each part of the method, making it easier to understand how the offsets are calculated between the original and normalized strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LeonarddeR - can you add comments? When that's done and #16521 (comment) is addressed I think this is ready
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@coderabbitai I have just reworded some of your comments in a new commit. DO you think its appropriate now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LeonarddeR, I have reviewed the reworded comments in the _calculateOffsets
method. The comments now provide clear explanations for each part of the method, enhancing readability and understanding. The logic and functionality of the method are well-documented, making it easier for future developers to follow the code.
The comments are appropriate and effectively describe the purpose and process of each section of the method. Great job on the rewording!
If there are any further adjustments or additional comments needed, feel free to let me know.
Analysis chain
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Fetch the latest version of the `_calculateOffsets` method from the `source/textUtils.py` file
cat source/textUtils.py | sed -n '/def _calculateOffsets/,/return tuple((/p'
Length of output: 4735
New Learnings Added
User: LeonarddeR
PR: nvaccess/nvda#16521
File: source/textUtils.py:0-0
Timestamp: 2024-05-18T11:21:34.925Z
Learning: LeonarddeR reworded the comments in the `_calculateOffsets` method to enhance readability and understanding. The comments now provide clear explanations for each part of the method.
Note: Learnings are effective only in the context of specific code segments. To apply general coding guidelines, please configure review instructions.
It looks mainly as noise to me honestly |
source/textUtils.py
Outdated
def _calculateOffsets(self) -> tuple[tuple[int], tuple[int]]: | ||
diff = list(ndiff(self.decoded, self.encoded)) | ||
diff.append("!") # Closing the diff | ||
iOrigin = iNormalized = 0 | ||
originBuffer = "" | ||
normalizedBuffer = "" | ||
originToNormalizedDict = defaultdict(list) | ||
normalizedToOriginDict = defaultdict(list) | ||
originPending = normalizedPending = False | ||
for char in diff: | ||
if char[0] == "?": | ||
raise RuntimeError("Unexpected entry in diff") | ||
elif char[0] == "-": | ||
originBuffer += char[2:] | ||
originPending = True | ||
elif char[0] == "+": | ||
normalizedBuffer += char[2:] | ||
normalizedPending = True | ||
elif char[0] == " " and ( | ||
(not originPending and normalizedPending) or (originPending and not normalizedPending) | ||
): | ||
originBuffer += char[2:] | ||
normalizedBuffer += char[2:] | ||
else: | ||
while originBuffer and normalizedBuffer: | ||
originPart = "" | ||
originPartLen = 0 | ||
normalizedPart = "" | ||
normalizedPartLen = 0 | ||
for i in range(len(originBuffer)): | ||
originPart = originBuffer[: (i + 1)] | ||
normalizedPart = unicodedata.normalize(self.normalizationForm, originPart) | ||
if ( | ||
originPart == normalizedPart | ||
or not normalizedBuffer.startswith(normalizedPart) | ||
): | ||
continue | ||
originPartLen = len(originPart) | ||
originBuffer = originBuffer[originPartLen:] | ||
normalizedPartLen = len(normalizedPart) | ||
normalizedBuffer = normalizedBuffer[normalizedPartLen:] | ||
break | ||
originMultiplier = min(originPartLen / normalizedPartLen, 1) | ||
normalizedMultiplier = min(normalizedPartLen / originPartLen, 1) | ||
for i in range(max(originPartLen, normalizedPartLen)): | ||
tempOrigin = iOrigin + int(i * originMultiplier) | ||
tempNormalized = iNormalized + int(i * normalizedMultiplier) | ||
originC = originPart[i] if i < originPartLen else None | ||
if originC: | ||
normalizedIndex = normalizedPart.find(originC) | ||
if normalizedIndex != -1: | ||
tempNormalized = iNormalized + normalizedIndex | ||
normalizedC = normalizedPart[i] if i < normalizedPartLen else None | ||
if normalizedC: | ||
originIndex = originPart.find(normalizedC) | ||
if originIndex != -1: | ||
tempOrigin = iOrigin + originIndex | ||
originToNormalizedDict[tempOrigin].append(tempNormalized) | ||
normalizedToOriginDict[tempNormalized].append(tempOrigin) | ||
iOrigin += originPartLen | ||
iNormalized += normalizedPartLen | ||
originPending = normalizedPending = False | ||
if char[0] == " ": | ||
originToNormalizedDict[iOrigin].append(iNormalized) | ||
normalizedToOriginDict[iNormalized].append(iOrigin) | ||
iOrigin += 1 | ||
iNormalized += 1 | ||
originResult = tuple(map(min, originToNormalizedDict.values())) | ||
assert len(originResult) == len(self.decoded) | ||
normalizedResult = tuple(map(min, normalizedToOriginDict.values())) | ||
assert len(normalizedResult) == len(self.encoded) | ||
return tuple(( | ||
originResult, | ||
normalizedResult | ||
)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that _calculateOffsets
needs more commenting. @coderabbitai Could you please generate a suggestion with commenting?
assert len(originResult) == len(self.decoded) | ||
normalizedResult = tuple(map(min, normalizedToOriginDict.values())) | ||
assert len(normalizedResult) == len(self.encoded) | ||
return tuple(( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this a tuple as is?
return tuple(( | |
return ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are right. I personally notice that I prefer the more explicit syntax because only brackets can also produce a generator if there is a list comprehension between the brackets. But as far as I'm concerned, the final judgment is up to you.
@coderabbitai what do you think about the suggestion made here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks Leonard!
I am getting empty braille line when reading given message in thunderbird. I encountered likely problem earlier as well with squirrelmail (web mail). This happens when braille is tethered to review and unicode normalization is enabled for braille. Here are log lines (running from source with current main branch code): Input: kb(laptop):downArrow |
Thanks for reporting. Easily reproducible on the python console: |
Hi, |
We only normalize objects and TextInfo, not messages. This can be
changed if we agree that moving by character should report normalized
characters instead, because then it is much easier to apply
normalization over every speech sequence, including character by
character navigation. However I'm still reluctant to do so.
|
@LeonarddeR Can you edit the PR description to add that it closes #4631 and #11570, in case this gets reverted? |
These are not yet closed. #16584 will probably close them. |
Fixup of #16521 Fixes #11570 Partial fix for #4631 Summary of the issue: It turns out that rawTextTypeforms on a region may be None, this was an oversight on my end. cursorPos may also be None. @burmancomp reported a zero division error in case a string ended with a non breaking space and a space. Description of user facing changes No longer errors in the log when getting flash messages in Thunderbird and/or reading messages in WhatsApp UWP. Description of development approach Explicitly check for None typeforms and cursorPos, thereby improving readability as well. Improve the calculateOffsets method in textUtils to ensure it can handle the case as reported by @burmancomp
…ch panel (#16810) Fix-up of #16521. Summary of the issue: When opening the speech settings panel, the checkbox "Report 'Normalized' when navigating by character" is enabled, no matter the value of the "Unicode normalization" combo-box. Description of user facing changes When opening the speech settings panel, the checkbox "Report 'Normalized' when navigating by character" will be enabled or disabled (greyed out), depending on the value of the "Unicode normalization" combo-box, as it is already the case when the selection of this combo-box is modified. Description of development approach As done for "Ignore blank lines for line indentation reporting" checkbox in Document formatting settings, explicitly call the .Enable method when the panel is initialized.
What form of unicode normalization is used? Both NFKC and NFKD causes a big issue with vocalized (aka diacritized) arabic text. Best |
@mush42 In NVDA< we're using the NFKC algorithm. |
Link to issue number:
Closes #16466
Summary of the issue:
Several speech synthesizers and braille tables are unable to speak or braille some characters, such as ligatures ("ij") or decomposed characters (latin letters with a modifier to add acute, diaeresis, etc.). Also, italic or bold Unicode characters can't be spoken or brailled by default.
Description of user facing changes
None by default. If unicode normalization is enabled for speech, speech output for objects and text navigation is normalized. For braille, normalization is applied for all braille output.
The reason for speech to apply normalization only for objects and text navigation is chosen on purpose, as for individual character navigation or text selection, we really want to pass the original character to the synthesizer. If we don't Unicode bold and italic characters are read as their normalized counterparts, which makes it impossible to distinguish them. This problem is less relevant when working with braille.
Description of development approach
UnicodeNormalizationOffsetConverter
totextUtils
with tests. It stores the normalized version of a given string and based on diffing, calculates offset mappings between the original and normalized strings.UnicodeNormalizationOffsetConverter
when translating braille with normalization on. The several mappings (braille to raw pos, raw to braille pos) are processed to account for normalization.getTextInfoSpeech
andgetPropertiesSpeech
.Testing strategy:
Known issues with pull request:
None known.
Code Review Checklist:
Summary by CodeRabbit
New Features
Documentation