Use locale.getpreferredencoding rather than locale.getlocale as latter can fail if Python doesn't know the locale #11384

lukaszgo1 · 2020-07-15T20:03:34Z

Link to issue number:

Summary of the issue:

Whenever we needed to access the ANSI Windows code page we were using locale.getlocale. This was problematic for two reasons:

`locale.getlocale`` returns the Python locale which we're changing when setting NVDA language so returned code page may not correspond to the user code page for example when someone's system is in Polish but NVDA in English.
More importantly Python, or rather the win32 function which Python invokes under the hood, cannot determine what code page should be used for some locales e.g. Aragonese.

Description of how this pull request fixes the issue:

In places where locale.getlocale was formerly used I've switched to locale.getpreferredencoding as this method returns ANSI code page of the current Windows user.

Testing performed:

Switched NVDA language to Aragonese, navigated in Notepad, run dialog and cmd - previously navigation was not possible due to the fact that AN was not recognized as a valid locale by Python.

Known issues with pull request:

I don't know why @LeonarddeR have chosen to use locale.getlocale in textUtils module to deal with offset differences between Python 3 strings and Windows wide character strings with surrogate characters #9545 so it is possible that it breaks something which I haven't considered.
Accepting bug-fix / maintenance PR's only while addressing backlog #11006 - I believe fixing inability to navigate in edit fields for some locales which additionally is a regression from Py3 migration can be classified as important bug fix.

Change log entry:

Bug fixes

It is once again possible to navigate in various controls when NVDA is set to Aragonese.

…r can fail if Python doesn't know the locale Fixes nvaccess#11155

michaelDCurran · 2020-07-15T21:59:25Z

I'm not sure that locale.getpreferredencoding actually honours the locale NVDA is set to, rather it only honours the locale the Windows user account is configured for.
At least, for me:
On my system:
locale.getlocale returns:
('English_United Kingdom', '1252')
locale.getpreferredencoding returns:
'cp1252'
Which is fine. But:
Then if I call
locale.setlocale(locale.LC_ALL, 'zh')
I get:
locale.getlocale returns:
('zh_CN', 'eucCN')
locale.getpreferredencoding returns:
'cp1252'
This last one looks wrong to me.

Perhaps we can catch the exception from locale.getlocale and fall back to locale.getpreferredencoding?

LeonarddeR · 2020-07-16T06:05:21Z

I actually think this pr is right, and @michaelDCurran's concern doesn't apply here.

First of all, the only reason why i decided to use locale.getlocale here was that it was used all over the place in the code pre textUtils module to deal with offset differences between Python 3 strings and Windows wide character strings with surrogate characters #9545, so it was merely copying existing behaviour. The reason why this fails in Python 3 versions of NVDA is that this code is used more widely now and that we're fetching the locale more often.
locale.getlocale returns locale and codepage of NVDA while locale.getpreferredencoding returns the code page of the system. I believe that we really should use the code page of the system in this code, which can be fetched with locale.getpreferredencoding.

LeonarddeR

Note that this pr doesn't fix the issue that locale.getlocale fails when NVDA is set to Aragonese. I think that's still a thing that should be addressed in a follow up. May be we should file this against Python or something.

source/textUtils.py

LeonarddeR · 2020-07-16T06:09:08Z

source/textUtils.py

@@ -18,6 +17,8 @@
 from logHandler import log

 WCHAR_ENCODING = "utf_16_le"
+USERANSICODEPAGE = locale.getpreferredencoding()


Please follow the style used for constants here:

Suggested change

USERANSICODEPAGE = locale.getpreferredencoding()

USER_ANSI_CODE_PAGE = locale.getpreferredencoding()

Or rather

Suggested change

USERANSICODEPAGE = locale.getpreferredencoding()

SYSTEM_PREFERRED_ENCODING = locale.getpreferredencoding()

I've renamed it to USER_ANSI_CODE_PAGE because it can be set per Windows user account and is not global to the system.

source/textInfos/offsets.py

josephsl · 2020-07-16T06:21:28Z

Note: potentially marking this for further investigation once we move to Python 3.8 and later. Thanks.

lukaszgo1 · 2020-07-16T09:00:26Z

@LeonarddeR wrote:

Note that this pr doesn't fix the issue that locale.getlocale fails when NVDA is set to Aragonese. I think that's still a thing that should be addressed in a follow up. May be we should file this against Python or something.

I don't think there is much we can do about that - an as a locale code is simply not known to Windows that's why Python fails.

lukaszgo1 · 2020-07-16T09:38:47Z

@LeonarddeR All your commends are now addressed.

michaelDCurran · 2020-07-16T10:28:43Z

If the aim is to get the codepage of the system, then I am happy with this pr.

Use locale.getpreferredencoding rather than locale.getlocale as latte…

ab5f1d6

…r can fail if Python doesn't know the locale Fixes nvaccess#11155

LeonarddeR suggested changes Jul 16, 2020

View reviewed changes

Merge branch 'master' into I11155

5ccac96

Review actions

078ed89

LeonarddeR approved these changes Jul 16, 2020

View reviewed changes

michaelDCurran merged commit 41970ca into nvaccess:master Jul 20, 2020

nvaccessAuto added this to the 2020.3 milestone Jul 20, 2020

michaelDCurran added a commit that referenced this pull request Jul 20, 2020

Update what's new for pr #11384

74bd9b8

lukaszgo1 deleted the I11155 branch July 20, 2020 09:00

josephsl mentioned this pull request Sep 8, 2020

Add 2020.3 blurb #11578

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use locale.getpreferredencoding rather than locale.getlocale as latter can fail if Python doesn't know the locale #11384

Use locale.getpreferredencoding rather than locale.getlocale as latter can fail if Python doesn't know the locale #11384

lukaszgo1 commented Jul 15, 2020

michaelDCurran commented Jul 15, 2020

LeonarddeR commented Jul 16, 2020

LeonarddeR left a comment

LeonarddeR Jul 16, 2020

lukaszgo1 Jul 16, 2020

josephsl commented Jul 16, 2020 via email •

edited by feerrenrut

Loading

lukaszgo1 commented Jul 16, 2020

lukaszgo1 commented Jul 16, 2020

michaelDCurran commented Jul 16, 2020 via email

	USERANSICODEPAGE = locale.getpreferredencoding()
	USER_ANSI_CODE_PAGE = locale.getpreferredencoding()

	USERANSICODEPAGE = locale.getpreferredencoding()
	SYSTEM_PREFERRED_ENCODING = locale.getpreferredencoding()

Use locale.getpreferredencoding rather than locale.getlocale as latter can fail if Python doesn't know the locale #11384

Use locale.getpreferredencoding rather than locale.getlocale as latter can fail if Python doesn't know the locale #11384

Conversation

lukaszgo1 commented Jul 15, 2020

Link to issue number:

Summary of the issue:

Description of how this pull request fixes the issue:

Testing performed:

Known issues with pull request:

Change log entry:

michaelDCurran commented Jul 15, 2020

LeonarddeR commented Jul 16, 2020

LeonarddeR left a comment

Choose a reason for hiding this comment

LeonarddeR Jul 16, 2020

Choose a reason for hiding this comment

lukaszgo1 Jul 16, 2020

Choose a reason for hiding this comment

josephsl commented Jul 16, 2020 via email • edited by feerrenrut Loading

lukaszgo1 commented Jul 16, 2020

lukaszgo1 commented Jul 16, 2020

michaelDCurran commented Jul 16, 2020 via email

josephsl commented Jul 16, 2020 via email •

edited by feerrenrut

Loading