Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Isolate _multibytecodec #103583

Closed
Tracked by #103092
erlend-aasland opened this issue Apr 16, 2023 · 5 comments · Fixed by #103869
Closed
Tracked by #103092

Isolate _multibytecodec #103583

erlend-aasland opened this issue Apr 16, 2023 · 5 comments · Fixed by #103869

Comments

@erlend-aasland erlend-aasland changed the title isolate _multibytecodec Isolate _multibytecodec Apr 16, 2023
@erlend-aasland
Copy link
Contributor Author

erlend-aasland commented Apr 16, 2023

In gh-103540 I've isolated _multibytecodec. However, more steps are needed to isolate the _codec_* (sub-ish) modules. I've identified some issues that IMO should be solved first, preferably in separate PRs:

  • There is no reference dependency between the _multibytecodec and the _codecs_* modules. This is a problem, because the MultibyteCodecObject type (of the _multibytecodec module) stores pointers to codec structs whose memory is owned by the _codecs_* modules. IMO, this needs to be done anyway, so we can start here.
  • There are two types of capsules used (one for maps, one for codecs), but both of them use the same name. IMO, this is both confusing and fragile. I suggest to use distinct names instead.
  • We need a way to store global pointers to codecs and maps; one possibility is to expand the MultibyteCodec struct and pass that to the various mb handlers. Another possibility is to store these in _codecs_* module state, store that state, for example in MultibyteCodecObject, and pass state to the mb handlers. Yet another possibility is to use the config member of MultibyteCodec for this, and change the semantics of that struct field.

Also, some low-hanging fruit that IMO can be done right now:

  • We've got both MultibyteCodec_State and _multibytecodec_state. IMO, we should rename one of those; the current naming is too similar and confusing.
  • MultibyteCodec pointers should be passed as const

@erlend-aasland
Copy link
Contributor Author

cc. @corona10

@erlend-aasland
Copy link
Contributor Author

In gh-103540 I've isolated _multibytecodec.

No, I haven't; I've only put the codec and map arrays into the _codecs* module state.

erlend-aasland added a commit to erlend-aasland/cpython that referenced this issue Apr 17, 2023
carljm added a commit to carljm/cpython that referenced this issue Apr 17, 2023
* main:
  Remove `expert-*` from `project-updater` GH workflow (python#103579)
  pythongh-103583: Add codecs and maps to _codecs_* module state (python#103540)
  pythongh-48330: address review comments to PR-12271 (python#103209)
  pythongh-103527: Add multibytecodec.h as make dep for _codecs_* (python#103567)
  pythongh-103553: Improve `test_inspect`: add more assertions, remove unused (python#103554)
  pythonGH-103517: Improve tests for `pathlib.Path.walk()` (pythonGH-103518)
  pythongh-102114: Make dis print more concise tracebacks for syntax errors in str inputs (python#102115)
  pythonGH-78079: Fix UNC device path root normalization in pathlib (pythonGH-102003)
  pythongh-101517: Add regression test for a lineno bug in try/except* impacting pdb (python#103547)
  pythongh-103527: Add make deps for _codecs_* and _multibytecodec (python#103528)
  pythongh-103532: Fix reST syntax in NEWS entry (pythonGH-103544)
  pythongh-103532: Add NEWS entry (python#103542)
carljm added a commit to carljm/cpython that referenced this issue Apr 17, 2023
* superopt:
  update generated cases with new comment
  review comments
  Remove `expert-*` from `project-updater` GH workflow (python#103579)
  pythongh-103583: Add codecs and maps to _codecs_* module state (python#103540)
  pythongh-48330: address review comments to PR-12271 (python#103209)
  pythongh-103527: Add multibytecodec.h as make dep for _codecs_* (python#103567)
  pythongh-103553: Improve `test_inspect`: add more assertions, remove unused (python#103554)
  pythonGH-103517: Improve tests for `pathlib.Path.walk()` (pythonGH-103518)
  pythongh-102114: Make dis print more concise tracebacks for syntax errors in str inputs (python#102115)
  pythonGH-78079: Fix UNC device path root normalization in pathlib (pythonGH-102003)
  pythongh-101517: Add regression test for a lineno bug in try/except* impacting pdb (python#103547)
  pythongh-103527: Add make deps for _codecs_* and _multibytecodec (python#103528)
  pythongh-103532: Fix reST syntax in NEWS entry (pythonGH-103544)
  pythongh-103532: Add NEWS entry (python#103542)
carljm added a commit to carljm/cpython that referenced this issue Apr 20, 2023
* main: (24 commits)
  pythongh-98040: Move the Single-Phase Init Tests Out of test_imp (pythongh-102561)
  pythongh-83861: Fix datetime.astimezone() method (pythonGH-101545)
  pythongh-102856: Clean some of the PEP 701 tokenizer implementation (python#103634)
  pythongh-102856: Skip test_mismatched_parens in WASI builds (python#103633)
  pythongh-102856: Initial implementation of PEP 701 (python#102855)
  pythongh-103583: Add ref. dependency between multibytecodec modules (python#103589)
  pythongh-83004: Harden msvcrt further (python#103420)
  pythonGH-88342: clarify that `asyncio.as_completed` accepts generators yielding tasks (python#103626)
  pythongh-102778: IDLE - make sys.last_exc available in Shell after traceback (python#103314)
  pythongh-103582: Remove last references to `argparse.REMAINDER` from docs (python#103586)
  pythongh-103583: Always pass multibyte codec structs as const (python#103588)
  pythongh-103617: Fix compiler warning in _iomodule.c (python#103618)
  pythongh-103596: [Enum] do not shadow mixed-in methods/attributes (pythonGH-103600)
  pythonGH-100530: Change the error message for non-class class patterns (pythonGH-103576)
  pythongh-95299: Remove lingering setuptools reference in installer scripts (pythonGH-103613)
  [Doc] Fix a typo in optparse.rst (python#103504)
  pythongh-101100: Fix broken reference `__format__` in `string.rst` (python#103531)
  pythongh-95299: Stop installing setuptools as a part of ensurepip and venv (python#101039)
  pythonGH-103484: Docs: add linkcheck allowed redirects entries for most cases (python#103569)
  pythongh-67230: update whatsnew note for csv changes (python#103598)
  ...
@erlend-aasland
Copy link
Contributor Author

@corona10, I made three competing PRs for how to implement custom module state for the CJK modules. IMO, alternative 2 is the best approach. Alt 3 is way too complex (and currently, it does not even work). Alt 1 is too hacky.

cc. @kumaraditya303

@corona10
Copy link
Member

@erlend-aasland I will take a look by tomorrow. ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants