Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deoptimise pathlib ABCs #113528

Closed
barneygale opened this issue Dec 28, 2023 · 0 comments · Fixed by #113559
Closed

Deoptimise pathlib ABCs #113528

barneygale opened this issue Dec 28, 2023 · 0 comments · Fixed by #113559
Labels
3.13 bugs and security fixes topic-pathlib type-feature A feature request or enhancement

Comments

@barneygale
Copy link
Contributor

barneygale commented Dec 28, 2023

Feature or enhancement

pathlib._abc.PurePathBase and PathBase have a few slots and methods that facilitate fast path object generation:

  • _drv, _root, _tail_cached: roughly, the result of os.path.splitroot() on the path
  • _str: the normalized path string
  • _make_child_relpath(): used when walking directories
  • _from_parsed_parts(): used in parent, parents, with_name(), relative_to()

These carefully-tuned features are useful in pathlib, where speed is important, but much less so in pathlib._abc, where the clarity of the interfaces and interactions should win out.

It should be possible to move them to PurePath, leaving PurePathBase with only _raw_paths and _resolving slots.

This should not affect the behaviour or performance of the public classes in pathlib proper.

Linked PRs

@barneygale barneygale added type-feature A feature request or enhancement topic-pathlib 3.13 bugs and security fixes labels Dec 28, 2023
barneygale added a commit to barneygale/cpython that referenced this issue Dec 28, 2023
Replace use of `_from_parsed_parts()` with `with_segments()` in
`PurePathBase.relative_to()`, and move the assignment of `_drv`, `_root`,
`_tail_cached` and `_str` slots into `PurePath.relative_to()`.

Also shuffle a deprecation warning into `PurePath.relative_to()` to
ensure it's raised at the right stack level, and do the same in
`is_relative_to()` for code consistency.
barneygale added a commit to barneygale/cpython that referenced this issue Dec 28, 2023
Replace use of `_from_parsed_parts()` with `with_segments()`, and move
assignments to `_drv`, `_root`, _tail_cached` and `_str` slots into
`PurePath`.
barneygale added a commit to barneygale/cpython that referenced this issue Dec 28, 2023
Replace usage of `_from_parsed_parts()` with `with_segments()` in
`with_name()`, and take a similar approach in `name` for consistency's
sake.
barneygale added a commit to barneygale/cpython that referenced this issue Dec 28, 2023
…h()`

Call straight through to `joinpath()` in `PathBase._make_child_relpath()`.
Move optimised/caching code to `pathlib.Path._make_child_relpath()`
barneygale added a commit to barneygale/cpython that referenced this issue Dec 28, 2023
Run expensive tests for walking and globbing from `test_pathlib` but not
`test_pathlib_abc`. The ABCs are not as tightly optimised as the classes
in top-level `pathlib`, and so these tests are taking rather a long time on
some buildbots. Coverage of the main `pathlib` classes should suffice.
barneygale added a commit to barneygale/cpython that referenced this issue Dec 28, 2023
Run expensive tests for walking and globbing from `test_pathlib` but not
`test_pathlib_abc`. The ABCs are not as tightly optimised as the classes
in top-level `pathlib`, and so these tests are taking rather a long time on
some buildbots. Coverage of the main `pathlib` classes should suffice.
barneygale added a commit that referenced this issue Dec 28, 2023
Run expensive tests for walking and globbing from `test_pathlib` but not
`test_pathlib_abc`. The ABCs are not as tightly optimised as the classes
in top-level `pathlib`, and so these tests are taking rather a long time on
some buildbots. Coverage of the main `pathlib` classes should suffice.
barneygale added a commit to barneygale/cpython that referenced this issue Dec 29, 2023
Move `_str` cache slot from `PurePathBase` to `PurePath`. As a result,
pathlib ABCs no longer cache their string representations.
barneygale added a commit to barneygale/cpython that referenced this issue Dec 29, 2023
barneygale added a commit to barneygale/cpython that referenced this issue Jan 4, 2024
barneygale added a commit to barneygale/cpython that referenced this issue Jan 5, 2024
barneygale added a commit to barneygale/cpython that referenced this issue Jan 5, 2024
barneygale added a commit to barneygale/cpython that referenced this issue Jan 5, 2024
barneygale added a commit to barneygale/cpython that referenced this issue Jan 5, 2024
barneygale added a commit to barneygale/cpython that referenced this issue Jan 6, 2024
Split test cases for invalid names into dedicated test methods. This will
make it easier to refactor tests for invalid name handling in ABCs later.

No change of coverage, just a change of test suite organisation.
barneygale added a commit to barneygale/cpython that referenced this issue Jan 6, 2024
The `DummyPurePath` and `DummyPath` test classes are simple subclasses of
`PurePathBase` and `Pathbase`. This commit adds `__repr__()` methods to the
dummy classes, which makes debugging test failures less painful.
kulikjak pushed a commit to kulikjak/cpython that referenced this issue Jan 22, 2024
… (again) (python#113882)

Restore full battle-tested implementations of `PurePath.[is_]relative_to()`. These were recently split up in 3375dfe and a15a773.

In `PurePathBase`, add entirely new implementations based on `_stack`, which itself calls `pathmod.split()` repeatedly to disassemble a path. These new implementations preserve features like trailing slashes where possible, while still observing that a `..` segment cannot be added to traverse an empty or `.` segment in *walk_up* mode. They do not rely on `parents` nor `__eq__()`, nor do they spin up temporary path objects.

Unfortunately calling `pathmod.relpath()` isn't an option, as it calls `abspath()` and in turn `os.getcwd()`, which is impure.
kulikjak pushed a commit to kulikjak/cpython that referenced this issue Jan 22, 2024
Apply pathlib's normalization and performance tuning in `pathlib.PurePath`, but not `pathlib._abc.PurePathBase`.

With this change, the pathlib ABCs do not normalize away alternate path separators, empty segments, or dot segments. A single string given to the initialiser will round-trip by default, i.e. `str(PurePathBase(my_string)) == my_string`. Implementors can set their own path domain-specific normalization scheme by overriding `__str__()`

Eliminating path normalization makes maintaining and caching the path's parts and string representation both optional and not very useful, so this commit moves the `_drv`, `_root`, `_tail_cached` and `_str` slots from `PurePathBase` to `PurePath`. Only `_raw_paths` and `_resolving` slots remain in `PurePathBase`. This frees the ABCs from the burden of some of pathlib's hardest-to-understand code.
aisk pushed a commit to aisk/cpython that referenced this issue Feb 11, 2024
…on#113534)

Run expensive tests for walking and globbing from `test_pathlib` but not
`test_pathlib_abc`. The ABCs are not as tightly optimised as the classes
in top-level `pathlib`, and so these tests are taking rather a long time on
some buildbots. Coverage of the main `pathlib` classes should suffice.
aisk pushed a commit to aisk/cpython that referenced this issue Feb 11, 2024
…ython#113777)

The `DummyPurePath` and `DummyPath` test classes are simple subclasses of
`PurePathBase` and `PathBase`. This commit adds `__repr__()` methods to the
dummy classes, which makes debugging test failures less painful.
aisk pushed a commit to aisk/cpython that referenced this issue Feb 11, 2024
…n#113776)

Split test cases for invalid names into dedicated test methods. This will
make it easier to refactor tests for invalid name handling in ABCs later.

No change of coverage, just a change of test suite organisation.
aisk pushed a commit to aisk/cpython that referenced this issue Feb 11, 2024
…link loop handling (python#113763)

Slightly improve `pathlib.Path.glob()` tests for symlink loop handling

When filtering results, ignore paths with more than one `linkD/` segment,
rather than all paths below the first `linkD/` segment. This allows us
to test that other paths under `linkD/` are correctly returned.
aisk pushed a commit to aisk/cpython that referenced this issue Feb 11, 2024
…113531)

Replace usage of `_from_parsed_parts()` with `with_segments()` in
`with_name()`, and take a similar approach in `name` for consistency's
sake.
aisk pushed a commit to aisk/cpython that referenced this issue Feb 11, 2024
…n#113530)

Replace use of `_from_parsed_parts()` with `with_segments()`, and move
assignments to `_drv`, `_root`, _tail_cached` and `_str` slots into
`PurePath`.
aisk pushed a commit to aisk/cpython that referenced this issue Feb 11, 2024
…python#113529)

Replace use of `_from_parsed_parts()` with `with_segments()` in
`PurePathBase.relative_to()`, and move the assignment of `_drv`, `_root`
and `_tail_cached` slots into `PurePath.relative_to()`.
aisk pushed a commit to aisk/cpython that referenced this issue Feb 11, 2024
`PurePathBase` does not define `__eq__()`, and so we have no business checking path equality in `test_eq_common` and `test_equivalences`. The tests only pass at the moment because we define the test class's `__eq__()` for use elsewhere.

Also move `test_parse_path_common` into the main pathlib test suite. It exercises a private `_parse_path()` method that will be moved to `PurePath` soon.

Lastly move a couple more tests concerned with optimisations and path normalisation.
aisk pushed a commit to aisk/cpython that referenced this issue Feb 11, 2024
- Add `__slots__` to dummy path classes.
- Return namedtuple rather than `os.stat_result` from `DummyPath.stat()`.
- Reduce maximum symlink count in `DummyPathWithSymlinks.resolve()`.
aisk pushed a commit to aisk/cpython that referenced this issue Feb 11, 2024
…h()` (python#113532)

Call straight through to `joinpath()` in `PathBase._make_child_relpath()`.
Move optimised/caching code to `pathlib.Path._make_child_relpath()`
aisk pushed a commit to aisk/cpython that referenced this issue Feb 11, 2024
…#113782)

Replace use of `_from_parsed_parts()` with `with_segments()` in
`resolve()`.

No effect on `Path.resolve()`, which uses `os.path.realpath()`.
aisk pushed a commit to aisk/cpython that referenced this issue Feb 11, 2024
…#113883)

Implement `parts` using `_stack`, which itself calls `pathmod.split()`
repeatedly. This avoids use of `_tail`, which will be moved to `PurePath`
shortly.
aisk pushed a commit to aisk/cpython that referenced this issue Feb 11, 2024
… (again) (python#113882)

Restore full battle-tested implementations of `PurePath.[is_]relative_to()`. These were recently split up in 3375dfe and a15a773.

In `PurePathBase`, add entirely new implementations based on `_stack`, which itself calls `pathmod.split()` repeatedly to disassemble a path. These new implementations preserve features like trailing slashes where possible, while still observing that a `..` segment cannot be added to traverse an empty or `.` segment in *walk_up* mode. They do not rely on `parents` nor `__eq__()`, nor do they spin up temporary path objects.

Unfortunately calling `pathmod.relpath()` isn't an option, as it calls `abspath()` and in turn `os.getcwd()`, which is impure.
aisk pushed a commit to aisk/cpython that referenced this issue Feb 11, 2024
Apply pathlib's normalization and performance tuning in `pathlib.PurePath`, but not `pathlib._abc.PurePathBase`.

With this change, the pathlib ABCs do not normalize away alternate path separators, empty segments, or dot segments. A single string given to the initialiser will round-trip by default, i.e. `str(PurePathBase(my_string)) == my_string`. Implementors can set their own path domain-specific normalization scheme by overriding `__str__()`

Eliminating path normalization makes maintaining and caching the path's parts and string representation both optional and not very useful, so this commit moves the `_drv`, `_root`, `_tail_cached` and `_str` slots from `PurePathBase` to `PurePath`. Only `_raw_paths` and `_resolving` slots remain in `PurePathBase`. This frees the ABCs from the burden of some of pathlib's hardest-to-understand code.
Glyphack pushed a commit to Glyphack/cpython that referenced this issue Sep 2, 2024
…on#113534)

Run expensive tests for walking and globbing from `test_pathlib` but not
`test_pathlib_abc`. The ABCs are not as tightly optimised as the classes
in top-level `pathlib`, and so these tests are taking rather a long time on
some buildbots. Coverage of the main `pathlib` classes should suffice.
Glyphack pushed a commit to Glyphack/cpython that referenced this issue Sep 2, 2024
…ython#113777)

The `DummyPurePath` and `DummyPath` test classes are simple subclasses of
`PurePathBase` and `PathBase`. This commit adds `__repr__()` methods to the
dummy classes, which makes debugging test failures less painful.
Glyphack pushed a commit to Glyphack/cpython that referenced this issue Sep 2, 2024
…n#113776)

Split test cases for invalid names into dedicated test methods. This will
make it easier to refactor tests for invalid name handling in ABCs later.

No change of coverage, just a change of test suite organisation.
Glyphack pushed a commit to Glyphack/cpython that referenced this issue Sep 2, 2024
…link loop handling (python#113763)

Slightly improve `pathlib.Path.glob()` tests for symlink loop handling

When filtering results, ignore paths with more than one `linkD/` segment,
rather than all paths below the first `linkD/` segment. This allows us
to test that other paths under `linkD/` are correctly returned.
Glyphack pushed a commit to Glyphack/cpython that referenced this issue Sep 2, 2024
…113531)

Replace usage of `_from_parsed_parts()` with `with_segments()` in
`with_name()`, and take a similar approach in `name` for consistency's
sake.
Glyphack pushed a commit to Glyphack/cpython that referenced this issue Sep 2, 2024
…n#113530)

Replace use of `_from_parsed_parts()` with `with_segments()`, and move
assignments to `_drv`, `_root`, _tail_cached` and `_str` slots into
`PurePath`.
Glyphack pushed a commit to Glyphack/cpython that referenced this issue Sep 2, 2024
…python#113529)

Replace use of `_from_parsed_parts()` with `with_segments()` in
`PurePathBase.relative_to()`, and move the assignment of `_drv`, `_root`
and `_tail_cached` slots into `PurePath.relative_to()`.
Glyphack pushed a commit to Glyphack/cpython that referenced this issue Sep 2, 2024
`PurePathBase` does not define `__eq__()`, and so we have no business checking path equality in `test_eq_common` and `test_equivalences`. The tests only pass at the moment because we define the test class's `__eq__()` for use elsewhere.

Also move `test_parse_path_common` into the main pathlib test suite. It exercises a private `_parse_path()` method that will be moved to `PurePath` soon.

Lastly move a couple more tests concerned with optimisations and path normalisation.
Glyphack pushed a commit to Glyphack/cpython that referenced this issue Sep 2, 2024
- Add `__slots__` to dummy path classes.
- Return namedtuple rather than `os.stat_result` from `DummyPath.stat()`.
- Reduce maximum symlink count in `DummyPathWithSymlinks.resolve()`.
Glyphack pushed a commit to Glyphack/cpython that referenced this issue Sep 2, 2024
…h()` (python#113532)

Call straight through to `joinpath()` in `PathBase._make_child_relpath()`.
Move optimised/caching code to `pathlib.Path._make_child_relpath()`
Glyphack pushed a commit to Glyphack/cpython that referenced this issue Sep 2, 2024
…#113782)

Replace use of `_from_parsed_parts()` with `with_segments()` in
`resolve()`.

No effect on `Path.resolve()`, which uses `os.path.realpath()`.
Glyphack pushed a commit to Glyphack/cpython that referenced this issue Sep 2, 2024
…#113883)

Implement `parts` using `_stack`, which itself calls `pathmod.split()`
repeatedly. This avoids use of `_tail`, which will be moved to `PurePath`
shortly.
Glyphack pushed a commit to Glyphack/cpython that referenced this issue Sep 2, 2024
… (again) (python#113882)

Restore full battle-tested implementations of `PurePath.[is_]relative_to()`. These were recently split up in 3375dfe and a15a773.

In `PurePathBase`, add entirely new implementations based on `_stack`, which itself calls `pathmod.split()` repeatedly to disassemble a path. These new implementations preserve features like trailing slashes where possible, while still observing that a `..` segment cannot be added to traverse an empty or `.` segment in *walk_up* mode. They do not rely on `parents` nor `__eq__()`, nor do they spin up temporary path objects.

Unfortunately calling `pathmod.relpath()` isn't an option, as it calls `abspath()` and in turn `os.getcwd()`, which is impure.
Glyphack pushed a commit to Glyphack/cpython that referenced this issue Sep 2, 2024
Apply pathlib's normalization and performance tuning in `pathlib.PurePath`, but not `pathlib._abc.PurePathBase`.

With this change, the pathlib ABCs do not normalize away alternate path separators, empty segments, or dot segments. A single string given to the initialiser will round-trip by default, i.e. `str(PurePathBase(my_string)) == my_string`. Implementors can set their own path domain-specific normalization scheme by overriding `__str__()`

Eliminating path normalization makes maintaining and caching the path's parts and string representation both optional and not very useful, so this commit moves the `_drv`, `_root`, `_tail_cached` and `_str` slots from `PurePathBase` to `PurePath`. Only `_raw_paths` and `_resolving` slots remain in `PurePathBase`. This frees the ABCs from the burden of some of pathlib's hardest-to-understand code.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.13 bugs and security fixes topic-pathlib type-feature A feature request or enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants
@barneygale and others