-
-
Notifications
You must be signed in to change notification settings - Fork 30.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mimalloc: additional integration and changes for --disable-gil
builds
#112532
Comments
In `--disable-gil` builds, the default allocator is now "mimalloc" and the "malloc" and "pymalloc" allocators are disabled.
In `--disable-gil` builds, we now use four separate heaps in anticipation of using mimalloc to find GC objects when the GIL is disabled. To support this, we also make a few changes to mimalloc: * Heap and mi_tld_t initialization is split from allocation. This allows us to have a per-PyThreadState mi_tld_t, which is important to keep interpreter isolation, since the same OS thread may run in multiple interpreters (using different PyThreadStates.) * The pool of abandoned segments is refactored into its own struct. This allows us to use different pools for different interpreters so that we can preserve interpreter isolation. * Heap abandoning (mi_heap_collect_ex) can now be called from a different thread than the one that created the heap. This is necessary because we may clear and delete the containing PyThreadStates from a different thread during finalization and after fork().
In `--disable-gil` builds, we now use four separate heaps in anticipation of using mimalloc to find GC objects when the GIL is disabled. To support this, we also make a few changes to mimalloc: * Heap and mi_tld_t initialization is split from allocation. This allows us to have a per-PyThreadState mi_tld_t, which is important to keep interpreter isolation, since the same OS thread may run in multiple interpreters (using different PyThreadStates.) * The pool of abandoned segments is refactored into its own struct. This allows us to use different pools for different interpreters so that we can preserve interpreter isolation. * Heap abandoning (mi_heap_collect_ex) can now be called from a different thread than the one that created the heap. This is necessary because we may clear and delete the containing PyThreadStates from a different thread during finalization and after fork().
In `--disable-gil` builds, we now use four separate heaps in anticipation of using mimalloc to find GC objects when the GIL is disabled. To support this, we also make a few changes to mimalloc: * Heap and mi_tld_t initialization is split from allocation. This allows us to have a per-PyThreadState mi_tld_t, which is important to keep interpreter isolation, since the same OS thread may run in multiple interpreters (using different PyThreadStates.) * Heap abandoning (mi_heap_collect_ex) can now be called from a different thread than the one that created the heap. This is necessary because we may clear and delete the containing PyThreadStates from a different thread during finalization and after fork().
In `--disable-gil` builds, we now use four separate heaps in anticipation of using mimalloc to find GC objects when the GIL is disabled. To support this, we also make a few changes to mimalloc: * `mi_heap_t` and `mi_tld_t` initialization is split from allocation. This allows us to have a `mi_tld_t` per-`PyThreadState`, which is important to keep interpreter isolation, since the same OS thread may run in multiple interpreters (using different PyThreadStates.) * Heap abandoning (mi_heap_collect_ex) can now be called from a different thread than the one that created the heap. This is necessary because we may clear and delete the containing PyThreadStates from a different thread during finalization and after fork().
Mimalloc segments are data structures that contain memory allocations along with metadata. Each segment is "owned" by a thread. When a thread exits, it abandons its segments to a global pool to be later reclaimed by other threads. This changes the pool to be per-interpreter instead of process-wide. This will be important for when we use mimalloc to find GC objects in the `--disable-gil` builds. We want heaps to only store Python objects from a single interpreter. Absent this change, the abandoning and reclaiming process could break this isolation.
* gh-112532: Use separate mimalloc heaps for GC objects In `--disable-gil` builds, we now use four separate heaps in anticipation of using mimalloc to find GC objects when the GIL is disabled. To support this, we also make a few changes to mimalloc: * `mi_heap_t` and `mi_tld_t` initialization is split from allocation. This allows us to have a `mi_tld_t` per-`PyThreadState`, which is important to keep interpreter isolation, since the same OS thread may run in multiple interpreters (using different PyThreadStates.) * Heap abandoning (mi_heap_collect_ex) can now be called from a different thread than the one that created the heap. This is necessary because we may clear and delete the containing PyThreadStates from a different thread during finalization and after fork(). * Use enum instead of defines and guard mimalloc includes. * The enum typedef will be convenient for future PRs that use the type. * Guarding the mimalloc includes allows us to unconditionally include pycore_mimalloc.h from other header files that rely on things like `struct _mimalloc_thread_state`. * Only define _mimalloc_thread_state in Py_GIL_DISABLED builds
Mimalloc segments are data structures that contain memory allocations along with metadata. Each segment is "owned" by a thread. When a thread exits, it abandons its segments to a global pool to be later reclaimed by other threads. This changes the pool to be per-interpreter instead of process-wide. This will be important for when we use mimalloc to find GC objects in the `--disable-gil` builds. We want heaps to only store Python objects from a single interpreter. Absent this change, the abandoning and reclaiming process could break this isolation.
* gh-112532: Isolate abandoned segments by interpreter Mimalloc segments are data structures that contain memory allocations along with metadata. Each segment is "owned" by a thread. When a thread exits, it abandons its segments to a global pool to be later reclaimed by other threads. This changes the pool to be per-interpreter instead of process-wide. This will be important for when we use mimalloc to find GC objects in the `--disable-gil` builds. We want heaps to only store Python objects from a single interpreter. Absent this change, the abandoning and reclaiming process could break this isolation. * Add missing '&_mi_abandoned_default' to 'tld_empty'
Mimalloc pages are data structures that contain contiguous allocations of the same block size. Note that they are distinct from operating system pages. Mimalloc pages are contained in segments. When a thread exits, it abandons any segments and contained pages that have live allocations. These segments and pages may be later reclaimed by another thread. To support GC and certain thread-safety guarantees in free-threaded builds, we want pages to only be reclaimed by the corresponding heap in the claimant thread. For example, we want pages containing GC objects to only be claimed by GC heaps. This allows heaps and pages to be tagged with an integer tag that is used to ensure that abandoned pages are only claimed by heaps with the same tag. Heaps can be initialized with a tag (0-15); any page allocated by that heap copies the corresponding tag.
* gh-112532: Tag mimalloc heaps and pages Mimalloc pages are data structures that contain contiguous allocations of the same block size. Note that they are distinct from operating system pages. Mimalloc pages are contained in segments. When a thread exits, it abandons any segments and contained pages that have live allocations. These segments and pages may be later reclaimed by another thread. To support GC and certain thread-safety guarantees in free-threaded builds, we want pages to only be reclaimed by the corresponding heap in the claimant thread. For example, we want pages containing GC objects to only be claimed by GC heaps. This allows heaps and pages to be tagged with an integer tag that is used to ensure that abandoned pages are only claimed by heaps with the same tag. Heaps can be initialized with a tag (0-15); any page allocated by that heap copies the corresponding tag. * Fix conversion warning
The AMD64 Ubuntu NoGIL Refleaks buildbot has been failing for two weeks now, and it looks like it was caused by #113263 (separate heaps). I'll try to reproduce and investigate further. |
I double-checked the leak on my local macOS with bisecting, and acf3bcc is triggered commit.
|
I minimized a few of the failing test cases. I found that all pass random strings to either click for the 4 minimized casesimport pathlib
import os
import tempfile
import unittest
class ModifiedZipAppTest(unittest.TestCase):
def test_create_archive(self):
for i in range(6): # need six or more to trigger the leak
tmpdir_p = tempfile.TemporaryDirectory()
pathlib.Path(tmpdir_p.name).joinpath('source').mkdir()
tmpdir_p.cleanup() import secrets
import string
import unittest
class ModifiedDictTest(unittest.TestCase):
def test_set_constructor(self):
n = 6 # need 6 or more to trigger the leak
# Generate a set of `n` unique strings
items = {(''.join(secrets.token_hex(5))) for i in range(n)}
dictliteral = str(items)
# Eval'ing the set's repr leaks
eval(dictliteral) import secrets
import unittest
from collections import namedtuple
class ModifiedTestNamedTuple(unittest.TestCase):
def test_odd_sizes(self):
n = 6 # need 6 or more to trigger leak
names = ['s' + secrets.token_hex(5) for i in range(n)]
namedtuple('Big', names) import tempfile
import os
import pathlib
import unittest
class ModifiedTempfileTest(unittest.TestCase):
def test_choose_directory(self):
for i in range(6): # you guessed it, 6 again
dir = tempfile.mkdtemp()
tempfile._infer_return_type(pathlib.Path(dir))
os.rmdir(dir) What do Small Python reproducer: import secrets
import gc
import sys
before = 0
for i in range(30):
sys.intern(secrets.token_hex(2))
gc.collect()
now = sys.gettotalrefcount()
print(f'{now}: {now-before:+}')
before = now Per PEP 703, interned strings are immortalized. That might ... not be a good idea? It'll definitely need a doc update. IMO, the PEP should be more clear that this is a change -- the other immortal objects it lists are already immortal in the GIL build. edit: but there might be more to this... |
I suspect this has something to do with |
Ok, so fixing However, the the failure was for reference counts, not leaked memory blocks so it's strange that fixing the memory block count fixes the purported reference leak. I think there's also something buggy in |
@colesbury Leakage from test_socket had been already happened before #113263 was merged, I checked with commit 8f5b998 |
@encukou cc @python/macos-team |
I will create a new issue for this. |
#114049 is created. |
This fixes `_PyInterpreterState_GetAllocatedBlocks()` and `_Py_GetGlobalAllocatedBlocks()` in the free-threaded builds. The gh-113263 change that introduced multiple mimalloc heaps per-thread broke the logic for counting the number of allocated blocks. For subtle reasons, this led to reported reference count leaks in the refleaks buildbots.
This adds support for visiting abandoned pages in mimalloc and improves the performance of the page visiting code. Abandoned pages contain memory blocks from threads that have exited. At some point, they may be later reclaimed by other threads. We still need to visit those pages in the free-threaded GC because they contain live objects. This also reduces the overhead of visiting mimalloc pages: * Special cases for full, empty, and pages containing only a single block. * Fix free_map to use one bit instead of one byte per block. * Use fast integer division by a constant algorithm when computing block offset from block size and index.
* gh-111926: Set up basic sementics of weakref API for freethreading (gh-113621) --------- Co-authored-by: Sam Gross <colesbury@gmail.com> * gh-113603: Compiler no longer tries to maintain the no-empty-block invariant (#113636) * gh-113258: Write frozen modules to the build tree on Windows (GH-113303) This ensures the source directory is not modified at build time, and different builds (e.g. different versions or GIL vs no-GIL) do not have conflicts. * Document the `co_lines` method on code objects (#113682) Co-authored-by: Hugo van Kemenade <hugovk@users.noreply.github.com> * gh-52161: Enhance Cmd support for docstrings (#110987) In `cmd.Cmd.do_help` call `inspect.cleandoc()`, to clean indentation and remove leading/trailing empty lines from a dosctring before printing. * GH-113689: Fix broken handling of invalid executors (GH-113694) * gh-113696: Docs: Annotate PyObject_CallOneArg and PyObject_CallNoArgs as returning a strong reference (#113697) * gh-113569: Display calls in Mock.assert_has_calls failure when empty (GH-113573) * gh-113538: Don't error in stream reader protocol callback when task is cancelled (#113690) * GH-113225: Speed up `pathlib.Path.glob()` (#113226) Use `os.DirEntry.path` as the string representation of child paths, unless the parent path is empty, in which case we use the entry `name`. * gh-112532: Isolate abandoned segments by interpreter (#113717) * gh-112532: Isolate abandoned segments by interpreter Mimalloc segments are data structures that contain memory allocations along with metadata. Each segment is "owned" by a thread. When a thread exits, it abandons its segments to a global pool to be later reclaimed by other threads. This changes the pool to be per-interpreter instead of process-wide. This will be important for when we use mimalloc to find GC objects in the `--disable-gil` builds. We want heaps to only store Python objects from a single interpreter. Absent this change, the abandoning and reclaiming process could break this isolation. * Add missing '&_mi_abandoned_default' to 'tld_empty' * gh-113320: Reduce the number of dangerous `getattr()` calls when constructing protocol classes (#113401) - Only attempt to figure out whether protocol members are "method members" or not if the class is marked as a runtime protocol. This information is irrelevant for non-runtime protocols; we can safely skip the risky introspection for them. - Only do the risky getattr() calls in one place (the runtime_checkable class decorator), rather than in three places (_ProtocolMeta.__init__, _ProtocolMeta.__instancecheck__ and _ProtocolMeta.__subclasscheck__). This reduces the number of locations in typing.py where the risky introspection could go wrong. - For runtime protocols, if determining whether a protocol member is callable or not fails, give a better error message. I think it's reasonable for us to reject runtime protocols that have members which raise strange exceptions when you try to access them. PEP-544 clearly states that all protocol member must be callable for issubclass() calls against the protocol to be valid -- and if a member raises when we try to access it, there's no way for us to figure out whether it's a callable member or not! * GH-113486: Do not emit spurious PY_UNWIND events for optimized calls to classes. (GH-113680) * gh-113703: Correctly identify incomplete f-strings in the codeop module (#113709) * gh-101100: Fix Sphinx warnings for 2.6 deprecations and removals (#113725) Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com> Co-authored-by: Hugo van Kemenade <hugovk@users.noreply.github.com> * gh-80532: Do not set ipv6type when cross-compiling (#17956) Co-authored-by: Xavier de Gaye <xdegaye@gmail.com> * gh-101100: Fix Sphinx warnings in `library/pyclbr.rst` (#113739) Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com> * gh-112532: Tag mimalloc heaps and pages (#113742) * gh-112532: Tag mimalloc heaps and pages Mimalloc pages are data structures that contain contiguous allocations of the same block size. Note that they are distinct from operating system pages. Mimalloc pages are contained in segments. When a thread exits, it abandons any segments and contained pages that have live allocations. These segments and pages may be later reclaimed by another thread. To support GC and certain thread-safety guarantees in free-threaded builds, we want pages to only be reclaimed by the corresponding heap in the claimant thread. For example, we want pages containing GC objects to only be claimed by GC heaps. This allows heaps and pages to be tagged with an integer tag that is used to ensure that abandoned pages are only claimed by heaps with the same tag. Heaps can be initialized with a tag (0-15); any page allocated by that heap copies the corresponding tag. * Fix conversion warning * gh-113688: Split up gcmodule.c (gh-113715) This splits part of Modules/gcmodule.c of into Python/gc.c, which now contains the core garbage collection implementation. The Python module remain in the Modules/gcmodule.c file. * GH-113568: Stop raising auditing events from pathlib ABCs (#113571) Raise auditing events in `pathlib.Path.glob()`, `rglob()` and `walk()`, but not in `pathlib._abc.PathBase` methods. Also move generation of a deprecation warning into `pathlib.Path` so it gets the right stack level. * gh-85567: Fix resouce warnings in pickle and pickletools CLIs (GH-113618) Explicitly open and close files instead of using FileType. * gh-113360: Fix the documentation of module's attribute __test__ (GH-113393) It can only be a dict since Python 2.4. * GH-113568: Stop raising deprecation warnings from pathlib ABCs (#113757) * gh-113750: Fix object resurrection in free-threaded builds (gh-113751) gh-113750: Fix object resurrection on free-threaded builds This avoids the undesired re-initializing of fields like `ob_gc_bits`, `ob_mutex`, and `ob_tid` when an object is resurrected due to its finalizer being called. This change has no effect on the default (with GIL) build. * gh-113729: Fix IDLE's Help -> "IDLE Help" menu bug in 3.12.1 and 3.11.7 (#113731) Co-authored-by: Terry Jan Reedy <tjreedy@udel.edu> * gh-113537: support loads str in plistlib.loads (#113582) Add support for loading XML plists from a string value instead of a only bytes value. * gh-111488: Changed error message in case of no 'in' keyword after 'for' in cmp (#113656) * gh-107901: synthetic jumps which are not at end of loop no longer check the eval breaker (#113721) * GH-113528: pathlib ABC tests: add repr to dummy path classes. (#113777) The `DummyPurePath` and `DummyPath` test classes are simple subclasses of `PurePathBase` and `PathBase`. This commit adds `__repr__()` methods to the dummy classes, which makes debugging test failures less painful. * GH-113528: Split up pathlib tests for invalid basenames. (#113776) Split test cases for invalid names into dedicated test methods. This will make it easier to refactor tests for invalid name handling in ABCs later. No change of coverage, just a change of test suite organisation. * GH-113528: Slightly improve `pathlib.Path.glob()` tests for symlink loop handling (#113763) Slightly improve `pathlib.Path.glob()` tests for symlink loop handling When filtering results, ignore paths with more than one `linkD/` segment, rather than all paths below the first `linkD/` segment. This allows us to test that other paths under `linkD/` are correctly returned. * GH-113528: Deoptimise `pathlib._abc.PurePathBase.name` (#113531) Replace usage of `_from_parsed_parts()` with `with_segments()` in `with_name()`, and take a similar approach in `name` for consistency's sake. * GH-113528: Deoptimise `pathlib._abc.PurePathBase.parent` (#113530) Replace use of `_from_parsed_parts()` with `with_segments()`, and move assignments to `_drv`, `_root`, _tail_cached` and `_str` slots into `PurePath`. * GH-113528: Deoptimise `pathlib._abc.PurePathBase.relative_to()` (#113529) Replace use of `_from_parsed_parts()` with `with_segments()` in `PurePathBase.relative_to()`, and move the assignment of `_drv`, `_root` and `_tail_cached` slots into `PurePath.relative_to()`. * gh-89532: Remove LibreSSL workarounds (#28728) Remove LibreSSL specific workaround ifdefs from `_ssl.c` and delete the non-version-specific `_ssl_data.h` file (relevant for OpenSSL < 1.1.1, which we no longer support per PEP 644). Co-authored-by: Christian Heimes <christian@python.org> Co-authored-by: Gregory P. Smith <greg@krypto.org> * gh-112795: Allow `/` folder in a zipfile (#112932) Allow extraction (no-op) of a "/" folder in a zipfile, they are commonly added by some archive creation tools. Co-authored-by: Erlend E. Aasland <erlend@python.org> Co-authored-by: Gregory P. Smith <greg@krypto.org> * gh-73965: New environment variable PYTHON_HISTORY (#13208) It can be used to set the location of a .python_history file --------- Co-authored-by: Levi Sabah <0xl3vi@gmail.com> Co-authored-by: Hugo van Kemenade <hugovk@users.noreply.github.com> * gh-73965: Move PYTHON_HISTORY into the correct usage section (#113798) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com> * gh-80109: Fix io.TextIOWrapper dropping the internal buffer during write() (GH-22535) io.TextIOWrapper was dropping the internal decoding buffer during read() and write() calls. * gh-74678: Increase base64 test coverage (GH-21913) Ensure the character y is disallowed within an Ascii85 5-tuple. Co-authored-by: Lee Cannon <leecannon@leecannon.xyz> * gh-110721: Remove unused code from suggestions.c after moving PyErr_Display to use the traceback module (#113712) * gh-113391: fix outdated PyObject_HasAttr docs (#113420) After #53875: PyObject_HasAttr is not an equivalent of hasattr. PyObject_HasAttrWithError is; it already has the note. * gh-113787: Fix refleaks in test_capi (gh-113816) Fix refleaks and a typo. * gh-113755: Fully adapt gcmodule.c to Argument Clinic (#113756) Adapt the following functions to Argument Clinic: - gc.set_threshold - gc.get_referrers - gc.get_referents * Minor algebraic simplification for the totient() recipe (gh-113822) * GH-113528: Move a few misplaced pathlib tests (#113527) `PurePathBase` does not define `__eq__()`, and so we have no business checking path equality in `test_eq_common` and `test_equivalences`. The tests only pass at the moment because we define the test class's `__eq__()` for use elsewhere. Also move `test_parse_path_common` into the main pathlib test suite. It exercises a private `_parse_path()` method that will be moved to `PurePath` soon. Lastly move a couple more tests concerned with optimisations and path normalisation. * gh-113688: fix dtrace build on Solaris (#113814) (the gcmodule -> gc refactoring broke it) * GH-113528: Speed up pathlib ABC tests. (#113788) - Add `__slots__` to dummy path classes. - Return namedtuple rather than `os.stat_result` from `DummyPath.stat()`. - Reduce maximum symlink count in `DummyPathWithSymlinks.resolve()`. * gh-113791: Expose CLOCK_MONOTONIC_RAW_APPROX and CLOCK_UPTIME_RAW_APROX on macOS in the time module (#113792) * GH-111693: Propagate correct asyncio.CancelledError instance out of asyncio.Condition.wait() (#111694) Also fix a race condition in `asyncio.Semaphore.acquire()` when cancelled. * gh-113827: Move Windows frozen modules directory to allow PGO builds (GH-113828) * gh-113027: Fix test_variable_tzname in test_email (#113821) Determine the support of the Kyiv timezone by checking the result of astimezone() which uses the system tz database and not the one populated by zoneinfo. * readme: fix displaying issue of command (#113719) Avoid line break in command as this causes displaying issues on GH. * gh-112806: Remove unused function warnings during mimalloc build on Solaris (#112807) * gh-112808: Fix mimalloc build on Solaris (#112809) * gh-112087: Update list.{pop,clear,reverse,remove} to use CS (gh-113764) * Docs: Link tokens in the format string grammars (#108184) Co-authored-by: Adam Turner <9087854+aa-turner@users.noreply.github.com> Co-authored-by: Sergey B Kirpichev <skirpichev@gmail.com> * gh-113692: skip a test if multiprocessing isn't available. (GH-113704) * gh-101100: Fix Sphinx warnings for 2.6 port-specific deprecations (#113752) * gh-113842: Add missing error check for PyIter_Next() in Python/symtable.c (GH-113843) * gh-87868: Sort and remove duplicates in getenvironment() (GH-102731) Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com> Co-authored-by: Pieter Eendebak <pieter.eendebak@gmail.com> Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com> * gh-103092: Test _ctypes type hierarchy and features (#113727) Test the following features for _ctypes types: - disallow instantiation - inheritance (MRO) - immutability - type name The following _ctypes types are tested: - Array - CField - COMError - PyCArrayType - PyCFuncPtrType - PyCPointerType - PyCSimpleType - PyCStructType - Structure - Union - UnionType - _CFuncPtr - _Pointer - _SimpleCData Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com> * gh-113650: Add workaround option for MSVC ARM64 bug affecting string encoding (GH-113836) * Fix opcode name printing in debug mode (#113870) Fix a few places where the lltrace debug output printed ``(null)`` instead of an opcode name, because it was calling ``_PyUOpName()`` on a Tier-1 opcode. * Simplify binomial approximation example with random.binomialvariate() (gh-113871) * GH-113528: Deoptimise `pathlib._abc.PathBase._make_child_relpath()` (#113532) Call straight through to `joinpath()` in `PathBase._make_child_relpath()`. Move optimised/caching code to `pathlib.Path._make_child_relpath()` * gh-113848: Use PyErr_GivenExceptionMatches() for check for CancelledError (GH-113849) * gh-113848: Handle CancelledError subclasses in asyncio TaskGroup() and timeout() (GH-113850) * gh-113781: Silence AttributeError in warning module during Python finalization (GH-113813) The tracemalloc module can already be cleared. * GH-113661: unittest runner: Don't exit 5 if tests were skipped (#113856) The intention of exiting 5 was to detect issues where the test suite wasn't discovered at all. If we skipped tests, it was correctly discovered. * GH-113528: Deoptimise `pathlib._abc.PathBase.resolve()` (#113782) Replace use of `_from_parsed_parts()` with `with_segments()` in `resolve()`. No effect on `Path.resolve()`, which uses `os.path.realpath()`. * gh-66060: Use actual class name in _io type's __repr__ (#30824) Use the object's actual class name in the following _io type's __repr__: - FileIO - TextIOWrapper - _WindowsConsoleIO * GH-113528: Deoptimise `pathlib._abc.PurePathBase.parts` (#113883) Implement `parts` using `_stack`, which itself calls `pathmod.split()` repeatedly. This avoids use of `_tail`, which will be moved to `PurePath` shortly. * GH-113528: Deoptimise `pathlib._abc.PurePathBase.relative_to()` (again) (#113882) Restore full battle-tested implementations of `PurePath.[is_]relative_to()`. These were recently split up in 3375dfe and a15a773. In `PurePathBase`, add entirely new implementations based on `_stack`, which itself calls `pathmod.split()` repeatedly to disassemble a path. These new implementations preserve features like trailing slashes where possible, while still observing that a `..` segment cannot be added to traverse an empty or `.` segment in *walk_up* mode. They do not rely on `parents` nor `__eq__()`, nor do they spin up temporary path objects. Unfortunately calling `pathmod.relpath()` isn't an option, as it calls `abspath()` and in turn `os.getcwd()`, which is impure. * gh-111968: Introduce _PyFreeListState and _PyFreeListState_GET API (gh-113584) * GH-113528: Deoptimise `pathlib._abc.PurePathBase` (#113559) Apply pathlib's normalization and performance tuning in `pathlib.PurePath`, but not `pathlib._abc.PurePathBase`. With this change, the pathlib ABCs do not normalize away alternate path separators, empty segments, or dot segments. A single string given to the initialiser will round-trip by default, i.e. `str(PurePathBase(my_string)) == my_string`. Implementors can set their own path domain-specific normalization scheme by overriding `__str__()` Eliminating path normalization makes maintaining and caching the path's parts and string representation both optional and not very useful, so this commit moves the `_drv`, `_root`, `_tail_cached` and `_str` slots from `PurePathBase` to `PurePath`. Only `_raw_paths` and `_resolving` slots remain in `PurePathBase`. This frees the ABCs from the burden of some of pathlib's hardest-to-understand code. * pathlib ABCs: Require one or more initialiser arguments (#113885) Refuse to guess what a user means when they initialise a pathlib ABC without any positional arguments. In mainline pathlib it's normalised to `.`, but in the ABCs this guess isn't appropriate; for example, the path type may not represent the current directory as `.`, or may have no concept of a "current directory" at all. * gh-112182: Replace StopIteration with RuntimeError for future (#113220) When an `StopIteration` raises into `asyncio.Future`, this will cause a thread to hang. This commit address this by not raising an exception and silently transforming the `StopIteration` with a `RuntimeError`, which the caller can reconstruct from `fut.exception().__cause__` * GH-113858: GitHub Actions config: Only save ccache on pushes (GH-113859) * gh-113877: Fix Tkinter method winfo_pathname() on 64-bit Windows (GH-113900) winfo_id() converts the result of "winfo id" command to integer, but "winfo pathname" command requires an argument to be a hexadecimal number on Win64. * gh-113879: Fix ResourceWarning in test_asyncio.test_server (GH-113881) * gh-96037: Always insert TimeoutError when exit an expired asyncio.timeout() block (GH-113819) If other exception was raised during exiting an expired asyncio.timeout() block, insert TimeoutError in the exception context just above the CancelledError. * gh-70835: Clarify error message for CSV file opened with wrong newline (GH-113786) Based on patch by SilentGhost. * gh-113594: Fix UnicodeEncodeError in TokenList.fold() (GH-113730) It occurred when try to re-encode an unknown-8bit part combined with non-unknown-8bit part. * gh-113664: Improve style of Big O notation (GH-113695) Use cursive to make it looking like mathematic formulas. * gh-58032: Do not use argparse.FileType in module CLIs and scripts (GH-113649) Open and close files manually. It prevents from leaking files, preliminary creation of output files, and accidental closing of stdin and stdout. * gh-89850: Add default C implementations of persistent_id() and persistent_load() (GH-113579) Previously the C implementation of pickle.Pickler and pickle.Unpickler classes did not have such methods and they could only be used if they were overloaded in subclasses or set as instance attributes. Fixed calling super().persistent_id() and super().persistent_load() in subclasses of the C implementation of pickle.Pickler and pickle.Unpickler classes. It no longer causes an infinite recursion. * gh-66515: Fix locking of an MH mailbox without ".mh_sequences" file (GH-113482) Guarantee that it either open an existing ".mh_sequences" file or create a new ".mh_sequences" file, but do not replace existing ".mh_sequences" file. * gh-111789: Use PyDict_GetItemRef() in Modules/_zoneinfo.c (GH-112078) * gh-109858: Protect zipfile from "quoted-overlap" zipbomb (GH-110016) Raise BadZipFile when try to read an entry that overlaps with other entry or central directory. * gh-111139: Optimize math.gcd(int, int) (#113887) Add a fast-path for the common case. Benchmark: python -m pyperf timeit \ -s 'import math; gcd=math.gcd; x=2*3; y=3*5' \ 'gcd(x,y)' Result: 1.07x faster (-3.4 ns) Mean +- std dev: 52.6 ns +- 4.0 ns -> 49.2 ns +- 0.4 ns: 1.07x faster * GH-113860: All executors are now defined in terms of micro ops. Convert counter executor to use uops. (GH-113864) * gh-111968: Use per-thread freelists for float in free-threading (gh-113886) * Add @requires_zlib() decorator for gh-109858 tests (GH-113918) * gh-113625: Align object addresses in the Descriptor HowTo Guide (#113894) * gh-113753: Clear finalized bit when putting PyAsyncGenASend back into free list (#113754) * gh-112302: Point core developers to SBOM devguide on errors (#113490) Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com> * gh-77046: os.pipe() sets _O_NOINHERIT flag on fds (#113817) On Windows, set _O_NOINHERIT flag on file descriptors created by os.pipe() and io.WindowsConsoleIO. Add test_pipe_spawnl() to test_os. Co-authored-by: Zackery Spytz <zspytz@gmail.com> * gh-87868: Skip `test_one_environment_variable` in `test_subprocess` when the platform or build cannot do that (#113867) * improve the assert for test_one_environment_variable * skip some test in test_subprocess when python is configured with shared * also skip the test if AddressSanitizer is enabled --------- Co-authored-by: Steve Dower <steve.dower@microsoft.com> * gh-113896: Fix test_builtin.BuiltinTest.test___ne__() (#113897) Fix DeprecationWarning in test___ne__(). Co-authored-by: Nikita Sobolev <mail@sobolevn.me> * gh-111968: Unify naming scheme for freelist (gh-113919) * gh-89811: Check for valid tp_version_tag in specializer (GH-113558) * gh-112640: Add `kwdefaults` parameter to `types.FunctionType.__new__` (#112641) * gh-112419: Document removal of sys.meta_path's 'find_module' fallback (#112421) Co-authored-by: Erlend E. Aasland <erlend@python.org> * gh-113932: assert ``SyntaxWarning`` in test_compile.TestSpecifics.test_… (#113933) * gh-91960: Remove Cirrus CI configuration (#113938) Remove .cirrus.yml which was already disabled by being renamed to .cirrus-DISABLED.yml. In total, Cirrus CI only run for less than one month. * gh-107901: jump leaving an exception handler doesn't need an eval break check (#113943) * GH-113853: Guarantee forward progress in executors (GH-113854) * gh-113845: Fix a compiler warning in Python/suggestions.c (GH-113949) * gh-111968: Use per-thread freelists for tuple in free-threading (gh-113921) * Update KDE recipe to match the standard use of the h parameter (gh-#113958) * gh-81489: Use Unicode APIs for mmap tagname on Windows (GH-14133) Co-authored-by: Erlend E. Aasland <erlend@python.org> * GH-107678: Improve Unicode handling clarity in ``library/re.rst`` (#107679) * Improve kde graph with better caption and number formatting (gh-113967) * gh-111968: Explicit handling for finalized freelist (gh-113929) * gh-113903: Fix an IDLE configdialog test (#113973) test_configdialog.HighPageTest.test_highlight_target_text_mouse fails if a line of the Highlight tab text sample is not visible. If so, bbox() in click_char() returns None and the unpacking iteration fails. This occurred on a Devuan Linux system. Fix by moving the 'see character' call inside click_char, just before the bbox call. Also, reduce the click_char calls to just one per tag name and replace the other nested function with a dict comprehension. * gh-113937 Fix failures in type cache tests due to re-running (GH-113953) * gh-113858: Cut down ccache size (GH-113945) Cut down ccache size - Only save the ccache in the main reusable builds, not on builds that don't use special build options: - Generated files check - OpenSSL tests - Hypothesis tests - Halve the max cache size, to 200M * gh-108364: In sqlite3, disable foreign keys before dumping SQL schema (#113957) sqlite3.Connection.iterdump now ensures that foreign key support is disabled before dumping the database schema, if there is any foreign key violation. Co-authored-by: Erlend E. Aasland <erlend@python.org> * gh-113027: Fix timezone check in test_variable_tzname in test_email (GH-113835) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com> * GH-113860: Get rid of `_PyUOpExecutorObject` (GH-113954) * Docs: Amend codeobject.co_lines docs; end number is exclusive (#113970) The end number should be exclusive, not inclusive. * gh-111877: Fixes stat() handling for inaccessible files on Windows (GH-113716) * gh-113980: Fix resource warnings in test_asyncgen (GH-113984) * gh-107901: duplicate blocks with no lineno that have an eval break and multiple predecessors (#113950) * gh-113868: Add a number of MAP_* flags from macOS to module mmap (#113869) The new flags were extracted from the macOS 14.2 SDK. Co-authored-by: Serhiy Storchaka <storchaka@gmail.com> * gh-113710: Add types to the interpreter DSL (#113711) Co-authored-by: Jules <57632293+JuliaPoo@users.noreply.github.com> Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com> * gh-113971: Make `zipfile.ZipInfo._compresslevel` public as `.compress_level` (#113969) Make zipfile.ZipInfo.compress_level public. A property is used to retain the behavior of the ._compresslevel. People constructing zipfile.ZipInfo instances to pass into existing APIs to control per-file compression levels already treat this as public, there was never a reason for it not to be. I used the more modern name compress_level instead of compresslevel as the keyword argument on other ZipFile APIs is called to be consistent with compress_type and a general long term preference of not runningwordstogether without a separator in names. * GH-111802: set a low recursion limit for `test_bad_getattr()` in `test.pickletester` (GH-113996) * gh-95649: Document that asyncio contains uvloop code (#107536) Some of the asyncio SSL changes in GH-31275 [1] were taken from v0.16.0 of the uvloop project [2]. In order to comply with the MIT license, we need to just need to document the copyright information. [1]: https://github.com/python/cpython/pull/31275 [2]: https://github.com/MagicStack/uvloop/tree/v0.16.0 * gh-101100: Fix Sphinx Lint warnings in `Misc/` (#113946) Fix Sphinx Lint warnings in Misc/ * Fix a grammatical error in `pycore_pymem.h` (#112993) * Tutorial: Clarify 'nonzero exit status' in the appendix (#112039) * Link to the glossary for "magic methods" in ``MagicMock`` (#111292) The MagicMock documentation mentions magic methods several times without actually pointing to the term in the glossary. This can be helpful for people to fully understand what those magic methods are. * datamodel: Fix a typo in ``object.__init_subclass__`` (#111599) * GH-111801: set a lower recursion limit for `test_infintely_many_bases()` in `test_isinstance` (#113997) * gh-89159: Document missing TarInfo members (#91564) * GH-111798: skip `test_super_deep()` from `test_call` under pydebug builds on WASI (GH-114010) * GH-44626, GH-105476: Fix `ntpath.isabs()` handling of part-absolute paths (#113829) On Windows, `os.path.isabs()` now returns `False` when given a path that starts with exactly one (back)slash. This is more compatible with other functions in `os.path`, and with Microsoft's own documentation. Also adjust `pathlib.PureWindowsPath.is_absolute()` to call `ntpath.isabs()`, which corrects its handling of partial UNC/device paths like `//foo`. Co-authored-by: Jon Foster <jon@jon-foster.co.uk> * pathlib ABCs: add `_raw_path` property (#113976) It's wrong for the `PurePathBase` methods to rely so much on `__str__()`. Instead, they should treat the raw path(s) as opaque objects and leave the details to `pathmod`. This commit adds a `PurePathBase._raw_path` property and uses it through many of the other ABC methods. These methods are all redefined in `PurePath` and `Path`, so this has no effect on the public classes. * Add module docstring for `pathlib._abc`. (#113691) * gh-101225: Increase the socket backlog when creating a multiprocessing.connection.Listener (#113567) Increase the backlog for multiprocessing.connection.Listener` objects created by `multiprocessing.manager` and `multiprocessing.resource_sharer` to significantly reduce the risk of getting a connection refused error when creating a `multiprocessing.connection.Connection` to them. * gh-114014: Update `fractions.Fraction()`'s rational parsing regex (#114015) Fix a bug in the regex used for parsing a string input to the `fractions.Fraction` constructor. That bug led to an inconsistent exception message being given for some inputs. --------- Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com> Co-authored-by: Mark Dickinson <dickinsm@gmail.com> * gh-111803: Support loading more deeply nested lists in binary plist format (GH-114024) It no longer uses the C stack. The depth of nesting is only limited by Python recursion limit setting. * gh-113317: Move global utility functions into libclinic (#113986) Establish Tools/clinic/libclinic/utils.py and move the following functions over there: - compute_checksum() - create_regex() - write_file() * gh-101100: Fix Sphinx warnings in `howto/urllib2.rst` and `library/http.client.rst` (#114060) * Add `pathlib._abc.PathModuleBase` (#113893) Path modules provide a subset of the `os.path` API, specifically those functions needed to provide `PurePathBase` functionality. Each `PurePathBase` subclass references its path module via a `pathmod` class attribute. This commit adds a new `PathModuleBase` class, which provides abstract methods that unconditionally raise `UnsupportedOperation`. An instance of this class is assigned to `PurePathBase.pathmod`, replacing `posixpath`. As a result, `PurePathBase` is no longer POSIX-y by default, and all its methods raise `UnsupportedOperation` courtesy of `pathmod`. Users who subclass `PurePathBase` or `PathBase` should choose the path syntax by setting `pathmod` to `posixpath`, `ntpath`, `os.path`, or their own subclass of `PathModuleBase`, as circumstances demand. * Replace `pathlib._abc.PathModuleBase.splitroot()` with `splitdrive()` (#114065) This allows users of the `pathlib-abc` PyPI package to use `posixpath` or `ntpath` as a path module in versions of Python lacking `os.path.splitroot()` (3.11 and before). * gh-113317: Move FormatCounterFormatter into libclinic (#114066) * gh-109862: Fix test_create_subprocess_with_pidfd when it was run separately (GH-113991) * gh-114075: Capture `test_compileall` stdout output (#114076) Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com> * gh-113666: Adding missing UF_ and SF_ flags to module 'stat' (#113667) Add some constants to module 'stat' that are used on macOS. Co-authored-by: Serhiy Storchaka <storchaka@gmail.com> * GH-112354: `_GUARD_IS_TRUE_POP` side-exits to target the next instruction, not themselves. (GH-114078) * gh-109598: make PyComplex_RealAsDouble/ImagAsDouble use __complex__ (GH-109647) `PyComplex_RealAsDouble()`/`PyComplex_ImagAsDouble` now try to convert an object to a `complex` instance using its `__complex__()` method before falling back to the ``__float__()`` method. PyComplex_ImagAsDouble() also will not silently return 0.0 for non-complex types anymore. Instead we try to call PyFloat_AsDouble() and return 0.0 only if this call is successful. * gh-112532: Fix memory block count for free-threaded build (gh-113995) This fixes `_PyInterpreterState_GetAllocatedBlocks()` and `_Py_GetGlobalAllocatedBlocks()` in the free-threaded builds. The gh-113263 change that introduced multiple mimalloc heaps per-thread broke the logic for counting the number of allocated blocks. For subtle reasons, this led to reported reference count leaks in the refleaks buildbots. * gh-111968: Use per-thread slice_cache in free-threading (gh-113972) * gh-99437: runpy: decode path-like objects before setting globals * gh-114070: correct the specification of ``digit`` in the float() docs (#114080) * gh-91539: Small performance improvement of urrlib.request.getproxies_environment() (#108771) Small performance improvement of getproxies_environment() when there are many environment variables. In a benchmark with 5k environment variables not related to proxies, and 5 specifying proxies, we get a 10% walltime improvement. * gh-112087: Update list impl to be thread-safe with manual CS (gh-113863) * gh-78502: Add a trackfd parameter to mmap.mmap() (GH-25425) If *trackfd* is False, the file descriptor specified by *fileno* will not be duplicated. Co-authored-by: Erlend E. Aasland <erlend@python.org> Co-authored-by: Petr Viktorin <encukou@gmail.com> Co-authored-by: Serhiy Storchaka <storchaka@gmail.com> * gh-114101: Correct PyErr_Format arguments in _testcapi module (#114102) - use PyErr_SetString() iso. PyErr_Format() in parse_tuple_and_keywords() - fix misspelled format specifier in CHECK_SIGNNESS() macro * GH-113655: Lower the C recursion limit on various platforms (GH-113944) * gh-113358: Fix rendering tracebacks with exceptions with a broken __getattr__ (GH-113359) Co-authored-by: Irit Katriel <1055913+iritkatriel@users.noreply.github.com> * gh-113238: add Anchor to importlib.resources (#113801) Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com> * gh-114077: Fix OverflowError in socket.sendfile() when pass count >2GiB (GH-114079) * Docs: Align multiprocessing.shared_memory docs with Sphinx recommendations (#114103) - add :class: and :mod: markups where needed - fix incorrect escaping of a star in ShareableList arg spec - mark up parameters with stars: *val* - mark up list of built-in types using list markup - remove unneeded parentheses from :meth: markups * gh-113858: GH Actions: Limit max ccache size for the asan build (GH-114113) * gh-114107: Fix importlib.resources symlink test if symlinks aren't supported (#114108) gh-114107: Fix symlink test if symlinks aren't supported * gh-102468: Document `PyCFunction_New*` and `PyCMethod_New` (GH-112557) Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com> * gh-113626: Add allow_code parameter in marshal functions (GH-113648) Passing allow_code=False prevents serialization and de-serialization of code objects which is incompatible between Python versions. * gh-111968: Use per-thread freelists for PyContext in free-threading (gh-114122) * gh-114107: test.pythoninfo logs Windows Developer Mode (#114121) Also, don't skip the whole collect_windows() if ctypes is missing. Log also ctypes.windll.shell32.IsUserAnAdmin(). * Fix an incorrect comment in iobase_is_closed (GH-102952) This comment appears to have been mistakenly copied from what is now called iobase_check_closed() in commit 4d9aec022063. Also unite the iobase_check_closed() code with the relevant comment. Co-authored-by: Serhiy Storchaka <storchaka@gmail.com> * gh-114069: Revise Tutorial Methods paragraph (#114127) Remove excess words in the first and third sentences. * gh-114096: Restore privileges in _winapi.CreateJunction after creating the junction (GH-114089) This avoids impact on later parts of the application which may be able to do things they otherwise shouldn't. * Docs: Improve multiprocessing.SharedMemory reference (#114093) Align the multiprocessing shared memory docs with Diatáxis's recommendations for references. - use a parameter list for the SharedMemory.__init__() argument spec - use the imperative mode - use versionadded, not versionchanged, for added parameters - reflow touched lines according to SemBr * Fix 'expresion' typo in IDLE doc (#114130) The substantive change is on line 577/593. Rest is header/footer stuff ignored when displaying. * gh-113659: Skip hidden .pth files (GH-113660) Skip .pth files with names starting with a dot or hidden file attribute. * Clean up backslash avoiding code in ast, fix typo (#113605) As of #108553, the `_avoid_backslashes` code path is dead `scape_newlines` was introduced in #110271. Happy to drop the typo fix if we don't want it * GH-114013: fix setting `HOSTRUNNER` for `Tools/wasm/wasi.py` (GH-114097) Also fix tests found failing under a pydebug build of WASI thanks to `make test` working due to this change. * Update copyright years to 2024. (GH-113608) Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com> * gh-112529: Track if debug allocator is used as underlying allocator (#113747) * gh-112529: Track if debug allocator is used as underlying allocator The GC implementation for free-threaded builds will need to accurately detect if the debug allocator is used because it affects the offset of the Python object from the beginning of the memory allocation. The current implementation of `_PyMem_DebugEnabled` only considers if the debug allocator is the outer-most allocator; it doesn't handle the case of "hooks" like tracemalloc being used on top of the debug allocator. This change enables more accurate detection of the debug allocator by tracking when debug hooks are enabled. * Simplify _PyMem_DebugEnabled * gh-113655: Increase default stack size for PGO builds to avoid C stack exhaustion (GH-114148) * GH-78988: Document `pathlib.Path.glob()` exception propagation. (#114036) We propagate the `OSError` from the `is_dir()` call on the top-level directory, and suppress all others. * gh-94220: Align fnmatch docs with the implementation and amend markup (#114152) - Align the argument spec for fnmatch functions with the actual implementation. - Update Sphinx markup to recent recommandations. - Add link to 'iterable' glossary entry. Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com> * Fix typo in c_annotations.py comment (#108773) "compatability" => "compatibility" * GH-110109: pathlib docs: bring `from_uri()` and `as_uri()` together. (#110312) This is a very soft deprecation of `PurePath.as_uri()`. We instead document it as a `Path` method, and add a couple of sentences mentioning that it's also available in `PurePath`. Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com> * gh-106293: Fix typos in Objects/object_layout.md (#106294) Co-authored-by: Terry Jan Reedy <tjreedy@udel.edu> * gh-88531 Fix dataclass __post_init__/__init__ interplay documentation (gh-107404) * Simplify __post_init__ example usage. It applies to all base classes, not just dataclasses. * gh-112043: Align concurrent.futures.Executor.map docs with implementation (#114153) The first parameter is named 'fn', not 'func'. * gh-81479: For Help => IDLE Doc, stop double-spacing some lists. (#114168) This matches Firefox format. Edge double-spaces non-simple list but I think it looks worse. * gh-72284: Revise lists in IDLE doc (#114174) Tkinter is a fact, not necessarily a feature. Reorganize editor key bindings in a logical order and remove those that do not work, at least on Windows. Improve shell bindings list. * gh-86179: Skip test case that fails on POSIX with unversioned binary (GH-114136) * Python 3.13.0a3 * gh-104282: Fix null pointer dereference in `lzma._decode_filter_properties` (GH-104283) * gh-111301: Advertise importlib methods removal in What's new in Python 3.12 (GH-111630) * gh-112343: pdb: Use tokenize to replace convenience variables (#112380) * Post 3.13.0a3 * gh-114178: Fix generate_sbom.py for out-of-tree builds (#114179) * gh-114070: fix token reference warnings in expressions.rst (#114169) * gh-114149: [Enum] fix tuple subclass handling when using custom __new__ (GH-114160) * gh-105102: Fix nested unions in structures when the system byteorder is the opposite (GH-105106) * Fix typo in tkinter.ttk.rst (GH-106157) * gh-38807: Fix race condition in Lib/trace.py (GH-110143) Instead of checking if a directory does not exist and thereafter creating it, directly call os.makedirs() with the exist_ok=True. * gh-112984 Update Windows build and installer for free-threaded builds (GH-113129) * gh-112984: Fix test_ctypes.test_loading.test_load_dll_with_flags when directory name includes a dot (GH-114217) * gh-114149: [Enum] revert #114160 and add more tuple-subclass tests (GH-114215) This reverts commit 05e142b1543eb9662d6cc33722e7e16250c9219f. * gh-104522: Fix OSError raised when run a subprocess (#114195) Only set filename to cwd if it was caused by failed chdir(cwd). _fork_exec() now returns "noexec:chdir" for failed chdir(cwd). Co-authored-by: Robert O'Shea <PurityLake@users.noreply.github.com> * gh-113205: test_multiprocessing.test_terminate: Test the API on threadpools (#114186) gh-113205: test_multiprocessing.test_terminate: Test the API works on threadpools Threads can't be forced to terminate (without potentially corrupting too much state), so the expected behaviour of `ThreadPool.terminate` is to wait for the currently executing tasks to finish. The entire test was skipped in GH-110848 (0e9c364f4ac18a2237bdbac702b96bcf8ef9cb09). Instead of skipping it entirely, we should ensure the API eventually succeeds: use a shorter timeout. For the record: on my machine, when the test is un-skipped, the task manages to start in about 1.5% cases. * gh-114211: Update EmailMessage doc about ordered keys (#114224) Ordered keys are no longer unlike 'real dict's. * gh-96905: In IDLE code, stop redefining built-ins 'dict' and 'object' (#114227) Prefix 'dict' with 'o', 'g', or 'l' for 'object', 'global', or 'local'. Suffix 'object' with '_'. * gh-114231: Fix indentation in enum.rst (#114232) * gh-104522: Fix test_subprocess failure when build Python in the root home directory (GH-114236) * gh-104522: Fix test_subprocess failure when build Python in the root home directory EPERM is raised when setreuid() fails. EACCES is set in execve() when the test user has not access to sys.executable. * gh-114050: Fix crash when more than two arguments are passed to int() (GH-114067) Co-authored-by: Kirill Podoprigora <kirill.bast9@mail.ru> * gh-103092: Convert some `_ctypes` metatypes to heap types (GH-113620) Co-authored-by: Erlend E. Aasland <erlend@python.org> * gh-110345: show Tcl/Tk patchlevel in `tkinter._test()` (GH-110350) * Delete unused macro (GH-114238) * gh-108303: Move all doctest related files and tests to `Lib/test/test_doctest/` (#112109) Co-authored-by: Brett Cannon <brett@python.org> * gh-114198: Rename dataclass __replace__ argument to 'self' (gh-114251) This change renames the dataclass __replace__ method's first argument name from 'obj' to 'self'. * gh-114087: Speed up dataclasses._asdict_inner (#114088) * gh-111968: Use per-thread freelists for generator in free-threading (gh-114189) * gh-112092: clarify unstable ABI recompilation requirements (#112093) Use different versions in the examples for when extensions do and do not need to be recompiled to make the examples easier to understand. * gh-114123: Migrate docstring from _csv to csv (#114124) Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com> Co-authored-by: Éric <merwok@netwok.org> * gh-112087: Remove duplicated critical_section (gh-114268) * gh-111968: Fix --without-freelists build (gh-114270) * gh-114286: Fix `maybe-uninitialized` warning in `Modules/_io/fileio.c` (GH-114287) * gh-113884: Refactor `queue.SimpleQueue` to use a ring buffer to store items (#114259) Use a ring buffer instead of a Python list in order to simplify the process of making queue.SimpleQueue thread-safe in free-threaded builds. The ring buffer implementation has no places where critical sections may be released. * gh-114275: Skip doctests that use `asyncio` in `test_pdb` for WASI builds (#114309) * gh-114265: move line number propagation before cfg optimization, remove guarantee_lineno_for_exits (#114267) * Retain shorter tables of contents for Sphinx 5.2.3+ (#114318) Disable toc_object_entries, new in Sphinx 5.2.3 * Add a `clean` subcommand to `Tools/wasm/wasi.py` (GH-114274) * GH-79634: Accept path-like objects as pathlib glob patterns. (#114017) Allow `os.PathLike` objects to be passed as patterns to `pathlib.Path.glob()` and `rglob()`. (It's already possible to use them in `PurePath.match()`) While we're in the area: - Allow empty glob patterns in `PathBase` (but not `Path`) - Speed up globbing in `PathBase` by generating paths with trailing slashes only as a final step, rather than for every intermediate directory. - Simplify and speed up handling of rare patterns involving both `**` and `..` segments. * GH-113225: Speed up `pathlib.Path.walk(top_down=False)` (#113693) Use `_make_child_entry()` rather than `_make_child_relpath()` to retrieve path objects for directories to visit. This saves the allocation of one path object per directory in user subclasses of `PathBase`, and avoids a second loop. This trick does not apply when walking top-down, because users can affect the walk by modifying *dirnames* in-place. A side effect of this change is that, in bottom-up mode, subdirectories of each directory are visited in reverse order, and that this order doesn't match that of the names in *dirnames*. I suspect this is fine as the order is arbitrary anyway. * gh-114332: Fix the flags reference for ``re.compile()`` (#114334) The GH-93000 change set inadvertently caused a sentence in re.compile() documentation to refer to details that no longer followed. Correct this with a link to the Flags sub-subsection. Co-authored-by: Adam Turner <9087854+aa-turner@users.noreply.github.com> * GH-99380: Update to Sphinx 7 (#99381) * Docs: structure the ftplib reference (#114317) Introduce the following headings and subheadings: - Reference * FTP objects * FTP_TLS objects * Module variables * gh-112529: Use GC heaps for GC allocations in free-threaded builds (gh-114157) * gh-112529: Use GC heaps for GC allocations in free-threaded builds The free-threaded build's garbage collector implementation will need to find GC objects by traversing mimalloc heaps. This hooks up the allocation calls with the correct heaps by using a thread-local "current_obj_heap" variable. * Refactor out setting heap based on type * gh-114281: Remove incorrect type hints from `asyncio.staggered` (#114282) Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com> * Docs: Add missing line continuation to FTP_TLS class docs (#114352) Regression introduced by b1ad5a5d4. * Remove the non-test Lib/test/time_hashlib.py. (#114354) I believe I added this while chasing some performance of hash functions when I first created hashlib. It hasn't been used since, is frankly trivial, and not a test. * Remove deleted `time_hashlib.py` from `Lib/test/.ruff.toml` (#114355) * Fix the confusing "User-defined methods" reference in the datamodel (#114276) Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com> Co-authored-by: Sergey B Kirpichev <skirpichev@gmail.com> * Docs: mark up the FTP debug levels as a list (#114360) Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com> * gh-101100: Fix sphinx warnings in `Doc/c-api/memory.rst` (#114373) * gh-80931: Skip some socket tests while hunting for refleaks on macOS (#114057) Some socket tests related to sending file descriptors cause a file descriptor leak on macOS, all of them tests that send one or more descriptors than cannot be received on the read end. This appears to be a platform bug. This PR skips those tests when doing a refleak test run to avoid hiding other problems. * Docs: mark up FTP() constructor with param list (#114359) Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com> * gh-114384: Align sys.set_asyncgen_hooks signature in docs to reflect implementation (#114385) * Docs: link to sys.stdout in ftplib docs (#114396) --------- Co-authored-by: Donghee Na <donghee.na@python.org> Co-authored-by: Sam Gross <colesbury@gmail.com> Co-authored-by: Irit Katriel <1055913+iritkatriel@users.noreply.github.com> Co-authored-by: Itamar Oren <itamarost@gmail.com> Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com> Co-authored-by: Hugo van Kemenade <hugovk@users.noreply.github.com> Co-authored-by: Filip Łapkiewicz <80906036+fipachu@users.noreply.github.com> Co-authored-by: Brandt Bucher <brandtbucher@microsoft.com> Co-authored-by: Jamie Phan <jamie@ordinarylab.dev> Co-authored-by: wookie184 <wookie1840@gmail.com> Co-authored-by: Guido van Rossum <guido@python.org> Co-authored-by: Barney Gale <barney.gale@gmail.com> Co-authored-by: Mark Shannon <mark@hotpy.org> Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com> Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com> Co-authored-by: Zackery Spytz <zspytz@gmail.com> Co-authored-by: Xavier de Gaye <xdegaye@gmail.com> Co-authored-by: Serhiy Storchaka <storchaka@gmail.com> Co-authored-by: Ronald Oussoren <ronaldoussoren@mac.com> Co-authored-by: Terry Jan Reedy <tjreedy@udel.edu> Co-authored-by: AN Long <aisk@users.noreply.github.com> Co-authored-by: Grigoriev Semyon <33061489+grigoriev-semyon@users.noreply.github.com> Co-authored-by: Rami <72725910+ramikg@users.noreply.github.com> Co-authored-by: Christian Heimes <christian@python.org> Co-authored-by: Gregory P. Smith <greg@krypto.org> Co-authored-by: Erlend E. Aasland <erlend@python.org> Co-authored-by: Levi Sabah <0xl3vi@gmail.com> Co-authored-by: Lee Cannon <leecannon@leecannon.xyz> Co-authored-by: Sergey B Kirpichev <skirpichev@gmail.com> Co-authored-by: neonene <53406459+neonene@users.noreply.github.com> Co-authored-by: Raymond Hettinger <rhettinger@users.noreply.github.com> Co-authored-by: Jakub Kulík <Kulikjak@gmail.com> Co-authored-by: Kristján Valur Jónsson <sweskman@gmail.com> Co-authored-by: Steve Dower <steve.dower@python.org> Co-authored-by: mara004 <geisserml@gmail.com> Co-authored-by: William Andrea <william.j.andrea@gmail.com> Co-authored-by: Adam Turner <9087854+aa-turner@users.noreply.github.com> Co-authored-by: Vinay Sajip <vinay_sajip@yahoo.co.uk> Co-authored-by: Yan Yanchii <yyanchiy@gmail.com> Co-authored-by: Pieter Eendebak <pieter.eendebak@gmail.com> Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com> Co-authored-by: Stefano Rivera <stefano@rivera.za.net> Co-authored-by: Petr Viktorin <encukou@gmail.com> Co-authored-by: Victor Stinner <vstinner@python.org> Co-authored-by: Seth Michael Larson <sethmichaellarson@gmail.com> Co-authored-by: Steve Dower <steve.dower@microsoft.com> Co-authored-by: Kirill Podoprigora <kirill.bast9@mail.ru> Co-authored-by: Nikita Sobolev <mail@sobolevn.me> Co-authored-by: Peter Lazorchak <lazorchakp@gmail.com> Co-authored-by: Mariusz Felisiak <felisiak.mariusz@gmail.com> Co-authored-by: Ned Batchelder <ned@nedbatchelder.com> Co-authored-by: Ken Jin <kenjin@python.org> Co-authored-by: Jules <57632293+JuliaPoo@users.noreply.github.com> Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com> Co-authored-by: Brett Cannon <brett@python.org> Co-authored-by: Alois Klink <alois@aloisklink.com> Co-authored-by: Joseph Pearson <74079531+JoeyPearson822@users.noreply.github.com> Co-authored-by: Andrew Zipperer <47086307+zipperer@users.noreply.github.com> Co-authored-by: Pierre Equoy <pierre.equoy@canonical.com> Co-authored-by: InSync <122007197+InSyncWithFoo@users.noreply.github.com> Co-authored-by: Stanley <46876382+slateny@users.noreply.github.com> Co-authored-by: Jon Foster <jon@jon-foster.co.uk> Co-authored-by: Crowthebird <78076854+thatbirdguythatuknownot@users.noreply.github.com> Co-authored-by: Mark Dickinson <dickinsm@gmail.com> Co-authored-by: Kamil Turek <kamil.turek@hotmail.com> Co-authored-by: Raphaël Marinier <raphael.marinier@gmail.com> Co-authored-by: Jérome Perrin <perrinjerome@gmail.com> Co-authored-by: Mike Zimin <122507876+mikeziminio@users.noreply.github.com> Co-authored-by: Jonathon Reinhart <JonathonReinhart@users.noreply.github.com> Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com> Co-authored-by: solya0x <41440448+0xSalikh@users.noreply.github.com> Co-authored-by: Kuan-Wei Chiu <visitorckw@gmail.com> Co-authored-by: Mano Sriram <mano.sriram0@gmail.com> Co-authored-by: Steffen Zeile <48187781+Kaniee@users.noreply.github.com> Co-authored-by: Thomas Wouters <thomas@python.org> Co-authored-by: Radislav Chugunov <52372310+chgnrdv@users.noreply.github.com> Co-authored-by: Karolina Surma <33810531+befeleme@users.noreply.github.com> Co-authored-by: Tian Gao <gaogaotiantian@hotmail.com> Co-authored-by: Ethan Furman <ethan@stoneleaf.us> Co-authored-by: Sheidan <37596668+Sh3idan@users.noreply.github.com> Co-authored-by: Christophe Nanteuil <35002064+christopheNan@users.noreply.github.com> Co-authored-by: buermarc <44375277+buermarc@users.noreply.github.com> Co-authored-by: Robert O'Shea <PurityLake@users.noreply.github.com> Co-authored-by: Miyashita Yosuke <44266492+miyashiiii@users.noreply.github.com> Co-authored-by: kcatss <kcats9731@gmail.com> Co-authored-by: Christopher Chavez <chrischavez@gmx.us> Co-authored-by: Phillip Schanely <pschanely@gmail.com> Co-authored-by: keithasaurus <592217+keithasaurus@users.noreply.github.com> Co-authored-by: DerSchinken <53398996+DerSchinken@users.noreply.github.com> Co-authored-by: Skip Montanaro <skip.montanaro@gmail.com> Co-authored-by: Éric <merwok@netwok.org> Co-authored-by: mpage <mpage@meta.com> Co-authored-by: David H. Gutteridge <dhgutteridge@users.noreply.github.com> Co-authored-by: cdzhan <zhancdi@163.com>
…-113263) * pythongh-112532: Use separate mimalloc heaps for GC objects In `--disable-gil` builds, we now use four separate heaps in anticipation of using mimalloc to find GC objects when the GIL is disabled. To support this, we also make a few changes to mimalloc: * `mi_heap_t` and `mi_tld_t` initialization is split from allocation. This allows us to have a `mi_tld_t` per-`PyThreadState`, which is important to keep interpreter isolation, since the same OS thread may run in multiple interpreters (using different PyThreadStates.) * Heap abandoning (mi_heap_collect_ex) can now be called from a different thread than the one that created the heap. This is necessary because we may clear and delete the containing PyThreadStates from a different thread during finalization and after fork(). * Use enum instead of defines and guard mimalloc includes. * The enum typedef will be convenient for future PRs that use the type. * Guarding the mimalloc includes allows us to unconditionally include pycore_mimalloc.h from other header files that rely on things like `struct _mimalloc_thread_state`. * Only define _mimalloc_thread_state in Py_GIL_DISABLED builds
…ngh-113492) pythongh-112532: Fix peg generator for mimalloc build
…3717) * pythongh-112532: Isolate abandoned segments by interpreter Mimalloc segments are data structures that contain memory allocations along with metadata. Each segment is "owned" by a thread. When a thread exits, it abandons its segments to a global pool to be later reclaimed by other threads. This changes the pool to be per-interpreter instead of process-wide. This will be important for when we use mimalloc to find GC objects in the `--disable-gil` builds. We want heaps to only store Python objects from a single interpreter. Absent this change, the abandoning and reclaiming process could break this isolation. * Add missing '&_mi_abandoned_default' to 'tld_empty'
* pythongh-112532: Tag mimalloc heaps and pages Mimalloc pages are data structures that contain contiguous allocations of the same block size. Note that they are distinct from operating system pages. Mimalloc pages are contained in segments. When a thread exits, it abandons any segments and contained pages that have live allocations. These segments and pages may be later reclaimed by another thread. To support GC and certain thread-safety guarantees in free-threaded builds, we want pages to only be reclaimed by the corresponding heap in the claimant thread. For example, we want pages containing GC objects to only be claimed by GC heaps. This allows heaps and pages to be tagged with an integer tag that is used to ensure that abandoned pages are only claimed by heaps with the same tag. Heaps can be initialized with a tag (0-15); any page allocated by that heap copies the corresponding tag. * Fix conversion warning
…ongh-113995) This fixes `_PyInterpreterState_GetAllocatedBlocks()` and `_Py_GetGlobalAllocatedBlocks()` in the free-threaded builds. The pythongh-113263 change that introduced multiple mimalloc heaps per-thread broke the logic for counting the number of allocated blocks. For subtle reasons, this led to reported reference count leaks in the refleaks buildbots.
This adds support for visiting abandoned pages in mimalloc and improves the performance of the page visiting code. Abandoned pages contain memory blocks from threads that have exited. At some point, they may be later reclaimed by other threads. We still need to visit those pages in the free-threaded GC because they contain live objects. This also reduces the overhead of visiting mimalloc pages: * Special cases for full, empty, and pages containing only a single block. * Fix free_map to use one bit instead of one byte per block. * Use fast integer division by a constant algorithm when computing block offset from block size and index.
…-113263) * pythongh-112532: Use separate mimalloc heaps for GC objects In `--disable-gil` builds, we now use four separate heaps in anticipation of using mimalloc to find GC objects when the GIL is disabled. To support this, we also make a few changes to mimalloc: * `mi_heap_t` and `mi_tld_t` initialization is split from allocation. This allows us to have a `mi_tld_t` per-`PyThreadState`, which is important to keep interpreter isolation, since the same OS thread may run in multiple interpreters (using different PyThreadStates.) * Heap abandoning (mi_heap_collect_ex) can now be called from a different thread than the one that created the heap. This is necessary because we may clear and delete the containing PyThreadStates from a different thread during finalization and after fork(). * Use enum instead of defines and guard mimalloc includes. * The enum typedef will be convenient for future PRs that use the type. * Guarding the mimalloc includes allows us to unconditionally include pycore_mimalloc.h from other header files that rely on things like `struct _mimalloc_thread_state`. * Only define _mimalloc_thread_state in Py_GIL_DISABLED builds
…ngh-113492) pythongh-112532: Fix peg generator for mimalloc build
…3717) * pythongh-112532: Isolate abandoned segments by interpreter Mimalloc segments are data structures that contain memory allocations along with metadata. Each segment is "owned" by a thread. When a thread exits, it abandons its segments to a global pool to be later reclaimed by other threads. This changes the pool to be per-interpreter instead of process-wide. This will be important for when we use mimalloc to find GC objects in the `--disable-gil` builds. We want heaps to only store Python objects from a single interpreter. Absent this change, the abandoning and reclaiming process could break this isolation. * Add missing '&_mi_abandoned_default' to 'tld_empty'
* pythongh-112532: Tag mimalloc heaps and pages Mimalloc pages are data structures that contain contiguous allocations of the same block size. Note that they are distinct from operating system pages. Mimalloc pages are contained in segments. When a thread exits, it abandons any segments and contained pages that have live allocations. These segments and pages may be later reclaimed by another thread. To support GC and certain thread-safety guarantees in free-threaded builds, we want pages to only be reclaimed by the corresponding heap in the claimant thread. For example, we want pages containing GC objects to only be claimed by GC heaps. This allows heaps and pages to be tagged with an integer tag that is used to ensure that abandoned pages are only claimed by heaps with the same tag. Heaps can be initialized with a tag (0-15); any page allocated by that heap copies the corresponding tag. * Fix conversion warning
…ongh-113995) This fixes `_PyInterpreterState_GetAllocatedBlocks()` and `_Py_GetGlobalAllocatedBlocks()` in the free-threaded builds. The pythongh-113263 change that introduced multiple mimalloc heaps per-thread broke the logic for counting the number of allocated blocks. For subtle reasons, this led to reported reference count leaks in the refleaks buildbots.
This adds support for visiting abandoned pages in mimalloc and improves the performance of the page visiting code. Abandoned pages contain memory blocks from threads that have exited. At some point, they may be later reclaimed by other threads. We still need to visit those pages in the free-threaded GC because they contain live objects. This also reduces the overhead of visiting mimalloc pages: * Special cases for full, empty, and pages containing only a single block. * Fix free_map to use one bit instead of one byte per block. * Use fast integer division by a constant algorithm when computing block offset from block size and index.
I think think this is done now. There are still a few QSBR-related mimalloc changes, but those will be tracked in #115103. |
…-113263) * pythongh-112532: Use separate mimalloc heaps for GC objects In `--disable-gil` builds, we now use four separate heaps in anticipation of using mimalloc to find GC objects when the GIL is disabled. To support this, we also make a few changes to mimalloc: * `mi_heap_t` and `mi_tld_t` initialization is split from allocation. This allows us to have a `mi_tld_t` per-`PyThreadState`, which is important to keep interpreter isolation, since the same OS thread may run in multiple interpreters (using different PyThreadStates.) * Heap abandoning (mi_heap_collect_ex) can now be called from a different thread than the one that created the heap. This is necessary because we may clear and delete the containing PyThreadStates from a different thread during finalization and after fork(). * Use enum instead of defines and guard mimalloc includes. * The enum typedef will be convenient for future PRs that use the type. * Guarding the mimalloc includes allows us to unconditionally include pycore_mimalloc.h from other header files that rely on things like `struct _mimalloc_thread_state`. * Only define _mimalloc_thread_state in Py_GIL_DISABLED builds
…ngh-113492) pythongh-112532: Fix peg generator for mimalloc build
…3717) * pythongh-112532: Isolate abandoned segments by interpreter Mimalloc segments are data structures that contain memory allocations along with metadata. Each segment is "owned" by a thread. When a thread exits, it abandons its segments to a global pool to be later reclaimed by other threads. This changes the pool to be per-interpreter instead of process-wide. This will be important for when we use mimalloc to find GC objects in the `--disable-gil` builds. We want heaps to only store Python objects from a single interpreter. Absent this change, the abandoning and reclaiming process could break this isolation. * Add missing '&_mi_abandoned_default' to 'tld_empty'
* pythongh-112532: Tag mimalloc heaps and pages Mimalloc pages are data structures that contain contiguous allocations of the same block size. Note that they are distinct from operating system pages. Mimalloc pages are contained in segments. When a thread exits, it abandons any segments and contained pages that have live allocations. These segments and pages may be later reclaimed by another thread. To support GC and certain thread-safety guarantees in free-threaded builds, we want pages to only be reclaimed by the corresponding heap in the claimant thread. For example, we want pages containing GC objects to only be claimed by GC heaps. This allows heaps and pages to be tagged with an integer tag that is used to ensure that abandoned pages are only claimed by heaps with the same tag. Heaps can be initialized with a tag (0-15); any page allocated by that heap copies the corresponding tag. * Fix conversion warning
…ongh-113995) This fixes `_PyInterpreterState_GetAllocatedBlocks()` and `_Py_GetGlobalAllocatedBlocks()` in the free-threaded builds. The pythongh-113263 change that introduced multiple mimalloc heaps per-thread broke the logic for counting the number of allocated blocks. For subtle reasons, this led to reported reference count leaks in the refleaks buildbots.
This adds support for visiting abandoned pages in mimalloc and improves the performance of the page visiting code. Abandoned pages contain memory blocks from threads that have exited. At some point, they may be later reclaimed by other threads. We still need to visit those pages in the free-threaded GC because they contain live objects. This also reduces the overhead of visiting mimalloc pages: * Special cases for full, empty, and pages containing only a single block. * Fix free_map to use one bit instead of one byte per block. * Use fast integer division by a constant algorithm when computing block offset from block size and index.
Feature or enhancement
Mimalloc was added as an allocator in #90815. The
--disable-gil
builds need further integration with mimalloc, as well as some modifications to mimalloc to support thread-safe garbage collection in--disable-gil
builds and the dictionary accesses that mostly avoid locking.These changes can be split up across multiple PRs.
PyMem_Malloc
calls, but we need separate heaps forPyObject_Malloc
andPyObject_GC_New
. We should associate somemi_heap_t
s with eachPyThreadState
. Every PyThreadState needs four heaps: one for PyMem_Malloc, one for non-GC objects (via PyObject_Malloc), one for GC objects with managed dicts (extra pre-header) and one for GC objects without a managed dict. We need some way to know which heap to use in_PyObject_MiMalloc
. There's not a great way to do this, but I suggest adding something like a "current pyobject heap" variable to PyThreadState. It should generally point to thePyObject_Malloc
heap, butPyObject_GC_New
should temporarily override it to point to the correct GC heap when called--disable-gil
should imply--with-mimalloc
and require mimalloc (i.e., disallow changing the allocator withPYTHONMALLOC
).cc @DinoV
Linked PRs
--disable-gil
builds #112883The text was updated successfully, but these errors were encountered: