Skip to content

Commit

Permalink
[3.11] gh-108822: Backport libregrtest changes from the main branch (#…
Browse files Browse the repository at this point in the history
…108820)

* Revert "[3.11] gh-101634: regrtest reports decoding error as failed test (#106169) (#106175)"

This reverts commit d5418e9.

* Revert "[3.11] bpo-46523: fix tests rerun when `setUp[Class|Module]` fails (GH-30895) (GH-103342)"

This reverts commit ecb09a8.

* Revert "gh-95027: Fix regrtest stdout encoding on Windows (GH-98492)"

This reverts commit b2aa28e.

* Revert "[3.11] gh-94026: Buffer regrtest worker stdout in temporary file (GH-94253) (GH-94408)"

This reverts commit 0122ab2.

* Revert "Run Tools/scripts/reindent.py (GH-94225)"

This reverts commit f0f3a42.

* Revert "gh-94052: Don't re-run failed tests with --python option (GH-94054)"

This reverts commit 1347607.

* Revert "[3.11] gh-84461: Fix Emscripten umask and permission issues (GH-94002) (GH-94006)"

This reverts commit 1073184.

* gh-93353: regrtest checks for leaked temporary files (#93776)

When running tests with -jN, create a temporary directory per process
and mark a test as "environment changed" if a test leaks a temporary
file or directory.

(cherry picked from commit e566ce5)

* gh-93353: Fix regrtest for -jN with N >= 2 (GH-93813)

(cherry picked from commit 36934a1)

* gh-93353: regrtest supports checking tmp files with -j2 (#93909)

regrtest now also implements checking for leaked temporary files and
directories when using -jN for N >= 2. Use tempfile.mkdtemp() to
create the temporary directory. Skip this check on WASI.

(cherry picked from commit 4f85cec)

* gh-84461: Fix Emscripten umask and permission issues (GH-94002)

- Emscripten's default umask is too strict, see
  emscripten-core/emscripten#17269
- getuid/getgid and geteuid/getegid are stubs that always return 0
  (root). Disable effective uid/gid syscalls and fix tests that use
  chmod() current user.
- Cannot drop X bit from directory.

(cherry picked from commit 2702e40)

* gh-94052: Don't re-run failed tests with --python option (#94054)

(cherry picked from commit 0ff7b99)

* Run Tools/scripts/reindent.py (#94225)

Reindent files which were not properly formatted (PEP 8: 4 spaces).

Remove also some trailing spaces.

(cherry picked from commit e87ada4)

* gh-94026: Buffer regrtest worker stdout in temporary file (GH-94253)

Co-authored-by: Victor Stinner <vstinner@python.org>
(cherry picked from commit 199ba23)

* gh-96465: Clear fractions hash lru_cache under refleak testing (GH-96689)

Automerge-Triggered-By: GH:zware
(cherry picked from commit 9c8f379)

* gh-95027: Fix regrtest stdout encoding on Windows (#98492)

On Windows, when the Python test suite is run with the -jN option,
the ANSI code page is now used as the encoding for the stdout
temporary file, rather than using UTF-8 which can lead to decoding
errors.

(cherry picked from commit ec1f6f5)

* gh-98903: Test suite fails with exit code 4 if no tests ran (#98904)

The Python test suite now fails wit exit code 4 if no tests ran. It
should help detecting typos in test names and test methods.

* Add "EXITCODE_" constants to Lib/test/libregrtest/main.py.
* Fix a typo: "NO TEST RUN" becomes "NO TESTS RAN"

(cherry picked from commit c76db37)

* gh-100086: Add build info to test.libregrtest (#100093)

The Python test runner (libregrtest) now logs Python build information like
"debug" vs "release" build, or LTO and PGO optimizations.

(cherry picked from commit 3c89202)

* bpo-46523: fix tests rerun when `setUp[Class|Module]` fails (#30895)

Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
Co-authored-by: Łukasz Langa <lukasz@langa.pl>
(cherry picked from commit 9953860)

* gh-82054: allow test runner to split test_asyncio to execute in parallel by sharding. (#103927)

This runs test_asyncio sub-tests in parallel using sharding from Cinder. This suite is typically the longest-pole in runs because it is a test package with a lot of further sub-tests otherwise run serially. By breaking out the sub-tests as independent modules we can run a lot more in parallel.

After porting we can see the direct impact on a multicore system.

Without this change:
  Running make test is 5 min 26 seconds
With this change:
  Running make test takes 3 min 39 seconds

That'll vary based on system and parallelism. On a `-j 4` run similar to what CI and buildbot systems often do, it reduced the overall test suite completion latency by 10%.

The drawbacks are that this implementation is hacky and due to the sorting of the tests it obscures when the asyncio tests occur and involves changing CPython test infrastructure but, the wall time saved it is worth it, especially in low-core count CI runs as it pulls a long tail. The win for productivity and reserved CI resource usage is significant.

Future tests that deserve to be refactored into split up suites to benefit from are test_concurrent_futures and the way the _test_multiprocessing suite gets run for all start methods. As exposed by passing the -o flag to python -m test to get a list of the 10 longest running tests.

---------

Co-authored-by: Carl Meyer <carl@oddbird.net>
Co-authored-by: Gregory P. Smith <greg@krypto.org> [Google, LLC]
(cherry picked from commit 9e011e7)

* Display the sanitizer config in the regrtest header. (#105301)

Display the sanitizers present in libregrtest.

Having this in the CI output for tests with the relevant environment
variable displayed will help make it easier to do what we need to
create an equivalent local test run.

(cherry picked from commit 852348a)

* gh-101634: regrtest reports decoding error as failed test (#106169)

When running the Python test suite with -jN option, if a worker stdout
cannot be decoded from the locale encoding report a failed testn so the
exitcode is non-zero.

(cherry picked from commit 2ac3eec)

* gh-108223: test.pythoninfo and libregrtest log Py_NOGIL (#108238)

Enable with --disable-gil --without-pydebug:

    $ make pythoninfo|grep NOGIL
    sysconfig[Py_NOGIL]: 1

    $ ./python -m test
    ...
    == Python build: nogil debug
    ...

(cherry picked from commit 5afe0c1)

* gh-90791: test.pythoninfo logs ASAN_OPTIONS env var (#108289)

* Cleanup libregrtest code logging ASAN_OPTIONS.
* Fix a typo on "ASAN_OPTIONS" vs "MSAN_OPTIONS".

(cherry picked from commit 3a1ac87)

* gh-108388: regrtest splits test_asyncio package (#108393)

Currently, test_asyncio package is only splitted into sub-tests when
using command "./python -m test". With this change, it's also
splitted when passing it on the command line:
"./python -m test test_asyncio".

Remove the concept of "STDTESTS". Python is now mature enough to not
have to bother with that anymore. Removing STDTESTS simplify the
code.

(cherry picked from commit 174e9da)

* regrtest computes statistics (#108793)

test_netrc, test_pep646_syntax and test_xml_etree now return results
in the test_main() function.

Changes:

* Rewrite TestResult as a dataclass with a new State class.
* Add test.support.TestStats class and Regrtest.stats_dict attribute.
* libregrtest.runtest functions now modify a TestResult instance
  in-place.
* libregrtest summary lists the number of run tests and skipped
  tests, and denied resources.
* Add TestResult.has_meaningful_duration() method.
* Compute TestResult duration in the upper function.
* Use time.perf_counter() instead of time.monotonic().
* Regrtest: rename 'resource_denieds' attribute to 'resource_denied'.
* Rename CHILD_ERROR to MULTIPROCESSING_ERROR.
* Use match/case syntadx to have different code depending on the
  test state.

Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
(cherry picked from commit d4e534c)

* gh-108822: Add Changelog entry for regrtest statistics (#108821)

---------

Co-authored-by: Christian Heimes <christian@python.org>
Co-authored-by: Zachary Ware <zach@python.org>
Co-authored-by: Nikita Sobolev <mail@sobolevn.me>
Co-authored-by: Joshua Herman <zitterbewegung@gmail.com>
Co-authored-by: Gregory P. Smith <greg@krypto.org>
  • Loading branch information
6 people authored Sep 3, 2023
1 parent ba47d87 commit 79f7a4c
Show file tree
Hide file tree
Showing 18 changed files with 812 additions and 375 deletions.
195 changes: 133 additions & 62 deletions Lib/test/libregrtest/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,14 +11,14 @@
import unittest
from test.libregrtest.cmdline import _parse_args
from test.libregrtest.runtest import (
findtests, runtest, get_abs_module, is_failed,
STDTESTS, NOTTESTS, PROGRESS_MIN_TIME,
Passed, Failed, EnvChanged, Skipped, ResourceDenied, Interrupted,
ChildError, DidNotRun)
findtests, split_test_packages, runtest, get_abs_module,
PROGRESS_MIN_TIME, State)
from test.libregrtest.setup import setup_tests
from test.libregrtest.pgo import setup_pgo_tests
from test.libregrtest.utils import removepy, count, format_duration, printlist
from test.libregrtest.utils import (removepy, count, format_duration,
printlist, get_build_info)
from test import support
from test.support import TestStats
from test.support import os_helper
from test.support import threading_helper

Expand Down Expand Up @@ -77,13 +77,14 @@ def __init__(self):
self.good = []
self.bad = []
self.skipped = []
self.resource_denieds = []
self.resource_denied = []
self.environment_changed = []
self.run_no_tests = []
self.need_rerun = []
self.rerun = []
self.first_result = None
self.interrupted = False
self.stats_dict: dict[str, TestStats] = {}

# used by --slow
self.test_times = []
Expand All @@ -92,7 +93,7 @@ def __init__(self):
self.tracer = None

# used to display the progress bar "[ 3/100]"
self.start_time = time.monotonic()
self.start_time = time.perf_counter()
self.test_count = ''
self.test_count_width = 1

Expand All @@ -110,36 +111,41 @@ def __init__(self):

def get_executed(self):
return (set(self.good) | set(self.bad) | set(self.skipped)
| set(self.resource_denieds) | set(self.environment_changed)
| set(self.resource_denied) | set(self.environment_changed)
| set(self.run_no_tests))

def accumulate_result(self, result, rerun=False):
test_name = result.name

if not isinstance(result, (ChildError, Interrupted)) and not rerun:
self.test_times.append((result.duration_sec, test_name))

if isinstance(result, Passed):
self.good.append(test_name)
elif isinstance(result, ResourceDenied):
self.skipped.append(test_name)
self.resource_denieds.append(test_name)
elif isinstance(result, Skipped):
self.skipped.append(test_name)
elif isinstance(result, EnvChanged):
self.environment_changed.append(test_name)
elif isinstance(result, Failed):
if not rerun:
self.bad.append(test_name)
self.need_rerun.append(result)
elif isinstance(result, DidNotRun):
self.run_no_tests.append(test_name)
elif isinstance(result, Interrupted):
self.interrupted = True
else:
raise ValueError("invalid test result: %r" % result)
test_name = result.test_name

if result.has_meaningful_duration() and not rerun:
self.test_times.append((result.duration, test_name))

if rerun and not isinstance(result, (Failed, Interrupted)):
match result.state:
case State.PASSED:
self.good.append(test_name)
case State.ENV_CHANGED:
self.environment_changed.append(test_name)
case State.SKIPPED:
self.skipped.append(test_name)
case State.RESOURCE_DENIED:
self.skipped.append(test_name)
self.resource_denied.append(test_name)
case State.INTERRUPTED:
self.interrupted = True
case State.DID_NOT_RUN:
self.run_no_tests.append(test_name)
case _:
if result.is_failed(self.ns.fail_env_changed):
if not rerun:
self.bad.append(test_name)
self.need_rerun.append(result)
else:
raise ValueError(f"invalid test state: {state!r}")

if result.stats is not None:
self.stats_dict[result.test_name] = result.stats

if rerun and not(result.is_failed(False) or result.state == State.INTERRUPTED):
self.bad.remove(test_name)

xml_data = result.xml_data
Expand All @@ -161,7 +167,7 @@ def log(self, line=''):
line = f"load avg: {load_avg:.2f} {line}"

# add the timestamp prefix: "0:01:05 "
test_time = time.monotonic() - self.start_time
test_time = time.perf_counter() - self.start_time

mins, secs = divmod(int(test_time), 60)
hours, mins = divmod(mins, 60)
Expand Down Expand Up @@ -245,26 +251,23 @@ def find_tests(self, tests):
# add default PGO tests if no tests are specified
setup_pgo_tests(self.ns)

stdtests = STDTESTS[:]
nottests = NOTTESTS.copy()
exclude = set()
if self.ns.exclude:
for arg in self.ns.args:
if arg in stdtests:
stdtests.remove(arg)
nottests.add(arg)
exclude.add(arg)
self.ns.args = []

# if testdir is set, then we are not running the python tests suite, so
# don't add default tests to be executed or skipped (pass empty values)
if self.ns.testdir:
alltests = findtests(self.ns.testdir, list(), set())
else:
alltests = findtests(self.ns.testdir, stdtests, nottests)
alltests = findtests(testdir=self.ns.testdir, exclude=exclude)

if not self.ns.fromfile:
self.selected = self.tests or self.ns.args or alltests
self.selected = self.tests or self.ns.args
if self.selected:
self.selected = split_test_packages(self.selected)
else:
self.selected = alltests
else:
self.selected = self.tests

if self.ns.single:
self.selected = self.selected[:1]
try:
Expand Down Expand Up @@ -339,7 +342,7 @@ def rerun_failed_tests(self):
rerun_list = list(self.need_rerun)
self.need_rerun.clear()
for result in rerun_list:
test_name = result.name
test_name = result.test_name
self.rerun.append(test_name)

errors = result.errors or []
Expand All @@ -366,7 +369,7 @@ def rerun_failed_tests(self):

self.accumulate_result(result, rerun=True)

if isinstance(result, Interrupted):
if result.state == State.INTERRUPTED:
break

if self.bad:
Expand Down Expand Up @@ -463,7 +466,7 @@ def run_tests_sequential(self):

previous_test = None
for test_index, test_name in enumerate(self.tests, 1):
start_time = time.monotonic()
start_time = time.perf_counter()

text = test_name
if previous_test:
Expand All @@ -482,14 +485,14 @@ def run_tests_sequential(self):
result = runtest(self.ns, test_name)
self.accumulate_result(result)

if isinstance(result, Interrupted):
if result.state == State.INTERRUPTED:
break

previous_test = str(result)
test_time = time.monotonic() - start_time
test_time = time.perf_counter() - start_time
if test_time >= PROGRESS_MIN_TIME:
previous_test = "%s in %s" % (previous_test, format_duration(test_time))
elif isinstance(result, Passed):
elif result.state == State.PASSED:
# be quiet: say nothing if the test passed shortly
previous_test = None

Expand All @@ -498,7 +501,7 @@ def run_tests_sequential(self):
if module not in save_modules and module.startswith("test."):
support.unload(module)

if self.ns.failfast and is_failed(result, self.ns):
if self.ns.failfast and result.is_failed(self.ns.fail_env_changed):
break

if previous_test:
Expand All @@ -518,22 +521,53 @@ def display_header(self):
print("==", platform.python_implementation(), *sys.version.split())
print("==", platform.platform(aliased=True),
"%s-endian" % sys.byteorder)
print("== Python build:", ' '.join(get_build_info()))
print("== cwd:", os.getcwd())
cpu_count = os.cpu_count()
if cpu_count:
print("== CPU count:", cpu_count)
print("== encodings: locale=%s, FS=%s"
% (locale.getencoding(), sys.getfilesystemencoding()))
self.display_sanitizers()

def display_sanitizers(self):
# This makes it easier to remember what to set in your local
# environment when trying to reproduce a sanitizer failure.
asan = support.check_sanitizer(address=True)
msan = support.check_sanitizer(memory=True)
ubsan = support.check_sanitizer(ub=True)
sanitizers = []
if asan:
sanitizers.append("address")
if msan:
sanitizers.append("memory")
if ubsan:
sanitizers.append("undefined behavior")
if not sanitizers:
return

print(f"== sanitizers: {', '.join(sanitizers)}")
for sanitizer, env_var in (
(asan, "ASAN_OPTIONS"),
(msan, "MSAN_OPTIONS"),
(ubsan, "UBSAN_OPTIONS"),
):
options= os.environ.get(env_var)
if sanitizer and options is not None:
print(f"== {env_var}={options!r}")

def no_tests_run(self):
return not any((self.good, self.bad, self.skipped, self.interrupted,
self.environment_changed))

def get_tests_result(self):
result = []
if self.bad:
result.append("FAILURE")
elif self.ns.fail_env_changed and self.environment_changed:
result.append("ENV CHANGED")
elif not any((self.good, self.bad, self.skipped, self.interrupted,
self.environment_changed)):
result.append("NO TEST RUN")
elif self.no_tests_run():
result.append("NO TESTS RAN")

if self.interrupted:
result.append("INTERRUPTED")
Expand Down Expand Up @@ -609,13 +643,48 @@ def finalize(self):
coverdir=self.ns.coverdir)

print()
duration = time.monotonic() - self.start_time
print("Total duration: %s" % format_duration(duration))
print("Tests result: %s" % self.get_tests_result())
self.display_summary()

if self.ns.runleaks:
os.system("leaks %d" % os.getpid())

def display_summary(self):
duration = time.perf_counter() - self.start_time

# Total duration
print("Total duration: %s" % format_duration(duration))

# Total tests
total = TestStats()
for stats in self.stats_dict.values():
total.accumulate(stats)
stats = [f'run={total.tests_run:,}']
if total.failures:
stats.append(f'failures={total.failures:,}')
if total.skipped:
stats.append(f'skipped={total.skipped:,}')
print(f"Total tests: {' '.join(stats)}")

# Total test files
report = [f'success={len(self.good)}']
if self.bad:
report.append(f'failed={len(self.bad)}')
if self.environment_changed:
report.append(f'env_changed={len(self.environment_changed)}')
if self.skipped:
report.append(f'skipped={len(self.skipped)}')
if self.resource_denied:
report.append(f'resource_denied={len(self.resource_denied)}')
if self.rerun:
report.append(f'rerun={len(self.rerun)}')
if self.run_no_tests:
report.append(f'run_no_tests={len(self.run_no_tests)}')
print(f"Total test files: {' '.join(report)}")

# Result
result = self.get_tests_result()
print(f"Result: {result}")

def save_xml_result(self):
if not self.ns.xmlpath and not self.testsuite_xml:
return
Expand Down Expand Up @@ -782,11 +851,13 @@ def _main(self, tests, kwargs):
self.save_xml_result()

if self.bad:
sys.exit(2)
sys.exit(EXITCODE_BAD_TEST)
if self.interrupted:
sys.exit(130)
sys.exit(EXITCODE_INTERRUPTED)
if self.ns.fail_env_changed and self.environment_changed:
sys.exit(3)
sys.exit(EXITCODE_ENV_CHANGED)
if self.no_tests_run():
sys.exit(EXITCODE_NO_TESTS_RAN)
sys.exit(0)


Expand Down
5 changes: 3 additions & 2 deletions Lib/test/libregrtest/refleak.py
Original file line number Diff line number Diff line change
Expand Up @@ -83,11 +83,12 @@ def get_pooled_int(value):
print(("1234567890"*(repcount//10 + 1))[:repcount], file=sys.stderr,
flush=True)

results = None
dash_R_cleanup(fs, ps, pic, zdc, abcs)
support.gc_collect()

for i in rep_range:
test_func()
results = test_func()

dash_R_cleanup(fs, ps, pic, zdc, abcs)
support.gc_collect()
Expand Down Expand Up @@ -146,7 +147,7 @@ def check_fd_deltas(deltas):
print(msg, file=refrep)
refrep.flush()
failed = True
return failed
return (failed, results)


def dash_R_cleanup(fs, ps, pic, zdc, abcs):
Expand Down
Loading

0 comments on commit 79f7a4c

Please sign in to comment.