Faster startup -- Experiment B -- streamline and tweak #153

gvanrossum · 2021-07-12T19:55:59Z

gvanrossum
Jul 12, 2021
Maintainer

This is a placeholder for a description of Experiment B (see #32 (comment))

try to speed up unmarshalling of code objects by low-level optimizations
streamline code objects (fewer redundant, computed fields; fewer small objects)
opcodes for immediate values:
- MAKE_INT creates an int from oparg; for 0-255 this is a sure "integer cache" hit that we can inline
- LOAD_COMMON_CONSTANT: oparg indexes in a second switch (or array?) with shared immutable common constants, e.g.
  - 0: None
  - 1: False
  - 2: True
  - 3: Ellipsis
  - 4: AssertionError (subsumes LOAD_ASSERTION_ERROR)
  - 5: ''
  - 6: ()
  - 7-11 (or some other range): negative values -1 through -5 (also sure hits in the integer cache)
Benefit of MAKE_INT and LOAD_COMMON_CONSTANT is that they reduce the size of co_consts (and hence of PYC files) without adding to the size of the instruction array. Downside is that the speed and simplicity of LOAD_CONST is hard to beat, so it will be hard to measure the difference in benchmarks. But None, False, True and small ints make up a lot of the constants (we can easily do research on this).

gvanrossum · 2021-07-12T22:23:58Z

gvanrossum
Jul 12, 2021
Maintainer Author

FWIW, in the 100 most popular PyPI packages, these were the 50 most common constants:

Total: 239 errors; 9,992 files; 218,124 code objects; 3,369,978 lines; 1,614,918 constants

Top 50 constants:
                                    None :     208223 (12.89%)
                                       0 :      41652 (2.58%)
                                       1 :      34245 (2.12%)
                                    True :      21520 (1.33%)
                                       2 :      19403 (1.20%)
                                   False :      16121 (1.00%)
                                       3 :      10646 (0.66%)
                                       5 :       6599 (0.41%)
                                       4 :       6413 (0.40%)
                                      -1 :       6359 (0.39%)
                                      '' :       5680 (0.35%)
                              ('dtype',) :       5539 (0.34%)
                                      10 :       5538 (0.34%)
                                     'a' :       4088 (0.25%)
                              ('match',) :       3505 (0.22%)
                                   'foo' :       3393 (0.21%)
                                      () :       2992 (0.19%)
                                     1.0 :       2940 (0.18%)
                                     'b' :       2837 (0.18%)
                                       6 :       2639 (0.16%)
                                 (None,) :       2562 (0.16%)
                                     'A' :       2385 (0.15%)
                                       8 :       2217 (0.14%)
                                  'name' :       2149 (0.13%)
                                'return' :       2132 (0.13%)
                                       7 :       2124 (0.13%)
                                     0.5 :       2006 (0.12%)
                              ('index',) :       1956 (0.12%)
                                     'x' :       1949 (0.12%)
                               ('name',) :       1946 (0.12%)
                                       9 :       1940 (0.12%)
                                     0.0 :       1939 (0.12%)
                               (1, 2, 3) :       1892 (0.12%)
                                     100 :       1859 (0.12%)
                                     '/' :       1624 (0.10%)
                                   'bar' :       1608 (0.10%)
                               ('axis',) :       1605 (0.10%)
                                      12 :       1574 (0.10%)
                                      20 :       1557 (0.10%)
                                    '\n' :       1552 (0.10%)
                                     'B' :       1538 (0.10%)
                                    'id' :       1459 (0.09%)
                        ('primary_key',) :       1421 (0.09%)
                                     ' ' :       1405 (0.09%)
                                     2.0 :       1386 (0.09%)
                                     'c' :       1365 (0.08%)
                                     '.' :       1362 (0.08%)
                                  'data' :       1283 (0.08%)
                                 'utf-8' :       1213 (0.08%)
                              '__main__' :       1186 (0.07%)

0 replies

gvanrossum · 2021-07-16T00:08:29Z

gvanrossum
Jul 16, 2021
Maintainer Author

Crazy idea: combine these two instructions. It could do something like

switch (oparg) {
case 0: value = Py_None; break;
case 1: value = Py_False; break;
// Several more cases
default:
    value = PyLong_FromLong(oparg - 50);
    PUSH(value);
    DISPATCH();
}
Py_INCREF(value);
PUSH(value);
DISPATCH();

The idea would be that oparg values in the range 10 to 255 translate to -40 to 215 -- that should cover the most common integers. (Or we could use a different bias, e.g. only go down to -5, since cached small ints cover the range of -5 to 255.)

Then again, a dedicated MAKE_INT instruction would be faster. (We could still support negative values by using a bias value.)

0 replies

gvanrossum · 2021-08-11T21:01:47Z

gvanrossum
Aug 11, 2021
Maintainer Author

I missed empty bytes (b'') for LOAD_COMMON_CONSTANT. There are probably a few other empty immutable values I missed (empty frozen set?). I also believe that (None,) may be generated in some case by the compiler, but I can't recall where exactly (closures? comprehensions?)

0 replies

methane · 2021-08-12T00:21:57Z

methane
Aug 12, 2021

(None,) is common for simple functions without docstring.

0 replies

gvanrossum · 2021-08-12T00:26:21Z

gvanrossum
Aug 12, 2021
Maintainer Author

Oh, it's co_consts. Maybe we should add the docstring to an explicit field in the code object. Then we can load it on demand -- another little save on code load time, perhaps.

0 replies

iritkatriel · 2021-08-19T19:56:15Z

iritkatriel
Aug 19, 2021
Maintainer

I started working on MAKE_INT here iritkatriel/cpython#26
Currently some test_modulefinder tests are still failing.

I think this opcode might be more appropriately named LOAD_SMALL_INT.

Initial perf numbers show this to be slightly slower than LOAD_CONST (I'll post something here soon). It might help if I create a function that takes the index in the small int cache and does the lookup. Right now we translate the opcode (which is the index) to the value and then the lookup function transforms the value back to the index.

0 replies

iritkatriel · 2021-08-19T20:15:11Z

iritkatriel
Aug 19, 2021
Maintainer

New:

% ./python.exe -m timeit "x=2"
20000000 loops, best of 5: 11.8 nsec per loop
>>> dis.dis("x=2")
  1           0 MAKE_INT                 7 (2)
              2 STORE_NAME               0 (x)
              4 LOAD_CONST               0 (None)
              6 RETURN_VALUE

Old:

% ./python.exe -m timeit "x = 2"
20000000 loops, best of 5: 10.8 nsec per loop

>>> dis.dis("x=2")
  1           0 LOAD_CONST               0 (2)
              2 STORE_NAME               0 (x)
              4 LOAD_CONST               1 (None)
              6 RETURN_VALUE

Sanity check:

New:

% ./python.exe -m timeit "x=300"
20000000 loops, best of 5: 10.7 nsec per loop

>>> dis.dis("x=300")
  1           0 LOAD_CONST               0 (300)
              2 STORE_NAME               0 (x)
              4 LOAD_CONST               1 (None)
              6 RETURN_VALUE

Old:

% ./python.exe -m timeit "x = 300"
20000000 loops, best of 5: 10.6 nsec per loop

same dis as above.

0 replies

iritkatriel · 2021-08-19T21:05:00Z

iritkatriel
Aug 19, 2021
Maintainer

I'm not getting very stable results from pyperformance, but here's a table anyway.

+-------------------------+---------+-----------------------+
| Benchmark               | old     | new                   |
+=========================+=========+=======================+
| xml_etree_iterparse     | 124 ms  | 119 ms: 1.04x faster  |
+-------------------------+---------+-----------------------+
| richards                | 77.0 ms | 75.0 ms: 1.03x faster |
+-------------------------+---------+-----------------------+
| fannkuch                | 554 ms  | 540 ms: 1.03x faster  |
+-------------------------+---------+-----------------------+
| scimark_lu              | 186 ms  | 182 ms: 1.02x faster  |
+-------------------------+---------+-----------------------+
| python_startup          | 14.1 ms | 13.8 ms: 1.02x faster |
+-------------------------+---------+-----------------------+
| scimark_sparse_mat_mult | 6.52 ms | 6.41 ms: 1.02x faster |
+-------------------------+---------+-----------------------+
| python_startup_no_site  | 10.2 ms | 10.1 ms: 1.02x faster |
+-------------------------+---------+-----------------------+
| telco                   | 7.41 ms | 7.31 ms: 1.01x faster |
+-------------------------+---------+-----------------------+
| unpack_sequence         | 61.1 ns | 60.5 ns: 1.01x faster |
+-------------------------+---------+-----------------------+
| logging_format          | 9.25 us | 9.17 us: 1.01x faster |
+-------------------------+---------+-----------------------+
| sympy_integrate         | 25.8 ms | 25.6 ms: 1.01x faster |
+-------------------------+---------+-----------------------+
| sympy_expand            | 613 ms  | 609 ms: 1.01x faster  |
+-------------------------+---------+-----------------------+
| pickle_list             | 4.10 us | 4.12 us: 1.01x slower |
+-------------------------+---------+-----------------------+
| spectral_norm           | 183 ms  | 185 ms: 1.01x slower  |
+-------------------------+---------+-----------------------+
| scimark_sor             | 212 ms  | 214 ms: 1.01x slower  |
+-------------------------+---------+-----------------------+
| chameleon               | 10.4 ms | 10.6 ms: 1.01x slower |
+-------------------------+---------+-----------------------+
| scimark_monte_carlo     | 101 ms  | 102 ms: 1.01x slower  |
+-------------------------+---------+-----------------------+
| pickle                  | 11.2 us | 11.4 us: 1.01x slower |
+-------------------------+---------+-----------------------+
| unpickle_list           | 4.83 us | 4.90 us: 1.01x slower |
+-------------------------+---------+-----------------------+
| regex_dna               | 187 ms  | 190 ms: 1.02x slower  |
+-------------------------+---------+-----------------------+
| pathlib                 | 49.5 ms | 50.6 ms: 1.02x slower |
+-------------------------+---------+-----------------------+
| logging_silent          | 198 ns  | 205 ns: 1.03x slower  |
+-------------------------+---------+-----------------------+
| hexiom                  | 10.2 ms | 10.5 ms: 1.03x slower |
+-------------------------+---------+-----------------------+
| 2to3                    | 362 ms  | 380 ms: 1.05x slower  |
+-------------------------+---------+-----------------------+
| pickle_dict             | 24.0 us | 25.4 us: 1.06x slower |
+-------------------------+---------+-----------------------+
| Geometric mean          | (ref)   | 1.00x faster          |
+-------------------------+---------+-----------------------+


Benchmark hidden because not significant (33): xml_etree_generate, float, sqlalchemy_declarative, unpickle, logging_simple, django_template, nbody, mako, pidigits, meteor_contest, json_loads, raytrace, regex_compile, sqlite_synth, pyflate, chaos, sympy_sum, sqlalchemy_imperative, xml_etree_process, sympy_str, crypto_pyaes, deltablue, dulwich_log, scimark_fft, xml_etree_parse, json_dumps, nqueens, pickle_pure_python, regex_effbot, regex_v8, unpickle_pure_python, go, tornado_http

0 replies

iritkatriel · 2021-08-19T21:41:13Z

iritkatriel
Aug 19, 2021
Maintainer

So I did the optimisation I mentioned above and now I get:
iritkatriel@Irits-MBP cpython % ./python.exe -m timeit "x = 2"
20000000 loops, best of 5: 11.1 nsec per loop
iritkatriel@Irits-MBP cpython % ./python.exe -m timeit "x = 2"
20000000 loops, best of 5: 10.8 nsec per loop
iritkatriel@Irits-MBP cpython % ./python.exe -m timeit "x = 2"
20000000 loops, best of 5: 10.9 nsec per loop

which is comparable to LOAD_CONST.

The change is this:
I added in Include/internal/pycore_long.h

+static inline PyObject* _PyLong_SmallIntLookiup_internal(size_t index)
+{
+    PyInterpreterState *interp = _PyInterpreterState_GET();
+    assert(0 <= index && index < _PY_NSMALLNEGINTS + _PY_NSMALLPOSINTS);
+    PyObject *obj = (PyObject*)interp->small_ints[index];
+    // must not be called before _PyLong_Init() nor after _PyLong_Fini().
+    assert(obj != NULL);
+    return obj;
+}

and then changed in coeval.c:

         TARGET(MAKE_INT): {
-            PyObject *value = _PyLong_GetSmallInt(oparg - _PY_MAKE_INT_BIAS);
+            PyObject *value = _PyLong_SmallIntLookiup_internal(oparg);
+            Py_INCREF(value);
             PUSH(value);
             DISPATCH();
         }

0 replies

gvanrossum · 2021-08-19T22:23:45Z

gvanrossum
Aug 19, 2021
Maintainer Author

Cool! So the compiler applies the bias? That makes the bytecode dependent on its value — easy to miss changing the magic number if the bias changes.

I also wonder whether it would be faster or slower if instead we just hard-coded a few popular values (e.g. -1 .. 3) in LOAD_COMMON_CONSTANT?

0 replies

iritkatriel · 2021-08-19T22:27:29Z

iritkatriel
Aug 19, 2021
Maintainer

I think that might be tidier. There is too much Mark-style breaking of the rules in what I did.

0 replies

iritkatriel · 2021-08-24T13:37:24Z

iritkatriel
Aug 24, 2021
Maintainer

Here's a rewrite with one common-consts array for ints and other types:
https://github.com/iritkatriel/cpython/pull/27/files

For now I only populate ints, next I will uncomment the other types (and fix the test).

0 replies

methane · 2021-09-02T04:57:57Z

methane
Sep 2, 2021

I think we need to check the common consts again. (None,) is common for co_consts, but it is not a target of LOAD_CONST.
LOAD_CONST loads a code object containing the co_consts, or None from the co_consts. But LOAD_CONST don't load co_const itself.

0 replies

methane · 2021-09-02T06:17:52Z

methane
Sep 2, 2021

Uh, I found it. It is used for function defaults.

#    def truncate(self, pos=None):

401          32 LOAD_CONST              31 ((None,))
             34 LOAD_CONST               7 (<code object truncate at 0x7fd5084a4a40, file "Lib/_pyio.py", line 401>)
             36 MAKE_FUNCTION            1 (defaults)
             38 STORE_NAME               7 (truncate)

0 replies

markshannon · 2021-09-02T11:27:54Z

markshannon
Sep 2, 2021
Collaborator

I think we should only use LOAD_COMMON_CONST for a small set of genuinely common (or necessary, like AssertError) constants.

How many of the constants in #65 (comment) are in run-once code that would be better implemented as bytecode?

The reason I ask is that creating objects in the interpreter is likely to be faster than in the marshal module.
E.g. If the const ("x", "y") is only used once, then LOAD_CONST "x"; LOAD_CONST "y"; BUILD_TUPLE 2 is likely to be faster than LOAD_CONST ("x", "y") as it moves the work of building of the tuple from the marshal module to the interpreter.

Small integer constants and single character constants might be better implemented with their own instructions, LOAD_INT and LOAD_CHAR.

0 replies

iritkatriel · 2021-09-21T16:40:44Z

iritkatriel
Sep 21, 2021
Maintainer

His proposal is to set the function docstring explicitly in the bytecode that creates the function -- then the docstring would become part of the co_consts of the parent code object, most likely a class or module. I am beginning to warm up to this idea, though not completely -- I have a feeling there could be a tooling need for knowing the docstring of a (function) code object without looking at its parent, but it's just intuition. See bpo-36521 for further discussion.

The docstring would also be on the function object, so you could access it from there. Arguably it's a property of the function and not of the function's code object. Or am I missing something?

0 replies

gvanrossum · 2021-09-21T18:57:37Z

gvanrossum
Sep 21, 2021
Maintainer Author

The docstring would also be on the function object, so you could access it from there. Arguably it's a property of the function and not of the function's code object. Or am I missing something?

I am thinking of a situation (say scanning PYC files) where you don't want to execute any code. Without execution, you have no function objects. So the doc string as specified in the source has to be somewhere in some code object.

Function objects exist primarily to store all the dynamic state wrapping code objects:

First and foremost, a pointer to the globals
Then, default argument values (which are computed dynamically when the function executes)
Next, "function attributes" (arbitrary things set on the function by attribute assignment)
Finally, annotations (also set dynamically, similarly to default argument values)

There are also a bunch of "metadata" attributes that are set on the function from the code object: __name__, __qualname__, and __doc__. Of these, __doc__ is unique in that it doesn't correspond to an attribute on the code object. (Then again, another metadata attribute is co_filename, which doesn't exist on the function object; and neither is co_firstlineno.)

0 replies

iritkatriel · 2021-09-22T12:10:02Z

iritkatriel
Sep 22, 2021
Maintainer

I added total size of co_consts to the script: faster-cpython/tools#5

Here are some results.

Total co_consts size goes from 735,736 to 718,616, so about 2.3% smaller.

It will get even better (if) when we move None docstrings out of co_consts.

Old:

iritkatriel@Irits-MBP faster-cpython-tools % ../cpython/python.exe scripts/count_opcodes.py --singles 20 packages/*
packages/boto3-1.9.99.tar.gz: 38 files; 495 code objects; 7,218 lines; 14,273 opcodes; 13,778 opcode pairs; 7,127.0 cache_size; 1,401.0 cache wasted; 2,863 ops quickened; 37,616 co_consts size
packages/botocore-1.9.9.tar.gz: 253 files; 5,362 code objects; 68,939 lines; 191,786 opcodes; 186,424 opcode pairs; 92,619.0 cache_size; 20,519.0 cache wasted; 36,050 ops quickened; 371 prev extended args; 475,968 co_consts size
packages/s3transfer-0.5.0.tar.gz: 50 files; 1,839 code objects; 18,456 lines; 61,123 opcodes; 59,284 opcode pairs; 30,951.0 cache_size; 4,797.0 cache wasted; 13,077 ops quickened; 140,016 co_consts size
packages/six-1.9.0.tar.gz: 4 files; 227 code objects; 1,945 lines; 7,627 opcodes; 7,400 opcode pairs; 4,311.0 cache_size; 721.0 cache wasted; 1,795 ops quickened; 128 prev extended args; 18,736 co_consts size
packages/urllib3-1.9.1.tar.gz: 55 files; 706 code objects; 9,602 lines; 29,268 opcodes; 28,562 opcode pairs; 15,172.0 cache_size; 3,028.0 cache wasted; 6,072 ops quickened; 15 prev extended args; 63,400 co_consts size
Total: 400 files; 8,629 code objects; 106,160 lines; 304,077 opcodes; 295,448 opcode pairs; 150,180.0 cache_size; 30,466.0 cache wasted; 59,857 ops quickened; 514 prev extended args; 735,736 co_consts size

Top 20 single opcodes:
LOAD_CONST               60,316  19.84%
LOAD_FAST                46,660  15.34%
LOAD_ATTR                19,014   6.25%
POP_TOP                  18,111   5.96%
LOAD_METHOD              16,006   5.26%
STORE_NAME               15,350   5.05%
CALL_METHOD              14,113   4.64%
LOAD_GLOBAL              12,389   4.07%
STORE_FAST               11,721   3.85%
CALL_FUNCTION            10,747   3.53%
RETURN_VALUE             10,230   3.36%
MAKE_FUNCTION             8,229   2.71%
LOAD_NAME                 5,651   1.86%
STORE_ATTR                3,626   1.19%
BUILD_MAP                 3,030   1.00%
POP_JUMP_IF_FALSE         3,028   1.00%
BINARY_SUBSCR             2,686   0.88%
IMPORT_NAME               2,639   0.87%
IMPORT_FROM               2,385   0.78%
DUP_TOP                   2,242   0.74%

With LOAD_NONE:

iritkatriel@Irits-MBP faster-cpython-tools % ../cpython-1/python.exe scripts/count_opcodes.py --singles 20 packages/*
packages/boto3-1.9.99.tar.gz: 38 files; 495 code objects; 7,218 lines; 14,273 opcodes; 13,778 opcode pairs; 7,127.0 cache_size; 1,401.0 cache wasted; 2,863 ops quickened; 36,152 co_consts size
packages/botocore-1.9.9.tar.gz: 253 files; 5,362 code objects; 68,939 lines; 191,783 opcodes; 186,421 opcode pairs; 92,618.0 cache_size; 20,518.0 cache wasted; 36,050 ops quickened; 368 prev extended args; 466,104 co_consts size
packages/s3transfer-0.5.0.tar.gz: 50 files; 1,839 code objects; 18,456 lines; 61,123 opcodes; 59,284 opcode pairs; 30,951.0 cache_size; 4,797.0 cache wasted; 13,077 ops quickened; 136,816 co_consts size
packages/six-1.9.0.tar.gz: 4 files; 227 code objects; 1,945 lines; 7,626 opcodes; 7,399 opcode pairs; 4,310.0 cache_size; 720.0 cache wasted; 1,795 ops quickened; 127 prev extended args; 18,200 co_consts size
packages/urllib3-1.9.1.tar.gz: 55 files; 706 code objects; 9,602 lines; 29,268 opcodes; 28,562 opcode pairs; 15,172.0 cache_size; 3,028.0 cache wasted; 6,072 ops quickened; 15 prev extended args; 61,344 co_consts size
Total: 400 files; 8,629 code objects; 106,160 lines; 304,073 opcodes; 295,444 opcode pairs; 150,178.0 cache_size; 30,464.0 cache wasted; 59,857 ops quickened; 510 prev extended args; 718,616 co_consts size

Top 20 single opcodes:
LOAD_CONST               49,382  16.24%
LOAD_FAST                46,660  15.34%
LOAD_ATTR                19,014   6.25%
POP_TOP                  18,111   5.96%
LOAD_METHOD              16,006   5.26%
STORE_NAME               15,350   5.05%
CALL_METHOD              14,113   4.64%
LOAD_GLOBAL              12,389   4.07%
STORE_FAST               11,721   3.85%
LOAD_NONE                10,934   3.60%
CALL_FUNCTION            10,747   3.53%
RETURN_VALUE             10,230   3.36%
MAKE_FUNCTION             8,229   2.71%
LOAD_NAME                 5,651   1.86%
STORE_ATTR                3,626   1.19%
BUILD_MAP                 3,030   1.00%
POP_JUMP_IF_FALSE         3,028   1.00%
BINARY_SUBSCR             2,686   0.88%
IMPORT_NAME               2,639   0.87%
IMPORT_FROM               2,385   0.78%

0 replies

iritkatriel · 2021-09-22T12:25:41Z

iritkatriel
Sep 22, 2021
Maintainer

Here are top 30 opcode pairs with LOAD_NONE:

Note: LOAD_NONE --> RETURN_VALUE 7,439 2.52%

Top 30 opcode pairs:
LOAD_FAST            --> LOAD_ATTR                12,187   4.12%
LOAD_CONST           --> LOAD_CONST               11,575   3.92%
LOAD_FAST            --> LOAD_METHOD              10,756   3.64%
STORE_NAME           --> LOAD_CONST                8,541   2.89%
LOAD_CONST           --> MAKE_FUNCTION             8,229   2.79%
CALL_METHOD          --> POP_TOP                   8,171   2.77%
LOAD_NONE            --> RETURN_VALUE              7,439   2.52%
STORE_FAST           --> LOAD_FAST                 7,052   2.39%
LOAD_METHOD          --> LOAD_FAST                 6,355   2.15%
MAKE_FUNCTION        --> STORE_NAME                6,323   2.14%
LOAD_FAST            --> LOAD_FAST                 5,277   1.79%
POP_TOP              --> LOAD_FAST                 5,131   1.74%
POP_TOP              --> LOAD_NONE                 4,696   1.59%
LOAD_METHOD          --> LOAD_CONST                4,577   1.55%
LOAD_FAST            --> LOAD_CONST                4,147   1.40%
LOAD_GLOBAL          --> LOAD_FAST                 4,016   1.36%
LOAD_CONST           --> CALL_METHOD               3,984   1.35%
LOAD_ATTR            --> LOAD_METHOD               3,832   1.30%
LOAD_CONST           --> LOAD_FAST                 3,693   1.25%
LOAD_GLOBAL          --> LOAD_ATTR                 3,681   1.25%
LOAD_METHOD          --> CALL_METHOD               3,575   1.21%
LOAD_ATTR            --> LOAD_CONST                3,509   1.19%
LOAD_ATTR            --> LOAD_FAST                 3,203   1.08%
LOAD_FAST            --> STORE_ATTR                2,976   1.01%
LOAD_FAST            --> CALL_METHOD               2,849   0.96%
POP_TOP              --> POP_TOP                   2,755   0.93%
CALL_METHOD          --> STORE_FAST                2,517   0.85%
IMPORT_FROM          --> STORE_NAME                2,360   0.80%
LOAD_FAST            --> CALL_FUNCTION             2,295   0.78%
LOAD_CONST           --> STORE_NAME                2,167   0.73%

0 replies

gvanrossum · 2021-09-22T17:32:21Z

gvanrossum
Sep 22, 2021
Maintainer Author

While I do think we should do RETURN_NONE, I wouldn't expect the world of it -- dynamically, there can be only one RETURN_NONE instruction per call (by definition :-).

0 replies

iritkatriel · 2021-09-27T13:25:34Z

iritkatriel
Sep 27, 2021
Maintainer

With this PR we got:

Total co_consts size goes from 48,822 to 46,682, about 4.3% less.

If I add this change: iritkatriel/cpython#30
(it adds a flag to the function to indicate whether its docstring is not None, and excludes None from the co_consts) then the total size of co_consts reduces to 40,878, which is consistent with the previous result that None is 20% of the consts.

0 replies

iritkatriel · 2021-09-27T14:05:35Z

iritkatriel
Sep 27, 2021
Maintainer

With only the docstring change (but no LOAD_NONE), it's 48,302 . ~~So most of the Nones are from docstrings.~~

0 replies

gvanrossum · 2021-09-27T14:19:51Z

gvanrossum
Sep 27, 2021
Maintainer Author

Either,there’s a typo or that last conclusion is wrong (48,302 is hardly less than 48,822)?

0 replies

iritkatriel · 2021-09-27T14:49:28Z

iritkatriel
Sep 27, 2021
Maintainer

Sorry, I misread the numbers, you're right. So it's the other way around - adding the docstring change to LOAD_NONE helps a lot, but docstring alone not so much.

0 replies

gvanrossum · 2021-09-27T17:34:18Z

gvanrossum
Sep 27, 2021
Maintainer Author

So there are 520 code objects that have a None at position 0 that's only used to indicate there's no docstring (these code objects don't load None on the stack at all).

And there are 2140 code objects that have a None in at another position used to load None only (we can deduce that these code objects have a docstring at position 0).

But 7944 code objects have a None in position 0 that is used both to load None and to indicate there's no docstring.

So we should do both things if we care about the size of co_consts.

Counter to this, we don't measure a real speedup, we save only 64 kBif all those packages you counted are imported in one process, and the marshal encoding of None is a single byte (TYPE_NONE) so it only saves 8 kB there.

I say we give up. :-( It's too bad, I had high hopes for each of these until we did the work.

0 replies

iritkatriel · 2021-09-27T17:40:26Z

iritkatriel
Sep 27, 2021
Maintainer

Yes, it doesn't seem to be worth another opcode.

Is there a chance that a MAKE_INT opcode will do anything?

0 replies

gvanrossum · 2021-09-27T17:45:22Z

gvanrossum
Sep 27, 2021
Maintainer Author

I expect it's the same story. The common small ints are already cached, so LOAD_CONST is more efficient than anything else. We'd save much less because there just are fewer of these than None loads (6.27% of the top 25 constants are ints), although the marshal size is more (5 bytes -- there's no "short int" type in marshal, and it probably wouldn't matter).

0 replies

iritkatriel · 2021-09-27T17:59:44Z

iritkatriel
Sep 27, 2021
Maintainer

One more thing I haven't tried yet is LOAD_CONST(None)+STORE_FAST:
#65 (comment)

0 replies

gvanrossum · 2021-09-27T18:21:07Z

gvanrossum
Sep 27, 2021
Maintainer Author

But if I read that correctly, it's only ~0.2% of opcode pairs that gets improved.

Maybe we should create a framework for measuring dynamic opcode pairs, using e.g. a benchmark suite like PyPerformance as a basis.

0 replies

gvanrossum · 2021-12-02T21:16:11Z

gvanrossum
Dec 2, 2021
Maintainer Author

There are still some ideas here that we might do, so I'm moving it to Discussions.

0 replies

Faster startup -- Experiment B -- streamline and tweak #153

gvanrossum Jul 12, 2021 Maintainer

Replies: 62 comments

gvanrossum Jul 12, 2021 Maintainer Author

gvanrossum Jul 16, 2021 Maintainer Author

gvanrossum Aug 11, 2021 Maintainer Author

methane Aug 12, 2021

gvanrossum Aug 12, 2021 Maintainer Author

iritkatriel Aug 19, 2021 Maintainer

iritkatriel Aug 19, 2021 Maintainer

iritkatriel Aug 19, 2021 Maintainer

iritkatriel Aug 19, 2021 Maintainer

gvanrossum Aug 19, 2021 Maintainer Author

iritkatriel Aug 19, 2021 Maintainer

iritkatriel Aug 24, 2021 Maintainer

methane Sep 2, 2021

methane Sep 2, 2021

markshannon Sep 2, 2021 Collaborator

iritkatriel Sep 21, 2021 Maintainer

gvanrossum Sep 21, 2021 Maintainer Author

iritkatriel Sep 22, 2021 Maintainer

iritkatriel Sep 22, 2021 Maintainer

gvanrossum Sep 22, 2021 Maintainer Author

iritkatriel Sep 27, 2021 Maintainer

iritkatriel Sep 27, 2021 Maintainer

gvanrossum Sep 27, 2021 Maintainer Author

iritkatriel Sep 27, 2021 Maintainer

gvanrossum Sep 27, 2021 Maintainer Author

iritkatriel Sep 27, 2021 Maintainer

gvanrossum Sep 27, 2021 Maintainer Author

iritkatriel Sep 27, 2021 Maintainer

gvanrossum Sep 27, 2021 Maintainer Author

gvanrossum Dec 2, 2021 Maintainer Author

gvanrossum
Jul 12, 2021
Maintainer

gvanrossum
Jul 12, 2021
Maintainer Author

gvanrossum
Jul 16, 2021
Maintainer Author

gvanrossum
Aug 11, 2021
Maintainer Author

methane
Aug 12, 2021

gvanrossum
Aug 12, 2021
Maintainer Author

iritkatriel
Aug 19, 2021
Maintainer

iritkatriel
Aug 19, 2021
Maintainer

iritkatriel
Aug 19, 2021
Maintainer

iritkatriel
Aug 19, 2021
Maintainer

gvanrossum
Aug 19, 2021
Maintainer Author

iritkatriel
Aug 19, 2021
Maintainer

iritkatriel
Aug 24, 2021
Maintainer

methane
Sep 2, 2021

methane
Sep 2, 2021

markshannon
Sep 2, 2021
Collaborator

iritkatriel
Sep 21, 2021
Maintainer

gvanrossum
Sep 21, 2021
Maintainer Author

iritkatriel
Sep 22, 2021
Maintainer

iritkatriel
Sep 22, 2021
Maintainer

gvanrossum
Sep 22, 2021
Maintainer Author

iritkatriel
Sep 27, 2021
Maintainer

iritkatriel
Sep 27, 2021
Maintainer

gvanrossum
Sep 27, 2021
Maintainer Author

iritkatriel
Sep 27, 2021
Maintainer

gvanrossum
Sep 27, 2021
Maintainer Author

iritkatriel
Sep 27, 2021
Maintainer

gvanrossum
Sep 27, 2021
Maintainer Author

iritkatriel
Sep 27, 2021
Maintainer

gvanrossum
Sep 27, 2021
Maintainer Author

gvanrossum
Dec 2, 2021
Maintainer Author