-
-
Notifications
You must be signed in to change notification settings - Fork 33.1k
Description
Feature or enhancement
Proposal:
I ran benchmarks on pyperformance
and found that tuples with one or two elements account for about 80% of the total.

I checked the code and filled the following table with the number of occurrences of one- and two-element tuples:
Table with number of occurrences
file | function | count |
---|---|---|
_asyncmodule.c |
PyTuple_New(2) |
2 |
_collectionsmodule.c |
PyTuple_Pack(1) |
2 |
_csv.c |
PyTuple_Pack(1) |
1 |
_datetimemodule.c |
PyTuple_Pack(2) |
4 |
PyTuple_Pack(1) |
3 | |
_elementtree.c |
PyTuple_Pack(2) |
4 |
_functoolsmodule.c |
PyTuple_New(2) |
2 |
_interpretersmodule.c |
PyTuple_Pack(2) |
1 |
_json.c |
PyTuple_New(2) |
1 |
PyTuple_Pack(2) |
1 | |
PyTuple_Pack(1) |
1 | |
_operator.c |
PyTuple_Pack(2) |
1 |
_pickle.c |
PyTuple_Pack(2) |
3 |
PyTuple_New(2) |
2 | |
PyTuple_New(1) |
2 | |
_ssl.c |
PyTuple_New(2) |
6 |
PyTuple_Pack(2) |
1 | |
_threadmodule.c |
PyTuple_New(2) |
1 |
_tkinter.c |
PyTuple_Pack(1) |
|
arraymodule.c |
PyTuple_New(2) |
2 |
itertoolsmodule.c |
PyTuple_Pack(2) |
2 |
PyTuple_New(2) |
1 | |
main.c |
PyTuple_Pack(2) |
1 |
overlapped.c |
PyTuple_New(2) |
2 |
posixmodule.c |
PyTuple_Pack(2) |
1 |
pyexpat.c |
PyTuple_New(1) |
1 |
selectmodule.c |
PyTuple_Pack(2) |
1 |
PyTuple_New(2) |
1 | |
signal_module.c |
PyTuple_New(2) |
1 |
socket_module.c |
PyTuple_Pack(2) |
3 |
termios.c |
PyTuple_New(2) |
2 |
_ctypes.c |
PyTuple_Pack(2) |
2 |
stgdict.c |
PyTuple_Pack(2) |
1 |
decimal.c |
PyTuple_Pack(2) |
7 |
PyTuple_Pack(1) |
2 | |
microprotocol.c |
PyTuple_Pack(2) |
2 |
_sre.c |
PyTuple_New(2) |
1 |
datetime.c |
PyTuple_Pack(1) |
2 |
PyTuple_Pack(2) |
1 | |
getargs.c |
PyTuple_Pack(1) |
1 |
heaptype.c |
PyTuple_Pack(2) |
1 |
PyTuple_Pack(1) |
2 | |
PyTuple_New(2) |
1 | |
vectorcall_limited.c |
PyTuple_New(1) |
2 |
multibytecodec.c |
PyTuple_New(2) |
1 |
codeobject.c |
PyTuple_Pack(2) |
7 |
dictobject.c |
PyTuple_Pack(2) |
2 |
PyTuple_New(2) |
4 | |
enumobject.c |
PyTuple_Pack(2) |
1 |
PyTuple_New(2) |
2 | |
exceptions.c |
PyTuple_Pack(2) |
7 |
floatobject.c |
PyTuple_Pack(2) |
1 |
frameobject.c |
PyTuple_Pack(2) |
2 |
PyTuple_Pack(1) |
1 | |
genericaliasobject.c |
PyTuple_Pack(1) |
2 |
listobject.c |
PyTuple_Pack(2) |
1 |
longobject.c |
PyTuple_Pack(2) |
1 |
PyTuple_New(2) |
2 | |
odictobject.c |
PyTuple_Pack(2) |
2 |
PyTuple_New(2) |
1 | |
setobject.c |
PyTuple_Pack(1) |
1 |
typeobject.c |
PyTuple_Pack(2) |
2 |
PyTuple_Pack(1) |
5 | |
typevarobject.c |
PyTuple_Pack(2) |
1 |
PyTuple_Pack(1) |
2 | |
unicode_format.h |
PyTuple_Pack(2) |
2 |
pegen_errors.c |
PyTuple_Pack(2) |
2 |
_warnings.c |
PyTuple_Pack(2) |
1 |
bltnmodule.c |
PyTuple_Pack(2) |
1 |
ceval.c |
PyTuple_Pack(1) |
1 |
_codegen.c |
PyTuple_Pack(1) |
2 |
compile.c |
PyTuple_Pack(2) |
1 |
crossinterp.c |
PyTuple_Pack(1) |
1 |
errors.c |
PyTuple_Pack(1) |
1 |
hamt.c |
PyTuple_Pack(2) |
1 |
marshal.c |
PyTuple_Pack(2) |
1 |
pylifecycle.c |
PyTuple_Pack(2) |
1 |
Python-tokenize.c |
PyTuple_Pack(2) |
1 |
sysmodule.c |
PyTuple_Pack(1) |
1 |
tracemalloc.c |
PyTuple_New(2) |
1 |
I came up with the idea of adding PyTuple_MakeSingle
and PyTuple_MakePair
for such cases to improve performance.
Afterwards, @eendebakpt sent me a link with a previous attempt at this (many thanks!) - #118222.
Anyway, I implemented these changes and ran benchmarks.
If we replace PyTuple_Pack(1,...)
with PyTuple_MakeSingle
and PyTuple_Pack(2,...)
with PyTuple_MakePair
then we get following results (ran on ubuntu 24.04 x64, compiled with lto):
Geometric mean - 1.00x faster
+--------------------------+----------+------------------------+
| Benchmark | main | opt |
+==========================+==========+========================+
| async_generators | 277 ms | 279 ms: 1.01x slower |
+--------------------------+----------+------------------------+
| asyncio_websockets | 242 ms | 241 ms: 1.00x faster |
+--------------------------+----------+------------------------+
| chaos | 36.0 ms | 36.6 ms: 1.02x slower |
+--------------------------+----------+------------------------+
| comprehensions | 10.3 us | 10.1 us: 1.02x faster |
+--------------------------+----------+------------------------+
| bench_mp_pool | 66.1 ms | 43.0 ms: 1.54x faster |
+--------------------------+----------+------------------------+
| coroutines | 15.0 ms | 14.5 ms: 1.03x faster |
+--------------------------+----------+------------------------+
| coverage | 54.9 ms | 56.1 ms: 1.02x slower |
+--------------------------+----------+------------------------+
| crypto_pyaes | 45.3 ms | 46.1 ms: 1.02x slower |
+--------------------------+----------+------------------------+
| deepcopy | 171 us | 168 us: 1.02x faster |
+--------------------------+----------+------------------------+
| deepcopy_reduce | 1.87 us | 1.84 us: 1.02x faster |
+--------------------------+----------+------------------------+
| deepcopy_memo | 16.5 us | 17.5 us: 1.06x slower |
+--------------------------+----------+------------------------+
| deltablue | 1.97 ms | 1.99 ms: 1.01x slower |
+--------------------------+----------+------------------------+
| django_template | 23.1 ms | 23.4 ms: 1.02x slower |
+--------------------------+----------+------------------------+
| docutils | 1.70 sec | 1.68 sec: 1.01x faster |
+--------------------------+----------+------------------------+
| fannkuch | 245 ms | 243 ms: 1.01x faster |
+--------------------------+----------+------------------------+
| float | 42.4 ms | 43.4 ms: 1.02x slower |
+--------------------------+----------+------------------------+
| gc_traversal | 2.93 ms | 2.82 ms: 1.04x faster |
+--------------------------+----------+------------------------+
| generators | 19.1 ms | 19.5 ms: 1.02x slower |
+--------------------------+----------+------------------------+
| genshi_text | 14.2 ms | 14.4 ms: 1.02x slower |
+--------------------------+----------+------------------------+
| genshi_xml | 32.8 ms | 33.1 ms: 1.01x slower |
+--------------------------+----------+------------------------+
| go | 68.7 ms | 70.2 ms: 1.02x slower |
+--------------------------+----------+------------------------+
| hexiom | 3.64 ms | 3.61 ms: 1.01x faster |
+--------------------------+----------+------------------------+
| json_dumps | 6.34 ms | 6.26 ms: 1.01x faster |
+--------------------------+----------+------------------------+
| json_loads | 15.7 us | 16.1 us: 1.02x slower |
+--------------------------+----------+------------------------+
| logging_silent | 63.6 ns | 59.4 ns: 1.07x faster |
+--------------------------+----------+------------------------+
| logging_simple | 3.49 us | 3.52 us: 1.01x slower |
+--------------------------+----------+------------------------+
| mako | 7.00 ms | 6.94 ms: 1.01x faster |
+--------------------------+----------+------------------------+
| mdp | 788 ms | 780 ms: 1.01x faster |
+--------------------------+----------+------------------------+
| meteor_contest | 68.0 ms | 68.2 ms: 1.00x slower |
+--------------------------+----------+------------------------+
| nbody | 55.4 ms | 55.0 ms: 1.01x faster |
+--------------------------+----------+------------------------+
| pickle_dict | 18.5 us | 18.8 us: 1.02x slower |
+--------------------------+----------+------------------------+
| pickle_list | 2.93 us | 2.98 us: 1.02x slower |
+--------------------------+----------+------------------------+
| pickle_pure_python | 208 us | 205 us: 1.01x faster |
+--------------------------+----------+------------------------+
| pidigits | 143 ms | 143 ms: 1.00x slower |
+--------------------------+----------+------------------------+
| pprint_safe_repr | 489 ms | 496 ms: 1.01x slower |
+--------------------------+----------+------------------------+
| pprint_pformat | 997 ms | 1.01 sec: 1.01x slower |
+--------------------------+----------+------------------------+
| pyflate | 259 ms | 260 ms: 1.00x slower |
+--------------------------+----------+------------------------+
| regex_compile | 82.7 ms | 83.5 ms: 1.01x slower |
+--------------------------+----------+------------------------+
| regex_dna | 115 ms | 113 ms: 1.01x faster |
+--------------------------+----------+------------------------+
| regex_v8 | 14.7 ms | 14.3 ms: 1.03x faster |
+--------------------------+----------+------------------------+
| richards | 27.3 ms | 27.1 ms: 1.01x faster |
+--------------------------+----------+------------------------+
| richards_super | 31.2 ms | 31.1 ms: 1.01x faster |
+--------------------------+----------+------------------------+
| scimark_fft | 174 ms | 178 ms: 1.02x slower |
+--------------------------+----------+------------------------+
| scimark_lu | 71.5 ms | 69.3 ms: 1.03x faster |
+--------------------------+----------+------------------------+
| scimark_monte_carlo | 41.9 ms | 41.2 ms: 1.02x faster |
+--------------------------+----------+------------------------+
| scimark_sor | 70.4 ms | 72.0 ms: 1.02x slower |
+--------------------------+----------+------------------------+
| scimark_sparse_mat_mult | 2.67 ms | 2.71 ms: 1.02x slower |
+--------------------------+----------+------------------------+
| spectral_norm | 59.2 ms | 58.8 ms: 1.01x faster |
+--------------------------+----------+------------------------+
| sqlglot_normalize | 176 ms | 179 ms: 1.01x slower |
+--------------------------+----------+------------------------+
| sqlglot_optimize | 34.0 ms | 34.1 ms: 1.00x slower |
+--------------------------+----------+------------------------+
| sqlglot_parse | 801 us | 792 us: 1.01x faster |
+--------------------------+----------+------------------------+
| sqlglot_transpile | 1.01 ms | 994 us: 1.01x faster |
+--------------------------+----------+------------------------+
| sympy_expand | 311 ms | 306 ms: 1.02x faster |
+--------------------------+----------+------------------------+
| sympy_sum | 92.8 ms | 92.4 ms: 1.00x faster |
+--------------------------+----------+------------------------+
| sympy_str | 177 ms | 176 ms: 1.01x faster |
+--------------------------+----------+------------------------+
| telco | 112 ms | 111 ms: 1.00x faster |
+--------------------------+----------+------------------------+
| tomli_loads | 1.21 sec | 1.23 sec: 1.02x slower |
+--------------------------+----------+------------------------+
| typing_runtime_protocols | 107 us | 109 us: 1.01x slower |
+--------------------------+----------+------------------------+
| unpack_sequence | 25.2 ns | 26.3 ns: 1.04x slower |
+--------------------------+----------+------------------------+
| unpickle_list | 2.93 us | 3.01 us: 1.03x slower |
+--------------------------+----------+------------------------+
| unpickle_pure_python | 137 us | 138 us: 1.01x slower |
+--------------------------+----------+------------------------+
| xml_etree_iterparse | 62.9 ms | 62.0 ms: 1.01x faster |
+--------------------------+----------+------------------------+
| xml_etree_generate | 54.7 ms | 55.4 ms: 1.01x slower |
+--------------------------+----------+------------------------+
| xml_etree_process | 39.0 ms | 40.3 ms: 1.03x slower |
+--------------------------+----------+------------------------+
| Geometric mean | (ref) | 1.00x faster |
+--------------------------+----------+------------------------+
Benchmark hidden because not significant (19): 2to3, asyncio_tcp, asyncio_tcp_ssl, bench_thread_pool, dulwich_log, create_gc_cycles, html5lib, logging_format, nqueens, pathlib, pickle, python_startup, python_startup_no_site, raytrace, regex_effbot, sqlite_synth, sympy_integrate, unpickle, xml_etree_parse
I plan to implement PyTuple_Make[Single,Pair]Steal
and also replace PyTuple_New(1,2)
.
Has this already been discussed elsewhere?
This is a minor feature, which does not need previous discussion elsewhere
Links to previous discussion of this feature:
No response