Skip to content

[C API] Replace PyTuple_Pack(1,2) with PyTuple_Make[Single,Pair] to optimize creation of tuples #140052

@sergey-miryanov

Description

@sergey-miryanov

Feature or enhancement

Proposal:

I ran benchmarks on pyperformance and found that tuples with one or two elements account for about 80% of the total.

Image

I checked the code and filled the following table with the number of occurrences of one- and two-element tuples:

Table with number of occurrences
file function count
_asyncmodule.c PyTuple_New(2) 2
_collectionsmodule.c PyTuple_Pack(1) 2
_csv.c PyTuple_Pack(1) 1
_datetimemodule.c PyTuple_Pack(2) 4
PyTuple_Pack(1) 3
_elementtree.c PyTuple_Pack(2) 4
_functoolsmodule.c PyTuple_New(2) 2
_interpretersmodule.c PyTuple_Pack(2) 1
_json.c PyTuple_New(2) 1
PyTuple_Pack(2) 1
PyTuple_Pack(1) 1
_operator.c PyTuple_Pack(2) 1
_pickle.c PyTuple_Pack(2) 3
PyTuple_New(2) 2
PyTuple_New(1) 2
_ssl.c PyTuple_New(2) 6
PyTuple_Pack(2) 1
_threadmodule.c PyTuple_New(2) 1
_tkinter.c PyTuple_Pack(1)
arraymodule.c PyTuple_New(2) 2
itertoolsmodule.c PyTuple_Pack(2) 2
PyTuple_New(2) 1
main.c PyTuple_Pack(2) 1
overlapped.c PyTuple_New(2) 2
posixmodule.c PyTuple_Pack(2) 1
pyexpat.c PyTuple_New(1) 1
selectmodule.c PyTuple_Pack(2) 1
PyTuple_New(2) 1
signal_module.c PyTuple_New(2) 1
socket_module.c PyTuple_Pack(2) 3
termios.c PyTuple_New(2) 2
_ctypes.c PyTuple_Pack(2) 2
stgdict.c PyTuple_Pack(2) 1
decimal.c PyTuple_Pack(2) 7
PyTuple_Pack(1) 2
microprotocol.c PyTuple_Pack(2) 2
_sre.c PyTuple_New(2) 1
datetime.c PyTuple_Pack(1) 2
PyTuple_Pack(2) 1
getargs.c PyTuple_Pack(1) 1
heaptype.c PyTuple_Pack(2) 1
PyTuple_Pack(1) 2
PyTuple_New(2) 1
vectorcall_limited.c PyTuple_New(1) 2
multibytecodec.c PyTuple_New(2) 1
codeobject.c PyTuple_Pack(2) 7
dictobject.c PyTuple_Pack(2) 2
PyTuple_New(2) 4
enumobject.c PyTuple_Pack(2) 1
PyTuple_New(2) 2
exceptions.c PyTuple_Pack(2) 7
floatobject.c PyTuple_Pack(2) 1
frameobject.c PyTuple_Pack(2) 2
PyTuple_Pack(1) 1
genericaliasobject.c PyTuple_Pack(1) 2
listobject.c PyTuple_Pack(2) 1
longobject.c PyTuple_Pack(2) 1
PyTuple_New(2) 2
odictobject.c PyTuple_Pack(2) 2
PyTuple_New(2) 1
setobject.c PyTuple_Pack(1) 1
typeobject.c PyTuple_Pack(2) 2
PyTuple_Pack(1) 5
typevarobject.c PyTuple_Pack(2) 1
PyTuple_Pack(1) 2
unicode_format.h PyTuple_Pack(2) 2
pegen_errors.c PyTuple_Pack(2) 2
_warnings.c PyTuple_Pack(2) 1
bltnmodule.c PyTuple_Pack(2) 1
ceval.c PyTuple_Pack(1) 1
_codegen.c PyTuple_Pack(1) 2
compile.c PyTuple_Pack(2) 1
crossinterp.c PyTuple_Pack(1) 1
errors.c PyTuple_Pack(1) 1
hamt.c PyTuple_Pack(2) 1
marshal.c PyTuple_Pack(2) 1
pylifecycle.c PyTuple_Pack(2) 1
Python-tokenize.c PyTuple_Pack(2) 1
sysmodule.c PyTuple_Pack(1) 1
tracemalloc.c PyTuple_New(2) 1

I came up with the idea of adding PyTuple_MakeSingle and PyTuple_MakePair for such cases to improve performance.

Afterwards, @eendebakpt sent me a link with a previous attempt at this (many thanks!) - #118222.

Anyway, I implemented these changes and ran benchmarks.

If we replace PyTuple_Pack(1,...) with PyTuple_MakeSingle and PyTuple_Pack(2,...) with PyTuple_MakePair then we get following results (ran on ubuntu 24.04 x64, compiled with lto):

Geometric mean - 1.00x faster
+--------------------------+----------+------------------------+
| Benchmark                | main     | opt                    |
+==========================+==========+========================+
| async_generators         | 277 ms   | 279 ms: 1.01x slower   |
+--------------------------+----------+------------------------+
| asyncio_websockets       | 242 ms   | 241 ms: 1.00x faster   |
+--------------------------+----------+------------------------+
| chaos                    | 36.0 ms  | 36.6 ms: 1.02x slower  |
+--------------------------+----------+------------------------+
| comprehensions           | 10.3 us  | 10.1 us: 1.02x faster  |
+--------------------------+----------+------------------------+
| bench_mp_pool            | 66.1 ms  | 43.0 ms: 1.54x faster  |
+--------------------------+----------+------------------------+
| coroutines               | 15.0 ms  | 14.5 ms: 1.03x faster  |
+--------------------------+----------+------------------------+
| coverage                 | 54.9 ms  | 56.1 ms: 1.02x slower  |
+--------------------------+----------+------------------------+
| crypto_pyaes             | 45.3 ms  | 46.1 ms: 1.02x slower  |
+--------------------------+----------+------------------------+
| deepcopy                 | 171 us   | 168 us: 1.02x faster   |
+--------------------------+----------+------------------------+
| deepcopy_reduce          | 1.87 us  | 1.84 us: 1.02x faster  |
+--------------------------+----------+------------------------+
| deepcopy_memo            | 16.5 us  | 17.5 us: 1.06x slower  |
+--------------------------+----------+------------------------+
| deltablue                | 1.97 ms  | 1.99 ms: 1.01x slower  |
+--------------------------+----------+------------------------+
| django_template          | 23.1 ms  | 23.4 ms: 1.02x slower  |
+--------------------------+----------+------------------------+
| docutils                 | 1.70 sec | 1.68 sec: 1.01x faster |
+--------------------------+----------+------------------------+
| fannkuch                 | 245 ms   | 243 ms: 1.01x faster   |
+--------------------------+----------+------------------------+
| float                    | 42.4 ms  | 43.4 ms: 1.02x slower  |
+--------------------------+----------+------------------------+
| gc_traversal             | 2.93 ms  | 2.82 ms: 1.04x faster  |
+--------------------------+----------+------------------------+
| generators               | 19.1 ms  | 19.5 ms: 1.02x slower  |
+--------------------------+----------+------------------------+
| genshi_text              | 14.2 ms  | 14.4 ms: 1.02x slower  |
+--------------------------+----------+------------------------+
| genshi_xml               | 32.8 ms  | 33.1 ms: 1.01x slower  |
+--------------------------+----------+------------------------+
| go                       | 68.7 ms  | 70.2 ms: 1.02x slower  |
+--------------------------+----------+------------------------+
| hexiom                   | 3.64 ms  | 3.61 ms: 1.01x faster  |
+--------------------------+----------+------------------------+
| json_dumps               | 6.34 ms  | 6.26 ms: 1.01x faster  |
+--------------------------+----------+------------------------+
| json_loads               | 15.7 us  | 16.1 us: 1.02x slower  |
+--------------------------+----------+------------------------+
| logging_silent           | 63.6 ns  | 59.4 ns: 1.07x faster  |
+--------------------------+----------+------------------------+
| logging_simple           | 3.49 us  | 3.52 us: 1.01x slower  |
+--------------------------+----------+------------------------+
| mako                     | 7.00 ms  | 6.94 ms: 1.01x faster  |
+--------------------------+----------+------------------------+
| mdp                      | 788 ms   | 780 ms: 1.01x faster   |
+--------------------------+----------+------------------------+
| meteor_contest           | 68.0 ms  | 68.2 ms: 1.00x slower  |
+--------------------------+----------+------------------------+
| nbody                    | 55.4 ms  | 55.0 ms: 1.01x faster  |
+--------------------------+----------+------------------------+
| pickle_dict              | 18.5 us  | 18.8 us: 1.02x slower  |
+--------------------------+----------+------------------------+
| pickle_list              | 2.93 us  | 2.98 us: 1.02x slower  |
+--------------------------+----------+------------------------+
| pickle_pure_python       | 208 us   | 205 us: 1.01x faster   |
+--------------------------+----------+------------------------+
| pidigits                 | 143 ms   | 143 ms: 1.00x slower   |
+--------------------------+----------+------------------------+
| pprint_safe_repr         | 489 ms   | 496 ms: 1.01x slower   |
+--------------------------+----------+------------------------+
| pprint_pformat           | 997 ms   | 1.01 sec: 1.01x slower |
+--------------------------+----------+------------------------+
| pyflate                  | 259 ms   | 260 ms: 1.00x slower   |
+--------------------------+----------+------------------------+
| regex_compile            | 82.7 ms  | 83.5 ms: 1.01x slower  |
+--------------------------+----------+------------------------+
| regex_dna                | 115 ms   | 113 ms: 1.01x faster   |
+--------------------------+----------+------------------------+
| regex_v8                 | 14.7 ms  | 14.3 ms: 1.03x faster  |
+--------------------------+----------+------------------------+
| richards                 | 27.3 ms  | 27.1 ms: 1.01x faster  |
+--------------------------+----------+------------------------+
| richards_super           | 31.2 ms  | 31.1 ms: 1.01x faster  |
+--------------------------+----------+------------------------+
| scimark_fft              | 174 ms   | 178 ms: 1.02x slower   |
+--------------------------+----------+------------------------+
| scimark_lu               | 71.5 ms  | 69.3 ms: 1.03x faster  |
+--------------------------+----------+------------------------+
| scimark_monte_carlo      | 41.9 ms  | 41.2 ms: 1.02x faster  |
+--------------------------+----------+------------------------+
| scimark_sor              | 70.4 ms  | 72.0 ms: 1.02x slower  |
+--------------------------+----------+------------------------+
| scimark_sparse_mat_mult  | 2.67 ms  | 2.71 ms: 1.02x slower  |
+--------------------------+----------+------------------------+
| spectral_norm            | 59.2 ms  | 58.8 ms: 1.01x faster  |
+--------------------------+----------+------------------------+
| sqlglot_normalize        | 176 ms   | 179 ms: 1.01x slower   |
+--------------------------+----------+------------------------+
| sqlglot_optimize         | 34.0 ms  | 34.1 ms: 1.00x slower  |
+--------------------------+----------+------------------------+
| sqlglot_parse            | 801 us   | 792 us: 1.01x faster   |
+--------------------------+----------+------------------------+
| sqlglot_transpile        | 1.01 ms  | 994 us: 1.01x faster   |
+--------------------------+----------+------------------------+
| sympy_expand             | 311 ms   | 306 ms: 1.02x faster   |
+--------------------------+----------+------------------------+
| sympy_sum                | 92.8 ms  | 92.4 ms: 1.00x faster  |
+--------------------------+----------+------------------------+
| sympy_str                | 177 ms   | 176 ms: 1.01x faster   |
+--------------------------+----------+------------------------+
| telco                    | 112 ms   | 111 ms: 1.00x faster   |
+--------------------------+----------+------------------------+
| tomli_loads              | 1.21 sec | 1.23 sec: 1.02x slower |
+--------------------------+----------+------------------------+
| typing_runtime_protocols | 107 us   | 109 us: 1.01x slower   |
+--------------------------+----------+------------------------+
| unpack_sequence          | 25.2 ns  | 26.3 ns: 1.04x slower  |
+--------------------------+----------+------------------------+
| unpickle_list            | 2.93 us  | 3.01 us: 1.03x slower  |
+--------------------------+----------+------------------------+
| unpickle_pure_python     | 137 us   | 138 us: 1.01x slower   |
+--------------------------+----------+------------------------+
| xml_etree_iterparse      | 62.9 ms  | 62.0 ms: 1.01x faster  |
+--------------------------+----------+------------------------+
| xml_etree_generate       | 54.7 ms  | 55.4 ms: 1.01x slower  |
+--------------------------+----------+------------------------+
| xml_etree_process        | 39.0 ms  | 40.3 ms: 1.03x slower  |
+--------------------------+----------+------------------------+
| Geometric mean           | (ref)    | 1.00x faster           |
+--------------------------+----------+------------------------+

Benchmark hidden because not significant (19): 2to3, asyncio_tcp, asyncio_tcp_ssl, bench_thread_pool, dulwich_log, create_gc_cycles, html5lib, logging_format, nqueens, pathlib, pickle, python_startup, python_startup_no_site, raytrace, regex_effbot, sqlite_synth, sympy_integrate, unpickle, xml_etree_parse

I plan to implement PyTuple_Make[Single,Pair]Steal and also replace PyTuple_New(1,2).

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

No response

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions