perf: fix enum performance by bonjourmauko · Pull Request #1307 · openfisca/openfisca-core

bonjourmauko · 2024-11-21T02:53:54Z

Performance

Fix enum's module performance issues
- 43.0.0 fixed impending bugs in indexed_enums and improved EnumArray
  performance
- However, Enun.__eq__ and Enum.encode suffered from performance
  degradation on large datasets
- This changeset aims at correcting these while keeping the bugfixes provided
  by the aforesaid published version

Note to reviewers

Some of the spectacular performances of Enum.encode came from the fact that
it didn't actually work, leaving buggy behaviour unseen (see for example
openfisca/openfisca-france@84e41a5).

This PR introduces O(n) and O(1) use of fancy indexing, vector masking, and
numpy.searchsorted, that scales nicely with large datasets (10k+).

However, as we need to validate data at enum encoding time, the encoding of
int and str sequences can't be faster than the pre-43.0.0 just because
data has to be copied over.

If ever this becomes problematic for very large datasets (50M+), we can workout
a feature flag to disable fancy indexing and trusting data has been properly
validated priorly by the user disabling run-time data validation, and so to
gain from the performance of using a memory view instead of copying data over
(that is, not using neither fancy indexing nor binary search).

However, it seems the least surprising for every user that the data be
validated before encoding (out of bounds indices and wrong str values not
present in an Enum).

Benchmarks

Against `42.0.0`

------------------------------------------------------------------------------------ benchmark 'Enum.__eq__': 2 tests ------------------------------------------------------------------------------------
Name (time in us)                            Min               Max              Mean            StdDev            Median               IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_enum_eq (0001_e8b0acf)     6.1164 (1.0)      6.2392 (1.0)      6.1712 (1.0)      0.0236 (1.0)      6.1702 (1.0)      0.0284 (1.13)         30;2      162.0429 (1.0)         100       10000
test_benchmark_enum_eq (NOW)              6.1806 (1.01)     6.3476 (1.02)     6.2902 (1.02)     0.0286 (1.21)     6.2958 (1.02)     0.0252 (1.0)          14;7      158.9779 (0.98)        100       10000
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------- benchmark 'Enum.encode (Enum)': 2 tests -----------------------------------------------------------------------------------
Name (time in ms)                                     Min               Max              Mean            StdDev            Median               IQR            Outliers       OPS            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_enum_encode_enum (NOW)              1.8063 (1.0)      2.0201 (1.0)      1.8426 (1.0)      0.0409 (1.0)      1.8268 (1.0)      0.0383 (1.0)          13;7  542.7000 (1.0)         100          10
test_benchmark_enum_encode_enum (0001_e8b0acf)     1.9466 (1.08)     2.4946 (1.23)     2.0038 (1.09)     0.0812 (1.98)     1.9726 (1.08)     0.0592 (1.55)        15;11  499.0517 (0.92)        100          10
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------------------- benchmark 'Enum.encode (int)': 2 tests --------------------------------------------------------------------------------------------------
Name (time in ns)                                         Min                    Max                   Mean                StdDev                 Median                 IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_enum_encode_int (0001_e8b0acf)        600.0000 (1.0)       4,333.4001 (1.0)         660.3360 (1.0)        371.1875 (1.0)         620.9000 (1.0)       12.4500 (1.0)           1;3    1,514.3805 (1.0)         100          10
test_benchmark_enum_encode_int (NOW)              36,487.4999 (60.81)    51,941.5989 (11.99)    36,839.4600 (55.79)    1,554.0287 (4.19)     36,554.2001 (58.87)    229.0995 (18.40)        1;12       27.1448 (0.02)        100          10
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------- benchmark 'Enum.encode (str)': 2 tests -----------------------------------------------------------------------------------
Name (time in ms)                                    Min               Max              Mean            StdDev            Median               IQR            Outliers       OPS            Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_enum_encode_str (0001_e8b0acf)     1.1514 (1.0)      5.4571 (2.05)     1.3338 (1.0)      0.5570 (4.93)     1.1804 (1.0)      0.0383 (1.15)         6;16  749.7253 (1.0)         100          10
test_benchmark_enum_encode_str (NOW)              1.7690 (1.54)     2.6589 (1.0)      1.8262 (1.37)     0.1130 (1.0)      1.7950 (1.52)     0.0332 (1.0)           7;9  547.5705 (0.73)        100          10
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------ benchmark 'EnumArray.__eq__': 2 tests -------------------------------------------------------------------------------------
Name (time in us)                                  Min               Max              Mean            StdDev            Median               IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_enum_array_eq (NOW)              6.4008 (1.0)      7.0462 (1.0)      6.4869 (1.0)      0.0953 (1.0)      6.4546 (1.0)      0.0511 (1.05)        12;15      154.1559 (1.0)         100         100
test_benchmark_enum_array_eq (0001_e8b0acf)     8.8654 (1.39)     9.3842 (1.33)     8.9354 (1.38)     0.1009 (1.06)     8.8969 (1.38)     0.0487 (1.0)         10;15      111.9138 (0.73)        100         100
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------------- benchmark 'EnumArray.decode': 2 tests --------------------------------------------------------------------------------------------
Name (time in us)                                        Min                   Max                Mean             StdDev              Median               IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_enum_array_decode (NOW)              241.8246 (1.0)        254.4829 (1.0)      243.7532 (1.0)       2.0537 (1.0)      243.1275 (1.0)      1.4817 (1.0)           9;8        4.1025 (1.0)         100         100
test_benchmark_enum_array_decode (0001_e8b0acf)     752.3354 (3.11)     1,501.3621 (5.90)     765.7738 (3.14)     74.5790 (36.31)    757.6515 (3.12)     4.3040 (2.90)          1;3        1.3059 (0.32)        100         100
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------- benchmark 'EnumArray.decode_to_str': 2 tests ------------------------------------------------------------------------------------------
Name (time in us)                                               Min                 Max                Mean            StdDev              Median               IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_enum_array_decode_to_str (NOW)              313.1738 (1.0)      321.3071 (1.0)      317.8362 (1.0)      0.9421 (1.0)      317.7827 (1.0)      0.9496 (1.0)          12;5        3.1463 (1.0)         100         100
test_benchmark_enum_array_decode_to_str (0001_e8b0acf)     861.6308 (2.75)     916.3904 (2.85)     871.6658 (2.74)     9.9436 (10.56)    868.3410 (2.73)     6.7525 (7.11)        12;11        1.1472 (0.36)        100         100
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Against `43.0.0`

------------------------------------------------------------------------------------ benchmark 'Enum.__eq__': 2 tests ------------------------------------------------------------------------------------
Name (time in us)                            Min               Max              Mean            StdDev            Median               IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_enum_eq (NOW)              6.2050 (1.0)      7.1716 (1.02)     6.3081 (1.0)      0.1370 (2.70)     6.2640 (1.0)      0.0587 (1.0)         10;14      158.5255 (1.0)         100       10000
test_benchmark_enum_eq (0002_d95a1ce)     6.8115 (1.10)     7.0431 (1.0)      6.9327 (1.10)     0.0507 (1.0)      6.9421 (1.11)     0.0646 (1.10)         28;0      144.2448 (0.91)        100       10000
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------- benchmark 'Enum.encode (Enum)': 2 tests -----------------------------------------------------------------------------------
Name (time in ms)                                     Min               Max              Mean            StdDev            Median               IQR            Outliers       OPS            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_enum_encode_enum (NOW)              1.7899 (1.0)      1.8529 (1.01)     1.8080 (1.0)      0.0109 (1.04)     1.8053 (1.0)      0.0165 (1.37)         25;1  553.0894 (1.0)         100          10
test_benchmark_enum_encode_enum (0002_d95a1ce)     1.7940 (1.00)     1.8348 (1.0)      1.8099 (1.00)     0.0105 (1.0)      1.8086 (1.00)     0.0120 (1.0)          40;1  552.5250 (1.00)        100          10
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------ benchmark 'Enum.encode (int)': 2 tests ------------------------------------------------------------------------------------------------
Name (time in us)                                         Min                    Max                   Mean              StdDev                 Median                IQR            Outliers          OPS            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_enum_encode_int (NOW)                  36.4666 (1.0)          72.2875 (1.0)          37.3550 (1.0)        3.8081 (1.0)          36.5750 (1.0)       0.6438 (1.0)           2;5  26,770.1773 (1.0)         100          10
test_benchmark_enum_encode_int (0002_d95a1ce)     12,735.8166 (349.25)   13,852.8666 (191.64)   12,821.3985 (343.23)   180.2659 (47.34)    12,763.6396 (348.97)   37.4478 (58.17)        8;15      77.9946 (0.00)        100          10
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------- benchmark 'Enum.encode (str)': 2 tests -------------------------------------------------------------------------------------
Name (time in ms)                                     Min                Max               Mean            StdDev             Median               IQR            Outliers       OPS            Rounds  Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_enum_encode_str (NOW)               1.7953 (1.0)       2.0568 (1.0)       1.8196 (1.0)      0.0292 (1.0)       1.8132 (1.0)      0.0118 (1.0)           3;9  549.5770 (1.0)         100          10
test_benchmark_enum_encode_str (0002_d95a1ce)     21.2889 (11.86)    21.9644 (10.68)    21.4329 (11.78)    0.1193 (4.08)     21.4100 (11.81)    0.1202 (10.14)        18;6   46.6572 (0.08)        100          10
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------ benchmark 'EnumArray.__eq__': 2 tests -------------------------------------------------------------------------------------
Name (time in us)                                  Min               Max              Mean            StdDev            Median               IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_enum_array_eq (0002_d95a1ce)     6.3300 (1.0)      6.9117 (1.0)      6.4014 (1.0)      0.0804 (1.0)      6.3788 (1.0)      0.0373 (1.0)         10;12      156.2161 (1.0)         100         100
test_benchmark_enum_array_eq (NOW)              6.3796 (1.01)     7.7629 (1.12)     6.4775 (1.01)     0.1607 (2.00)     6.4352 (1.01)     0.0594 (1.59)         6;14      154.3805 (0.99)        100         100
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------ benchmark 'EnumArray.decode': 2 tests -------------------------------------------------------------------------------------------
Name (time in us)                                        Min                 Max                Mean            StdDev              Median               IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_enum_array_decode (NOW)              239.6783 (1.0)      252.8412 (1.0)      241.4663 (1.00)     1.4732 (1.0)      241.2848 (1.00)     0.7644 (1.0)          12;8        4.1414 (1.00)        100         100
test_benchmark_enum_array_decode (0002_d95a1ce)     240.0513 (1.00)     259.0704 (1.02)     241.1311 (1.0)      1.9358 (1.31)     240.7379 (1.0)      0.9602 (1.26)          1;4        4.1471 (1.0)         100         100
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------- benchmark 'EnumArray.decode_to_str': 2 tests -------------------------------------------------------------------------------------------
Name (time in us)                                               Min                 Max                Mean             StdDev              Median               IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_enum_array_decode_to_str (NOW)              316.5517 (1.0)      473.0125 (1.46)     322.0825 (1.01)     17.4004 (21.00)    317.8633 (1.0)      4.1794 (5.62)          4;6        3.1048 (0.99)        100         100
test_benchmark_enum_array_decode_to_str (0002_d95a1ce)     318.1996 (1.01)     323.5029 (1.0)      319.1073 (1.0)       0.8285 (1.0)      318.9919 (1.00)     0.7440 (1.0)          10;7        3.1337 (1.0)         100         100
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

bonjourmauko · 2024-11-21T06:49:51Z

BTW, when assuming data is correct, performance could be also improved with numba and the like.

clallemand

I've tested this fix on one of our (LexImpact) simulations and it seems to works !

bonjourmauko requested review from a team, benoit-cty and clallemand November 21, 2024 02:58

bonjourmauko added the kind:perf A code change that improves performance label Nov 21, 2024

clallemand approved these changes Nov 21, 2024

View reviewed changes

bonjourmauko added 2 commits November 26, 2024 10:42

perf: fix enum performance (#1306)

9488f30

chore: version bump (fixes #1306)

bc092d8

bonjourmauko force-pushed the perf/fix-enum-perf branch from f7298f3 to bc092d8 Compare November 26, 2024 09:42

bonjourmauko merged commit 435ee78 into master Nov 26, 2024

bonjourmauko deleted the perf/fix-enum-perf branch November 26, 2024 09:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: fix enum performance#1307

perf: fix enum performance#1307
bonjourmauko merged 2 commits intomasterfrom
perf/fix-enum-perf

bonjourmauko commented Nov 21, 2024 •

edited

Loading

Uh oh!

bonjourmauko commented Nov 21, 2024

Uh oh!

clallemand left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bonjourmauko commented Nov 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance

Note to reviewers

Benchmarks

Against 42.0.0

Against 43.0.0

Uh oh!

bonjourmauko commented Nov 21, 2024

Uh oh!

clallemand left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bonjourmauko commented Nov 21, 2024 •

edited

Loading

Against `42.0.0`

Against `43.0.0`