Commit 880c551
committed
re2: hoist a few loads out of BitState ShouldVisit
Caching the fields from prog_ in the outer loop instead
of reloading them inside each call to ShouldVisit makes
the fast search path of BitState noticeably faster.
Thanks to @nafi3000 for the idea and the initial patch.
benchmark \ host s7 mac
vs base vs base
Search_Success1_BitState/size=8 ~ ~
Search_Success1_BitState/size=64 -3.63% -2.04%
Search_Success1_BitState/size=512 -15.83% -6.83%
Search_Success1_BitState/size=4096 -25.59% -10.74%
Search_Success1_BitState/size=32768 -27.85% -11.85%
Search_Success1_BitState/size=262144 -28.03% -11.94%
Search_Success1_BitState/size=2097152 -27.99% -11.80%
Search_Success1_CachedBitState/size=8 -17.81% -8.16%
Search_Success1_CachedBitState/size=64 -27.17% -10.69%
Search_Success1_CachedBitState/size=512 -27.63% -11.56%
Search_Success1_CachedBitState/size=4096 -27.88% -11.61%
Search_Success1_CachedBitState/size=32768 -27.90% -11.60%
Search_Success1_CachedBitState/size=262144 -27.73% -11.55%
Search_Success1_CachedBitState/size=2097152 -27.88% -11.54%
Search_AltMatch_BitState/size=8 +0.83% ~
Search_AltMatch_BitState/size=64 +0.88% ~
Search_AltMatch_BitState/size=512 +0.88% ~
Search_AltMatch_BitState/size=4096 +1.02% ~
Search_AltMatch_BitState/size=32768 +1.02% ~
Search_AltMatch_BitState/size=262144 +1.22% ~
Search_AltMatch_BitState/size=2097152 +1.10% ~
Search_AltMatch_BitState/size=16777216 ~ ~
Search_AltMatch_CachedBitState/size=8 -1.79% ~
Search_AltMatch_CachedBitState/size=64 -3.23% ~
Search_AltMatch_CachedBitState/size=512 -3.39% ~
Search_AltMatch_CachedBitState/size=4096 ~ ~
Search_AltMatch_CachedBitState/size=32768 ~ ~
Search_AltMatch_CachedBitState/size=262144 ~ ~
Search_AltMatch_CachedBitState/size=2097152 ~ ~
Search_AltMatch_CachedBitState/size=16777216 ~ ~
host: s7
│ old │ new │
│ sec/op │ sec/op vs base │
Search_Success1_BitState/size=8 50.55µ ± 0% 50.58µ ± 0% ~ (p=0.806 n=25)
Search_Success1_BitState/size=64 58.28µ ± 0% 56.16µ ± 0% -3.63% (p=0.000 n=25)
Search_Success1_BitState/size=512 119.6µ ± 0% 100.7µ ± 0% -15.83% (p=0.000 n=25)
Search_Success1_BitState/size=4096 612.7µ ± 0% 455.9µ ± 0% -25.59% (p=0.000 n=25)
Search_Success1_BitState/size=32768 4.563m ± 0% 3.292m ± 0% -27.85% (p=0.000 n=25)
Search_Success1_BitState/size=262144 36.09m ± 0% 25.98m ± 0% -28.03% (p=0.000 n=25)
Search_Success1_BitState/size=2097152 288.6m ± 0% 207.8m ± 0% -27.99% (p=0.000 n=25)
Search_Success1_CachedBitState/size=8 1.971µ ± 1% 1.620µ ± 0% -17.81% (p=0.000 n=25)
Search_Success1_CachedBitState/size=64 9.829µ ± 1% 7.158µ ± 0% -27.17% (p=0.000 n=25)
Search_Success1_CachedBitState/size=512 71.19µ ± 0% 51.52µ ± 0% -27.63% (p=0.000 n=25)
Search_Success1_CachedBitState/size=4096 563.1µ ± 0% 406.1µ ± 0% -27.88% (p=0.000 n=25)
Search_Success1_CachedBitState/size=32768 4.492m ± 0% 3.239m ± 0% -27.90% (p=0.000 n=25)
Search_Success1_CachedBitState/size=262144 35.88m ± 0% 25.93m ± 0% -27.73% (p=0.000 n=25)
Search_Success1_CachedBitState/size=2097152 287.3m ± 0% 207.2m ± 0% -27.88% (p=0.000 n=25)
Search_AltMatch_BitState/size=8 16.54µ ± 0% 16.68µ ± 0% +0.83% (p=0.000 n=25)
Search_AltMatch_BitState/size=64 16.53µ ± 0% 16.68µ ± 0% +0.88% (p=0.000 n=25)
Search_AltMatch_BitState/size=512 16.53µ ± 0% 16.68µ ± 0% +0.88% (p=0.000 n=25)
Search_AltMatch_BitState/size=4096 16.62µ ± 0% 16.79µ ± 0% +1.02% (p=0.000 n=25)
Search_AltMatch_BitState/size=32768 16.63µ ± 0% 16.80µ ± 0% +1.02% (p=0.000 n=25)
Search_AltMatch_BitState/size=262144 17.05µ ± 0% 17.26µ ± 0% +1.22% (p=0.000 n=25)
Search_AltMatch_BitState/size=2097152 22.36µ ± 0% 22.61µ ± 0% +1.10% (p=0.000 n=25)
Search_AltMatch_BitState/size=16777216 508.3µ ± 1% 510.5µ ± 0% ~ (p=0.202 n=25)
Search_AltMatch_CachedBitState/size=8 616.0n ± 1% 605.0n ± 2% -1.79% (p=0.000 n=25)
Search_AltMatch_CachedBitState/size=64 620.0n ± 1% 600.0n ± 1% -3.23% (p=0.000 n=25)
Search_AltMatch_CachedBitState/size=512 620.0n ± 1% 599.0n ± 1% -3.39% (p=0.000 n=25)
Search_AltMatch_CachedBitState/size=4096 629.0n ± 1% 629.0n ± 0% ~ (p=0.747 n=25)
Search_AltMatch_CachedBitState/size=32768 690.0n ± 0% 688.0n ± 1% ~ (p=0.874 n=25)
Search_AltMatch_CachedBitState/size=262144 1.212µ ± 0% 1.210µ ± 0% ~ (p=0.415 n=25)
Search_AltMatch_CachedBitState/size=2097152 5.988µ ± 2% 5.931µ ± 4% ~ (p=0.066 n=25)
Search_AltMatch_CachedBitState/size=16777216 510.8µ ± 0% 512.0µ ± 0% ~ (p=0.164 n=25)
geomean 70.97µ 62.83µ -11.47%
host: mac
│ old │ new │
│ sec/op │ sec/op vs base │
Search_Success1_BitState/size=8 33.09µ ± 1% 33.02µ ± 1% ~ (p=0.003 n=25)
Search_Success1_BitState/size=64 37.52µ ± 1% 36.75µ ± 0% -2.04% (p=0.000 n=25)
Search_Success1_BitState/size=512 71.32µ ± 1% 66.45µ ± 0% -6.83% (p=0.000 n=25)
Search_Success1_BitState/size=4096 341.2µ ± 0% 304.6µ ± 0% -10.74% (p=0.000 n=25)
Search_Success1_BitState/size=32768 2.505m ± 0% 2.208m ± 0% -11.85% (p=0.000 n=25)
Search_Success1_BitState/size=262144 19.79m ± 0% 17.43m ± 0% -11.94% (p=0.000 n=25)
Search_Success1_BitState/size=2097152 158.1m ± 0% 139.5m ± 0% -11.80% (p=0.000 n=25)
Search_Success1_CachedBitState/size=8 1.201µ ± 0% 1.103µ ± 0% -8.16% (p=0.000 n=25)
Search_Success1_CachedBitState/size=64 5.442µ ± 0% 4.860µ ± 0% -10.69% (p=0.000 n=25)
Search_Success1_CachedBitState/size=512 39.18µ ± 0% 34.65µ ± 0% -11.56% (p=0.000 n=25)
Search_Success1_CachedBitState/size=4096 308.7µ ± 0% 272.9µ ± 0% -11.61% (p=0.000 n=25)
Search_Success1_CachedBitState/size=32768 2.466m ± 0% 2.180m ± 0% -11.60% (p=0.000 n=25)
Search_Success1_CachedBitState/size=262144 19.71m ± 0% 17.44m ± 0% -11.55% (p=0.000 n=25)
Search_Success1_CachedBitState/size=2097152 157.8m ± 0% 139.6m ± 0% -11.54% (p=0.000 n=25)
Search_AltMatch_BitState/size=8 11.48µ ± 0% 11.40µ ± 1% ~ (p=0.573 n=25)
Search_AltMatch_BitState/size=64 11.51µ ± 0% 11.39µ ± 2% ~ (p=0.200 n=25)
Search_AltMatch_BitState/size=512 11.41µ ± 1% 11.39µ ± 1% ~ (p=0.866 n=25)
Search_AltMatch_BitState/size=4096 11.51µ ± 1% 11.39µ ± 2% ~ (p=0.328 n=25)
Search_AltMatch_BitState/size=32768 11.60µ ± 0% 11.61µ ± 2% ~ (p=0.711 n=25)
Search_AltMatch_BitState/size=262144 12.33µ ± 1% 12.20µ ± 1% ~ (p=0.044 n=25)
Search_AltMatch_BitState/size=2097152 24.24µ ± 2% 25.27µ ± 7% ~ (p=0.095 n=25)
Search_AltMatch_BitState/size=16777216 310.3µ ± 7% 280.1µ ± 2% ~ (p=0.001 n=25)
Search_AltMatch_CachedBitState/size=8 375.0n ± 1% 375.0n ± 0% ~ (p=0.260 n=25)
Search_AltMatch_CachedBitState/size=64 414.0n ± 6% 380.0n ± 14% ~ (p=0.025 n=25)
Search_AltMatch_CachedBitState/size=512 377.0n ± 0% 378.0n ± 1% ~ (p=0.775 n=25)
Search_AltMatch_CachedBitState/size=4096 414.0n ± 1% 412.0n ± 1% ~ (p=0.008 n=25)
Search_AltMatch_CachedBitState/size=32768 524.0n ± 1% 526.0n ± 1% ~ (p=0.248 n=25)
Search_AltMatch_CachedBitState/size=262144 2.825µ ± 4% 2.904µ ± 6% ~ (p=0.342 n=25)
Search_AltMatch_CachedBitState/size=2097152 22.63µ ± 31% 22.63µ ± 9% ~ (p=0.399 n=25)
Search_AltMatch_CachedBitState/size=16777216 304.0µ ± 6% 281.2µ ± 4% ~ (p=0.052 n=25)
geomean 49.43µ 46.81µ -5.31%1 parent dcd1f64 commit 880c551
1 file changed
+15
-10
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
49 | 49 | | |
50 | 50 | | |
51 | 51 | | |
52 | | - | |
| 52 | + | |
53 | 53 | | |
54 | 54 | | |
55 | 55 | | |
| |||
85 | 85 | | |
86 | 86 | | |
87 | 87 | | |
88 | | - | |
89 | | - | |
| 88 | + | |
| 89 | + | |
90 | 90 | | |
91 | 91 | | |
92 | | - | |
93 | | - | |
94 | | - | |
95 | | - | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
96 | 99 | | |
97 | | - | |
| 100 | + | |
98 | 101 | | |
99 | 102 | | |
100 | 103 | | |
| |||
140 | 143 | | |
141 | 144 | | |
142 | 145 | | |
| 146 | + | |
| 147 | + | |
143 | 148 | | |
144 | 149 | | |
145 | 150 | | |
146 | | - | |
| 151 | + | |
147 | 152 | | |
148 | 153 | | |
149 | 154 | | |
| |||
237 | 242 | | |
238 | 243 | | |
239 | 244 | | |
240 | | - | |
| 245 | + | |
241 | 246 | | |
242 | 247 | | |
243 | 248 | | |
| |||
0 commit comments