Commit 318bf55
Enable SIMD optimizations by default with auto-detection (#982)
* Enable SIMD optimizations by default with automatic CPU detection
This commit enables SIMD optimizations automatically based on CPU capabilities,
providing significant performance improvements for JSON string parsing without
requiring manual configuration via --with-sse42 flag.
Key changes:
1. Simplified extconf.rb for auto-detection:
- Automatically tries -msse4.2, falls back to -msse2
- No user configuration needed - works out of the box
- Removed unnecessary platform-specific logic
2. Enhanced simd.h with unified architecture detection:
- Defines HAVE_SIMD_SSE4_2, HAVE_SIMD_SSE2, HAVE_SIMD_NEON
- Provides SIMD_TYPE macro for debugging
- Uses compiler defines for cleaner conditional compilation
- Priority: SSE4.2 > NEON > SSE2 > scalar
3. Added SSE2 fallback implementation:
- Uses SSE2 instructions available on all x86_64 CPUs
- Provides SIMD benefits even on older processors
- Uses bit manipulation for efficient character matching
4. Updated parse.c to use new SIMD architecture:
- scan_string_SSE42() for SSE4.2 capable CPUs
- scan_string_SSE2() for older x86_64 CPUs
- Automatic selection at initialization
Performance:
- Equivalent performance to baseline with --with-sse42
- All tests pass (445 runs, 986 assertions, 0 failures)
- SIMD now enabled by default without any flags
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <[email protected]>
* Optimize SIMD string scanning with prefetching and parallel processing
This commit improves SIMD performance by processing 64 bytes per iteration
with prefetching and branch hints for better CPU utilization.
Optimizations:
1. Process 64 bytes (4x16-byte chunks) per iteration instead of 16
2. Prefetch next cache line with __builtin_prefetch()
3. Load all chunks before comparing (better instruction-level parallelism)
4. Add __builtin_expect() branch hints (matches are unlikely in long strings)
5. Applied to both SSE4.2 and SSE2 implementations
Performance improvements (50K iterations):
- Strings with escape sequences: 8.3% faster (0.166s -> 0.152s)
- Long strings (~2KB): 3.8% faster (0.145s -> 0.140s)
- Short strings: 0.8% faster (1.945s -> 1.929s)
All tests pass: 445 runs, 986 assertions, 0 failures
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <[email protected]>
* Remove deprecated OJ_USE_SSE4_2 define
Use only compiler-provided __SSE4_2__ define for SIMD detection.
The old OJ_USE_SSE4_2 macro is no longer needed since we rely on
compiler flags (-msse4.2) which automatically define __SSE4_2__.
This simplifies the code and removes legacy configuration.
---------
Co-authored-by: Claude <[email protected]>1 parent 8929358 commit 318bf55
3 files changed
+156
-25
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
35 | 35 | | |
36 | 36 | | |
37 | 37 | | |
38 | | - | |
39 | | - | |
40 | | - | |
41 | | - | |
42 | | - | |
43 | | - | |
44 | | - | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
45 | 44 | | |
46 | 45 | | |
47 | 46 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
| 18 | + | |
18 | 19 | | |
19 | 20 | | |
20 | | - | |
21 | | - | |
22 | | - | |
23 | | - | |
24 | 21 | | |
25 | 22 | | |
26 | 23 | | |
| |||
202 | 199 | | |
203 | 200 | | |
204 | 201 | | |
205 | | - | |
206 | | - | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
207 | 206 | | |
208 | 207 | | |
209 | | - | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
210 | 241 | | |
211 | | - | |
| 242 | + | |
| 243 | + | |
212 | 244 | | |
213 | | - | |
214 | | - | |
215 | | - | |
216 | | - | |
| 245 | + | |
217 | 246 | | |
218 | | - | |
219 | | - | |
220 | | - | |
221 | | - | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
222 | 314 | | |
223 | 315 | | |
224 | 316 | | |
| |||
228 | 320 | | |
229 | 321 | | |
230 | 322 | | |
231 | | - | |
232 | | - | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
233 | 327 | | |
| 328 | + | |
234 | 329 | | |
235 | 330 | | |
236 | 331 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
4 | 27 | | |
5 | 28 | | |
6 | 29 | | |
7 | 30 | | |
8 | 31 | | |
9 | 32 | | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
10 | 47 | | |
0 commit comments