Skip to content

refactor(yaml): make yaml string based #6590

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

timreichen
Copy link
Contributor

This PR makes the yaml parser string based where possible. This is easier to read and allows for simplifications in future PRs. It seems to have only a tiny impact on performance:
Current:

benchmark                                                                         time/iter (avg)        iter/s      (min … max)           p75      p99     p995
--------------------------------------------------------------------------------- ----------------------------- --------------------- --------------------------
parse() handles single document yaml string                                                4.8 µs       207,800 (  4.5 µs … 174.0 µs)   4.7 µs   6.0 µs   8.0 µs
parseAll() handles yaml string with multiple documents                                     8.1 µs       122,700 (  7.6 µs … 170.5 µs)   8.0 µs  10.3 µs  19.6 µs
parse() throws with `!!js/*` yaml types with default schemas                              13.0 µs        77,080 ( 10.4 µs …   9.9 ms)  11.5 µs  43.7 µs  72.3 µs
parse() handles `!!js/*` yaml types with extended schema while parsing                     6.4 µs       155,100 (  6.3 µs …   7.0 µs)   6.4 µs   7.0 µs   7.0 µs
parse() throws with `!!js/function` yaml type with extended schema                        12.3 µs        81,570 ( 11.0 µs … 331.2 µs)  12.5 µs  19.1 µs  29.2 µs
parseAll() accepts parse options                                                           5.7 µs       174,100 (  5.7 µs …   6.1 µs)   5.8 µs   6.1 µs   6.1 µs
parse() handles __proto__                                                                 30.3 ms          33.0 ( 29.5 ms …  32.8 ms)  30.8 ms  32.8 ms  32.8 ms
parse() returns `null` when yaml is empty or only comments                               833.3 ns     1,200,000 (814.3 ns … 885.3 ns) 834.8 ns 885.3 ns 885.3 ns
parse() handles binary type                                                               38.0 µs        26,330 ( 34.1 µs … 369.6 µs)  36.5 µs  85.0 µs  96.5 µs
parse() handles boolean types                                                             16.7 µs        60,050 ( 15.8 µs … 171.3 µs)  16.4 µs  24.8 µs  44.4 µs
parse() handles float types                                                                8.4 µs       118,900 (  8.0 µs … 160.3 µs)   8.3 µs   9.0 µs  10.3 µs
parse() handles integer types                                                             74.0 µs        13,520 ( 71.2 µs … 310.5 µs)  73.6 µs 103.8 µs 109.6 µs
parse() handles timestamp types                                                           31.1 µs        32,180 ( 29.7 µs … 266.0 µs)  31.0 µs  36.6 µs  61.7 µs
parse() handles omap type                                                                 54.0 µs        18,500 ( 51.7 µs … 190.8 µs)  53.6 µs  76.8 µs  88.3 µs
parse() handles !!pairs type                                                              30.9 µs        32,410 ( 29.4 µs … 198.7 µs)  30.6 µs  40.0 µs  62.1 µs
parse() handles anchors and aliases                                                       43.8 µs        22,850 ( 41.5 µs … 186.3 µs)  43.2 µs  79.9 µs  94.1 µs
parse() handles escaped strings in double quotes                                          22.7 µs        44,130 ( 21.3 µs … 291.5 µs)  22.5 µs  27.1 µs  40.0 µs
parse() handles single quoted scalar                                                      20.2 µs        49,430 ( 19.4 µs … 169.8 µs)  20.1 µs  22.2 µs  29.2 µs
parse() handles %YAML directive                                                           61.6 µs        16,230 ( 55.8 µs …   1.4 ms)  61.1 µs  81.2 µs  90.0 µs
parse() handles %TAG directive                                                            37.9 µs        26,360 ( 36.2 µs …   1.2 ms)  37.6 µs  46.1 µs  55.2 µs
parse() throws with invalid strings                                                       32.2 µs        31,080 ( 30.7 µs … 202.4 µs)  32.2 µs  36.0 µs  44.4 µs
parse() handles merge (<<) types                                                          19.6 µs        50,990 ( 18.5 µs … 493.4 µs)  19.4 µs  22.8 µs  29.5 µs
parse() handles block scalars                                                             48.1 µs        20,810 ( 46.3 µs … 464.9 µs)  47.8 µs  54.9 µs  68.2 µs
parse() handles BOM at the beginning of the yaml                                           1.4 µs       715,100 (  1.4 µs …   1.5 µs)   1.4 µs   1.5 µs   1.5 µs
parse() throws if there are more than one document in the yaml                             7.6 µs       131,000 (  7.6 µs …   7.8 µs)   7.6 µs   7.8 µs   7.8 µs
parse() throws when the directive name is empty                                            8.3 µs       120,600 (  8.2 µs …   8.5 µs)   8.3 µs   8.5 µs   8.5 µs
parse() ignores unknown directive                                                          9.0 µs       110,700 (  8.9 µs …   9.3 µs)   9.0 µs   9.3 µs   9.3 µs
parse() handles document separator                                                         1.5 µs       656,300 (  1.5 µs …   1.6 µs)   1.5 µs   1.6 µs   1.6 µs
parse() show warning with non-ascii line breaks if YAML version is < 1.2                  13.4 µs        74,480 ( 11.8 µs …   6.8 ms)  12.5 µs  16.6 µs  18.5 µs
parse() throws with directive only document                                                9.3 µs       107,000 (  9.2 µs …   9.6 µs)   9.4 µs   9.6 µs   9.6 µs
parse() throws with  in the middle of the document                                       11.7 µs        85,230 ( 11.2 µs … 267.5 µs)  11.7 µs  12.6 µs  13.7 µs
parse() handles complex mapping key                                                       25.9 µs        38,670 ( 25.0 µs … 349.8 µs)  25.6 µs  29.3 µs  35.0 µs
parse() handles unordered set                                                              3.9 µs       259,600 (  3.8 µs …   4.1 µs)   3.9 µs   4.1 µs   4.1 µs
parse() throws with empty mapping key                                                     11.3 µs        88,120 ( 10.8 µs … 264.0 µs)  11.2 µs  12.5 µs  19.5 µs
parse() throws on duplicate keys                                                          13.4 µs        74,860 ( 12.5 µs … 325.0 µs)  13.3 µs  17.3 µs  23.7 µs
parse() allows duplicate keys when `allowDuplicateKeys` option is set to `true`            3.3 µs       301,900 (  3.3 µs …   3.4 µs)   3.3 µs   3.4 µs   3.4 µs
parse() throws at reseverd characters '`' and '@'                                         18.8 µs        53,200 ( 18.1 µs … 301.2 µs)  18.7 µs  22.2 µs  30.2 µs
parse() handles sequence                                                                  11.7 µs        85,530 ( 11.0 µs … 340.3 µs)  11.6 µs  12.5 µs  14.1 µs
parse() handles mapping                                                                   13.5 µs        74,290 ( 12.8 µs … 356.1 µs)  13.2 µs  17.2 µs  27.8 µs
parse() handles string                                                                     1.2 µs       823,800 (  1.2 µs …   1.2 µs)   1.2 µs   1.2 µs   1.2 µs
parse() handles regexp value with extended schema option                                  38.4 µs        26,040 ( 36.8 µs … 310.8 µs)  38.1 µs  46.8 µs  55.8 µs

PR:

benchmark                                                                         time/iter (avg)        iter/s      (min … max)           p75      p99     p995
--------------------------------------------------------------------------------- ----------------------------- --------------------- --------------------------
parse() handles single document yaml string                                                5.0 µs       198,400 (  4.7 µs … 985.0 µs)   5.0 µs   6.4 µs   7.6 µs
parseAll() handles yaml string with multiple documents                                     8.3 µs       119,900 (  7.9 µs …  99.8 µs)   8.3 µs   9.3 µs  12.5 µs
parse() throws with `!!js/*` yaml types with default schemas                              11.6 µs        86,020 ( 10.4 µs …   2.4 ms)  11.4 µs  16.1 µs  22.9 µs
parse() handles `!!js/*` yaml types with extended schema while parsing                     6.8 µs       147,200 (  6.4 µs … 217.6 µs)   6.8 µs   7.5 µs   8.0 µs
parse() throws with `!!js/function` yaml type with extended schema                        12.1 µs        82,380 ( 11.3 µs … 219.2 µs)  12.3 µs  13.8 µs  16.2 µs
parseAll() accepts parse options                                                           5.9 µs       170,800 (  5.8 µs …   6.2 µs)   5.8 µs   6.2 µs   6.2 µs
parse() handles __proto__                                                                 30.2 ms          33.1 ( 29.7 ms …  31.3 ms)  30.5 ms  31.3 ms  31.3 ms
parse() returns `null` when yaml is empty or only comments                               850.1 ns     1,176,000 (823.5 ns … 888.3 ns) 853.8 ns 888.3 ns 888.3 ns
parse() handles binary type                                                               38.3 µs        26,100 ( 34.9 µs … 197.0 µs)  37.2 µs  84.7 µs  89.5 µs
parse() handles boolean types                                                             17.6 µs        56,720 ( 16.6 µs … 221.8 µs)  17.3 µs  26.6 µs  38.3 µs
parse() handles float types                                                                8.9 µs       111,900 (  8.5 µs … 138.7 µs)   8.9 µs   9.6 µs  11.5 µs
parse() handles integer types                                                             71.7 µs        13,960 ( 68.5 µs … 299.6 µs)  71.2 µs  93.1 µs 116.2 µs
parse() handles timestamp types                                                           31.4 µs        31,820 ( 30.0 µs … 225.2 µs)  31.4 µs  39.4 µs  49.2 µs
parse() handles omap type                                                                 52.0 µs        19,240 ( 49.8 µs … 186.2 µs)  51.8 µs  66.9 µs  80.4 µs
parse() handles !!pairs type                                                              31.0 µs        32,240 ( 29.6 µs … 159.6 µs)  31.0 µs  35.1 µs  48.6 µs
parse() handles anchors and aliases                                                       44.6 µs        22,440 ( 42.2 µs … 209.1 µs)  44.1 µs  73.5 µs  88.5 µs
parse() handles escaped strings in double quotes                                          22.8 µs        43,880 ( 21.4 µs … 234.4 µs)  22.7 µs  26.1 µs  38.0 µs
parse() handles single quoted scalar                                                      20.4 µs        49,020 ( 19.5 µs … 134.5 µs)  20.3 µs  22.5 µs  31.2 µs
parse() handles %YAML directive                                                           62.7 µs        15,950 ( 56.6 µs …   1.7 ms)  62.3 µs  75.3 µs  87.3 µs
parse() handles %TAG directive                                                            37.9 µs        26,370 ( 35.0 µs … 287.7 µs)  38.0 µs  44.5 µs  54.8 µs
parse() throws with invalid strings                                                       32.2 µs        31,020 ( 31.0 µs … 292.3 µs)  32.3 µs  38.1 µs  45.5 µs
parse() handles merge (<<) types                                                          20.0 µs        49,990 ( 19.0 µs … 227.6 µs)  19.9 µs  26.1 µs  34.2 µs
parse() handles block scalars                                                             49.0 µs        20,390 ( 46.6 µs … 308.2 µs)  48.7 µs  63.4 µs  75.1 µs
parse() handles BOM at the beginning of the yaml                                           1.4 µs       706,100 (  1.4 µs …   1.5 µs)   1.4 µs   1.5 µs   1.5 µs
parse() throws if there are more than one document in the yaml                             7.8 µs       128,400 (  7.7 µs …   8.0 µs)   7.8 µs   8.0 µs   8.0 µs
parse() throws when the directive name is empty                                            8.3 µs       120,900 (  8.2 µs …   8.4 µs)   8.3 µs   8.4 µs   8.4 µs
parse() ignores unknown directive                                                          9.1 µs       110,000 (  9.0 µs …   9.2 µs)   9.1 µs   9.2 µs   9.2 µs
parse() handles document separator                                                         1.6 µs       628,000 (  1.6 µs …   1.7 µs)   1.6 µs   1.7 µs   1.7 µs
parse() show warning with non-ascii line breaks if YAML version is < 1.2                  13.8 µs        72,310 ( 12.2 µs …   8.3 ms)  13.0 µs  16.1 µs  21.5 µs
parse() throws with directive only document                                                9.4 µs       106,600 (  9.2 µs …   9.7 µs)   9.4 µs   9.7 µs   9.7 µs
parse() throws with  in the middle of the document                                       12.0 µs        83,040 ( 11.5 µs … 254.2 µs)  12.0 µs  13.7 µs  20.0 µs
parse() handles complex mapping key                                                       26.7 µs        37,420 ( 25.2 µs … 362.0 µs)  26.4 µs  34.4 µs  47.5 µs
parse() handles unordered set                                                              3.9 µs       254,100 (  3.9 µs …   4.1 µs)   3.9 µs   4.1 µs   4.1 µs
parse() throws with empty mapping key                                                     11.4 µs        87,720 ( 11.0 µs … 264.1 µs)  11.4 µs  12.1 µs  12.5 µs
parse() throws on duplicate keys                                                          13.7 µs        73,140 ( 12.8 µs … 272.1 µs)  13.7 µs  14.6 µs  17.4 µs
parse() allows duplicate keys when `allowDuplicateKeys` option is set to `true`            3.4 µs       291,900 (  3.4 µs …   3.5 µs)   3.4 µs   3.5 µs   3.5 µs
parse() throws at reseverd characters '`' and '@'                                         19.0 µs        52,570 ( 18.2 µs … 274.6 µs)  19.0 µs  20.4 µs  23.4 µs
parse() handles sequence                                                                  12.0 µs        83,330 ( 11.3 µs … 332.6 µs)  11.9 µs  12.8 µs  15.2 µs
parse() handles mapping                                                                   13.8 µs        72,590 ( 13.1 µs … 364.5 µs)  13.6 µs  14.6 µs  16.9 µs
parse() handles string                                                                     1.3 µs       796,700 (  1.2 µs …   1.3 µs)   1.3 µs   1.3 µs   1.3 µs
parse() handles regexp value with extended schema option                                  38.5 µs        25,940 ( 36.8 µs … 320.4 µs)  38.2 µs  56.0 µs  60.5 µs

@timreichen timreichen requested a review from kt3k as a code owner April 15, 2025 21:18
@github-actions github-actions bot added the yaml label Apr 15, 2025
Copy link

codecov bot commented Apr 15, 2025

Codecov Report

Attention: Patch coverage is 93.44262% with 8 lines in your changes missing coverage. Please review.

Project coverage is 94.58%. Comparing base (8d814b0) to head (3ae9b23).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
yaml/_loader_state.ts 93.15% 3 Missing and 2 partials ⚠️
yaml/_dumper_state.ts 78.57% 0 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6590      +/-   ##
==========================================
- Coverage   94.58%   94.58%   -0.01%     
==========================================
  Files         579      579              
  Lines       43751    43762      +11     
  Branches     6494     6499       +5     
==========================================
+ Hits        41384    41392       +8     
- Misses       2327     2329       +2     
- Partials       40       41       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@kt3k
Copy link
Member

kt3k commented Apr 17, 2025

I'm not in favor of this change. The change doesn't seem worth the performance degradation.

@timreichen
Copy link
Contributor Author

I'm not in favor of this change. The change doesn't seem worth the performance degradation.

I admit it doesn't look like that in the current state. However I played around with a string based implementation and lots of functionality can be simplified once string based. One can introduce a scanner like in toml with startsWith(), match() etc. Which makes the algorithms not only easier to read but potentially even allows for performance gains (see #6538 (comment))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants