Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix list tightness. #269

Merged
merged 1 commit into from
Aug 8, 2023
Merged

Conversation

taku0
Copy link
Contributor

@taku0 taku0 commented Feb 9, 2023

According to the specification, blank lines in a block quote doesn't separate list items:
https://spec.commonmark.org/0.30/#example-320

Therefore, the following example should be tight:

- > - a
  >
- b

The specification also say that link reference definitions can be children of list items when checking list tightness: https://spec.commonmark.org/0.30/#example-317

Therefore, the following example should be loose:

- [aaa]: /

  [bbb]: /
- b

This commit fixes those problems with the following strategy:

  • Using source end position and start position of adjoining elements to check tightness.
    This requires adjusting source end position of some block types to exclude trailing blank lines. This introduces an incompatible change to sourcepos property of Node. If this cannot be acceptable, I will add a field to Node to hold source position for list tightness.

  • Delaying removal of link reference definitions until the entire document is parsed.

Re: CONTRIBUTING.md

Ensure that all tests pass (make test).

$ make test
...
723 tests passed, 0 failed.

Use make lint to check for problems.

$ make lint
npm run lint

> [email protected] lint
> eslint .

Ensure that performance has not been affected (make bench before and after the change).

master:

commonmark.js x 3,754 ops/sec ±0.96% (92 runs sampled)
showdown.js x 407 ops/sec ±1.36% (85 runs sampled)
marked.js x 1,652 ops/sec ±1.28% (90 runs sampled)
markdown-it x 3,680 ops/sec ±1.02% (90 runs sampled)

This branch:

commonmark.js x 3,756 ops/sec ±1.14% (91 runs sampled)
showdown.js x 422 ops/sec ±0.81% (88 runs sampled)
marked.js x 1,690 ops/sec ±1.03% (92 runs sampled)
markdown-it x 3,724 ops/sec ±0.99% (91 runs sampled)

@taku0 taku0 force-pushed the tighten-list-blockquote-list branch from 613bae3 to 0f87d28 Compare February 9, 2023 15:43
@taku0
Copy link
Contributor Author

taku0 commented Feb 10, 2023

cmark works well for the former example but not the latter example.

@taku0
Copy link
Contributor Author

taku0 commented Feb 10, 2023

cmark fails to handle the following case while the master branch of commonmark.js can handle it:

- ***
  [aaa]: /

  bbb
- c

@taku0
Copy link
Contributor Author

taku0 commented Feb 10, 2023

Is the following example tight or loose? The master branch handles it as loose while this branch handles it tight.

- <pre>

- </pre>
- a

@jgm
Copy link
Member

jgm commented Feb 10, 2023

That last one should be loose I think.

@taku0 taku0 force-pushed the tighten-list-blockquote-list branch from 0f87d28 to 2f8593f Compare February 11, 2023 04:03
@taku0
Copy link
Contributor Author

taku0 commented Feb 11, 2023

Updated to exlucde trailing blank lines from HTML blocks when checking tightness. So the previous example is now loose.

This behaviour seems inconsistent with the fenced code block:

- ```
  the following blank line is a part of fenced code block. Therefore, not
  separates list item:

- aaa

* <pre>The current implementation trims trailing blank lines from HTML blocks.
  Therefore, the following line is **not** a part of HTML block, so it
  separates the list items:

* The HTML block ends together with its parent (list item).
* </pre>
  The preceding line starts a new HTML block, it also ends together with its
  parent.

-     Indented code blocks are explicitly stated to exclude trailling blank
      lines.  Therefore, the following line is not a part of indentated code
      block and separetes the list items:
      
- aaa

Maybe we should update the spec to specify whether each block types contain trailing blank lines or not.

@jgm
Copy link
Member

jgm commented Feb 11, 2023

Ah. I hadn't taken into account of the fact that the blank line belongs to the raw HTML block (which it does according to the spec).

The spec says: "A list is loose if any of its constituent list items are separated by blank lines, or if any of its constituent list items directly contain two block-level elements with a blank line between them." So, this case isn't loose after all, and I guess this is a bug in the behavior of the reference implementation.

@taku0 taku0 force-pushed the tighten-list-blockquote-list branch from 2f8593f to 8e99b30 Compare February 12, 2023 02:49
@taku0
Copy link
Contributor Author

taku0 commented Feb 12, 2023

Fixed HTML blocks to include trailing blank lines. The previous example is now tight.

@taku0
Copy link
Contributor Author

taku0 commented Aug 6, 2023

@jgm can this be merged? Please let me know if there is anything I should do to merge this pull request.

@jgm
Copy link
Member

jgm commented Aug 6, 2023

Sorry for the delay on this. Can you briefly summarize the changes and their rationale? It's a fairly big PR.

According to the specification, blank lines in a block quote doesn't
separate list items:
https://spec.commonmark.org/0.30/#example-320

Therefore, the following example should be tight:

- > - a
  >
- b

The specification also say that link reference definitions can be
children of list items when checking list tightness:
https://spec.commonmark.org/0.30/#example-317

Therefore, the following example should be loose:

- [aaa]: /

  [bbb]: /
- b

This commit fixes those problems with the following strategy:

- Using source end position and start position of adjoining elements to
  check tightness.
  This requires adjusting source end position of some block types to
  exclude trailing blank lines.

- Delaying removal of link reference definitions until the entire document is
  parsed.
@taku0 taku0 force-pushed the tighten-list-blockquote-list branch from 8e99b30 to d1d3d17 Compare August 7, 2023 12:34
@taku0
Copy link
Contributor Author

taku0 commented Aug 7, 2023

TL;DR

The current implementation is based on tangled buggy imperative mutable state updates. My implementation simply checks the line numbers of adjacent items to check list tightness.

Longer explanation

I will explain as follows:

  • Requirement
  • Current Implementation
  • Problem
  • My Implementation

“Current Implementation” and “Problem” are long and complicated. You can skip to “My Implementation” section after reading “Requirement” and back to those sections later.

Requirement

A list is loose if and only if:

  • some adjacent items are separated by one or more blank lines, or
  • some items contain adjacent elements separated by one or more blank lines.

Current Implementation

Node have a flag _lastLineBlank, which indicates that it is followed by blank lines.

this._lastLineBlank = false;

It is updated imperatively each time a line is processed in incorporateLine by two cases:

  • When the current line is blank and the current container contains a child, the last child is marked as _lastLineBlank.

    commonmark.js/lib/blocks.js

    Lines 838 to 840 in 20b52e5

    if (this.blank && container.lastChild) {
    container.lastChild._lastLineBlank = true;
    }

    This covers when the last child is explicitly closed and followed by a blank:

    - <pre>
      aaa
      </pre>
    
    - The previous item contains a `pre` element closed by `</pre>` and followed by a blank line.
  • When the current line is blank and we cannot conclude at this time that the current container is closed, we temporarily mark the current container and its ancestors as _lastLineBlank. If we get a non-blank line later without closing the current container, the flag will be cleared.

    - aaa
      - When the following blank line is processed, we cannot tell whether
        the outer list is tight or loose, because if the following `bbb` is
        missing but the blank line is present, both the inner list and the outer
        list are loose, but if the `bbb` is present, only the inner list is loose.
        So we mark both lists as `_lastLineBlank` for now and clear it when `bbb`
        is read.
    
        bbb
    - ccc

    commonmark.js/lib/blocks.js

    Lines 848 to 863 in 20b52e5

    var lastLineBlank =
    this.blank &&
    !(
    t === "block_quote" ||
    (t === "code_block" && container._isFenced) ||
    (t === "item" &&
    !container._firstChild &&
    container.sourcepos[0][0] === this.lineNumber)
    );
    // propagate lastLineBlank up through parents:
    var cont = container;
    while (cont) {
    cont._lastLineBlank = lastLineBlank;
    cont = cont._parent;
    }

    However, there are many exceptions to this rule:

    • Lines just inside block quote nodes are not blank.

      To be precise, block quote itself cannot have blank lines but blocks inside block quote nodes can contain blank lines:

      - > The outer list is tight because the following line is not blank.
        >
        > - This inner list is loose because the following line is blank for the
        >   inner list but not for the outer list.
        >
        > - aaa
      - bbb
    • Lines in fenced code are never blank.

    • The container must not be an empty list item that was just created while the current line was being processed.

Problem

The current implementation have bugs:

  • The “block quote exception” aforementioned is checked only against the innermost container:

    - > - a
      >
    - When the preceding line is processed, the innermost container is the inner
      list item, so the “block quote exception” doesn't apply and the
      `_lastLineBlank` flag is set.
  • We must also exclude blank lines inside HTML blocks but the current implementation doesn't:

    - <pre> The following line is part of the HTML block and not separate items.
    
    - The block is implicitly closed at the end of the item, so this line is not
      part of the HTML block.
  • The flag is updated before closing list node, so that the _lastLineBlank flag of the inner list below is cleared but one of its item is not cleared as expected since the item is closed before clearing the flag.

    * foo
      * bar
    
      baz

    This is compensated by endsWithBlankLine by recursing into children of list node, but it complicates the whole logic and introduces another flag _lastLineChecked to Node for the sake of performance or other reasons (it passes all tests without the flag). The flag makes endsWithBlankLine impure and we have to take care of calling order and number of invocations.

  • The tightness of list is checked after link reference definitions are removed, so the following example is marked as tight but it should be loose:

    - [aaa]: /
    
      [bbb]: /
    - b
  • Last but not least, the whole logic is tangled and hard to follow (I'm not sure why _lastLineChecked is there) and prone to bugs.

My Implementation

We could fix these bugs one by one, but I propose a cleaner logic:

  • Adjacent blocks are separated by one or more blank lines if and only if the end line number of the preceding block is not equal to the start line number of the following block minus one:

    // Returns true if block ends with a blank line.
    var endsWithBlankLine = function(block) {
        return block.next &&
            block.sourcepos[1][0] !== block.next.sourcepos[0][0] - 1;
    };
  • List is loose if and only if:

    • some adjacent items are separated by one or more blank lines, or
    • some items contain adjacent elements separated by one or more blank lines.
    var item = block._firstChild;
    while (item) {
        // check for non-final list item ending with blank line:
        if (item._next && endsWithBlankLine(item)) {
            block._listData.tight = false;
            break;
        }
        // recurse into children of list item, to see if there are
        // spaces between any of them:
        var subitem = item._firstChild;
        while (subitem) {
            if (
                subitem._next &&
                endsWithBlankLine(subitem)
            ) {
                block._listData.tight = false;
                break;
            }
            subitem = subitem._next;
        }
        item = item._next;
    }
  • Removal of link reference definitions is deferred until the whole document is finalized (see removeLinkReferenceDefinitions).

We have to set the end position of each node precisely, but this is not a bad thing. I have tweaked finalize methods of some node types for this.

@jgm
Copy link
Member

jgm commented Aug 7, 2023

Thank you very much for that lucid and detailed explanation.

I believe that the reason I didn't use the "compare end line to start line of next block" method was that the closing line positions for indented code and lists included the blank lines following them. For example:

 % bin/commonmark --sourcepos
1. a
   - b
   - c


2. d
<ol data-sourcepos="1:1-6:4">
<li data-sourcepos="1:1-5:0">
<p data-sourcepos="1:4-1:4">a</p>
<ul data-sourcepos="2:4-5:0">
<li data-sourcepos="2:4-2:6">b</li>
<li data-sourcepos="3:4-5:0">c</li>
</ul>
</li>
<li data-sourcepos="6:1-6:4">
<p data-sourcepos="6:4-6:4">d</p>
</li>
</ol>

Note that the inner list goes from lines 2-5, and the next item starts on 6, so this woldu be tight as judged by the "next line" test.

You say that you've modified the finalizers to produce more accurate end positions, which would fix the problem. I saw this code for indented code blocks but didn't notice it for lists -- is it there?

Anyway, as long as this problem is dealt with, I agree that your approach is much cleaner!

Have you checked benchmarks and pathological tests to make sure that there are no bad effects? I imagine that handling reference links at the end by walking the whole document might be slightly less performant than handling them in the finalizers, but I'm guessing the difference is miniscule. (EDIT: I'd also expect some increase in performance from the more streamlined tight/loose checking.)

@taku0
Copy link
Contributor Author

taku0 commented Aug 8, 2023

@jgm

You say that you've modified the finalizers to produce more accurate end positions, which would fix the problem. I saw this code for indented code blocks but didn't notice it for lists -- is it there?

It is adjusted in finalize method of item and list type.

The end position of a item is the end position of the last child if any. If the item is empty, the end position is the position after the marker and padding.

commonmark.js/lib/blocks.js

Lines 356 to 367 in d1d3d17

finalize: function(parser, block) {
if (block._lastChild) {
block.sourcepos[1] = block._lastChild.sourcepos[1];
} else {
// Empty list item
block.sourcepos[1][0] = block.sourcepos[0][0];
block.sourcepos[1][1] =
block._listData.markerOffset + block._listData.padding;
}
return;
},

The end position of a list is the end position of the last item.

block.sourcepos[1] = block._lastChild.sourcepos[1];

Here is the output of commonmark --sourcepos:

$ bin/commonmark --sourcepos
1. a
   - b
   - c


2. d
<ol data-sourcepos="1:1-6:4">
<li data-sourcepos="1:1-3:6">
<p data-sourcepos="1:4-1:4">a</p>
<ul data-sourcepos="2:4-3:6">
<li data-sourcepos="2:4-2:6">b</li>
<li data-sourcepos="3:4-3:6">c</li>
</ul>
</li>
<li data-sourcepos="6:1-6:4">
<p data-sourcepos="6:4-6:4">d</p>
</li>
</ol>

Have you checked benchmarks and pathological tests to make sure that there are no bad effects?

make test 723 tests passed, 0 failed. (click for full output)
$ make test
npm test

> [email protected] pretest
> npm run build


> [email protected] build
> rollup -c


lib/index.js → dist/commonmark.js, dist/commonmark.min.js...
created dist/commonmark.js, dist/commonmark.min.js in 2s

> [email protected] test
> node ./test/test

Spec tests [test/spec.txt]:
Tabs  ✓✓✓✓✓✓✓✓✓✓✓
Backslash escapes  ✓✓✓✓✓✓✓✓✓✓✓✓✓
Entity and numeric character references  ✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓
Precedence  ✓
Thematic breaks  ✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓
ATX headings  ✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓
Setext headings  ✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓
Indented code blocks  ✓✓✓✓✓✓✓✓✓✓✓✓
Fenced code blocks  ✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓
HTML blocks  ✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓
Link reference definitions  ✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓
Paragraphs  ✓✓✓✓✓✓✓✓
Blank lines  ✓
Block quotes  ✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓
List items  ✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓
Lists  ✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓
Inlines  ✓
Code spans  ✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓
Emphasis and strong emphasis  ✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓
Links  ✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓
Images  ✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓
Autolinks  ✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓
Raw HTML  ✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓
Hard line breaks  ✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓
Soft line breaks  ✓✓
Textual content  ✓✓✓
Elapsed time: 34.399ms

Spec tests [test/smart_punct.txt]:
Smart punctuation  ✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓
Elapsed time: 0.709ms

Spec tests [test/regression.txt]:
Regression tests  ✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓
Elapsed time: 1.142ms

Pathological cases:
U+0000 in input ✓
  elapsed time: 0.18ms
alternate line endings ✓
  elapsed time: 0.05ms
nested strong emph 1000 deep ✓
  elapsed time: 17.393ms
nested strong emph 10000 deep ✓
  elapsed time: 104.974ms
1000 emph closers with no openers ✓
  elapsed time: 0.465ms
10000 emph closers with no openers ✓
  elapsed time: 7.528ms
1000 emph openers with no closers ✓
  elapsed time: 0.434ms
10000 emph openers with no closers ✓
  elapsed time: 6.858ms
1000 openers and closers multiple of 3 ✓
  elapsed time: 0.515ms
10000 openers and closers multiple of 3 ✓
  elapsed time: 6.11ms
1000 #172 ✓
  elapsed time: 1.372ms
10000 #172 ✓
  elapsed time: 30.178ms
1000 link closers with no openers ✓
  elapsed time: 0.318ms
10000 link closers with no openers ✓
  elapsed time: 6.703ms
1000 link openers with no closers ✓
  elapsed time: 0.402ms
10000 link openers with no closers ✓
  elapsed time: 7.793ms
1000 link openers and emph closers ✓
  elapsed time: 0.678ms
10000 link openers and emph closers ✓
  elapsed time: 16.744ms
1000 mismatched openers and closers ✓
  elapsed time: 0.81ms
10000 mismatched openers and closers ✓
  elapsed time: 22.765ms
1000 pattern [ (]( ✓
  elapsed time: 2.047ms
10000 pattern [ (]( ✓
  elapsed time: 27.861ms
nested brackets 1000 deep ✓
  elapsed time: 0.451ms
nested brackets 10000 deep ✓
  elapsed time: 19.995ms
nested block quote 1000 deep ✓
  elapsed time: 1.295ms
nested block quote 10000 deep ✓
  elapsed time: 27.169ms
[\\... 1000 deep ✓
  elapsed time: 12.591ms
[\\... 10000 deep ✓
  elapsed time: 1.11ms
10 backslashes in unclosed link title ✓
  elapsed time: 0.137ms
100 backslashes in unclosed link title ✓
  elapsed time: 0.038ms
1000 backslashes in unclosed link title ✓
  elapsed time: 0.132ms

723 tests passed, 0 failed.

I imagine that handling reference links at the end by walking the whole document might be slightly less performant than handling them in the finalizers, but I'm guessing the difference is miniscule. (EDIT: I'd also expect some increase in performance from the more streamlined tight/loose checking.)

Performance impact isn't significant:

master branch:

$ make bench
node bench/bench.js bench/samples/README.md
commonmark.js x 3,931 ops/sec ±0.78% (92 runs sampled)
showdown.js x 442 ops/sec ±0.71% (91 runs sampled)
marked.js x 1,709 ops/sec ±0.82% (92 runs sampled)
markdown-it x 3,933 ops/sec ±0.69% (94 runs sampled)

This branch:

$ make bench
node bench/bench.js bench/samples/README.md
commonmark.js x 3,974 ops/sec ±1.25% (90 runs sampled)
showdown.js x 441 ops/sec ±0.54% (91 runs sampled)
marked.js x 1,715 ops/sec ±0.83% (93 runs sampled)
markdown-it x 3,919 ops/sec ±0.65% (94 runs sampled)

@vassudanagunta
Copy link
Contributor

I'm not sure if the point I'm about to make is moot because CommonMark is already too far gone down this path, in which case taking it to its logical conclusion, as this PR does, may be the smart thing to do.

I think it is a mistake to determine list "looseness" based on blank lines in the plain text, because such blank lines are often an artifact of syntax, not semantics.

For example, the following two lists are semantically identical, but because of an artifact of syntax, CommonMark considers the first loose and the second tight:

- item one
- item two
  
  Loose Lists Sink Ships
  ======================
  more text
- item three
- item one
- item two
  # How Not to Run a Tight List
  more text
- item three

What is a tight list intuitively?

A tight list is a list that can be readably rendered with no additional vertical whitespace beyond regular intra-paragraph line spacing.

What, then, is a tight list semantically?

A tight list is a list all of whose items have only inline text content. They do not contain paragraphs or any other block content, with one exception: a nested list which itself is tight.

Because of the list markers, such list items do not need the vertical margin that usually separates paragraphs. In addition, because the list item does not contain a sequence of two or more blocks beyond the one exception, it won't have internal vertical whitespace that would necessitate commensurate whitespace above and below it to effect a more readable and aesthetic visual grouping of elements.

Thus if we could go back in time, I'd push the following for the CommonMark spec:

  • A list items whose content is a single block of text is parsed as a list item with inline content.
  • A list items whose content is a single block of text followed by a single nested list is parsed as a list item with inline content followed by a single nested list.
  • In all other cases, the list item's content is parsed as a sequence of blocks.
  • The AST would reflect this. How it is rendered would be up to the renderer, but the natural rendering would be exactly as one would expect.

Gruber got this both right and wrong

So while Gruber's Markdown produces the same result as CommonMark for the first list above, it yields something entirely different for the second:

<ul>
  <li>item one</li>
  <li>item two
    # How Not to Run a Tight List
    more text</li>
  <li>item three</li>
</ul>

I haven't looked as his code, but based on empirical data, it looks like Gruber parses list item content this way:

  • If it has a blank line, it is parsed as nested blocks.
  • If it does not have a blank line, it is parsed as inline content.

The end result puts Gruber's interpretation somewhere between CommonMark's and mine. It's consistent with my definition of tightness as it ties tightness strictly to whether the list item contains inlines or blocks. But like CommonMark, its behavior is governed by the presence or absence of blank lines. The result is a bit backward, as seen above: the lack of blank line predetermines that the content is inline. Thus what would otherwise be seen as two paragraphs separated by a heading is not. It violates CommonMark's principle of uniformity.

Back to the PR

Let's pretend that I am right. Is the right thing to do cutting losses? Or doubling down?

Or is there a way to steer CommonMark's take closer to what I propose without breaking anything but crazy corner cases?

Or, maybe, again pretending that I'm right, correcting this should be left to new formats such as djot.

@vassudanagunta
Copy link
Contributor

Addendum

The following, per CommonMark, are a tight list and a loose list:

- > - a
- b
- > - a

- b

But because the first item in both cases contains a nested block, there is no visual difference in typical HTML renderings. You can see this when comparing the rendering of one against the other by any CommonMark compliant renders, as well as comparing to the rendering of those, like Pandoc, that appear to follow my definition.

The only difference is in the details of the underlying HTML, which will only matter if you target that difference with CSS or Javascript.

@jgm
Copy link
Member

jgm commented Aug 8, 2023

@vassudanagunta this larger discussion doesn't really belong in this PR.
This PR fixes a problem in the javascript parser, and does it well. Your point belongs in a discussion of the spec.

@jgm
Copy link
Member

jgm commented Aug 8, 2023

@taku0 this is great. You've not only dramatically simplified the tight/list detection, and fixed a bug, you've also improved the accuracy of source positions. Thank you very much for the patch and the clear explanations.

@jgm jgm merged commit df3ea1e into commonmark:master Aug 8, 2023
6 checks passed
@vassudanagunta
Copy link
Contributor

sorry @jgm. It wasn't obvious to me that this was a no-brainer change.

@taku0
Copy link
Contributor Author

taku0 commented Aug 10, 2023

Thank you for merging. I will port this to cmark soon.

taku0 added a commit to taku0/cmark that referenced this pull request Aug 17, 2023
- Set the end position precisely
- Check list tightness by comparing line numbers
- Remove `LAST_LINE_BLANK` flag

See also commonmark/commonmark.js#269 .

Classification of end positions:

- The end of the current line:
  - Thematic breaks
  - ATX headings
  - Setext headings
  - Fenced code blocks closed explicitly
  - HTML blocks (`pre`, comments, and others)

- The end of the previous line:
  - Fenced code blocks closed by the end of the parent or EOF
  - HTML blocks (`div` and others)
  - HTML blocks closed by the end of the parent or EOF
  - Paragraphs
  - Block quotes
  - Empty list items

- The end position of the last child:
  - Non-empty list items
  - Lists

- The end position of the last non-blank line:
  - Indented code blocks

The first two cases are handed by `finalize` and `closed_explicitly` flag.

Non empty list items and lists are handled by `switch` statements in `finalize`.

Indented code blocks are handled by setting the end position every time
non-blank line is added to the block.
taku0 added a commit to taku0/cmark that referenced this pull request Aug 17, 2023
- Set the end position precisely
- Check list tightness by comparing line numbers
- Remove `LAST_LINE_BLANK` flag

See also commonmark/commonmark.js#269 .

Classification of end positions:

- The end of the current line:
  - Thematic breaks
  - ATX headings
  - Setext headings
  - Fenced code blocks closed explicitly
  - HTML blocks (`pre`, comments, and others)

- The end of the previous line:
  - Fenced code blocks closed by the end of the parent or EOF
  - HTML blocks (`div` and others)
  - HTML blocks closed by the end of the parent or EOF
  - Paragraphs
  - Block quotes
  - Empty list items

- The end position of the last child:
  - Non-empty list items
  - Lists

- The end position of the last non-blank line:
  - Indented code blocks

The first two cases are handed by `finalize` and `closed_explicitly` flag.

Non empty list items and lists are handled by `switch` statements in `finalize`.

Indented code blocks are handled by setting the end position every time
non-blank line is added to the block.
taku0 added a commit to taku0/cmark that referenced this pull request Aug 17, 2023
- Set the end position precisely
- Check list tightness by comparing line numbers
- Remove `LAST_LINE_BLANK` flag

See also commonmark/commonmark.js#269 .

Classification of end positions:

- The end of the current line:
  - Thematic breaks
  - ATX headings
  - Setext headings
  - Fenced code blocks closed explicitly
  - HTML blocks (`pre`, comments, and others)

- The end of the previous line:
  - Fenced code blocks closed by the end of the parent or EOF
  - HTML blocks (`div` and others)
  - HTML blocks closed by the end of the parent or EOF
  - Paragraphs
  - Block quotes
  - Empty list items

- The end position of the last child:
  - Non-empty list items
  - Lists

- The end position of the last non-blank line:
  - Indented code blocks

The first two cases are handed by `finalize` and `closed_explicitly` flag.

Non empty list items and lists are handled by `switch` statements in `finalize`.

Indented code blocks are handled by setting the end position every time
non-blank line is added to the block.
taku0 added a commit to taku0/cmark that referenced this pull request Aug 17, 2023
- Set the end position precisely
- Check list tightness by comparing line numbers
- Remove `LAST_LINE_BLANK` flag

See also commonmark/commonmark.js#269 .

Classification of end positions:

- The end of the current line:
  - Thematic breaks
  - ATX headings
  - Setext headings
  - Fenced code blocks closed explicitly
  - HTML blocks (`pre`, comments, and others)

- The end of the previous line:
  - Fenced code blocks closed by the end of the parent or EOF
  - HTML blocks (`div` and others)
  - HTML blocks closed by the end of the parent or EOF
  - Paragraphs
  - Block quotes
  - Empty list items

- The end position of the last child:
  - Non-empty list items
  - Lists

- The end position of the last non-blank line:
  - Indented code blocks

The first two cases are handed by `finalize` and `closed_explicitly` flag.

Non empty list items and lists are handled by `switch` statements in `finalize`.

Indented code blocks are handled by setting the end position every time
non-blank line is added to the block.
taku0 added a commit to taku0/cmark that referenced this pull request Aug 17, 2023
- Set the end position precisely
- Check list tightness by comparing line numbers
- Remove `LAST_LINE_BLANK` flag

See also commonmark/commonmark.js#269 .

Classification of end positions:

- The end of the current line:
  - Thematic breaks
  - ATX headings
  - Setext headings
  - Fenced code blocks closed explicitly
  - HTML blocks (`pre`, comments, and others)

- The end of the previous line:
  - Fenced code blocks closed by the end of the parent or EOF
  - HTML blocks (`div` and others)
  - HTML blocks closed by the end of the parent or EOF
  - Paragraphs
  - Block quotes
  - Empty list items

- The end position of the last child:
  - Non-empty list items
  - Lists

- The end position of the last non-blank line:
  - Indented code blocks

The first two cases are handed by `finalize` and `closed_explicitly` flag.

Non empty list items and lists are handled in `switch` statements in `finalize`.

Indented code blocks are handled by setting the end position every time
non-blank line is added to the block.
taku0 added a commit to taku0/cmark that referenced this pull request Aug 19, 2023
- Set the end position precisely
- Check list tightness by comparing line numbers
- Remove `LAST_LINE_BLANK` flag

See also commonmark/commonmark.js#269 .

Classification of end positions:

- The end of the current line:
  - Thematic breaks
  - ATX headings
  - Setext headings
  - Fenced code blocks closed explicitly
  - HTML blocks (`pre`, comments, and others)

- The end of the previous line:
  - Fenced code blocks closed by the end of the parent or EOF
  - HTML blocks (`div` and others)
  - HTML blocks closed by the end of the parent or EOF
  - Paragraphs
  - Block quotes
  - Empty list items

- The end position of the last child:
  - Non-empty list items
  - Lists

- The end position of the last non-blank line:
  - Indented code blocks

The first two cases are handed by `finalize` and `closed_explicitly` flag.

Non empty list items and lists are handled in `switch` statements in `finalize`.

Indented code blocks are handled by setting the end position every time
non-blank line is added to the block.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants