-
-
Notifications
You must be signed in to change notification settings - Fork 226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix list tightness. #269
Fix list tightness. #269
Conversation
613bae3
to
0f87d28
Compare
|
- ***
[aaa]: /
bbb
- c |
Is the following example tight or loose? The - <pre>
- </pre>
- a |
That last one should be loose I think. |
0f87d28
to
2f8593f
Compare
Updated to exlucde trailing blank lines from HTML blocks when checking tightness. So the previous example is now loose. This behaviour seems inconsistent with the fenced code block: - ```
the following blank line is a part of fenced code block. Therefore, not
separates list item:
- aaa
* <pre>The current implementation trims trailing blank lines from HTML blocks.
Therefore, the following line is **not** a part of HTML block, so it
separates the list items:
* The HTML block ends together with its parent (list item).
* </pre>
The preceding line starts a new HTML block, it also ends together with its
parent.
- Indented code blocks are explicitly stated to exclude trailling blank
lines. Therefore, the following line is not a part of indentated code
block and separetes the list items:
- aaa
Maybe we should update the spec to specify whether each block types contain trailing blank lines or not. |
Ah. I hadn't taken into account of the fact that the blank line belongs to the raw HTML block (which it does according to the spec). The spec says: "A list is loose if any of its constituent list items are separated by blank lines, or if any of its constituent list items directly contain two block-level elements with a blank line between them." So, this case isn't loose after all, and I guess this is a bug in the behavior of the reference implementation. |
2f8593f
to
8e99b30
Compare
Fixed HTML blocks to include trailing blank lines. The previous example is now tight. |
@jgm can this be merged? Please let me know if there is anything I should do to merge this pull request. |
Sorry for the delay on this. Can you briefly summarize the changes and their rationale? It's a fairly big PR. |
According to the specification, blank lines in a block quote doesn't separate list items: https://spec.commonmark.org/0.30/#example-320 Therefore, the following example should be tight: - > - a > - b The specification also say that link reference definitions can be children of list items when checking list tightness: https://spec.commonmark.org/0.30/#example-317 Therefore, the following example should be loose: - [aaa]: / [bbb]: / - b This commit fixes those problems with the following strategy: - Using source end position and start position of adjoining elements to check tightness. This requires adjusting source end position of some block types to exclude trailing blank lines. - Delaying removal of link reference definitions until the entire document is parsed.
8e99b30
to
d1d3d17
Compare
TL;DRThe current implementation is based on tangled buggy imperative mutable state updates. My implementation simply checks the line numbers of adjacent items to check list tightness. Longer explanationI will explain as follows:
“Current Implementation” and “Problem” are long and complicated. You can skip to “My Implementation” section after reading “Requirement” and back to those sections later. RequirementA list is loose if and only if:
Current Implementation
Line 77 in 20b52e5
It is updated imperatively each time a line is processed in
ProblemThe current implementation have bugs:
My ImplementationWe could fix these bugs one by one, but I propose a cleaner logic:
We have to set the end position of each node precisely, but this is not a bad thing. I have tweaked |
Thank you very much for that lucid and detailed explanation. I believe that the reason I didn't use the "compare end line to start line of next block" method was that the closing line positions for indented code and lists included the blank lines following them. For example:
Note that the inner list goes from lines 2-5, and the next item starts on 6, so this woldu be tight as judged by the "next line" test. You say that you've modified the finalizers to produce more accurate end positions, which would fix the problem. I saw this code for indented code blocks but didn't notice it for lists -- is it there? Anyway, as long as this problem is dealt with, I agree that your approach is much cleaner! Have you checked benchmarks and pathological tests to make sure that there are no bad effects? I imagine that handling reference links at the end by walking the whole document might be slightly less performant than handling them in the finalizers, but I'm guessing the difference is miniscule. (EDIT: I'd also expect some increase in performance from the more streamlined tight/loose checking.) |
It is adjusted in The end position of a Lines 356 to 367 in d1d3d17
The end position of a Line 301 in d1d3d17
Here is the output of
|
I'm not sure if the point I'm about to make is moot because CommonMark is already too far gone down this path, in which case taking it to its logical conclusion, as this PR does, may be the smart thing to do. I think it is a mistake to determine list "looseness" based on blank lines in the plain text, because such blank lines are often an artifact of syntax, not semantics. For example, the following two lists are semantically identical, but because of an artifact of syntax, CommonMark considers the first loose and the second tight: - item one
- item two
Loose Lists Sink Ships
======================
more text
- item three - item one
- item two
# How Not to Run a Tight List
more text
- item three What is a tight list intuitively?A tight list is a list that can be readably rendered with no additional vertical whitespace beyond regular intra-paragraph line spacing. What, then, is a tight list semantically?A tight list is a list all of whose items have only inline text content. They do not contain paragraphs or any other block content, with one exception: a nested list which itself is tight. Because of the list markers, such list items do not need the vertical margin that usually separates paragraphs. In addition, because the list item does not contain a sequence of two or more blocks beyond the one exception, it won't have internal vertical whitespace that would necessitate commensurate whitespace above and below it to effect a more readable and aesthetic visual grouping of elements. Thus if we could go back in time, I'd push the following for the CommonMark spec:
Gruber got this both right and wrongSo while Gruber's Markdown produces the same result as CommonMark for the first list above, it yields something entirely different for the second: <ul>
<li>item one</li>
<li>item two
# How Not to Run a Tight List
more text</li>
<li>item three</li>
</ul> I haven't looked as his code, but based on empirical data, it looks like Gruber parses list item content this way:
The end result puts Gruber's interpretation somewhere between CommonMark's and mine. It's consistent with my definition of tightness as it ties tightness strictly to whether the list item contains inlines or blocks. But like CommonMark, its behavior is governed by the presence or absence of blank lines. The result is a bit backward, as seen above: the lack of blank line predetermines that the content is inline. Thus what would otherwise be seen as two paragraphs separated by a heading is not. It violates CommonMark's principle of uniformity. Back to the PRLet's pretend that I am right. Is the right thing to do cutting losses? Or doubling down? Or is there a way to steer CommonMark's take closer to what I propose without breaking anything but crazy corner cases? Or, maybe, again pretending that I'm right, correcting this should be left to new formats such as djot. |
AddendumThe following, per CommonMark, are a tight list and a loose list:
But because the first item in both cases contains a nested block, there is no visual difference in typical HTML renderings. You can see this when comparing the rendering of one against the other by any CommonMark compliant renders, as well as comparing to the rendering of those, like Pandoc, that appear to follow my definition. The only difference is in the details of the underlying HTML, which will only matter if you target that difference with CSS or Javascript. |
@vassudanagunta this larger discussion doesn't really belong in this PR. |
@taku0 this is great. You've not only dramatically simplified the tight/list detection, and fixed a bug, you've also improved the accuracy of source positions. Thank you very much for the patch and the clear explanations. |
sorry @jgm. It wasn't obvious to me that this was a no-brainer change. |
Thank you for merging. I will port this to |
- Set the end position precisely - Check list tightness by comparing line numbers - Remove `LAST_LINE_BLANK` flag See also commonmark/commonmark.js#269 . Classification of end positions: - The end of the current line: - Thematic breaks - ATX headings - Setext headings - Fenced code blocks closed explicitly - HTML blocks (`pre`, comments, and others) - The end of the previous line: - Fenced code blocks closed by the end of the parent or EOF - HTML blocks (`div` and others) - HTML blocks closed by the end of the parent or EOF - Paragraphs - Block quotes - Empty list items - The end position of the last child: - Non-empty list items - Lists - The end position of the last non-blank line: - Indented code blocks The first two cases are handed by `finalize` and `closed_explicitly` flag. Non empty list items and lists are handled by `switch` statements in `finalize`. Indented code blocks are handled by setting the end position every time non-blank line is added to the block.
- Set the end position precisely - Check list tightness by comparing line numbers - Remove `LAST_LINE_BLANK` flag See also commonmark/commonmark.js#269 . Classification of end positions: - The end of the current line: - Thematic breaks - ATX headings - Setext headings - Fenced code blocks closed explicitly - HTML blocks (`pre`, comments, and others) - The end of the previous line: - Fenced code blocks closed by the end of the parent or EOF - HTML blocks (`div` and others) - HTML blocks closed by the end of the parent or EOF - Paragraphs - Block quotes - Empty list items - The end position of the last child: - Non-empty list items - Lists - The end position of the last non-blank line: - Indented code blocks The first two cases are handed by `finalize` and `closed_explicitly` flag. Non empty list items and lists are handled by `switch` statements in `finalize`. Indented code blocks are handled by setting the end position every time non-blank line is added to the block.
- Set the end position precisely - Check list tightness by comparing line numbers - Remove `LAST_LINE_BLANK` flag See also commonmark/commonmark.js#269 . Classification of end positions: - The end of the current line: - Thematic breaks - ATX headings - Setext headings - Fenced code blocks closed explicitly - HTML blocks (`pre`, comments, and others) - The end of the previous line: - Fenced code blocks closed by the end of the parent or EOF - HTML blocks (`div` and others) - HTML blocks closed by the end of the parent or EOF - Paragraphs - Block quotes - Empty list items - The end position of the last child: - Non-empty list items - Lists - The end position of the last non-blank line: - Indented code blocks The first two cases are handed by `finalize` and `closed_explicitly` flag. Non empty list items and lists are handled by `switch` statements in `finalize`. Indented code blocks are handled by setting the end position every time non-blank line is added to the block.
- Set the end position precisely - Check list tightness by comparing line numbers - Remove `LAST_LINE_BLANK` flag See also commonmark/commonmark.js#269 . Classification of end positions: - The end of the current line: - Thematic breaks - ATX headings - Setext headings - Fenced code blocks closed explicitly - HTML blocks (`pre`, comments, and others) - The end of the previous line: - Fenced code blocks closed by the end of the parent or EOF - HTML blocks (`div` and others) - HTML blocks closed by the end of the parent or EOF - Paragraphs - Block quotes - Empty list items - The end position of the last child: - Non-empty list items - Lists - The end position of the last non-blank line: - Indented code blocks The first two cases are handed by `finalize` and `closed_explicitly` flag. Non empty list items and lists are handled by `switch` statements in `finalize`. Indented code blocks are handled by setting the end position every time non-blank line is added to the block.
- Set the end position precisely - Check list tightness by comparing line numbers - Remove `LAST_LINE_BLANK` flag See also commonmark/commonmark.js#269 . Classification of end positions: - The end of the current line: - Thematic breaks - ATX headings - Setext headings - Fenced code blocks closed explicitly - HTML blocks (`pre`, comments, and others) - The end of the previous line: - Fenced code blocks closed by the end of the parent or EOF - HTML blocks (`div` and others) - HTML blocks closed by the end of the parent or EOF - Paragraphs - Block quotes - Empty list items - The end position of the last child: - Non-empty list items - Lists - The end position of the last non-blank line: - Indented code blocks The first two cases are handed by `finalize` and `closed_explicitly` flag. Non empty list items and lists are handled in `switch` statements in `finalize`. Indented code blocks are handled by setting the end position every time non-blank line is added to the block.
- Set the end position precisely - Check list tightness by comparing line numbers - Remove `LAST_LINE_BLANK` flag See also commonmark/commonmark.js#269 . Classification of end positions: - The end of the current line: - Thematic breaks - ATX headings - Setext headings - Fenced code blocks closed explicitly - HTML blocks (`pre`, comments, and others) - The end of the previous line: - Fenced code blocks closed by the end of the parent or EOF - HTML blocks (`div` and others) - HTML blocks closed by the end of the parent or EOF - Paragraphs - Block quotes - Empty list items - The end position of the last child: - Non-empty list items - Lists - The end position of the last non-blank line: - Indented code blocks The first two cases are handed by `finalize` and `closed_explicitly` flag. Non empty list items and lists are handled in `switch` statements in `finalize`. Indented code blocks are handled by setting the end position every time non-blank line is added to the block.
According to the specification, blank lines in a block quote doesn't separate list items:
https://spec.commonmark.org/0.30/#example-320
Therefore, the following example should be tight:
The specification also say that link reference definitions can be children of list items when checking list tightness: https://spec.commonmark.org/0.30/#example-317
Therefore, the following example should be loose:
This commit fixes those problems with the following strategy:
Using source end position and start position of adjoining elements to check tightness.
This requires adjusting source end position of some block types to exclude trailing blank lines. This introduces an incompatible change to
sourcepos
property ofNode
. If this cannot be acceptable, I will add a field toNode
to hold source position for list tightness.Delaying removal of link reference definitions until the entire document is parsed.
Re: CONTRIBUTING.md
master
:This branch: