fix(align): blockwise align with wide and combining characters #1214

Diomendius · 2024-09-12T09:34:14Z

Blockwise align is currently broken when lines contain multi-cell characters such as:

Tab
CJK characters
Control codes (Vim represents these as ^*)
Some invisible Unicode code points (Vim represents these as <xxxx>)

Single display cells containing multiple Unicode code points also cause problems, such as when é is decomposed to e<U+0301 COMBINING ACUTE ACCENT>.

This is mainly caused by passing virtual column positions to strcharpart() and byteidx() as if they were character indices.

When the leftmost selection corner sits directly on a wide character, only the last virtual column the character occupies is included in the selection, as virtcol() is called without the list argument; without this argument, virtcol() only returns the position of the rightmost column containing the character.

Exclusive block selections also aren't handled correctly, as the right vcol is always shifted left. An exclusive block selection extends from and including the corner with the lowest byte offset, to the left edge of the corner with the highest. This means that the selected region is the same as for an inclusive selection unless the corner with the highest byte offset is at least one virtual column to the right of the other corner.

This PR fixes all these issues and adds regression tests. I've added T['Align']['all modes'] and T['Align']['blockwise mode'] sets, which many of the existing tests could be adapted to use. This should be applicable to any tests which operate over the entire text, as all modes should behave identically in this case.

One issue I encountered which this PR does not fix is that tabs not at the start of the line are aligned incorrectly. I don't think this is worth fixing as using tabs in the middle of lines is a terrible idea in general, and it's not obvious what the correct behaviour should even be.

I have read CONTRIBUTING.md
I have read CODE_OF_CONDUCT.md

This affects ['Align with preview']['works']. The operator blockwise and visual blockwise forms of this test should affect the text in the same way, but the reference screenshots didn't reflect this. The old screenshots showed the result of an erroneously exclusive gA<C-v>`a motion. If d<C-v>`a is used instead in the same situation, the characters after the first _ (in the same column as the 'a mark) are deleted. Forced-blockwise motions behave the same as visual-blockwise motions, except the cursor is placed at the start of the affected region. This means the motion is inclusive or exclusive depending on the 'selection' option.

echasnovski · 2024-09-12T10:03:46Z

Thanks for the PR!

Code change at glance looks a bit complicated, maybe even to the point of opting for not supporting these cases. And definitely uses functionality present only in Neovim>=0.10.

Currently there is a "code freeze" period before the next 0.14.0 release, so this has to wait a bit. After that I'll have a closer look.

Diomendius · 2024-09-12T10:57:55Z

It looks like the list arg to virtcol() requires Nvim 0.10. It should be possible to implement a workaround for older versions, probably by grabbing text from the buffer and using strdisplaywidth(). I'll implement a fix for older versions soon.

To my eye, the new implementation looks about as complex as the old implementation. The current handling of virtual columns causes text to be deleted or duplicated, and causes a 'start' is higher than 'end' error in nvim_buf_set_text() when the selected portion of a line starts with a multi-cell character and ends on or beyond the last character in the line, so keeping this behaviour is not desirable.

You can run the tests at c2078cb to see the problems with the current implementation.

Notable test failures @ c2078cb

These also fail in the same way for Normal-block mode.

FAIL in tests/test_align.lua | Align | all modes | works with wide characters + args { "Visual-block" }:
Failed expectation for equality.
Left:  { "ыыф  фццф", "ы фのфц фыфのфцф" }
Right: { "ыыф  фццф", "ы фのфц ф" }
Traceback:
    tests/test_align.lua:1163
    tests/test_align.lua:1314

FAIL in tests/test_align.lua | Align | blockwise mode | correctly handles virtual column bounds + args { "Visual-block" }: tests/test_align.lua:1162: E5108: Error executing lua: vim/_editor.lua:0: nvim_exec2(): Vim(lua):E5108: Error executing lua [string ":lua"]:1: 'start' is higher than 'end'
stack traceback:
    [C]: in function 'nvim_buf_set_text'
    [string ":lua"]:1: in main chunk
    [C]: in function 'nvim_exec2'
    vim/_editor.lua: in function 'cmd'
    .../.local/share/nvim/lazy/mini.nvim/lua/mini/align.lua:1902: in function 'set_text'
    .../.local/share/nvim/lazy/mini.nvim/lua/mini/align.lua:1765: in function 'region_set_text'
    .../.local/share/nvim/lazy/mini.nvim/lua/mini/align.lua:1676: in function 'process_current_region'
    .../.local/share/nvim/lazy/mini.nvim/lua/mini/align.lua:743: in function 'align_user'
    .../.local/share/nvim/lazy/mini.nvim/lua/mini/align.lua:1360: in function <.../.local/share/nvim/lazy/mini.nvim/lua/mini/align.lua:1353>
stack traceback:
    [C]: in function 'nvim_exec2'
    vim/_editor.lua: in function 'cmd'
    .../.local/share/nvim/lazy/mini.nvim/lua/mini/align.lua:1902: in function 'set_text'
    .../.local/share/nvim/lazy/mini.nvim/lua/mini/align.lua:1765: in function 'region_set_text'
    .../.local/share/nvim/lazy/mini.nvim/lua/mini/align.lua:1676: in function 'process_current_region'
    .../.local/share/nvim/lazy/mini.nvim/lua/mini/align.lua:743: in function 'align_user'
    .../.local/share/nvim/lazy/mini.nvim/lua/mini/align.lua:1360: in function <.../.local/share/nvim/lazy/mini.nvim/lua/mini/align.lua:1353>
Traceback:
    tests/test_align.lua:1162
    tests/test_align.lua:1345

When the blockwise selection is in to-EOL mode (when the last horizontal motion extends to EOL, like $) the selection on each line extends to EOL regardless of the position of the rightmost selection corner. With 'virtualedit', the selection extends to the column one character past the end of the longest line in the region. getcurpos() seems to be the only way to detect if the blockwise selection is in to-EOL mode, as the last field (curswant) is v:maxcol if the cursor is in to-EOL mode, which is normally true when the active blockwise selection is to-EOL. Swapping to one of the left corners with o or O deactivates to-EOL mode, so this isn't a problem. This is fine for visual mode, but requires an awkward workaround in operator mode: 1. Save cursor position 2. Restore visual selection 3. Call getcurpos() 4. Exit visual mode 5. Restore cursor position There is one exception where the cursor position doesn't match the to-EOL-ness of the selection, which is after the '> mark has been manually set, such as via m>. See vim/vim#15587

Subtracts 1 character column not virtual column when adjusting for exclusive selection. Passes list = true to virtcol() to get both the left and right virtcols for wide characters. Renames region_virtcols() to block_region_virtcols() as it currently isn't used for other modes and needs slightly different logic to handle exclusive non-blockwise regions.

Adds a fallback for v:maxcol for < 0.9.0. Adds workaround for getting both left and right virtcol bounds in pos_to_virtcol() for < 0.10.0 as the list flag for virtcol() requires 0.10.0. Also replaces next_char_boundary() with byte_range_from_virtcols(). This allows both call sites to be cleaner, and the new implementation is robust against virtcol2col() returning different byte positions for multibyte characters on different versions. On 0.10.0 and later virtcol2col() returns the index of the first byte of the character at the specified position, but it returns the index of the last byte on older versions. It would actually save a lot of hassle if it were possible to get both these values, as the weird workaround with byteidx() and charidx() wouldn't be necessary.

echasnovski · 2024-09-13T14:55:07Z

@Diomendius, thanks for the work! I'll take it from here after the next release of 'mini.nvim'.

Diomendius added 2 commits September 12, 2024 20:33

test(align): add wide character and virtcol tests

63cb11d

echasnovski added the mini.align label Sep 12, 2024

Diomendius force-pushed the align_fix branch from 8f576af to 735c0bb Compare September 13, 2024 06:39

Diomendius added 3 commits September 13, 2024 18:51

fix(align): fix blockwise handling of wide chars

c82f7ff

Diomendius force-pushed the align_fix branch from 735c0bb to 64561d4 Compare September 13, 2024 06:52

Diomendius force-pushed the align_fix branch from 64561d4 to ce114e8 Compare September 13, 2024 13:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(align): blockwise align with wide and combining characters #1214

fix(align): blockwise align with wide and combining characters #1214

Diomendius commented Sep 12, 2024 •

edited

Loading

echasnovski commented Sep 12, 2024

Diomendius commented Sep 12, 2024

echasnovski commented Sep 13, 2024

fix(align): blockwise align with wide and combining characters #1214

Are you sure you want to change the base?

fix(align): blockwise align with wide and combining characters #1214

Conversation

Diomendius commented Sep 12, 2024 • edited Loading

echasnovski commented Sep 12, 2024

Diomendius commented Sep 12, 2024

echasnovski commented Sep 13, 2024

Diomendius commented Sep 12, 2024 •

edited

Loading