Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(align): blockwise align with wide and combining characters #1214

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

Diomendius
Copy link
Contributor

@Diomendius Diomendius commented Sep 12, 2024

Blockwise align is currently broken when lines contain multi-cell characters such as:

  • Tab
  • CJK characters
  • Control codes (Vim represents these as ^*)
  • Some invisible Unicode code points (Vim represents these as <xxxx>)

Single display cells containing multiple Unicode code points also cause problems, such as when é is decomposed to e<U+0301 COMBINING ACUTE ACCENT>.

This is mainly caused by passing virtual column positions to strcharpart() and byteidx() as if they were character indices.

When the leftmost selection corner sits directly on a wide character, only the last virtual column the character occupies is included in the selection, as virtcol() is called without the list argument; without this argument, virtcol() only returns the position of the rightmost column containing the character.

Exclusive block selections also aren't handled correctly, as the right vcol is always shifted left. An exclusive block selection extends from and including the corner with the lowest byte offset, to the left edge of the corner with the highest. This means that the selected region is the same as for an inclusive selection unless the corner with the highest byte offset is at least one virtual column to the right of the other corner.

This PR fixes all these issues and adds regression tests. I've added T['Align']['all modes'] and T['Align']['blockwise mode'] sets, which many of the existing tests could be adapted to use. This should be applicable to any tests which operate over the entire text, as all modes should behave identically in this case.

One issue I encountered which this PR does not fix is that tabs not at the start of the line are aligned incorrectly. I don't think this is worth fixing as using tabs in the middle of lines is a terrible idea in general, and it's not obvious what the correct behaviour should even be.

This affects ['Align with preview']['works']. The operator blockwise and
visual blockwise forms of this test should affect the text in the same
way, but the reference screenshots didn't reflect this.

The old screenshots showed the result of an erroneously exclusive
gA<C-v>`a motion. If d<C-v>`a is used instead in the same situation, the
characters after the first _ (in the same column as the 'a mark) are
deleted.

Forced-blockwise motions behave the same as visual-blockwise motions,
except the cursor is placed at the start of the affected region. This
means the motion is inclusive or exclusive depending on the 'selection'
option.
@echasnovski
Copy link
Owner

Thanks for the PR!

Code change at glance looks a bit complicated, maybe even to the point of opting for not supporting these cases. And definitely uses functionality present only in Neovim>=0.10.

Currently there is a "code freeze" period before the next 0.14.0 release, so this has to wait a bit. After that I'll have a closer look.

@Diomendius
Copy link
Contributor Author

It looks like the list arg to virtcol() requires Nvim 0.10. It should be possible to implement a workaround for older versions, probably by grabbing text from the buffer and using strdisplaywidth(). I'll implement a fix for older versions soon.

To my eye, the new implementation looks about as complex as the old implementation. The current handling of virtual columns causes text to be deleted or duplicated, and causes a 'start' is higher than 'end' error in nvim_buf_set_text() when the selected portion of a line starts with a multi-cell character and ends on or beyond the last character in the line, so keeping this behaviour is not desirable.

You can run the tests at c2078cb to see the problems with the current implementation.

Notable test failures @ c2078cb

These also fail in the same way for Normal-block mode.

FAIL in tests/test_align.lua | Align | all modes | works with wide characters + args { "Visual-block" }:
Failed expectation for equality.
Left:  { "ыыф  фццф", "ы фのфц фыфのфцф" }
Right: { "ыыф  фццф", "ы фのфц ф" }
Traceback:
    tests/test_align.lua:1163
    tests/test_align.lua:1314
FAIL in tests/test_align.lua | Align | blockwise mode | correctly handles virtual column bounds + args { "Visual-block" }: tests/test_align.lua:1162: E5108: Error executing lua: vim/_editor.lua:0: nvim_exec2(): Vim(lua):E5108: Error executing lua [string ":lua"]:1: 'start' is higher than 'end'
stack traceback:
    [C]: in function 'nvim_buf_set_text'
    [string ":lua"]:1: in main chunk
    [C]: in function 'nvim_exec2'
    vim/_editor.lua: in function 'cmd'
    .../.local/share/nvim/lazy/mini.nvim/lua/mini/align.lua:1902: in function 'set_text'
    .../.local/share/nvim/lazy/mini.nvim/lua/mini/align.lua:1765: in function 'region_set_text'
    .../.local/share/nvim/lazy/mini.nvim/lua/mini/align.lua:1676: in function 'process_current_region'
    .../.local/share/nvim/lazy/mini.nvim/lua/mini/align.lua:743: in function 'align_user'
    .../.local/share/nvim/lazy/mini.nvim/lua/mini/align.lua:1360: in function <.../.local/share/nvim/lazy/mini.nvim/lua/mini/align.lua:1353>
stack traceback:
    [C]: in function 'nvim_exec2'
    vim/_editor.lua: in function 'cmd'
    .../.local/share/nvim/lazy/mini.nvim/lua/mini/align.lua:1902: in function 'set_text'
    .../.local/share/nvim/lazy/mini.nvim/lua/mini/align.lua:1765: in function 'region_set_text'
    .../.local/share/nvim/lazy/mini.nvim/lua/mini/align.lua:1676: in function 'process_current_region'
    .../.local/share/nvim/lazy/mini.nvim/lua/mini/align.lua:743: in function 'align_user'
    .../.local/share/nvim/lazy/mini.nvim/lua/mini/align.lua:1360: in function <.../.local/share/nvim/lazy/mini.nvim/lua/mini/align.lua:1353>
Traceback:
    tests/test_align.lua:1162
    tests/test_align.lua:1345

When the blockwise selection is in to-EOL mode (when the last horizontal
motion extends to EOL, like $) the selection on each line extends to EOL
regardless of the position of the rightmost selection corner. With
'virtualedit', the selection extends to the column one character past
the end of the longest line in the region.

getcurpos() seems to be the only way to detect if the blockwise
selection is in to-EOL mode, as the last field (curswant) is v:maxcol if
the cursor is in to-EOL mode, which is normally true when the active
blockwise selection is to-EOL. Swapping to one of the left corners with
o or O deactivates to-EOL mode, so this isn't a problem. This is fine
for visual mode, but requires an awkward workaround in operator mode:
    1. Save cursor position
    2. Restore visual selection
    3. Call getcurpos()
    4. Exit visual mode
    5. Restore cursor position

There is one exception where the cursor position doesn't match the
to-EOL-ness of the selection, which is after the '> mark has been
manually set, such as via m>.

See vim/vim#15587
Subtracts 1 character column not virtual column when adjusting for
exclusive selection.

Passes list = true to virtcol() to get both the left and right virtcols
for wide characters.

Renames region_virtcols() to block_region_virtcols() as it currently
isn't used for other modes and needs slightly different logic to handle
exclusive non-blockwise regions.
Adds a fallback for v:maxcol for < 0.9.0. Adds workaround for getting
both left and right virtcol bounds in pos_to_virtcol() for < 0.10.0 as
the list flag for virtcol() requires 0.10.0.

Also replaces next_char_boundary() with byte_range_from_virtcols(). This
allows both call sites to be cleaner, and the new implementation is
robust against virtcol2col() returning different byte positions for
multibyte characters on different versions. On 0.10.0 and later
virtcol2col() returns the index of the first byte of the character at
the specified position, but it returns the index of the last byte on
older versions. It would actually save a lot of hassle if it were
possible to get both these values, as the weird workaround with
byteidx() and charidx() wouldn't be necessary.
@echasnovski
Copy link
Owner

@Diomendius, thanks for the work! I'll take it from here after the next release of 'mini.nvim'.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants