Skip to content

Conversation

Diomendius
Copy link
Contributor

@Diomendius Diomendius commented Sep 12, 2024

Blockwise align is currently broken when lines contain multi-cell characters such as:

  • Tab
  • CJK characters
  • Control codes (Vim represents these as ^*)
  • Some invisible Unicode code points (Vim represents these as <xxxx>)

Single display cells containing multiple Unicode code points also cause problems, such as when é is decomposed to e<U+0301 COMBINING ACUTE ACCENT>.

This is mainly caused by passing virtual column positions to strcharpart() and byteidx() as if they were character indices.

When the leftmost selection corner sits directly on a wide character, only the last virtual column the character occupies is included in the selection, as virtcol() is called without the list argument; without this argument, virtcol() only returns the position of the rightmost column containing the character.

Exclusive block selections also aren't handled correctly, as the right vcol is always shifted left. An exclusive block selection extends from and including the corner with the lowest byte offset, to the left edge of the corner with the highest. This means that the selected region is the same as for an inclusive selection unless the corner with the highest byte offset is at least one virtual column to the right of the other corner.

This PR fixes all these issues and adds regression tests. I've added T['Align']['all modes'] and T['Align']['blockwise mode'] sets, which many of the existing tests could be adapted to use. This should be applicable to any tests which operate over the entire text, as all modes should behave identically in this case.

One issue I encountered which this PR does not fix is that tabs not at the start of the line are aligned incorrectly. I don't think this is worth fixing as using tabs in the middle of lines is a terrible idea in general, and it's not obvious what the correct behaviour should even be.

This affects ['Align with preview']['works']. The operator blockwise and
visual blockwise forms of this test should affect the text in the same
way, but the reference screenshots didn't reflect this.

The old screenshots showed the result of an erroneously exclusive
gA<C-v>`a motion. If d<C-v>`a is used instead in the same situation, the
characters after the first _ (in the same column as the 'a mark) are
deleted.

Forced-blockwise motions behave the same as visual-blockwise motions,
except the cursor is placed at the start of the affected region. This
means the motion is inclusive or exclusive depending on the 'selection'
option.
@echasnovski
Copy link
Member

Thanks for the PR!

Code change at glance looks a bit complicated, maybe even to the point of opting for not supporting these cases. And definitely uses functionality present only in Neovim>=0.10.

Currently there is a "code freeze" period before the next 0.14.0 release, so this has to wait a bit. After that I'll have a closer look.

@Diomendius
Copy link
Contributor Author

It looks like the list arg to virtcol() requires Nvim 0.10. It should be possible to implement a workaround for older versions, probably by grabbing text from the buffer and using strdisplaywidth(). I'll implement a fix for older versions soon.

To my eye, the new implementation looks about as complex as the old implementation. The current handling of virtual columns causes text to be deleted or duplicated, and causes a 'start' is higher than 'end' error in nvim_buf_set_text() when the selected portion of a line starts with a multi-cell character and ends on or beyond the last character in the line, so keeping this behaviour is not desirable.

You can run the tests at c2078cb to see the problems with the current implementation.

Notable test failures @ c2078cb

These also fail in the same way for Normal-block mode.

FAIL in tests/test_align.lua | Align | all modes | works with wide characters + args { "Visual-block" }:
Failed expectation for equality.
Left:  { "ыыф  фццф", "ы фのфц фыфのфцф" }
Right: { "ыыф  фццф", "ы фのфц ф" }
Traceback:
    tests/test_align.lua:1163
    tests/test_align.lua:1314
FAIL in tests/test_align.lua | Align | blockwise mode | correctly handles virtual column bounds + args { "Visual-block" }: tests/test_align.lua:1162: E5108: Error executing lua: vim/_editor.lua:0: nvim_exec2(): Vim(lua):E5108: Error executing lua [string ":lua"]:1: 'start' is higher than 'end'
stack traceback:
    [C]: in function 'nvim_buf_set_text'
    [string ":lua"]:1: in main chunk
    [C]: in function 'nvim_exec2'
    vim/_editor.lua: in function 'cmd'
    .../.local/share/nvim/lazy/mini.nvim/lua/mini/align.lua:1902: in function 'set_text'
    .../.local/share/nvim/lazy/mini.nvim/lua/mini/align.lua:1765: in function 'region_set_text'
    .../.local/share/nvim/lazy/mini.nvim/lua/mini/align.lua:1676: in function 'process_current_region'
    .../.local/share/nvim/lazy/mini.nvim/lua/mini/align.lua:743: in function 'align_user'
    .../.local/share/nvim/lazy/mini.nvim/lua/mini/align.lua:1360: in function <.../.local/share/nvim/lazy/mini.nvim/lua/mini/align.lua:1353>
stack traceback:
    [C]: in function 'nvim_exec2'
    vim/_editor.lua: in function 'cmd'
    .../.local/share/nvim/lazy/mini.nvim/lua/mini/align.lua:1902: in function 'set_text'
    .../.local/share/nvim/lazy/mini.nvim/lua/mini/align.lua:1765: in function 'region_set_text'
    .../.local/share/nvim/lazy/mini.nvim/lua/mini/align.lua:1676: in function 'process_current_region'
    .../.local/share/nvim/lazy/mini.nvim/lua/mini/align.lua:743: in function 'align_user'
    .../.local/share/nvim/lazy/mini.nvim/lua/mini/align.lua:1360: in function <.../.local/share/nvim/lazy/mini.nvim/lua/mini/align.lua:1353>
Traceback:
    tests/test_align.lua:1162
    tests/test_align.lua:1345

When the blockwise selection is in to-EOL mode (when the last horizontal
motion extends to EOL, like $) the selection on each line extends to EOL
regardless of the position of the rightmost selection corner. With
'virtualedit', the selection extends to the column one character past
the end of the longest line in the region.

getcurpos() seems to be the only way to detect if the blockwise
selection is in to-EOL mode, as the last field (curswant) is v:maxcol if
the cursor is in to-EOL mode, which is normally true when the active
blockwise selection is to-EOL. Swapping to one of the left corners with
o or O deactivates to-EOL mode, so this isn't a problem. This is fine
for visual mode, but requires an awkward workaround in operator mode:
    1. Save cursor position
    2. Restore visual selection
    3. Call getcurpos()
    4. Exit visual mode
    5. Restore cursor position

There is one exception where the cursor position doesn't match the
to-EOL-ness of the selection, which is after the '> mark has been
manually set, such as via m>.

See vim/vim#15587
Subtracts 1 character column not virtual column when adjusting for
exclusive selection.

Passes list = true to virtcol() to get both the left and right virtcols
for wide characters.

Renames region_virtcols() to block_region_virtcols() as it currently
isn't used for other modes and needs slightly different logic to handle
exclusive non-blockwise regions.
Adds a fallback for v:maxcol for < 0.9.0. Adds workaround for getting
both left and right virtcol bounds in pos_to_virtcol() for < 0.10.0 as
the list flag for virtcol() requires 0.10.0.

Also replaces next_char_boundary() with byte_range_from_virtcols(). This
allows both call sites to be cleaner, and the new implementation is
robust against virtcol2col() returning different byte positions for
multibyte characters on different versions. On 0.10.0 and later
virtcol2col() returns the index of the first byte of the character at
the specified position, but it returns the index of the last byte on
older versions. It would actually save a lot of hassle if it were
possible to get both these values, as the weird workaround with
byteidx() and charidx() wouldn't be necessary.
@echasnovski
Copy link
Member

@Diomendius, thanks for the work! I'll take it from here after the next release of 'mini.nvim'.

echasnovski added a commit that referenced this pull request Dec 29, 2024
@echasnovski
Copy link
Member

@Diomendius, thanks again for this huge amount of meticulous work in the PR!

After reading into this more closely, here is the final result of what I think is reasonable to have:

  • The main usage of virtcol2col() with adjustment to make 'mini.align' work in presence of wide and composing characters is reasonable after some refactor. The main goal was to have it fixed with as simple code as possible. I refactored it to use the same code snippet for both region_get_text and region_set_text (in blockwise mode) plus small adjustments depending on Neovim version.

    Tests for these are a bit over the top in this PR, so I decided to keep only necessary minimum.

  • Supporting 'ambiwidth=double' will probably be out of scope for the whole 'mini.nvim'. Because having it enabled kind of breaks most of modules altogether (like usage of icons, listchars/fillchars, etc.). Plus it seems to be mostly about visual representation of the text, which I don't think is a good thing to depend on when aligning text.

  • Supporting "to EOL" blockwise selection although good, actually doing so seems too hacky and not clean. Plus this type of selection also is not supported in 'mini.surround' and 'mini.operators' (maybe others also) so is better for consistency.

    The suggested way to approach "to EOL" blockwise selection is to have 'virtualedit=block' and selecting manually the exact rectangle. Yes, this is not ideal, but will have to be enough. At least for while until there is no pressing issues with modules other than supporting that across the whole 'mini.nvim'.

  • Supporting the case of "all edges with selection=exclusive" also seems to not worth the extra code complexity. The suggested approach is to select across topleft-bottomright diagonal which should work.

In the future I highly suggest first opening an issue to discuss the problem before making the PR. This will most certainly save time for both parties.

@Diomendius
Copy link
Contributor Author

I skimmed your comment and commit when you made them, but didn't reply at the time. Thanks for what you've done here. I don't blame you for omitting support for ambiwidth or the edge cases of blockwise exclusive selection, and not just because I don't use them myself.

I happened to revisit your comment today and noticed your comments on workarounds for blockwise to-EOL selection. Using virtualedit=block to explicitly select a region past EOL seems to work fine with mini.align, but with mini.surround it actually doesn't work correctly, while blockwise to-EOL selection does, even if you didn't intend it to.

mini.surround seems to effectively ignore virtualedit by clamping both the start and end columns to the EOL column (one past the last character) of their respective lines. There are also some edge cases around wide characters, but nothing major and nothing as bad as the problems in this issue. I don't know if or when I'll make a detailed issue about this, but I might as well mention this in case someone finds it useful.

I think what this really highlights is the deficiencies in the Nvim API regarding selections. Having a single representation of a visual selection that unambiguously indicates which cells are included, regardless of selection order or selection=exclusive or anything else, would be a huge benefit to almost every plugin that needs to handle selections in any way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants