Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange character sequences (ANSI escape sequences?) leaking into shell prompt when starting/attaching Tmux #2275

Open
3 tasks done
primeos-work opened this issue Sep 25, 2024 · 15 comments

Comments

@primeos-work
Copy link

primeos-work commented Sep 25, 2024

Prerequisites

  • Write a descriptive title.
  • Make sure you are able to repro it on the latest version
  • Search the existing issues.

Steps to reproduce

Connect to a Linux/Unix host via ssh and attach to a tmux session. Strangely the results seem to differ based on the target OS, e.g.:

  • On an EoL CentOS 7 and a Fedora Linux 40 system I reliably get 61;6;7;21;22;23;24;28;32;42c (basically every time)
  • On a Debian 12 host I get 0;10;1c but only on every third/fourth attempt or so

Expected behavior

I can attach to my Tmux session and it looks exactly as I left it.
This usually means that the last line of the active Tmux window/pane contains a empty shell prompt like this:

[michael@groot ~]$

Actual behavior

I get some additional random characters ("garbage") behind the prompt, e.g.:

[michael@groot ~]$ 61;6;7;21;22;23;24;28;32;42c

Error details

No response

Environment data

I am using the Windows Terminal and it doesn't matter if I uses the "Windows PowerShell", "Command Prompt", or "Git Bash". When I use "Git Bash" via "Windows Terminal" everything works as expected with /usr/bin/ssh (OpenSSH_9.7p1, OpenSSL 3.2.1 30 Jan 2024) but not with /c/Program\ Files/OpenSSH/ssh (OpenSSH_for_Windows_9.5p1, LibreSSL 3.8.2) which is why I came to the conclusion that this seems to be a OpenSSH for Windows bug. The second indication that this seems to be an OpenSSH for Windows bug is that I didn't get those unexpected character sequence with SSH_TERM_CONHOST_PARSER=0 (but that also makes the rendering much much slower and I get significant visual glitches like a green background color (probably due to the status line background color from tmux)).

Anyway, here is the desired output but I don't expect it to be relevant in this case:

Name                           Value
----                           -----
PSVersion                      5.1.19041.4894
PSEdition                      Desktop
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0...}
BuildVersion                   10.0.19041.4894
CLRVersion                     4.0.30319.42000
WSManStackVersion              3.0
PSRemotingProtocolVersion      2.3
SerializationVersion           1.1.0.1

Version

OpenSSH_for_Windows_9.5p1, LibreSSL 3.8.2

Visuals

No response

@primeos-work
Copy link
Author

primeos-work commented Sep 25, 2024

Please also note that there are already a lot of related issues/reports but not in this repository.
I tried the following workarounds/solutions but none of them worked for me:

  • set -sg escape-time 1 in ~/.tmux.conf
  • I don't use any Tmux plugins (so I'm, e.g., not affected by tmux-sensible overriding escape-time)

Some links to the reports that I found:

Other relevant links:

So my conclusions are the following:

  • This affects lots of people and the real source of the issue is still unknown
  • Only users on Windows seem to be affected by it
  • Weirdly the set -sg escape-time 1 workaround seems to resolve the Tmux issues for most users but not for me

@yoctozepto
Copy link

yoctozepto commented Sep 26, 2024

On OpenSSH_for_Windows_9.8 Win32-OpenSSH-GitHubp1, LibreSSL 3.8.2 as built from PowerShell/openssh-portable@414d853 it happens almost every time tmux is run and no workaround helps. But it happens only in Windows Terminal. The legacy command host is not affected so it's most likely an issue of the Windows Terminal. However, it also does not happen with WSL2-based ssh even though it's still in Windows Terminal.

@primeos-work
Copy link
Author

primeos-work commented Sep 27, 2024

Thanks for the additional data @yoctozepto! :)

I think it's the combination of the terminal and OpenSSH for Windows but that the actual bug is in OpenSSH for Windows and the terminal only matters as it will affect the response to tmux's ESC [ c (\033[c) "Request terminal attributes" (if https://serverfault.com/questions/1130064/why-is-the-windows-11-terminal-pushing-escape-sequences-to-tmux is correct).

I did run two additional tests yesterday:

To be sure I did another test today via Ubuntu 24.04 LTS in WSL 2: I installed and started openssh-server in the Ubuntu VM/container, started a Tmux session and then did compare a "direct" terminal with an SSH connection:

  • When I open a "direct" terminal to the Ubuntu system by selecting the "Ubuntu 24.04 LTS" profile on Windows Terminal I can attach to the Tmux session without triggering the bug (I didn't manage to get any "strange characters" even once)
  • When I use the "Windows PowerShell" profile and OpenSSH for Windows (ssh $USER@localhost) I consistently get 61;6;7;14;21;22;23;24;28;32;42c0;10;1c every time.
    • With $env:SSH_TERM_CONHOST_PARSER = '0'; ssh localhost, the bug/problem goes away completely (although I get lot's of visual glitches / rendering issues instead...).

PS: I can also trigger the bug with echo instead of tmux like this (or echo -e '\e[c', etc.):

mweiss@groot ~ $ echo -e '\033[c'

mweiss@groot ~ $ 1;2c

@yoctozepto
Copy link

OK, so it truly is the Win32-OpenSSH - that is the common denominator here. It needs certain conditions to happen but it does not happen without it.

@mgkuhn
Copy link

mgkuhn commented Nov 6, 2024

Sounds like a tmux bug to me. It is normal and expected behaviour that sending to a VT100-style terminal a device-attributes request of the form ESC [ c or ESC [ 0 c will cause the terminal to respond with a sequence of the form ESC [ ? number ; number ; ... ; number c.

If an application like tmux sends such a request, it has to wait for and parse the entire response, and not cause it to echo back to the terminal for the user to see. This waiting and parsing by tmux doesn't seem to happen correctly here. The question is why tmux doesn't swallow the response.

On my old Linux xterm, I get back

ESC [ ?64;1;2;6;9;15;16;17;18;21;22;28c

On Windows Terminal on current Windows 11, I get instead

ESC [ ?61;6;7;14;21;22;23;24;28;32;42c

Different numbers, but still correct syntax that tmux should swallow.

I don't think this is anything that OpenSSH gets involved in. Instead, I suspect that the VT100 terminal emulation built into Windows Terminal (or ConPTY?) is returning something that tmux didn't expect, even though it looks syntactically correct to me. Check in the source code of tmux the state machine that is meant to process the reply to that request.

@mgkuhn
Copy link

mgkuhn commented Nov 6, 2024

PS: I can also trigger the bug with echo instead of tmux like this (or echo -e '\e[c', etc.)

@primeos-work That is expected behaviour and not a bug. (Or you could say it's a bug in your use of the echo command ... ;-) Your echo command asked the terminal to reply to a device-attributes request and then finished without reading the reply. So the reply now goes to the shell, which didn't expect it and treats it like user keyboard entry.

@yoctozepto
Copy link

Since the issue seems random, could it be that the speed at which these characters are produced/transferred is causing it? As in, tmux does not wait long enough for the consumption.

@mgkuhn
Copy link

mgkuhn commented Nov 7, 2024

Since the issue seems random, could it be that the speed at which these characters are produced/transferred is causing it? As in, tmux does not wait long enough for the consumption.

Yes, that seems likely, and would explain reports that increasing the tmux configuration parameter escape-time alleviates the problem for many users.

Looking at tmux/tty-keys.c, the unit for escape-time is milliseconds and looking at tmux/options-table.c the default is currently 10 ms (in tmux-3.5, down from previously 500 ms in tmux-3.4 after this commit by @nicm). I would increase this to at least 100, or five times the top-quartile ping time to the server, whichever is higher.

Looking at Nagle's algorithm, I suspect escape-time should be a small multiple of the round-trip time, as TCP may wait with sending the rest of a device-attribute response until it has received the TCP ACK for the start of the response, in case the response is split across two IP packets.

There may be a conflict of interest with vi users who may have to wait for escape-time for their ESC key presses to be passed through to their favourite editor. One solution is perhaps to temporarily increase that wait time back to 500 ms after tmux has sent out its device-attributes request?

@mgkuhn
Copy link

mgkuhn commented Nov 7, 2024

One thing Windows Terminal and ConPTY folks might want to check is if their device-attributes response is really being sent out via a single write() system call (as ESC sequences always should be). If not, splitting such a response across several system calls might risk substantially increasing the chances that the response will end up spread over several IP packets that do not arrive within the 10 ms patience time that tmux-3.5 allows before it concludes that this might have been a manual press of the ESC key and not a response sequence.

@mgkuhn
Copy link

mgkuhn commented Nov 7, 2024

Looking in Windows Terminal at terminal/src/terminal/adaptDispatch.cpp:AdaptDispatch::DeviceAttributes() and _ReturnCsiResponse we can see where the device-attributes response of Windows Terminal comes from. It is one wide string L"?61;4;6;7;14;21;22;23;24;28;32;42c", except for the initial ESC [, which all may have to go through a UTF-8 encoder, and thus might end up in separate write() system calls and separate TCP packets?

@nicm
Copy link

nicm commented Nov 7, 2024

Maybe try something like this, although last time I was asked about anything like this it turned out the terminal was sending the response twice, so you would want to be sure that is not happening (tmux logs or strace tmux is the easiest way):

--- tty-keys.c  4 Oct 2024 14:55:17 -0000       1.182
+++ tty-keys.c  7 Nov 2024 14:44:35 -0000
@@ -917,9 +917,13 @@ partial_key:
        }

        /* Get the time period. */
-       delay = options_get_number(global_options, "escape-time");
-       if (delay == 0)
-               delay = 1;
+       if ((tty->flags & TTY_ALL_REQUEST_FLAGS) != TTY_ALL_REQUEST_FLAGS)
+               delay = 500;
+       else {
+               delay = options_get_number(global_options, "escape-time");
+               if (delay == 0)
+                       delay = 1;
+       }
        tv.tv_sec = delay / 1000;
        tv.tv_usec = (delay % 1000) * 1000L;

@mgkuhn
Copy link

mgkuhn commented Nov 7, 2024

@nicm Thanks for the suggestion. I would probably modify it to more something like

delay = options_get_number(global_options, "escape-time");
if (delay == 0)
        delay = 1;
if (delay < 500 && (tty->flags & TTY_ALL_REQUEST_FLAGS) != TTY_ALL_REQUEST_FLAGS)
        delay = 500;

such that you don't override an even higher user preference for that delay.

@yoctozepto
Copy link

I do still wonder, however, why this does not trigger when WSL2 SSH is run in Windows Terminal. Maybe only then do the bytes get sent in different packets or buffered too long? Could it have to do with Windows networking? Especially how it runs (or doesn't run) the Nagle's algorithm?

@mgkuhn
Copy link

mgkuhn commented Nov 7, 2024

@yoctozepto If you run ssh in WSL2, you use the Linux-kernel implementation of TCP, which does all kinds of very clever things, like the tcp_autocorking function that “tries to coalesce small writes (from consecutive write(2) and sendmsg(2) calls) as much as possible, in order to decrease the total number of sent packets” (see man tcp). May be that helps the entire Windows Terminal device-attribute response to stay in one packet?

I can't find anything similar mentioned among the Windows TCP features.

(If you are really curious, try disabling TCP autocorking in WSL2 with echo 0 >/proc/sys/net/ipv4/tcp_autocorking and see if this makes the problem more likely.)

@yoctozepto
Copy link

@yoctozepto If you run ssh in WSL2, you use the Linux-kernel implementation of TCP, which does all kinds of very clever things,

Yup, this is why I am suggesting it might be the Windows vs Linux networking subsystem that causes the discrepancy.

like the tcp_autocorking function that “tries to coalesce small writes (from consecutive write(2) and sendmsg(2) calls) as much as possible, in order to decrease the total number of sent packets” (see man tcp). May be that helps the entire Windows Terminal device-attribute response to stay in one packet?

I can't find anything similar mentioned among the Windows TCP features.

(If you are really curious, try disabling TCP autocorking in WSL2 with echo 0 >/proc/sys/net/ipv4/tcp_autocorking and see if this makes the problem more likely.)

Checked - but it has no effect on the outcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants