What is the current status of locale support (particularly `en_US.UTF-8` and `C`)? #23010

akinomyoga · 2025-01-23T10:36:27Z

akinomyoga
Jan 23, 2025

Although the default locale in Termux seems to be en_US.UTF-8, the locale support of Termux appears to be incomplete. Here, I'd like to know the latest information about how it is incomplete (what's available and what's not) and how we could work around issues related to incomplete locale support better. There doesn't seem to be any official documentation about the locale support.

Original issue

Suppose one wants to manipulate bytes in binary data (which does not contain NUL) stored in a shell variable within a Bash script. Usually, one can achieve this by setting LC_CTYPE=C and count the number of bytes by ${#data} or access a byte with ${data:index:1}. However, this doesn't seem to work in Termux. For example, you can see the issue with the following example:

$ (LC_CTYPE=C; a=$'\xE3\x81\x82'; echo "${#a}")
1

Although we expect 3 as the result, Bash returns 1 in Termux. In all the other environments, 3 is obtained as expected.

Past discussions

There is an old discussion from 2020:

Does Termux has the 'C' locale? #5845

The issue asked whether the locale C is available. The answer was that Termux doesn't support locales. However, it would be unclear what happens when no locales are supported. If it were not supported literally at all, many of the basic C APIs would be unavailable (e.g., printf, isalpha, tolower, strftime, etc. all depend on the current locale). Thus, it would be reasonable to think something is assumed for the results of the actions that rely on locale. What is that?

A StackOverflow question from 2021

How to detect that POSIX locale is not provided on POSIX shellscript and POSIX utilities, portablily? - Unix & Linux Stack Exchange

states that

Termux does never serve locale command nor POSIX locale; only en_US.UTF-8 is available

which implies that en_US.UTF-8 would have been introduced between 2020 and 2021.

There is also a comment in a discussion from 2022:

Postgres - add collation #5996 (comment)

The comment says

Bionic libc does not support locales other than UTF-8 or C (POSIX).

which contradicts the first information from 2020. Does this mean that Termux/Bionic introduced a certain support for the locale en_US.UTF-8 and C between 2020 and 2022?

However, as of 2025, the locale C is incomplete as illustrated in the first example. Even for en_US.UTF-8, another issue from 2023 reports that en_US.UTF-8 is unsupported (or not complete enough to pass the tests):

Configure the system locale to please `nvim +checkhealth` within `tmux`: Locale does not support UTF-8 termux-app#3187

Those four statements in past discussions don't seem to be really consistent with each other, so I think some of them (or all) are untrustworthy. If all of them are somewhat correct, I guess it would mean Termux supports neither of en_US.UTF-8 nor C, but an unspecified amalgam of en_US.UTF-8 and C. Or it might be switching back and forth between en_US.UTF-8 and C every single year.

Bionic libc

The third mentioned Bionic libc, so can I assume that Termux packages adopt Bionic as the C standard library? I also tried to look up information in Bionic. However, Bionic doesn't seem to have a place to report an issue or ask questions. Instead, I find the following comment in /libc/bionic/locale.cpp of the Bionic codebase:

// We only support two locales, the "C" locale (also known as "POSIX"),
// and the "C.UTF-8" locale (also known as "en_US.UTF-8").

This seems to imply that Bionic supports both C and en_US.UTF-8 (a synonym of C.UTF-8) separately. This comment has existed at least since 2016, which is inconsistent with the observation above.

I also found a mention on locale in the documentation (boldfaced by me):

Locales. Although bionic contains the various _l() functions, the only locale supported is a UTF-8 C/POSIX locale. Most of the POSIX APIs are insufficient to support the wide range of languages used by Android users, and apps should use icu4c (or do their i18n work in Java) instead.

This part of the documentation seems to have been introduced by commit aosp-mirror/platform_bionic@046fe15, whose commit message says

Explicitly mention bionic's single C.UTF-8 locale.

So it seems to imply that Bionic actually only supports en_US.UTF-8 (a synonym of C.UTF-8). If this is true, it seems to me that the first information in the code comment Bionic's locale.cpp would be wrong. Or the support for the C locale might have been dropped at some point between 2015 and 2022.

I'm not sure which information I should believe. In either case, the behavior is not consistent with the past reports for Termux. Another possibility would be that the upstream Bionic and the Bionic used by Termux are actually different versions. Another possibility would be that Termux only uses Bionic partially, and the locale part might have extensions/modifications.

Timeline

To summarize the timeline, we could make the following table for the locale support:

	Termux	Bionic
2015		`C` and `en_US.UTF-8`
2020	None
2021	`en_US.UTF-8`
2022	`C` or `en_US.UTF-8`	`en_US.UTF-8`
2023	~~`en_US.UTF-8`~~ (broken)
2025	~~`C`~~ (broken)

Every piece of the information is inconsistent, so I'm confused about which information would be really trustworthy, and what would be the relationship between the C library used in Termux packages and the upstream Bionic.

Questions

What is the true situation of the locale support? In particular, I'd like trustworthy and certain answers rather than guesses like in the above contradicting information.
What is the relation between the C library used in Termux and Bionic?
Can I use the C locale (which is separate from C.UTF-8 / en_US.UTF-8)? If not, would it be supported in the future?

algorythmic · 2025-03-13T12:19:21Z

algorythmic
Mar 13, 2025

To address your background question: Bionic libc is the C library (on device) that the packages in this repo link with. The runtime is whatever version is shipped on the user's device, but the system headers used for building the Termux environment (or building code within it) are a modified version of the Android NDK.

Bionic libc does provide reasonably compliant C (alias POSIX) and C.UTF-8 (alias en_US.UTF-8) locales--and only those. However, the setlocale implementation historically had (and still has) significant deficiencies, which Termux tries to deal with, with patches both to the NDK header files and to package code.

While the default locale is the C.UTF-8 one, in Android 7.0 setlocale(..., "") would set it back to C.

Termux somewhat papered over this by:

changing the stdlib.h header to hardcode MB_CUR_MAX to 4 see a328a50
ad-hoc patching setlocale and MB_CUR_MAX usage in packages, e.g. https://github.com/termux/termux-packages/blob/master/packages/sed/fix-locale.patch
probably other stuff I'm not aware of

This issue was later somewhat fixed for Android 8.0 (API 26). Now, setlocale(..., "") sets the locale to C.UTF-8. However, it still does not check the LC_ environment variables. (Sidenote: Bionic doesn't really have separate locale categories--setting any valid category is the same as setting them all--and I suspect this will never be implemented)

Anyway, I have no insight into why Termux took the path above for dealing with this. What I do currently for myself is:

revert the MB_CUR_MAX patch for stdlib.h
add an inline wrapper to locale.h:

/* Implement the algorithm from
   https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap08.html */
static inline const char* __locale_from_env(int cat) {
    static const char * const cat_names[] = {
        "LC_CTYPE", "LC_NUMERIC", "LC_TIME", "LC_COLLATE", "LC_MONETARY", "LC_MESSAGES",
        "LC_ALL", "LC_PAPER", "LC_NAME", "LC_ADDRESS", "LC_TELEPHONE", "LC_MEASUREMENT",
        "LC_IDENTIFICATION" };
    const char *name;

    if (cat < 0 || cat >= sizeof cat_names)
        return "";
    if ((name = getenv("LC_ALL")) && *name)
        return name;
    if (cat != LC_ALL && (name = getenv(cat_names[cat])) && *name)
        return name;
    if ((name = getenv("LANG")) && *name)
        return name;
    return ""; /* Or "C.UTF-8" for Android <= 7 support */
}

static char* _Nullable __termux_repl_setlocale(int __category, const char* _Nullable __locale_name) {
    if (__locale_name && *__locale_name == '\0')
        __locale_name = __locale_from_env(__category);

    return setlocale(__category, __locale_name);
}
#define setlocale __termux_repl_setlocale

I'm glad you started this topic because I'd be very curious to know if there's any obvious issue with my approach, and maybe making a PR to integrate something like that and drop the various workarounds in the packages.

3 replies

robertkirkman Jun 30, 2025
Maintainer

@algorythmic hello,

I wanted to try your solution, so I tried applying it onto Termux and recompiling bash, then installing bash and restarting Termux, but unfortunately, a problem happened where bash has frozen and inputs are not working. I can see input echoed in the terminal, but nothing happens.

Welcome to Termux!

Docs:       https://termux.dev/docs
Donate:     https://termux.dev/donate
Community:  https://termux.dev/community

Working with packages:

 - Search:  pkg search <query>
 - Install: pkg install <package>
 - Upgrade: pkg upgrade

Report issues at https://termux.dev/issues

dwaa
aswdf
sefwrge

it's ok if you don't really have time to, but would you like to try to troubleshoot how to change Termux to have your way of patching setlocale() and fixing the problem with the command (LC_CTYPE=C; a=$'\xE3\x81\x82'; echo "${#a}")? If so, would you like me to explain the full steps of how I attempted to apply your solution to Termux bash so you can also try it?

algorythmic Jun 30, 2025

After applying the 3 changes here (I neglected to mention one necessary one earlier), try building bash 5.3-rc2 (rather than the master branch, which is currently bash 5.2) like:

git clone https://git.savannah.gnu.org/git/bash.git
cd bash
git switch bash-5.3-testing
./configure CFLAGS='-target aarch64-linux-android34' # adjust API level to match your device
make

robertkirkman Jul 1, 2025
Maintainer

Thank you for the advice,

I wanted to also mention that I did not ignore your advice to try the bash-5.3-testing branch. After your langinfo.h change, the freezing problem did stop, but for both 5.2 and 5.3 versions of bash, the freezing problem was replaced with the crashing problem described in the message below.

Here is the full patch which I applied in addition to the changes described in the message below (after first testing those with bash 5.2), in order to get test builds of readline 8.3 and bash 5.3 for Termux, combined with your changes.

patch to cross-compile the bash bash-5.3-testing branch for Termux

diff --git a/packages/bash/build.sh b/packages/bash/build.sh
index e19d6a2b9f..6d0180f893 100644
--- a/packages/bash/build.sh
+++ b/packages/bash/build.sh
@@ -2,12 +2,9 @@ TERMUX_PKG_HOMEPAGE=https://www.gnu.org/software/bash/
 TERMUX_PKG_DESCRIPTION="A sh-compatible shell that incorporates useful features from the Korn shell (ksh) and C shell (csh)"
 TERMUX_PKG_LICENSE="GPL-3.0"
 TERMUX_PKG_MAINTAINER="Joshua Kahn @TomJo2000"
-_MAIN_VERSION=5.2
-_PATCH_VERSION=37
-TERMUX_PKG_VERSION=${_MAIN_VERSION}.${_PATCH_VERSION}
-TERMUX_PKG_REVISION=2
-TERMUX_PKG_SRCURL=https://mirrors.kernel.org/gnu/bash/bash-${_MAIN_VERSION}.tar.gz
-TERMUX_PKG_SHA256=a139c166df7ff4471c5e0733051642ee5556c1cc8a4a78f145583c5c81ab32fb
+TERMUX_PKG_VERSION="5.3-testing"
+TERMUX_PKG_SRCURL=git+https://git.savannah.gnu.org/git/bash
+TERMUX_PKG_GIT_BRANCH="bash-$TERMUX_PKG_VERSION"
 TERMUX_PKG_AUTO_UPDATE=false
 TERMUX_PKG_DEPENDS="libandroid-support, libiconv, readline (>= 8.0), termux-tools"
 TERMUX_PKG_RECOMMENDS="command-not-found, bash-completion"
@@ -36,58 +33,6 @@ TERMUX_PKG_CONFFILES="etc/bash.bashrc etc/profile"
 
 TERMUX_PKG_RM_AFTER_INSTALL="share/man/man1/bashbug.1 bin/bashbug"
 
-termux_step_pre_configure() {
-	declare -A PATCH_CHECKSUMS
-
-	PATCH_CHECKSUMS[001]=f42f2fee923bc2209f406a1892772121c467f44533bedfe00a176139da5d310a
-	PATCH_CHECKSUMS[002]=45cc5e1b876550eee96f95bffb36c41b6cb7c07d33f671db5634405cd00fd7b8
-	PATCH_CHECKSUMS[003]=6a090cdbd334306fceacd0e4a1b9e0b0678efdbbdedbd1f5842035990c8abaff
-	PATCH_CHECKSUMS[004]=38827724bba908cf5721bd8d4e595d80f02c05c35f3dd7dbc4cd3c5678a42512
-	PATCH_CHECKSUMS[005]=ece0eb544368b3b4359fb8464caa9d89c7a6743c8ed070be1c7d599c3675d357
-	PATCH_CHECKSUMS[006]=d1e0566a257d149a0d99d450ce2885123f9995e9c01d0a5ef6df7044a72a468c
-	PATCH_CHECKSUMS[007]=2500a3fc21cb08133f06648a017cebfa27f30ea19c8cbe8dfefdf16227cfd490
-	PATCH_CHECKSUMS[008]=6b4bd92fd0099d1bab436b941875e99e0cb3c320997587182d6267af1844b1e8
-	PATCH_CHECKSUMS[009]=f95a817882eaeb0cb78bce82859a86bbb297a308ced730ebe449cd504211d3cd
-	PATCH_CHECKSUMS[010]=c7705e029f752507310ecd7270aef437e8043a9959e4d0c6065a82517996c1cd
-	PATCH_CHECKSUMS[011]=831b5f25bf3e88625f3ab315043be7498907c551f86041fa3b914123d79eb6f4
-	PATCH_CHECKSUMS[012]=2fb107ce1fb8e93f36997c8b0b2743fc1ca98a454c7cc5a3fcabec533f67d42c
-	PATCH_CHECKSUMS[013]=094b4fd81bc488a26febba5d799689b64d52a5505b63e8ee854f48d356bc7ce6
-	PATCH_CHECKSUMS[014]=3ef9246f2906ef1e487a0a3f4c647ae1c289cbd8459caa7db5ce118ef136e624
-	PATCH_CHECKSUMS[015]=ef73905169db67399a728e238a9413e0d689462cb9b72ab17a05dba51221358a
-	PATCH_CHECKSUMS[016]=155853bc5bd10e40a9bea369fb6f50a203a7d0358e9e32321be0d9fa21585915
-	PATCH_CHECKSUMS[017]=1c48cecbc9b7b4217990580203b7e1de19c4979d0bd2c0e310167df748df2c89
-	PATCH_CHECKSUMS[018]=4641dd49dd923b454dd0a346277907090410f5d60a29a2de3b82c98e49aaaa80
-	PATCH_CHECKSUMS[019]=325c26860ad4bba8558356c4ab914ac57e7b415dac6f5aae86b9b05ccb7ed282
-	PATCH_CHECKSUMS[020]=b6fc252aeb95ce67c9b017d29d81e8a5e285db4bf20d4ec8cdca35892be5c01d
-	PATCH_CHECKSUMS[021]=8334b88117ad047598f23581aeb0c66c0248cdd77abc3b4e259133aa307650cd
-	PATCH_CHECKSUMS[022]=78b5230a49594ec30811e72dcd0f56d1089710ec7828621022d08507aa57e470
-	PATCH_CHECKSUMS[023]=af905502e2106c8510ba2085aa2b56e64830fc0fdf6ee67ebb459ac11696dcd3
-	PATCH_CHECKSUMS[024]=971534490117eb05d97d7fd81f5f9d8daf927b4d581231844ffae485651b02c3
-	PATCH_CHECKSUMS[025]=5138f487e7cf71a6323dc81d22419906f1535b89835cc2ff68847e1a35613075
-	PATCH_CHECKSUMS[026]=96ee1f549aa0b530521e36bdc0ba7661602cfaee409f7023cac744dd42852eac
-	PATCH_CHECKSUMS[027]=e12a890a2e4f0d9c6ec1ce65b73da4fe116c8e4209bac8ac9dc4cd96f486ab39
-	PATCH_CHECKSUMS[028]=6042780ba2893daca4a3f0f9b65728592cd7bb6d4cebe073855a6aad4d63aac1
-	PATCH_CHECKSUMS[029]=125cacb37e625471924b3ee06c54cb1bf21b3b7fe0e569d24a681b0ec4a29987
-	PATCH_CHECKSUMS[030]=c3ff73230e123acdb5ac216921a386df8f74340459533d776d02811a1f76698f
-	PATCH_CHECKSUMS[031]=c2d1b7be2df771126105020af7fafa00fffd4deff4a4e45d60fc6a235bcba795
-	PATCH_CHECKSUMS[032]=7b9c77daeca93ff711781d7537234166e83ed9835ce1ee7dcd5742319c372a16
-	PATCH_CHECKSUMS[033]=013ec6cc10ad98060a7c34ed5c11187bcc5bf4510f32de0d545db89a9a52a2e2
-	PATCH_CHECKSUMS[034]=899fbb3b338048fe52a9c8252bf65ef1194cdff4f7a3fb3316f5f2396143232e
-	PATCH_CHECKSUMS[035]=821a0a47fa692bb0a39482728b1b396bf951e2912768fea6f3026c813c1913e5
-	PATCH_CHECKSUMS[036]=15c93f4936a5e5b88301f3ede767a23d3dd19635af2f3a91fb4cc0e560ca9057
-	PATCH_CHECKSUMS[037]=8a2c1c3b5125d9ae5b47882f7d2ddf9648805f8c67c13aa5ea7efeac475cda94
-
-	for PATCH_NUM in $(seq -f '%03g' ${_PATCH_VERSION}); do
-		PATCHFILE=$TERMUX_PKG_CACHEDIR/bash_patch_${PATCH_NUM}.patch
-		termux_download \
-			"https://mirrors.kernel.org/gnu/bash/bash-${_MAIN_VERSION}-patches/bash${_MAIN_VERSION/./}-$PATCH_NUM" \
-			$PATCHFILE \
-			${PATCH_CHECKSUMS[$PATCH_NUM]}
-		patch -p0 -i $PATCHFILE
-	done
-	unset PATCH_CHECKSUMS PATCHFILE PATCH_NUM
-}
-
 termux_step_post_make_install() {
 	sed -e "s|@TERMUX_PREFIX@|$TERMUX_PREFIX|g" \
 		-e "s|@TERMUX_HOME@|$TERMUX_ANDROID_HOME|g" \
diff --git a/packages/bash/config-top.h.patch b/packages/bash/config-top.h.patch
index 3f9ebaaca7..b401a9072c 100644
--- a/packages/bash/config-top.h.patch
+++ b/packages/bash/config-top.h.patch
@@ -10,14 +10,21 @@ diff -uNr bash-5.0/config-top.h bash-5.0.mod/config-top.h
  #endif
  
  /* If you want to unconditionally set a value for PATH in every restricted
-@@ -74,7 +74,7 @@
+@@ -69,13 +69,13 @@
     the Posix.2 confstr () function, or CS_PATH define are not present. */
  #ifndef STANDARD_UTILS_PATH
  #define STANDARD_UTILS_PATH \
--  "/bin:/usr/bin:/sbin:/usr/sbin:/etc:/usr/etc"
+-  "/bin:/usr/bin:/sbin:/usr/sbin"
 +  "@TERMUX_PREFIX@/bin"
  #endif
  
+ /* The default path for enable -f */
+ #ifndef DEFAULT_LOADABLE_BUILTINS_PATH
+ #define DEFAULT_LOADABLE_BUILTINS_PATH \
+-  "/usr/local/lib/bash:/usr/lib/bash:/opt/local/lib/bash:/usr/pkg/lib/bash:/opt/pkg/lib/bash:."
++  "@TERMUX_PREFIX@/lib/bash:."
+ #endif
+ 
  /* Default primary and secondary prompt strings. */
 @@ -91,7 +91,7 @@
  #define DEFAULT_BASHRC "~/.bashrc"
diff --git a/packages/bash/lib-readline-complete.c.patch b/packages/bash/lib-readline-complete.c.patch
deleted file mode 100644
index c1164dd6b9..0000000000
--- a/packages/bash/lib-readline-complete.c.patch
+++ /dev/null
@@ -1,12 +0,0 @@
-diff -uNr bash-5.0/lib/readline/complete.c bash-5.0.mod/lib/readline/complete.c
---- bash-5.0/lib/readline/complete.c	2017-07-05 02:43:20.000000000 +0300
-+++ bash-5.0.mod/lib/readline/complete.c	2019-02-20 14:15:49.683440481 +0200
-@@ -2231,7 +2231,7 @@
- char *
- rl_username_completion_function (const char *text, int state)
- {
--#if defined (__WIN32__) || defined (__OPENNT)
-+#if defined (__WIN32__) || defined (__OPENNT) || defined (__ANDROID__)
-   return (char *)NULL;
- #else /* !__WIN32__ && !__OPENNT) */
-   static char *username = (char *)NULL;
diff --git a/packages/readline/build.sh b/packages/readline/build.sh
index a79c45053f..ed5213f89c 100644
--- a/packages/readline/build.sh
+++ b/packages/readline/build.sh
@@ -5,44 +5,13 @@ TERMUX_PKG_MAINTAINER="@termux"
 TERMUX_PKG_DEPENDS="libandroid-support, ncurses"
 TERMUX_PKG_BREAKS="bash (<< 5.0), readline-dev"
 TERMUX_PKG_REPLACES="readline-dev"
-_MAIN_VERSION=8.2
-_PATCH_VERSION=13
-TERMUX_PKG_VERSION=$_MAIN_VERSION.$_PATCH_VERSION
-TERMUX_PKG_SRCURL=https://mirrors.kernel.org/gnu/readline/readline-${_MAIN_VERSION}.tar.gz
-TERMUX_PKG_SHA256=3feb7171f16a84ee82ca18a36d7b9be109a52c04f492a053331d7d1095007c35
+TERMUX_PKG_VERSION="8.3-rc2"
+TERMUX_PKG_SRCURL=https://ftp.gnu.org/gnu/readline/readline-${TERMUX_PKG_VERSION}.tar.gz
+TERMUX_PKG_SHA256=f7a444d1ac5c3c21a8fa8130d0d616f9e5218b215a25c69ba8de053d695add44
 TERMUX_PKG_EXTRA_CONFIGURE_ARGS="--with-curses --enable-multibyte bash_cv_wcwidth_broken=no"
 TERMUX_PKG_EXTRA_MAKE_ARGS="SHLIB_LIBS=-lncursesw"
 TERMUX_PKG_CONFFILES="etc/inputrc"
 
-termux_step_pre_configure() {
-	declare -A PATCH_CHECKSUMS
-
-	PATCH_CHECKSUMS[001]=bbf97f1ec40a929edab5aa81998c1e2ef435436c597754916e6a5868f273aff7
-	PATCH_CHECKSUMS[002]=e06503822c62f7bc0d9f387d4c78c09e0ce56e53872011363c74786c7cd4c053
-	PATCH_CHECKSUMS[003]=24f587ba46b46ed2b1868ccaf9947504feba154bb8faabd4adaea63ef7e6acb0
-	PATCH_CHECKSUMS[004]=79572eeaeb82afdc6869d7ad4cba9d4f519b1218070e17fa90bbecd49bd525ac
-	PATCH_CHECKSUMS[005]=622ba387dae5c185afb4b9b20634804e5f6c1c6e5e87ebee7c35a8f065114c99
-	PATCH_CHECKSUMS[006]=c7b45ff8c0d24d81482e6e0677e81563d13c74241f7b86c4de00d239bc81f5a1
-	PATCH_CHECKSUMS[007]=5911a5b980d7900aabdbee483f86dab7056851e6400efb002776a0a4a1bab6f6
-	PATCH_CHECKSUMS[008]=a177edc9d8c9f82e8c19d0630ab351f3fd1b201d655a1ddb5d51c4cee197b26a
-	PATCH_CHECKSUMS[009]=3d9885e692e1998523fd5c61f558cecd2aafd67a07bd3bfe1d7ad5a31777a116
-	PATCH_CHECKSUMS[010]=758e2ec65a0c214cfe6161f5cde3c5af4377c67d820ea01d13de3ca165f67b4c
-	PATCH_CHECKSUMS[011]=e0013d907f3a9e6482cc0934de1bd82ee3c3c4fd07a9646aa9899af237544dd7
-	PATCH_CHECKSUMS[012]=6c8adf8ed4a2ca629f7fd11301ed6293a6248c9da0c674f86217df715efccbd3
-	PATCH_CHECKSUMS[013]=1ea434957d6ec3a7b61763f1f3552dad0ebdd6754d65888b5cd6d80db3a788a8
-
-	for PATCH_NUM in $(seq -f '%03g' ${_PATCH_VERSION}); do
-		PATCHFILE=$TERMUX_PKG_CACHEDIR/readline_patch_${PATCH_NUM}.patch
-		termux_download \
-			"http://mirrors.kernel.org/gnu/readline/readline-$_MAIN_VERSION-patches/readline${_MAIN_VERSION/./}-$PATCH_NUM" \
-			$PATCHFILE \
-			${PATCH_CHECKSUMS[$PATCH_NUM]}
-		patch -p0 -i $PATCHFILE
-	done
-
-	CFLAGS+=" -fexceptions"
-}
-
 termux_step_post_make_install() {
 	mkdir -p $TERMUX_PREFIX/lib/pkgconfig
 	cp readline.pc $TERMUX_PREFIX/lib/pkgconfig/

akinomyoga · 2025-06-26T11:55:58Z

akinomyoga
Jun 26, 2025
Author

$ (LC_CTYPE=C; a=$'\xE3\x81\x82'; echo "${#a}")
1

In addition to the above issue, the behavior of gawk is also broken in Termux with LC_ALL=C:

$ echo | LC_ALL=C gawk $'/(\xE3\x81\x82)/'   # note: this is equivalent to '/(あ)/'
gawk: cmd. line:1: fatal: unbalanced (

The gawk version in Termux is 5.3.1. This doesn't happen in gawk in other systems (I checked the behavior in 5.3.0 and 5.3.1 in Fedora 41, and 5.3.2 in Arch).

0 replies

algorythmic · 2025-06-30T04:12:19Z

algorythmic
Jun 30, 2025

(Continuing discussion from #25149 (comment), cc: @TomJo2000)

One thing I neglected to mention is that for some programs (including bash) to work correctly we need a functional nl_langinfo(CODESET), as this is currently hardcoded to return "UTF-8".

So there are three total header changes needed:

stdlib.h

213c213,214
< #define MB_CUR_MAX 4
---
> size_t __ctype_get_mb_cur_max(void);
> #define MB_CUR_MAX __ctype_get_mb_cur_max()

langinfo.h

146c146
< 	if (item == CODESET) return "UTF-8";
---
> 	if (item == CODESET) return (MB_CUR_MAX == 1) ? "ASCII" : "UTF-8";

locale.h
Add setlocale wrapper to the end as above

After making these changes, a fresh build of bash behaves correctly:

$ a=$'\xE3\x81\x82'
$ (LC_CTYPE=C; echo "${#a}")
3
$ (LC_CTYPE=C.UTF-8; echo "${#a}")
1

I'd be happy to work on a PR for this in the near future.

3 replies

robertkirkman Jun 30, 2025
Maintainer

Thank you for explaining more,

Unfortunately, when I tried to apply your new changes onto Termux and recompile bash, the resulting build of bash now has a problem where it is crashing immediately and even a build with debug symbols crashes without any backtrace:

Reading symbols from bash...
(gdb) run
Starting program: /data/data/com.termux/files/usr/bin/bash 
During startup program terminated with signal SIGSEGV, Segmentation fault.
(gdb) bt
No stack.
(gdb)

Here are the exact commands I am using to try to apply your changes:

commands I tried

docker container kill termux-package-builder
docker container rm termux-package-builder
git clone https://github.com/termux/termux-packages.git
cd termux-packages/
git apply -v << 'EOF'
--- /dev/null
+++ b/ndk-patches/23c/locale.h.patch
@@ -0,0 +1,48 @@
+diff --git a/usr/include/locale.h b/usr/include/locale.h
+index 4924962..99dadf2 100644
+--- a/usr/include/locale.h
++++ b/usr/include/locale.h
+@@ -33,7 +33,7 @@
+ #include <xlocale.h>
+ 
+ #define __need_NULL
+-#include <stddef.h>
++#include <stdlib.h>
+ 
+ __BEGIN_DECLS
+ 
+@@ -105,7 +105,33 @@ void freelocale(locale_t __l) __INTRODUCED_IN(21);
+ locale_t newlocale(int __category_mask, const char* __locale_name, locale_t __base) __INTRODUCED_IN(21);
+ #endif /* __ANDROID_API__ >= 21 */
+ 
+-char* setlocale(int __category, const char* __locale_name);
++/* Implement the algorithm from
++   https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap08.html */
++static inline const char* __locale_from_env(int cat) {
++    static const char * const cat_names[] = {
++        "LC_CTYPE", "LC_NUMERIC", "LC_TIME", "LC_COLLATE", "LC_MONETARY", "LC_MESSAGES",
++        "LC_ALL", "LC_PAPER", "LC_NAME", "LC_ADDRESS", "LC_TELEPHONE", "LC_MEASUREMENT",
++        "LC_IDENTIFICATION" };
++    const char *name;
++
++    if (cat < 0 || cat >= sizeof cat_names)
++        return "";
++    if ((name = getenv("LC_ALL")) && *name)
++        return name;
++    if (cat != LC_ALL && (name = getenv(cat_names[cat])) && *name)
++        return name;
++    if ((name = getenv("LANG")) && *name)
++        return name;
++    return "C.UTF-8"; /* Or "" for Android >= 8 support */
++}
++
++static char* _Nullable __termux_repl_setlocale(int __category, const char* _Nullable __locale_name) {
++    if (__locale_name && *__locale_name == '\0')
++        __locale_name = __locale_from_env(__category);
++
++    return setlocale(__category, __locale_name);
++}
++#define setlocale __termux_repl_setlocale
+ 
+ #if __ANDROID_API__ >= 21
+ locale_t uselocale(locale_t __l) __INTRODUCED_IN(21);
--- a/ndk-patches/23c/stdlib.h.patch
+++ b/ndk-patches/23c/stdlib.h.patch
@@ -8,13 +8,3 @@
  #include <sys/cdefs.h>
  #include <xlocale.h>
  
-@@ -224,8 +225,7 @@
- size_t wcstombs(char* __dst, const wchar_t* __src, size_t __n);
- 
- #if __ANDROID_API__ >= 21
--size_t __ctype_get_mb_cur_max(void) __INTRODUCED_IN(21);
--#define MB_CUR_MAX __ctype_get_mb_cur_max()
-+#define MB_CUR_MAX 4
- #else
- /*
-  * Pre-L we didn't have any locale support and so we were always the POSIX
--- /dev/null
+++ b/ndk-patches/27c/locale.h.patch
@@ -0,0 +1,46 @@
+--- a/usr/include/locale.h
++++ b/usr/include/locale.h
+@@ -33,7 +33,7 @@
+ #include <xlocale.h>
+ 
+ #define __need_NULL
+-#include <stddef.h>
++#include <stdlib.h>
+ 
+ __BEGIN_DECLS
+ 
+@@ -101,7 +101,33 @@ struct lconv* _Nonnull localeconv(void);
+ locale_t _Nullable duplocale(locale_t _Nonnull __l);
+ void freelocale(locale_t _Nonnull __l);
+ locale_t _Nullable newlocale(int __category_mask, const char* _Nonnull __locale_name, locale_t _Nullable __base);
+-char* _Nullable setlocale(int __category, const char* _Nullable __locale_name);
++/* Implement the algorithm from
++   https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap08.html */
++static inline const char* __locale_from_env(int cat) {
++    static const char * const cat_names[] = {
++        "LC_CTYPE", "LC_NUMERIC", "LC_TIME", "LC_COLLATE", "LC_MONETARY", "LC_MESSAGES",
++        "LC_ALL", "LC_PAPER", "LC_NAME", "LC_ADDRESS", "LC_TELEPHONE", "LC_MEASUREMENT",
++        "LC_IDENTIFICATION" };
++    const char *name;
++
++    if (cat < 0 || cat >= sizeof cat_names)
++        return "";
++    if ((name = getenv("LC_ALL")) && *name)
++        return name;
++    if (cat != LC_ALL && (name = getenv(cat_names[cat])) && *name)
++        return name;
++    if ((name = getenv("LANG")) && *name)
++        return name;
++    return "C.UTF-8"; /* Or "" for Android >= 8 support */
++}
++
++static char* _Nullable __termux_repl_setlocale(int __category, const char* _Nullable __locale_name) {
++    if (__locale_name && *__locale_name == '\0')
++        __locale_name = __locale_from_env(__category);
++
++    return setlocale(__category, __locale_name);
++}
++#define setlocale __termux_repl_setlocale
+ locale_t _Nullable uselocale(locale_t _Nullable __l);
+ 
+ #define LC_GLOBAL_LOCALE __BIONIC_CAST(reinterpret_cast, locale_t, -1L)
--- a/ndk-patches/27c/stdlib.h.patch
+++ b/ndk-patches/27c/stdlib.h.patch
@@ -10,13 +10,3 @@
  #include <sys/cdefs.h>
  #include <xlocale.h>
  
-@@ -207,8 +207,7 @@
- 
- size_t wcstombs(char* _Nullable __dst, const wchar_t* _Nullable __src, size_t __n);
- 
--size_t __ctype_get_mb_cur_max(void);
--#define MB_CUR_MAX __ctype_get_mb_cur_max()
-+#define MB_CUR_MAX 4
- 
- #if defined(__BIONIC_INCLUDE_FORTIFY_HEADERS)
- #include <bits/fortify/stdlib.h>
--- a/ndk-patches/langinfo.h
+++ b/ndk-patches/langinfo.h
@@ -143,7 +143,7 @@ static char *nl_langinfo_l(nl_item item, locale_t loc)
 	int idx = item & 65535;
 	const char *str;
 
-	if (item == CODESET) return "UTF-8";
+	if (item == CODESET) return (MB_CUR_MAX == 1) ? "ASCII" : "UTF-8";
 
 	switch (cat) {
 	case 0:
EOF
scripts/run-docker.sh ./build-package.sh -I -f -d bash
scp -P 8022 output/bash-dbg_5.2.37-2_aarch64.deb  192.168.12.185:~
ssh -p 8022 192.168.12.185
pkg upgrade
pkg install gdb
pkg reinstall ./bash-dbg_5.2.37-2_aarch64.deb
gdb bash
run
bt
quit
apt reinstall bash

Do you happen to know what might be going wrong?

robertkirkman Jul 1, 2025
Maintainer

I forgot to say that:

the commands are run on Linux PC until the IP address appears,
Docker and Git are installed and set up on the Linux PC,
192.168.12.185 is the WiFi LAN IP address of my Termux phone, which I used:
- pkg install openssh
- passwd
- and sshd on, before running these commands,
and after the point of the command ssh -p 8022 192.168.12.185,
- all the commands are run on the Termux phone
- using the PC keyboard, in the SSH window that appears on the PC
and also that I think SSH is very convenient in this situation, and,
- I would definitely recommend using SSH if troubleshooting this way of installing the changes,
- because while the broken bash test package is installed, new Termux sessions stop working,
- In those situations, it's convenient to have multiple Termux sessions launched through SSH or the Termux App directly or otherwise,
- in order to have sessions to recover from the broken bash test package using the command apt reinstall bash, without necessarily needing the Termux failsafe session button.

dfaure Jul 6, 2025

One could also switch to another default shell (like zsh) while testing changes to bash (by simply running bash from zsh).

WormasterGpt · 2025-09-23T16:53:21Z

WormasterGpt
Sep 23, 2025

I dont understand probably nothing, but this seems nice. Good job.

0 replies

Uh oh!

What is the current status of locale support (particularly en_US.UTF-8 and C)? #23010

Uh oh!

Uh oh!

akinomyoga Jan 23, 2025

Original issue

Past discussions

Bionic libc

Timeline

Questions

Replies: 4 comments · 6 replies

Uh oh!

Uh oh!

algorythmic Mar 13, 2025

Uh oh!

robertkirkman Jun 30, 2025 Maintainer

Uh oh!

Uh oh!

algorythmic Jun 30, 2025

Uh oh!

Uh oh!

robertkirkman Jul 1, 2025 Maintainer

Uh oh!

akinomyoga Jun 26, 2025 Author

Uh oh!

algorythmic Jun 30, 2025

Uh oh!

Uh oh!

robertkirkman Jun 30, 2025 Maintainer

Uh oh!

Uh oh!

robertkirkman Jul 1, 2025 Maintainer

Uh oh!

dfaure Jul 6, 2025

Uh oh!

WormasterGpt Sep 23, 2025

What is the current status of locale support (particularly `en_US.UTF-8` and `C`)? #23010

akinomyoga
Jan 23, 2025

Replies: 4 comments 6 replies

algorythmic
Mar 13, 2025

robertkirkman Jun 30, 2025
Maintainer

robertkirkman Jul 1, 2025
Maintainer

akinomyoga
Jun 26, 2025
Author

algorythmic
Jun 30, 2025

robertkirkman Jun 30, 2025
Maintainer

robertkirkman Jul 1, 2025
Maintainer

WormasterGpt
Sep 23, 2025