Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

中文模式下,数字后的标点不应该自动转换为半角 #972

Closed
oTnTh opened this issue Jan 12, 2025 · 33 comments · Fixed by #980
Closed

中文模式下,数字后的标点不应该自动转换为半角 #972

oTnTh opened this issue Jan 12, 2025 · 33 comments · Fixed by #980

Comments

@oTnTh
Copy link

oTnTh commented Jan 12, 2025

目前在rime的中文模式下,数字字符后的句号(“。”)会被自动转换为半角的小数点,而且这是一个无法关闭的默认行为。

相关代码在75行:https://github.com/rime/librime/blob/master/src/rime/gear/punctuator.cc

  if (ch == '.' || ch == ':') {  // 3.14, 12:30
    const CommitHistory& history(ctx->commit_history());
    if (!history.empty()) {
      const CommitRecord& cr(history.back());
      if (cr.type == "thru" && cr.text.length() == 1 && isdigit(cr.text[0])) {
        return kRejected;
      }
    }
  }

这样的设计需要满足一个条件:在中文里,数字符号不可能出现在句尾。

但是这个假设并不成立,以下是我从个人笔记里翻出来的一些句子:

  • 2022-06-18,到手价102.95,斤价429。
  • 画面还可以,但是卡顿很多,帧率会从60直接掉到0。
  • 家里虽然有延长线,应该都不支持USB 3.0。
  • XX市XX路,姓名,189XXXX。
  • 先升级到越狱比较简单点的版本,5.10.3。
  • 最后还是买了个K60。
  • 第一个4G的fat32,剩下的则格式化为ext3。

或许给用户提供一个选项会是一个更好的处理方式,诸如punctuator/half_shape_after_number之类的?

@oTnTh oTnTh added the bug label Jan 12, 2025
@LEOYoon-Tsaw
Copy link
Member

句尾的數字佔比還是少,不可否認大多數情況都是小數點

@oTnTh
Copy link
Author

oTnTh commented Jan 12, 2025

句尾的數字佔比還是少,不可否認大多數情況都是小數點

“少”并不等于“没有”,给个选项让用户自己决定才是更合理的做法。

实际上搜索一下本项目的issues,这个问题被提出过很多次了。

@LEOYoon-Tsaw
Copy link
Member

就算加個選項又能怎樣呢,難道就能自動識別了?用戶關閉了之後又會抱怨打出3。14,2。3章這種東西

@lotem
Copy link
Member

lotem commented Jan 13, 2025

一個辦法是數字不直接上屏,數全寫完再按空格鍵回車鍵上屏。這樣就可以區分數字中的小數點和數字之後的句號。缺點是輸入較少的數字不如直接上屏方便。

我剛剛從你們的討論中想到,可以自動識別可能是小數點的場景,提供兩個符號候選。如果想要句號,要多打一次 .
選中第二個符號候選,再按空格上屏……

這種方案稍稍變化一下。打到 . 可以只顯示中文句號候選,要手動按空格鍵上屏句號。如果繼續輸入數字,就自動變成小數點。

@lotem
Copy link
Member

lotem commented Jan 13, 2025

還有一個選項……
就是現在的選項。
從交互簡單的角度來說,刪掉 . 再重新打句號是最快的、容易理解。
大公司的輸入法好像也這樣處理。

@ksqsf
Copy link
Member

ksqsf commented Jan 13, 2025

可以考虑维持立即上屏+小数点识别,但在触发小数点后再按一次句号则替换小数点成正常的句号,相当于减少一次退格键。不过我不清楚能不能在所有前端上实现

@oTnTh
Copy link
Author

oTnTh commented Jan 13, 2025

在当前版本的微软拼音中,并没有“数字后的句号自动转换为半角”的功能。

搜狗拼音我好久没用了,以前的话是有这个功能,并且默认是打开的,不过用户可以自主关闭。

去网上搜了一张图,“智能调整数字后标点”。虽然不知道具体怎么个智能法,但也是可以关掉的。

001

增加一个诸如punctuator/half_shape_after_number的选项,应该是代码改动最小,最不容易引入新bug的实现方式了。考虑到兼容性,该选项的默认值可以为true,不喜欢的人自己去改配置文件即可。

至于连续按两下按键,上屏不同符号的功能,貌似搜狗也是有的:

002

如果有大佬愿意为rime添加这个功能当然很好,但我认为即便实现了该功能,也应该给用户一个可以关闭的选项。

输入法是个高度个性化的东西,用户的环境、设备、习惯都有可能存在很大的不同。

以句号和小数点这个问题为例,对于使用104全键盘,又不在意移动手腕的用户来说,主键盘区输入句号,小键盘区输入小数点,就是一种清晰明确的输入方式。而目前的rime对于这名用户来说,需要“按两下句号,方向键左,退格,方向键右”,无疑是增加了额外的心智负担。

开发者受限于个人习惯和经验,很难穷尽一切可能的使用场景,所以我认为rime应该保持灵活强大的特色,把可配置权留给用户,而不是替用户做出选择。

@LEOYoon-Tsaw
Copy link
Member

還有一種可能,有這種需求的用戶可以為「。」和「.」分配不同按鍵,就像iOS中文輸入法一樣。

image

@oTnTh
Copy link
Author

oTnTh commented Jan 13, 2025

如果你的意思是保持rime现有设计不变,让用户另外设定一个快捷键输入句号,这种设计会导致使用逻辑变得很复杂:

1、处于中文输入模式时,大部分情况下,period表示句号;
2、跟在数字字符后面时,period表示小数点;
3、情况2时,如果需要输入句号,必须按另外一组快捷键,比如win+period。

关键在于,有一部分人并不想要“数字后的某些字符自动变为半角”的功能。将该功能保持默认开启的状态,然后给一个选项让不喜欢的人关掉,皆大欢喜不好么。

lotem added a commit to lotem/librime that referenced this issue Jan 16, 2025
fixes rime#972

use ascii punctuation ,.:' after numbers.
they are auto-committed if followed by digit.
or commit manualy with space key.
double strike the key to access the original binding.
@lotem lotem removed the bug label Jan 16, 2025
lotem added a commit to lotem/librime that referenced this issue Jan 17, 2025
fixes rime#972

use ascii punctuation ,.:' after numbers.
they are auto-committed if followed by digit.
or commit manualy with space key.
double strike the key to access the original binding.
lotem added a commit to lotem/librime that referenced this issue Jan 17, 2025
fixes rime#972

use ascii punctuation ,.:' after numbers.
they are auto-committed if followed by a digit.
or commit manualy with space key.
double strike the key to access the original binding.

support half-shape and full-shape forms.
opt-out with `punctuator/convert_punct_in_number: false`.
lotem added a commit to lotem/librime that referenced this issue Jan 17, 2025
fixes rime#972

use ascii punctuation ,.:' after numbers.
they are auto-committed if followed by a digit.
or commit manualy with space key.
double strike the key to access the original binding.

support half-shape and full-shape forms.
opt-out with `punctuator/convert_punct_in_number: false`.
lotem added a commit to lotem/librime that referenced this issue Jan 19, 2025
fixes rime#972

use ascii punctuation ,.:' after numbers.
they are auto-committed if followed by a digit.
or commit manualy with space key.
double strike the key to access the original binding.

support half-shape and full-shape forms.

customize string `punctuator/digit_separators` to specify which ascii
characters are digit separators.
@lotem lotem closed this as completed in 28a234f Jan 19, 2025
@oTnTh
Copy link
Author

oTnTh commented Jan 19, 2025

我从这里下了一个artifact-Windows-msvc-x64.zip来测试: https://github.com/rime/librime/actions/runs/12854606965

punctuator/digit_separators的设置似乎并不生效,不管设置成空字符串还是随便什么,小数点和冒号等都是一样都按两次全角。

update:

找到问题了,该设置项需要写在方案的配置文件中,而非default.custom.yaml里,是我先入为主了。

非常感谢诸位大佬的辛勤工作。

@oTnTh
Copy link
Author

oTnTh commented Jan 19, 2025

我稍微搜了一下,以下issues都跟本贴相关,应该都可以标记完成了:

#376
#486
#712

以下似乎也跟本贴相关,但我不确定问题是否已经得到处理:

#351
#670

@jsh9
Copy link

jsh9 commented Jan 30, 2025

Hi @lotem ,

I installed Squirrel 1.0.3, which I believe includes this update, but I couldn't seem to get this change to work.

Here's what I want: I do not want Rime/Squirrel to detect any digit separators. If I want to enter "123,456,789", I'd prefer to switch to macOS's builtin English keyboard.

How to reproduce

I located default.yaml, and made 2 changes in this file:

First:

menu:
  page_size: 2  # 候选词个数

(This is just a sanity check to ensure that Rime/Squirrel actually picks up my changes in default.yaml.)

punctuator:
  digit_separators: ":"

(I uncommented the line digit_separators: and changed ",.:" into ":".)

(This line was originally: # digit_separators: ",.:" # 在此处指定的字符,在数字后被输入,若再次输入数字,则连同数字直接上屏;若双击,则恢复映射。 # librime >= 28a234f)

The behavior I observed

I saw 2 candidate phrases in the menu, but when I typed 123,456, I still saw 123,456 instead of 123,456.

The expected behavior

I should have seen 123,456.


Or if @oTnTh could share with me which file you tweaked and how you tweaked it, that would be great!

@LEOYoon-Tsaw
Copy link
Member

First thing, don't modify default.yaml directly, use default.custom.yaml. You can check build/default.yaml to check if it's picked up

@jsh9
Copy link

jsh9 commented Jan 30, 2025

You can check build/default.yaml to check if it's picked up

Interesting that this isn't explicitly mentioned in the wiki (or at least it's quite difficult to find). It seems like institutional knowledge ☹️


I modified default.yaml (I know it's deemed "unsafe" but I'll get into that later) and then deploy, and it is indeed reflected in build/default.yaml.

But I still could only see page_size taking effect but not digit_separators.


Now if I modify default.custom.yaml and then deploy, it is also reflected in build/default.yaml (overriding what I wrote in default.yaml), but the behaviors (including page_size) are still those in default.yaml, rather than in default.custom.yaml.

Maybe this is a symptom of a deeper bug?

@oTnTh
Copy link
Author

oTnTh commented Jan 30, 2025

@jsh9

For example, if you use luna_pinyin_simp, then edit luna_pinyin_simp.custom.yaml:

patch:
  punctuator/digit_separators: ":"

@jsh9
Copy link

jsh9 commented Jan 30, 2025

OK, so I finally figured out. It's a very convoluted process.

I'm not using Luna pinyin, but rather I'm using iDvel/rime-ice. I installed it via bash rime-install iDvel/rime-ice:others/recipes/full (from plum's root dir).

And then I created default.custom.yaml to add the following:

patch:
  punctuator/digit_separators: ""

And then I had to locate rime_ice.schema.yaml and find this part: https://github.com/iDvel/rime-ice/blob/2a2bb24367ba9948c840fec599710006dcb1e9ca/rime_ice.schema.yaml#L356-L362

I then needed to append this:

  digit_separators:
    __include: default:/punctuator/digit_separators

Thank you both!

@oTnTh
Copy link
Author

oTnTh commented Jan 31, 2025

@jsh9

Rime is a platform, rime-ice and luna_pinyin_simp are schemas working on this platform.

The default.yaml is the configuration file of Rime, while xxxx.schema.yaml is the configuration file of the schema.

The settings in xxxx.yaml, will be overridden by xxxx.custom.yaml.

default.custom.yaml => default.yaml
rime_ice.custom.yaml => rime_ice.schema.yaml
luna_pinyin_simp.custom.yaml => luna_pinyin_simp.schema.yaml

When upgrading Rime or schemas, the configuration files may be modified, but the custom files will not (at least they shouldn't), so the modifications you made can be retained.

punctuator/digit_separators is a configuration item for the schema, not for Rime itself. If you only use one schema (rime-ice), then you should modify rime_ice.custom.yaml directly.

patch:
  punctuator/digit_separators: ":"

If you use multiple schemas and want all of them to use the same settings, then you can use default.custom.yaml and put __include in every custom file of schemas.

It's quite complicated indeed. I hope this can help.

@jsh9
Copy link

jsh9 commented Feb 1, 2025

Thanks @oTnTh !

So there's at least 2 pieces of institutional knowledge, which can't be easily found in the wiki:

1. I'd need to create rime_ice.custom.yaml instead of rime_ice.schema.custom.yaml
2. In default.custom.yaml, I'd need to write Version A instead of Version B

Version A:

patch:
  punctuator/digit_separators: ""

Version B:

patch:
  punctuator:
    digit_separators: ""

Actually, rime_ice.custom.yaml doesn't work. Even though I could see the expected results in build/default.yaml after deployment, the end result is that I couldn't even use full-space punctuations (such as ","). No error logs (or at least I don't know where to find them).

I had to revert to the old method (modifying rime_ice.schema.yaml directly) for the correct behavior to work.

@oTnTh
Copy link
Author

oTnTh commented Feb 1, 2025

@jsh9

Click the icon of Squirrel, top-right of the screen. Or Right click if you use Windows.

日志 means log, you can find out log files here.

Image

When there is only one sub-item digit_separators under punctuator, Version A and Version B are the same.

However, when there is more than one sub-items, they're different.

# rime_ice.schema.yaml
testa:
  a: 1
  b: 2
testb:
  a: 1
  b: 2
# rime_ice.custom.yaml
patch:
  testa:
    b: 3
  testb/b: 3

After build, you'll get:

testa:
  b: 3
testb:
  a: 1
  b: 3

So Version A is better for write custom file.

Like XML and HTML, YAML is an open format, and its technical details are not defined by Rime.

For more detail about YAML: https://yaml.org

Although it is quite complicated as well.

@chenzhiwei
Copy link

chenzhiwei commented Feb 14, 2025

我和你们相反,我更喜欢数字/字母后使用标点时直接半角上屏,大不了删除重来一下。毕竟绝大部分时候数字字母后面不需要全角符号的。

这个改动修改了默认行为,Rime用户量挺大的,感觉此改动不友好。本来一下就能输入的东西,现在变成两下了。

Markdown的有序列表是1.这种形式,现在输入时需要额外的空格键把.上屏了。

@oTnTh
Copy link
Author

oTnTh commented Feb 14, 2025

不确定我有没有误会你的意思,在这次的修改之后,如果在中文模式下想要连续输入“半角数字、半角句号、半角空格”,需要比以前多打一个空格。如果是这样的话,现在这个改动确实跟以前不一样了。

代码的事情我不是很熟悉,也不知道在保持现有设计的前提下好不好解决这个问题。不过要是这个问题很困扰你,可以通过回退rime版本的法子,暂时用旧版,等看看有没有更好的方法处理。

@chenzhiwei
Copy link

是的,需要多按一次空格键了。

@ksqsf
Copy link
Member

ksqsf commented Feb 14, 2025

https://xkcd.com/1172/ 😄

@chenzhiwei
Copy link

能否设置成下面这种,当前是smart。

  • half
  • full
  • smart

@LEOYoon-Tsaw
Copy link
Member

根本就不要有轉換最好,程序的智能永遠比不過用戶的智能,還會起衝突。如果用戶需要同時使用。和.就分別設置按鍵就好,簡單明瞭

@oTnTh
Copy link
Author

oTnTh commented Feb 14, 2025

中文输入模式下可以设置为完全只使用半角标点的,所以half跟full不是问题,只是这个smart。

中文模式下输入全角,英文模式下输入半角,这是一种“确定性”。

但是输入法是一种高度个人化的东西,每个人的习惯都不同,

“某些字符跟在另外一些字符后面会发生变化”,这是一种“特殊性”,如果这种功能不能关,很难做到让所有人都满意。

比如像我,也写markdown,但是从来不用有序列表。而且大部分时间我用的键盘都是104键,小键盘永远都有半角句号。

我想维护人员在修改这部分代码的时候,也是因为日常使用的时候没遇到过这种情况,所以没想到吧。

就我个人而言,如果找不到更好的处理方式,我也赞成做成一个简单的开关选项。

@lotem
Copy link
Member

lotem commented Feb 14, 2025

不是。原來的實現不能關閉,再者有些情況下會出 BUG,不得不改。
目前的實現是一種折衷:1. 需要比上一版多按一次空格上屏;1。可以比上一版少按一次退格鍵;兩方各退一步。
其實理想的實現應該能保持 1. 行爲不變,只不過輸入法要跨平臺就沒有文字上屏之後再修改的接口可用,這才必須多一步。

#980 看看效果怎樣。

另一個選項是基於現在的代碼重新實現小數點直接上屏,允許配置成跟原來的行爲一致,但不再實現原來的 BUG。

@oTnTh
Copy link
Author

oTnTh commented Feb 14, 2025

我一开始的念头是,退出inline将字符上屏之前,检查字符串长度或者内容。

如果长度为1,即只有一个半角标点,则上屏时多加一个空格。

如果字符串长度大于1,则保持字符串上屏不变。

不过自从我参加讨论以来,见到了各种各样不同的输入习惯,就有点怀疑,这种实现逻辑越来越复杂的弄法,会不会又引入其他的问题。

@jsh9
Copy link

jsh9 commented Feb 14, 2025

There will always be edge cases if you write heuristics, and sooner or later the heuristics will be very difficult to maintain and scale.

I think the inherent shortcoming of heuristics is that it usually can only "look backwards", i.e., make decisions based on existing characters, rather than "look forward".

An LLM (or a bespoke machine learning model) can solve this issue (it needs to "look forward" too) with high precision and high recall. But an LLM is not free and adds big latency, and a bespoke machine learning model is time consuming to train.

@chenzhiwei
Copy link

我和你们相反,我更喜欢数字/字母后使用标点时直接半角上屏,大不了删除重来一下。毕竟绝大部分时候数字字母后面不需要全角符号的。

这个改动修改了默认行为,Rime用户量挺大的,感觉此改动不友好。本来一下就能输入的东西,现在变成两下了。

Markdown的有序列表是1.这种形式,现在输入时需要额外的空格键把.上屏了。

我发这个的初衷是在毫无提示的情况下发现输入法的行为发生了变化,当时那一刻有点儿被冲击到了。

现在想想其实蛮能理解的,软件迭代必然会带来界面和操作逻辑发生变化,另外就是这是个基础库也没法提前告知用户。

@lotem 当前这个实现我现在觉得确实是个不错的方案。目前这个改动应该还没大面积推送到用户侧,后面可以看看其他用户的反馈再决定是否进一步改动。

@ksqsf #980 这个改动并没有完全和之前行为一致,我觉得可以暂时停一下。后面看大面积更新到用户侧之后的用户反馈,要么让用户适应新行为,要么通过开关兼容之前旧行为。如果兼容旧行为会导致代码不宜维护的话,就当前这样即可,毕竟软件长久迭代更新最重要。

感谢大家!

@ksqsf
Copy link
Member

ksqsf commented Feb 14, 2025

这个改动并没有完全和之前行为一致

哪里不一致?

@lotem
Copy link
Member

lotem commented Feb 15, 2025

哪里不一致?

還沒細看,可能是以下不同:

  1. 只輸入 1. 不加空格要比原來多按一次回車鍵,且按回車鍵上屏這個操作用戶也不熟知。
  2. style/inline_preedit: false 打到 1. 時會在候選框裏顯示 . 而不是 1 後面。

@lotem
Copy link
Member

lotem commented Feb 15, 2025

還有,如果用戶不知道要按兩次 . 打出 ,那麼如果嘗試先刪除小數點、再打一個句號 1.{BackSpace}. 會再次輸入小數點。

如果讓小數點直接上屏,就會更新上屏記錄,因此不存在刪除後重複轉換成小數點的煩惱。

這個可以試試在現有代碼上改進。刪除小數點時在上屏記錄裏打標記,或者由 punctuator 內部狀態控制,在輸出 . 之間切換。

lotem added a commit that referenced this issue Feb 15, 2025
follow-up #972
instead of converting it to a `punct_number` segment in the input
buffer, this option reproduces the behaviour before commit e02d6b3.

opt in with `punctuator/digit_separator_action: commit`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants