Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions api/core/tools/utils/text_processing_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ def remove_leading_symbols(text: str) -> str:
Returns:
str: The text with leading punctuation or symbols removed.
"""
# Match Unicode ranges for punctuation and symbols
# FIXME this pattern is confused quick fix for #11868 maybe refactor it later
pattern = r"^[\u2000-\u206F\u2E00-\u2E7F\u3000-\u303F!\"#$%&'()*+,./:;<=>?@^_`~]+"
# 移除了感叹号 '!' 原逻辑会错误地将 Markdown 图像语法 ![]() 中的前导 ![' 符号移除,导致输出的 Markdown 格式不完整,图片无法正常显示
# Removed the exclamation mark '!' The original logic would mistakenly remove the leading '!' symbol in the Markdown image syntax ![](), resulting in incomplete Markdown formatting and images that cannot be displayed properly.
pattern = r"^[\u2000-\u206F\u2E00-\u2E7F\u3000-\u303F\"#$%&'()*+,./:;<=>?@^_`~]+"
Comment on lines +14 to +16
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This change introduces a regression. By removing ! from the character set entirely, it prevents stripping of leading exclamation marks in all cases. This will cause the existing unit test with input "!@#Test symbols" to fail, as it expects all leading symbols to be removed.

A better approach is to use a negative lookahead to prevent stripping ! only when it's followed by [, which indicates Markdown image syntax. This fixes the bug without breaking existing behavior.

I've provided a code suggestion with an improved regex and a comment explaining the logic.

Also, please remember to include the new test cases for this fix in your commit, as mentioned in the PR description. A test case like ("![image](url)", "![image](url)") would be a good addition to prevent future regressions.

Suggested change
# 移除了感叹号 '!' 原逻辑会错误地将 Markdown 图像语法 ![]() 中的前导 ![' 符号移除,导致输出的 Markdown 格式不完整,图片无法正常显示
# Removed the exclamation mark '!' The original logic would mistakenly remove the leading '!' symbol in the Markdown image syntax ![](), resulting in incomplete Markdown formatting and images that cannot be displayed properly.
pattern = r"^[\u2000-\u206F\u2E00-\u2E7F\u3000-\u303F\"#$%&'()*+,./:;<=>?@^_`~]+"
# This pattern uses a negative lookahead `!(?!\)` to match an exclamation mark
# only if it's not followed by an opening square bracket. This prevents
# stripping the '!' from Markdown image links like '![...](...)' while
# still removing other leading symbols.
pattern = r"^([\u2000-\u206F\u2E00-\u2E7F\u3000-\u303F\"#$%&'()*+,./:;<=>?@^_`~]|!(?!\[]))+"

return re.sub(pattern, "", text)
Loading