Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Label or its font does not display some Unicode characters correctly #7142

Open
Semnodime opened this issue Nov 3, 2024 · 6 comments
Open
Assignees
Labels
Status: Triage Information is being gathered

Comments

@Semnodime
Copy link

Describe the bug

The unicode string -❤️🛢️⚽📕🖍️🩸🧯🔴🟧🟨🟩🟦🟪🟫⬛⬜- is displayed as
image

To Reproduce
Steps to reproduce the behavior:

  1. Go to Listing view
  2. Edit a label
  3. Enter the string
  4. See error

Expected behavior
A font that displays each unicode symbol with its respective glyph or at least a generic box with the codepoint

  • Ghidra Version: 21.0.4
  • Ghidra Origin: official GitHub distro
@Semnodime Semnodime changed the title Label or its font does not support some Unicode characters Label or its font does not display some Unicode characters correctly Nov 3, 2024
@hippietrail
Copy link
Contributor

This looks like a surrogate pair / astral plane problem. Unicode has two kinds of emoji symbols. The old ones are "dingbats" in the Basic Multilingual Plane and need only one 16-bit UCS-2 / UTF-16 code unit whereas the new ones are the true emoji, which are not in the BMP and need two 16-bit UTF-16 code units to make up one codepoint, which is known as a surrogate pair. When Java came out, there were no surrogate pairs and all characters / all codepoints could be represented in a single UCS-2 code unit.

  • ❤ is from dingbats and is U+2764 (<= 0xffff)
  • ⚽ is SOCCER BALL: U+26BD (<= 0xffff)
  • 🛢 is OIL DRUM: U+1F6E2 (>0xffff)

Modern non-BMP true emoji are always colour. Old dingbats style BMP ones are often rendered monochrome in some places and colour in other places.

These bugs were very common but nobody cared before modern emoji. When everybody decided they care about emoji a lot, many of these bugs have been fixed.

@ryanmkurtz ryanmkurtz added the Status: Triage Information is being gathered label Nov 4, 2024
@dev747368
Copy link
Collaborator

You didn't mention what platform you were using ghidra on, but I can say that the behavior is quite different than what I'm seeing on windows and linux.

As hippietrail said, java does lack strong support for some of the newer unicode additions, but generally it should not get worse than just displaying a replacement char generic box in place of the codepoint, and for the glyphs that need 2 16bit values, they typically take 2 arrow presses to advance over them when they are in an input field.

Here is a jshell snippet that will let you test what plain java does on your system:

javax.swing.JOptionPane.showMessageDialog(null, new javax.swing.JTextField("put the test string here"))

jshell is a java REPL that should be co-located with the rest of your java jdk binaries.

@dev747368 dev747368 added the Status: Waiting on customer Waiting for customer feedback label Nov 4, 2024
@Semnodime
Copy link
Author

image
Linux Mint 21.3 Cinnamon with X11

@hippietrail
Copy link
Contributor

hippietrail commented Nov 5, 2024

Just confirming that everywhere I try this string it worksrenders correctly in macOS.

(It doesn't "just work" in other ways though. The Search Memory dialog looks like it only acknowledges the first byte of each of these characters.)

@ryanmkurtz ryanmkurtz removed the Status: Waiting on customer Waiting for customer feedback label Nov 5, 2024
@Semnodime
Copy link
Author

Font Image of Decopile View
Dialog image
DialogInput image
Ubuntu image
Hack image
Free Mono image
Noto Color Emoji image
Noto Mono image

@dev747368
Copy link
Collaborator

could you comment about your observation in #7143 about the cursor skipping "all" the characters in the string in the plain java JTextField?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Triage Information is being gathered
Projects
None yet
Development

No branches or pull requests

4 participants