Skip to content

Linkify incorrectly includes trailing U+2066 and presumably other non-printing unicode in URLs #30351

@ara4n

Description

@ara4n

Steps to reproduce

  1. Copy-paste a URL and accidentally append some non-printing unicode (e.g. a U+2066 LTR ISOLATE codepoint), sending it as plain text from a Matrix client like EX.
  2. EW spots the URL and linkifies it
  3. But linkify includes the non-printing sequence as part of the URL

For instance:

https://example.com/⁦ (which includes an invisible U+2066 on the end)

gets linkified to be

https://example.com/%E2%81%A6

causing mass confusion

Outcome

What did you expect?

Linkify's regexp should not pick up random non-printing unicode on the end of URLs.

What happened instead?

chaos

Operating system

macOS

Browser information

No response

URL for webapp

No response

Application version

Element Nightly version: 2025072101 Crypto version: Rust SDK 0.12.0 (b30f1f3), Vodozemac 0.9.0

Homeserver

matrix.org

Will you send logs?

No

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-URL-PreviewsO-UncommonMost users are unlikely to come across this or unexpected workflowS-MinorImpairs non-critical functionality or suitable workarounds existT-Defect

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions