Skip to content

Conversation

@MihaZupan
Copy link
Member

@MihaZupan MihaZupan commented Nov 15, 2025

Fixes #1487

Uri parsing happens in stages:

  • PrivateParseMinimal - The ctor will only do minimal validation to see whether it should throw
  • CreateUriInfo - Accessing properties will allocate UriInfo and populate some offsets of various Uri components (scheme, userinfo, host, port). This is enough to access the Host.
  • ParseRemaining - Accessing properties like PathAndQuery will populate the rest of the offsets and check whether the input is in the canonical form (whether we'll have to do some text changes for the properties).

If the input contains non-ASCII, we'll also rebuild the string representation and do all parts as part of the ctor. While recreating the string, we're dealing with two separate offsets (into the original and into the (partial) new string).
The issue with #1487 was that we were indexing into the new string instead of the original inside CreateUriInfo, which broke implicit files where the host got reduced in length (by removing bidi chars).

This PR fixes that, and also removes the use of unsafe code in that method.

No real perf impact - MihuBot/runtime-utils#1635, MihuBot/runtime-utils#1636

As alaways, easier to review without space changes.

@MihaZupan MihaZupan added this to the 11.0.0 milestone Nov 15, 2025
@MihaZupan MihaZupan self-assigned this Nov 15, 2025
@MihaZupan MihaZupan marked this pull request as ready for review November 15, 2025 05:20
@MihaZupan MihaZupan requested review from a team and Copilot November 15, 2025 05:20
Copilot finished reviewing on behalf of MihaZupan November 15, 2025 05:24
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes issue #1487 by correcting how URI parsing handles strings with non-ASCII characters, specifically addressing an IndexOutOfRangeException that occurred when parsing implicit file URIs containing bidirectional control characters. The fix ensures that when the URI string is rebuilt to handle unicode characters, the correct string offsets are used throughout the parsing process.

Key changes:

  • Fixed indexing bug in CreateUriInfo where _string (potentially modified) was incorrectly used instead of OriginalString when tracking offsets, and vice versa when needed
  • Removed unsafe code from CreateUriInfo method by eliminating the fixed statement and pointer arithmetic
  • Added guard conditions in UriExt.cs to prevent parsing when exceptions have already occurred

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
UncTest.cs Adds test coverage for UNC paths containing bidi control characters to prevent regression of #1487
UriExt.cs Adds null checks before hasUnicode parsing to avoid redundant work when errors already exist
Uri.cs Refactors CreateUriInfo to remove unsafe code, fix string offset tracking between OriginalString and _string, and simplify control flow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Uri] Uri(String, UriKind) throws IndexOutOfRangeException for certain inputs

1 participant