LINQ to XML: Inconsistent namespace prefix handling #48400

philippdolder · 2021-02-17T11:25:59Z

Description

Note: this post has been edited to summarize the whole discussion.

I'm experiencing inconsistent behavior regarding XML namespace prefixing. When I create an XElement via new XElement() the code behaves differently from when I use XElement.Parse() to parse the equivalent XML structure.
See code sample below. The first assert succeeds. The second assert fails.
The root element knows about the namespace prefixing. After adding both elements to it, only the one created with the constructor gets prefixed properly. See string representations of each object as they are at the end of the Fact method.

original: <mn:p xmlns:mn="http://my-namespace.com/">text</mn:p>
deserialized: <p xmlns="http://my-namespace.com/">text</p>
root:

<html xmlns:mn="http://my-namespace.com/">
  <mn:p>text</mn:p>
  <p xmlns="http://my-namespace.com/">text</p>
</html>

namespace MyNamespace
{
    using System.Xml.Linq;
    using Xunit;

    public class XmlParsing
    {
        private const string Prefix = "mn";
        private static readonly XNamespace Namespace = "http://my-namespace.com/";

        [Fact]
        public void Fact()
        {
            var root = new XElement("html", new XAttribute(XNamespace.Xmlns + Prefix, Namespace));
            var original = new XElement(Namespace + "p", "text");

            var deserialized = XElement.Parse(original.ToString());

            // original: [<p xmlns="http://my-namespace.com/">text</p>]
            // deserialized: [<p xmlns="http://my-namespace.com/">text</p>]
            Assert.Equal(original.ToString(), deserialized.ToString()); // succeeds

            root.Add(original);
            root.Add(deserialized);

            // original: [<mn:p xmlns:mn="http://my-namespace.com/">text</mn:p>]
            // deserialized: [<p xmlns="http://my-namespace.com/">text</p>]
            Assert.Equal(original.ToString(), deserialized.ToString()); // fails
        }
    }
}

I would expect the namespace prefixing to work identical no matter how the XElement is created.

Using XDocument instead, leads to the same error.

        public class XmlParsing
        {
            private const string Prefix = "mn";
            private static readonly XNamespace Namespace = "http://my-namespace.com/";

            [Fact]
            public void Fact()
            {
                var root = new XElement("html", new XAttribute(XNamespace.Xmlns + Prefix, Namespace));
                var document = new XDocument();
                document.Add(root);
                var original = new XElement(Namespace + "p", "text");

                var deserialized = XDocument.Parse(original.ToString()).Root;

                // original: [<p xmlns="http://my-namespace.com/">text</p>]
                // deserialized: [<p xmlns="http://my-namespace.com/">text</p>]
                Assert.Equal(original.ToString(), deserialized.ToString()); // succeeds

                root.Add(original);
                root.Add(deserialized);

                // original: [<mn:p xmlns:mn="http://my-namespace.com/">text</mn:p>]
                // deserialized: [<p xmlns="http://my-namespace.com/">text</p>]
                Assert.Equal(original.ToString(), deserialized.ToString()); // fails
            }
        }

Would be nice to add some justification why do you think this behavior is better and comments with what are the values on each assert
My expectation would be that the XElement implementation behaves exactly the same. Independent of how the XElement is created. Be it using the constructor or the XElement.Parse() method.

I also created a gist containing above code to demonstrate the issue.

Configuration

Above code is running in a netcoreappe3.1 with C# 8.0 nullable enabled on a Windows 10 Pro x64

Regression?

Other information

Do you know of any workarounds?
The only work around I found is to explicitly add the new XAttribute(XNamespace.Xmlns + Prefix, Namespace) attribute to the original XElement. Since the xml I'm trying to parse is not 100% under our control, this will not work in the real world scenario.
Doesn't really work, as the resulting xml will contain unnecessary namespace prefix declarations

The text was updated successfully, but these errors were encountered:

ghost · 2021-02-17T11:26:05Z

Tagging subscribers to this area: @buyaa-n, @krwq
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

I'm experiencing inconsistent behavior regarding XML namespace prefixing. When I create an XElement via new XElement() the code behaves differently from when I use XElement.Parse() to parse the equivalent XML structure.
See code sample below. The first assert succeeds.
The root element knows about the namespace prefixing. After adding both elements to it, only the one create with the constructor gets prefixed properly. See string representations of each object at the end of the method.

original: <mn:p xmlns:mn="http://my-namespace.com/">text</mn:p>
deserialized: <p xmlns="http://my-namespace.com/">text</p>
root:

<html xmlns:mn="http://my-namespace.com/">
  <mn:p>text</mn:p>
  <p xmlns="http://my-namespace.com/">text</p>
</html>

namespace MyNamespace
{
    using System.Xml.Linq;
    using Xunit;

    public class XmlParsing
    {
        private const string Prefix = "mn";
        private static readonly XNamespace Namespace = "http://my-namespace.com/";

        [Fact]
        public void Fact()
        {
            var root = new XElement("html", new XAttribute(XNamespace.Xmlns + Prefix, Namespace));
            var original = new XElement(Namespace + "p", "text");

            var deserialized = XElement.Parse(original.ToString());

            Assert.Equal(original.ToString(), deserialized.ToString());

            root.Add(original);
            root.Add(deserialized);

            Assert.Equal(original.ToString(), deserialized.ToString());
        }
    }
}

I would expect the namespace prefixing to work identical no matter how the XElement is created.

I also created a gist containing above code to demonstrate the issue.

Configuration

Above code is running in a netcoreappe3.1 with C# 8.0 nullable enabled on a Windows 10 Pro x64

Regression?

Other information

Do you know of any workarounds?
The only work around I found is to explicitly add the new XAttribute(XNamespace.Xmlns + Prefix, Namespace) attribute to the original XElement. Since the xml I'm trying to parse is not 100% under our control, this will not work in the real world scenario.

Author:	philippdolder
Assignees:	-
Labels:	`area-System.Xml`, `untriaged`
Milestone:	-

watfordgnf · 2021-02-17T14:01:30Z

The order of operation appears to matters here. If original is detached from root when you create deserialized then you receive the behavior you see.

If you change your source like so you receive the intended behavior:

> var root = new XElement("html", new XAttribute(XNamespace.Xmlns + Prefix, Namespace));
> var original = new XElement(Namespace + "p", "text");
> root.Add(original);
> var deserialized = XElement.Parse(original.ToString());
> original
[<mn:p xmlns:mn="http://my-namespace.com/">text</mn:p>]
> deserialized
[<mn:p xmlns:mn="http://my-namespace.com/">text</mn:p>]

Not sure if this helps in the interim.

philippdolder · 2021-02-17T14:09:02Z

@watfordgnf thanks for the suggestion.

Unfortunately in the real use case, I don't have access to the root element at this level. Only later will it be composed from all the different parts making up the complete XML.

watfordgnf · 2021-02-17T15:01:29Z

In both cases original.Name == deserialized.Name so functionally they're identical and this matches the XML spec. From "Namespaces in XML 1.0":

Applications SHOULD use the namespace name, not the prefix, in constructing names whose scope extends beyond the containing document.

It appears XElement.ToString is the behavioral difference based on the presence or absence of a parent element containing the namespace binding to a prefix. Delving into XElement.ToString, you end up in XNode.ToString which sets up an XmlWriter that gets passed to XElement.WriteTo, eventually making it to ElementWriter.WriteElement:

runtime/src/libraries/System.Private.Xml.Linq/src/System/Xml/Linq/XLinq.cs

Lines 206 to 209 in 247212f

    
           public void WriteElement(XElement e) 
        
           { 
        
               PushAncestors(e); 
        
               XElement root = e;

That call to PushAncestors grabs any namespace declarations (and bindings) from its ancestor elements and puts them in the resolver to use when writing the qualified name of an element or attribute (line 320):

runtime/src/libraries/System.Private.Xml.Linq/src/System/Xml/Linq/XLinq.cs

Lines 306 to 325 in 247212f

    
           private void PushAncestors(XElement? e) 
        
           { 
        
               while (true) 
        
               { 
        
                   e = e!.parent as XElement; 
        
                   if (e == null) break; 
        
                   XAttribute? a = e.lastAttr; 
        
                   if (a != null) 
        
                   { 
        
                       do 
        
                       { 
        
                           a = a.next!; 
        
                           if (a.IsNamespaceDeclaration) 
        
                           { 
        
                               _resolver.AddFirst(a.Name.NamespaceName.Length == 0 ? string.Empty : a.Name.LocalName, XNamespace.Get(a.Value)); 
        
                           } 
        
                       } while (a != e.lastAttr); 
        
                   } 
        
               } 
        
           }

Without any ancestors ElementWriter.WriteStartElement will use NameSpaceResolver.GetPrefixOfNamespace passing allowDefaultNamespace = true. This selects the elements' namespace as the default rather than using any binding, because there are none setup at this point.

That was also a long way of saying that the specific binding is being lost when reading it in (lines 907 and 912):

runtime/src/libraries/System.Private.Xml.Linq/src/System/Xml/Linq/XContainer.cs

Lines 902 to 916 in 9195df6

    
           public bool ReadContentFrom(XContainer rootContainer, XmlReader r) 
        
           { 
        
               switch (r.NodeType) 
        
               { 
        
                   case XmlNodeType.Element: 
        
                       XElement e = new XElement(_eCache.Get(r.NamespaceURI).GetName(r.LocalName)); 
        
                       if (r.MoveToFirstAttribute()) 
        
                       { 
        
                           do 
        
                           { 
        
                               e.AppendAttributeSkipNotify(new XAttribute(_aCache.Get(r.Prefix.Length == 0 ? string.Empty : r.NamespaceURI).GetName(r.LocalName), r.Value)); 
        
                           } while (r.MoveToNextAttribute()); 
        
                           r.MoveToElement(); 
        
                       } 
        
                       _currentContainer.AddNodeSkipNotify(e);

I'm not quite sure how you could change ContentReader to do that without making things weird when you attached an XML fragment to a document or other element.

philippdolder · 2021-02-18T10:25:02Z

I was hoping that I could work around this by using XElement.Load with custom XmlReaderSettings but from what I understand it's how XElement and friends are implemented

mayorovp · 2021-02-18T15:47:28Z

@philippdolder you can workaround this by cleaning all namespace declarations after deserialization:

deserialized.DescendantsAndSelf().Attributes().Where(a => a.IsNamespaceDeclaration).Remove();

or by adding all required prefixes before serialization:

original.SetAttributeValue(XNamespace.Xmlns + Prefix, Namespace.NamespaceName);

but from what I understand it's how XElement and friends are implemented

No, it's how XML designed. The isolated string "<mn:p>text</mn:p>" is an invalid XML.

philippdolder · 2021-02-18T17:22:30Z

@mayorovp Thanks for the hints

No, it's how XML designed. The isolated string "<mn:p>text</mn:p>" is an invalid XML.

I'm not claiming this string in isolation is valid xml though. I'm totally aware that in isolation it needs to have the namespace prefix declaration inside the tag itself.

krwq · 2021-07-12T17:26:46Z

Can you repro this with XDocument rather than XElement? If not I'm leaning to close this as "by design"

philippdolder · 2021-07-13T14:00:15Z

Can you repro this with XDocument rather than XElement? If not I'm leaning to close this as "by design"

This test still fails with the same error.

        public class XmlParsing
        {
            private const string Prefix = "mn";
            private static readonly XNamespace Namespace = "http://my-namespace.com/";

            [Fact]
            public void Fact()
            {
                var root = new XElement("html", new XAttribute(XNamespace.Xmlns + Prefix, Namespace));
                var document = new XDocument();
                document.Add(root);
                var original = new XElement(Namespace + "p", "text");

                var deserialized = XDocument.Parse(original.ToString()).Root;

                Assert.Equal(original.ToString(), deserialized.ToString());

                root.Add(original);
                root.Add(deserialized);

                Assert.Equal(original.ToString(), deserialized.ToString());
            }
        }

krwq · 2021-07-20T16:29:05Z

@philippdolder can you edit the first post here to summarize this thread with latest info to make it easier for anyone to pick this up? Would be nice to add some justification why do you think this behavior is better and comments with what are the values on each assert

philippdolder · 2021-08-02T05:48:10Z

@philippdolder can you edit the first post here to summarize this thread with latest info to make it easier for anyone to pick this up?

@krwq done. and it seems the bot changed labels and moved the issue around again

krwq · 2021-08-06T13:26:45Z

@philippdolder thanks! No worries about the bot. As for this issue to set the expectations: I'm not sure this will get fixed - currently even though it doesn't roundtrip with serialization both XMLs are equivalent - all elements have the same local name and the namespace and the content and the difference is just the prefix which doesn't have any meaning and just improves readability. Perhaps some knobs in the serialization which would allow roundtripping might be a better solution than a re-design...

udlose · 2025-01-25T23:28:00Z

Unfortunately, I think this is a design flaw in LINQ to XML. The .NET traditional XML DOM implementation maintains the original element and attribute prefixes: XmlElement.Prefix and XmlAttribute.Prefix. While you can call XElement.GetPrefixOfNamespace(), it doesn't maintain the correct mappings.

I'm writing an application to implement semantic equality using LINQ to XML and this shortcoming has caused some pain points. For example, if I have superfluous namespace declarations (which is perfectly valid xml), LINQ to XML only picks up the first one doesn't maintain records of all of the superfluous/duplicates.

For example (from 3.5 Superfluous Namespace declarations, the following is possible and legal/valid xml but it is impossible to canonicalize according to the spec using LINQ to XML because of this "namespace aggregation". Note that all namespace declarations point to the same namespace but just use different prefixes:

<foo xmlns:a="http://z0" xmlns:b="http://z0" a:att1="val1" b:att2="val2" xmlns="http://z0"> 
 <c:bar xmlns:a="http://z0" xmlns:c="http://z0" c:att3="val3"/>
 <d:bar xmlns:d="http://z0"/>
</foo>

I'd love to see a fix here!

dotnet-issue-labeler bot added area-System.Xml untriaged New issue has not been triaged by the area owner labels Feb 17, 2021

krwq added this to the Future milestone Jul 12, 2021

krwq added the help wanted [up-for-grabs] Good issue for external contributors label Jul 12, 2021

krwq removed the untriaged New issue has not been triaged by the area owner label Jul 12, 2021

krwq removed the help wanted [up-for-grabs] Good issue for external contributors label Jul 12, 2021

ghost added the needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration label Jul 12, 2021

krwq removed the needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration label Jul 20, 2021

ghost added the needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration label Aug 2, 2021

krwq removed the needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration label Aug 6, 2021

joperezr added this to Triage POD for Reflection, META, etc. Nov 2, 2021

joperezr removed this from Triage POD for Reflection, META, etc. Nov 2, 2021

joperezr moved this to Future in Triage POD for Reflection, META, etc. Nov 2, 2021

joperezr added this to Triage POD for Reflection, META, etc. Nov 2, 2021

krwq removed this from the Future milestone Jan 26, 2025

krwq added the untriaged New issue has not been triaged by the area owner label Jan 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LINQ to XML: Inconsistent namespace prefix handling #48400

LINQ to XML: Inconsistent namespace prefix handling #48400

philippdolder commented Feb 17, 2021 •

edited

Loading

ghost commented Feb 17, 2021

Description

Configuration

Regression?

Other information

watfordgnf commented Feb 17, 2021

philippdolder commented Feb 17, 2021

watfordgnf commented Feb 17, 2021

philippdolder commented Feb 18, 2021

mayorovp commented Feb 18, 2021 •

edited

Loading

philippdolder commented Feb 18, 2021 •

edited

Loading

krwq commented Jul 12, 2021

philippdolder commented Jul 13, 2021

krwq commented Jul 20, 2021 •

edited

Loading

philippdolder commented Aug 2, 2021

krwq commented Aug 6, 2021

udlose commented Jan 25, 2025 •

edited

Loading

LINQ to XML: Inconsistent namespace prefix handling #48400

LINQ to XML: Inconsistent namespace prefix handling #48400

Comments

philippdolder commented Feb 17, 2021 • edited Loading

Description

Configuration

Regression?

Other information

ghost commented Feb 17, 2021

Description

Configuration

Regression?

Other information

watfordgnf commented Feb 17, 2021

philippdolder commented Feb 17, 2021

watfordgnf commented Feb 17, 2021

philippdolder commented Feb 18, 2021

mayorovp commented Feb 18, 2021 • edited Loading

philippdolder commented Feb 18, 2021 • edited Loading

krwq commented Jul 12, 2021

philippdolder commented Jul 13, 2021

krwq commented Jul 20, 2021 • edited Loading

philippdolder commented Aug 2, 2021

krwq commented Aug 6, 2021

udlose commented Jan 25, 2025 • edited Loading

philippdolder commented Feb 17, 2021 •

edited

Loading

mayorovp commented Feb 18, 2021 •

edited

Loading

philippdolder commented Feb 18, 2021 •

edited

Loading

krwq commented Jul 20, 2021 •

edited

Loading

udlose commented Jan 25, 2025 •

edited

Loading