Tag Processor: Document and test XSS prevention in set_attribute #44447

adamziel · 2022-09-26T04:44:52Z

What?

The WP_HTML_Tag_Processor allows setting new HTML attributes with a given name and value. Passing malicious values such as " onclick="alert(1) should not create new attributes. WP_HTML_Tag_Processor prevents that by applying the correct escaping.

In this patch we're adding a series of tests and comments to document this behavior and prevent any future regressions.

Why?

We don't want to allow setting attributes whose names could break the HTML syntax or open new vulnerabilities in code that parses post content.

Testing Instructions

Validate the given escaping and tests are sufficient to prevent XSS via injecting a malicious attribute value.

cc @azaozz @dmsnell @gziolo @aristath

gziolo

As far as I know, it correctly documents the current behavior and it’s what been discussed as the way moving forward. I would appreciate confirmation from @dmsnell.

lib/experimental/html/class-wp-html-tag-processor.php

dmsnell · 2022-09-26T17:46:59Z

lib/experimental/html/class-wp-html-tag-processor.php

+			 * This is sufficient because:
+			 * * The attributes touched by set_attribute are always rewritten using double quotes.
+			 * * The only way to terminate a double quoted attribute value is via a double quote (")
+			 *   character. There is no need to encode other characters such as <, >, /, or '.


Given the existing body of code attempting to parse HTML in WordPress and its plugins, I think @azaozz has a good point that we might want to encode <, >, and & as well. This isn't an HTML thing, but if we allow alt="the <img /> tag we might open the door for defects in other code to trip up here when we had the power to prevent it.

the one caveat I see is with class because we don't support decoding character references at this moment, but even then I think it's fine to encode them on save.

Good point, let’s escape these as well. I agree it’s fine to encode the class names on save. One caveat is stringifying one processor and initialising another with the result - we might need to add a ”slow mode” that decodes attributes values by default to support that.

Actually, let's stick to encoding just ", <, and >. Encoding & breaks the following use-case:

$p->set_attribute( 'data-price', '10 €' ); // <div data-price="10 &euro;"></div>

I just committed that in 55a0a83;

I believe that _wp_specialchars() exists to handle encoding without double-encoding (unless you explicitly tell it to). it looks up any character references before replacing & willy-nilly.

"10 € & "USD"" === _wp_specialchars( '10 € & "USD"' );

_wp_specialchars seems to be deprecated in favor of esc_html, which does the same thing as esc_attr – let's try esc_attr then?

Proposed in 6c309a8

dmsnell

sorry to drag this on, but I feel like we need to.

with the proposed use of esc_attr() we're preventing the use of this processor for URL fields, which you might think should be escaped with esc_url() but actually needs to be handled more carefully.

this brings me back to why we left this idea of general escaping unaddressed in this system because escaping is dependent on which attribute is being set.

we could start to build in that handling and distinguish which attribute is being set, or we could bail and leave it up to developers using this API to do their own escaping.

potentially we could introduce something in Core like esc_attr( $text, $attribute_name = null ) to handle the distinctions.

Type	Value
URL	`https://wordpress.com/?fields=id,user&pretty=true&q="is anything > 0"`
Wanted escaping	`https://wordpress.com/?fields=id,user&pretty=true&q="is%20anything%20>%200"`
`esc_attr()`	`https://wordpress.com/?fields=id,user&pretty=true&q="is anything > 0"`
`esc_url()`	`https://wordpress.com/?fields=id,user&pretty=true&q=is%20anything%20%200`
`esc_url($context = 'attribute')`	`https://wordpress.com/?fields=id,user&pretty=true&q=is%20anything%20%200`

(the $_context parameter to esc_url() is marked private and distinguishes between sanitizing the URL for database storage and for display)

these are not complicated URLs. for example, if I use an image from my blog on WordPress.com both esc_attr() and esc_url() break the link, ironically eliminating the ssl=1 query arg. these URLs are generated from WordPress itself and are exactly the kinds of URLs I expect code to inject when using this processor.

Type	Value
URL	`https://i0.wp.com/fluffyandflakey.blog/wp-content/uploads/2019/09/Untitled_Artwork.png?fit=1304%2C497&ssl=1`
`esc_attr()`	`https://i0.wp.com/fluffyandflakey.blog/wp-content/uploads/2019/09/Untitled_Artwork.png?fit=1304%2C497&ssl=1`
`esc_url()`	`https://i0.wp.com/fluffyandflakey.blog/wp-content/uploads/2019/09/Untitled_Artwork.png?fit=1304%2C497&ssl=1`

one option we have is to bail sanitization if setting a src, href, srcset, or other URL fields. I'm leery of that because of how different the protections are for those fields vs. for others. I'd rather we try to do everything or nothing so that the expectations are clear.

dmsnell · 2022-09-28T19:10:56Z

:sigh: I'm getting confused myself again with HTML escaping vs. parsed values.

looks like esc_attr() is correct here because even though it escapes more than we want, the browser still parses the attribute value because HTML5 specifies that all attribute values may contain character references

phpunit/html/wp-html-tag-processor-test.php

dmsnell · 2022-09-28T20:13:09Z

do you think this should be paired with get_attribute() returning the decoded values?

dmsnell · 2022-09-28T20:25:15Z

do you think this should be paired with get_attribute() returning the decoded values?

I've added this in 907d37d

phpunit/html/wp-html-tag-processor-test.php

… libraries

adamziel · 2022-09-29T01:30:44Z

I've added this in 907d37d

Yeah it only makes sense to do it. It shouldn't even add much of a speed penalty since it's not a part of the parser but of the getter. You only bear the cost when you specifically want to read that value. If that ever becomes a problem we can add some configuration to disable the decoding when needed, but I don't think we'll ever get there.

adamziel · 2022-09-29T02:42:43Z

@dmsnell I went ahead and merged this since the URL encoding is a non-issue and my only change on top of yours was another test file. I think we're good here – let's iterate in another PR if needed.

Document and test encoding double quotes

8bd5823

adamziel requested a review from spacedmonkey as a code owner September 26, 2022 04:44

adamziel self-assigned this Sep 26, 2022

adamziel added [Type] Enhancement A suggestion for improvement. [Feature] Block API API that allows to express the block paradigm. Developer Experience Ideas about improving block and theme developer experience labels Sep 26, 2022

adamziel mentioned this pull request Sep 26, 2022

WP_HTML_Tag_Processor overview issue #44410

Closed

26 tasks

adamziel changed the title ~~Document and test encoding double quotes~~ Tag Processor: Document and test encoding double quotes Sep 26, 2022

adamziel changed the title ~~Tag Processor: Document and test encoding double quotes~~ Tag Processor: Document and test XSS prevention in set_attribute Sep 26, 2022

Reword the docstring

4c522dd

gziolo approved these changes Sep 26, 2022

View reviewed changes

dmsnell reviewed Sep 26, 2022

View reviewed changes

lib/experimental/html/class-wp-html-tag-processor.php Outdated Show resolved Hide resolved

dmsnell reviewed Sep 26, 2022

View reviewed changes

Encode < and > characters in set_attribute

55a0a83

adamziel mentioned this pull request Sep 27, 2022

Block Library: Refactor core blocks to use HTML Tag Processor #43178

Merged

adamziel added 2 commits September 28, 2022 15:22

Use esc_attr to escape the attributes values

1a84536

Escape attribute values using esc_attr

6c309a8

adamziel force-pushed the update/tag-processor-escape-attribute-values branch from 9598b78 to 6c309a8 Compare September 28, 2022 05:38

dmsnell requested changes Sep 28, 2022

View reviewed changes

dmsnell self-requested a review September 28, 2022 19:01

decouple tests from behavior of esc_attr()

eb8bda4

dmsnell reviewed Sep 28, 2022

View reviewed changes

phpunit/html/wp-html-tag-processor-test.php Outdated Show resolved Hide resolved

Decode character entities in get_attribute()

907d37d

dmsnell reviewed Sep 28, 2022

View reviewed changes

phpunit/html/wp-html-tag-processor-test.php Outdated Show resolved Hide resolved

adamziel added 3 commits September 29, 2022 11:20

Add WP_HTML_Tag_Processor_Test_WP file with tests requiring WordPress…

8cf66b0

… libraries

Lint

a1153e4

Merge branch 'trunk' into update/tag-processor-escape-attribute-values

50bf5f0

adamziel merged commit b58ff34 into trunk Sep 29, 2022

adamziel deleted the update/tag-processor-escape-attribute-values branch September 29, 2022 02:42

github-actions bot added this to the Gutenberg 14.3 milestone Sep 29, 2022

adamziel mentioned this pull request Sep 30, 2022

[Tag Processor] Merge the test files into a single file #44593

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tag Processor: Document and test XSS prevention in set_attribute #44447

Tag Processor: Document and test XSS prevention in set_attribute #44447

adamziel commented Sep 26, 2022

gziolo left a comment •

edited

Loading

dmsnell Sep 26, 2022

adamziel Sep 27, 2022 •

edited

Loading

adamziel Sep 27, 2022 •

edited

Loading

dmsnell Sep 27, 2022

adamziel Sep 28, 2022

adamziel Sep 28, 2022

dmsnell left a comment

dmsnell commented Sep 28, 2022

dmsnell commented Sep 28, 2022

dmsnell commented Sep 28, 2022

adamziel commented Sep 29, 2022 •

edited

Loading

adamziel commented Sep 29, 2022

Tag Processor: Document and test XSS prevention in set_attribute #44447

Tag Processor: Document and test XSS prevention in set_attribute #44447

Conversation

adamziel commented Sep 26, 2022

What?

Why?

Testing Instructions

gziolo left a comment • edited Loading

Choose a reason for hiding this comment

dmsnell Sep 26, 2022

Choose a reason for hiding this comment

adamziel Sep 27, 2022 • edited Loading

Choose a reason for hiding this comment

adamziel Sep 27, 2022 • edited Loading

Choose a reason for hiding this comment

dmsnell Sep 27, 2022

Choose a reason for hiding this comment

adamziel Sep 28, 2022

Choose a reason for hiding this comment

adamziel Sep 28, 2022

Choose a reason for hiding this comment

dmsnell left a comment

Choose a reason for hiding this comment

dmsnell commented Sep 28, 2022

dmsnell commented Sep 28, 2022

dmsnell commented Sep 28, 2022

adamziel commented Sep 29, 2022 • edited Loading

adamziel commented Sep 29, 2022

gziolo left a comment •

edited

Loading

adamziel Sep 27, 2022 •

edited

Loading

adamziel Sep 27, 2022 •

edited

Loading

adamziel commented Sep 29, 2022 •

edited

Loading