-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tag Processor: Document and test XSS prevention in set_attribute #44447
Tag Processor: Document and test XSS prevention in set_attribute #44447
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I know, it correctly documents the current behavior and it’s what been discussed as the way moving forward. I would appreciate confirmation from @dmsnell.
* This is sufficient because: | ||
* * The attributes touched by set_attribute are always rewritten using double quotes. | ||
* * The only way to terminate a double quoted attribute value is via a double quote (") | ||
* character. There is no need to encode other characters such as <, >, /, or '. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the existing body of code attempting to parse HTML in WordPress and its plugins, I think @azaozz has a good point that we might want to encode <
, >
, and &
as well. This isn't an HTML thing, but if we allow alt="the <img /> tag
we might open the door for defects in other code to trip up here when we had the power to prevent it.
the one caveat I see is with class
because we don't support decoding character references at this moment, but even then I think it's fine to encode them on save.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, let’s escape these as well. I agree it’s fine to encode the class names on save. One caveat is stringifying one processor and initialising another with the result - we might need to add a ”slow mode” that decodes attributes values by default to support that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, let's stick to encoding just "
, <
, and >
. Encoding &
breaks the following use-case:
$p->set_attribute( 'data-price', '10 €' );
// <div data-price="10 &euro;"></div>
I just committed that in 55a0a83;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that _wp_specialchars()
exists to handle encoding without double-encoding (unless you explicitly tell it to). it looks up any character references before replacing &
willy-nilly.
"10 € & "USD"" === _wp_specialchars( '10 € & "USD"' );
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_wp_specialchars
seems to be deprecated in favor of esc_html
, which does the same thing as esc_attr
– let's try esc_attr
then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proposed in 6c309a8
9598b78
to
6c309a8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry to drag this on, but I feel like we need to.
with the proposed use of esc_attr()
we're preventing the use of this processor for URL fields, which you might think should be escaped with esc_url()
but actually needs to be handled more carefully.
this brings me back to why we left this idea of general escaping unaddressed in this system because escaping is dependent on which attribute is being set.
we could start to build in that handling and distinguish which attribute is being set, or we could bail and leave it up to developers using this API to do their own escaping.
potentially we could introduce something in Core like esc_attr( $text, $attribute_name = null )
to handle the distinctions.
Type | Value |
---|---|
URL | https://wordpress.com/?fields=id,user&pretty=true&q="is anything > 0" |
Wanted escaping | https://wordpress.com/?fields=id,user&pretty=true&q="is%20anything%20>%200" |
esc_attr() |
https://wordpress.com/?fields=id,user&pretty=true&q="is anything > 0" |
esc_url() |
https://wordpress.com/?fields=id,user&pretty=true&q=is%20anything%20%200 |
esc_url($context = 'attribute') |
https://wordpress.com/?fields=id,user&pretty=true&q=is%20anything%20%200 |
(the $_context
parameter to esc_url()
is marked private and distinguishes between sanitizing the URL for database storage and for display)
these are not complicated URLs. for example, if I use an image from my blog on WordPress.com both esc_attr()
and esc_url()
break the link, ironically eliminating the ssl=1
query arg. these URLs are generated from WordPress itself and are exactly the kinds of URLs I expect code to inject when using this processor.
Type | Value |
---|---|
URL | https://i0.wp.com/fluffyandflakey.blog/wp-content/uploads/2019/09/Untitled_Artwork.png?fit=1304%2C497&ssl=1 |
esc_attr() |
https://i0.wp.com/fluffyandflakey.blog/wp-content/uploads/2019/09/Untitled_Artwork.png?fit=1304%2C497&ssl=1 |
esc_url() |
https://i0.wp.com/fluffyandflakey.blog/wp-content/uploads/2019/09/Untitled_Artwork.png?fit=1304%2C497&ssl=1 |
one option we have is to bail sanitization if setting a src
, href
, srcset
, or other URL fields. I'm leery of that because of how different the protections are for those fields vs. for others. I'd rather we try to do everything or nothing so that the expectations are clear.
do you think this should be paired with |
I've added this in 907d37d |
Yeah it only makes sense to do it. It shouldn't even add much of a speed penalty since it's not a part of the parser but of the getter. You only bear the cost when you specifically want to read that value. If that ever becomes a problem we can add some configuration to disable the decoding when needed, but I don't think we'll ever get there. |
@dmsnell I went ahead and merged this since the URL encoding is a non-issue and my only change on top of yours was another test file. I think we're good here – let's iterate in another PR if needed. |
Part of #44410
What?
The
WP_HTML_Tag_Processor
allows setting new HTML attributes with a given name and value. Passing malicious values such as" onclick="alert(1)
should not create new attributes.WP_HTML_Tag_Processor
prevents that by applying the correct escaping.In this patch we're adding a series of tests and comments to document this behavior and prevent any future regressions.
Why?
We don't want to allow setting attributes whose names could break the HTML syntax or open new vulnerabilities in code that parses post content.
Testing Instructions
Validate the given escaping and tests are sufficient to prevent XSS via injecting a malicious attribute value.
cc @azaozz @dmsnell @gziolo @aristath