-
Notifications
You must be signed in to change notification settings - Fork 44
Description
So, this is a bit of a weird one. It seems that when the feed validator is validating the HTML content within a block, it gets tripped up by certain attributes having no value. For example, this minimal reproduction:
<?xml version="1.0" encoding="utf-8”?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title>Crash the validator</title>
<link href="http://example.com/feed.xml" rel="self" />
<link href="http://example.com/" />
<id>tag:example.com,2019-07-16:blog</id>
<updated>2019-06-04T00:40:00-07:00</updated>
<entry>
<title>This feed crashes feedvalidator</title>
<link href="http://example.com/crash.html" rel="alternate" type="text/html" />
<published>2019-06-04T00:40:00-07:00</published>
<updated>2019-06-04T00:40:00-07:00</updated>
<id>urn:uuid:9a51cf78-c042-5254-b564-eec1fe3bb181</id>
<author><name>fluffy</name></author>
<content type="html"><![CDATA[
<p>This generates a weird error.</p>
<div class="images" style><img src="http://placekitten.com/200/300" alt="meow"></div>
<p>Isn't it strange?</p>
]]></content>
</entry>
</feed>causes an error:
An error occurred while trying to validate this feed.
Possible causes:
• The address may be incorrect. Make sure the address is spelled correctly. Try loading the feed directly in your browser to make sure a feed exists at that address.
• The feed may be temporarily unavailable. The server may be down, or too slow. Try again later.
• The validator may be busted. If the feed exists, the server is fine, and the problem is reproducible, let us know on the feedvalidator-users mailing list.
The element that’s causing the problem is the <div class=“images” style> - removing that causes the validator to work perfectly.
Via the Python REPL I was able to figure out where exactly the code is blowing up; here's a stack trace:
>>> feedvalidator.validateStream(open('/Users/fluffy/Desktop/feed.xml'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "feedvalidator/__init__.py", line 164, in validateStream
validator = _validate(rawdata, firstOccurrenceOnly, loggedEvents, base, encoding, mediaType=mediaType)
File "feedvalidator/__init__.py", line 115, in _validate
parser.parse(source)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/sax/expatreader.py", line 110, in parse
xmlreader.IncrementalParser.parse(self, source)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/sax/xmlreader.py", line 123, in parse
self.feed(buffer)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/sax/expatreader.py", line 213, in feed
self._parser.Parse(data, isFinal)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/sax/expatreader.py", line 365, in end_element_ns
self._cont_handler.endElementNS(pair, None)
File "feedvalidator/base.py", line 266, in endElementNS
handler.endElementNS(name, qname)
File "feedvalidator/base.py", line 516, in endElementNS
self.validate()
File "feedvalidator/content.py", line 82, in validate
self.validateSafe(self.value)
File "feedvalidator/validators.py", line 736, in validateSafe
HTMLValidator(value, self)
File "feedvalidator/validators.py", line 257, in __init__
self.feed(value)
File "feedvalidator/vendor/HTMLParser.py", line 169, in feed
self.goahead(0)
File "feedvalidator/vendor/HTMLParser.py", line 209, in goahead
k = self.parse_starttag(i)
File "feedvalidator/vendor/HTMLParser.py", line 332, in parse_starttag
self.handle_starttag(tag, attrs)
File "feedvalidator/validators.py", line 279, in handle_starttag
for evil in checkStyle(value):
File "feedvalidator/validators.py", line 304, in checkStyle
if not re.match("""^([:,;#%.\sa-zA-Z0-9!]|\w-\w|'[\s\w]+'|"[\s\w]+"|\([\d,\s]+\))*$""", style):
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 141, in match
return _compile(pattern, flags).match(string)
TypeError: expected string or buffer
So, it seems that there's some special-casing on the style attribute parser that dies if it gets a None value.
Oddly enough, I couldn't manage to reproduce this issue with a minimal content block like:
<div style></div>Anyway, finding this error finally gave me a reason to care about fixing PlaidWeb/Publ#226 sooner rather than later. :)