Open
Description
Selector
initialization ignores user choice of type
if the text/body is JSON-serializable, this seems to have been introduce in 1.8.0+
This is also a documentation problem, since from 1.8.0+ it seems that Parsel tries to guess the type of the data instead of defaulting to "html".
The problem is that this makes parsing unknown text unreliable because they might be interpreted as something else than expected (as in the examples below), and things like .xpath(...)
may break.
For instance in 1.7.0:
>>> Selector("2000") # loaded as html
<Selector xpath=None data='<html><body><p>2000</p></body></html>'>
>>> Selector("foo") # loaded as html
<Selector xpath=None data='<html><body><p>foo</p></body></html>'>
In 1.8.1:
>>> Selector("foo") # loaded as html
<Selector query=None data='<html><body><p>foo</p></body></html>'>
>>> Selector("2000") # loaded as json
<parsel.selector.Selector object at 0x1247f8e20>
>>> Selector("200", type="html") # loade as json, even if html is requested
<parsel.selector.Selector object at 0x104e72a40>
The root cause is when identifying the data type, the logic does not check what was passed by the user
Lines 332 to 333 in 4966533
There are possible solutions:
- Check if the user passed
type=="json"
(Defaults to"html"
) - Check if the user passed
type in (None, "json")
(Auto-detect type)
I could open a PR with either