-
-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some HTML is parsed incorrectly #53
Comments
I should also note: I have yet dug in far enough to know whether it might not be an upstream issue (with |
I would be happy to review a fix. The only thing to be careful of is that it complies with the HTML5 spec. Here are some relevant pieces:
Are you sure that it is the newline inside the My first guess as to what is happening here is that the unclosed If that's correct, then this result is according to the spec (i.e. Chrome and other browsers would produce the same interpretation). Have you tried your inputs in a browser, inspecting the resulting DOMs? You can also try |
Good cal! It does appear to be the newline at the end that triggers it:
Thanks very much for the pointers and detailed response. I am not an HTML expert (obviously :)). If I understand correctly, this may actually be the "correct" behavior, at least as per the spec? If so, I'm happy with just closing the issue. It may at least help instruct future users who hit this behavior! |
Yes, this is the correct behavior according to the spec :) The spec asks a compliant parser to sometimes produce outputs that may be counterintuitive. That's the result of trade-offs that were made in order to intuitively parse certain other malformed inputs that the authors of HTML5 found to be more common or otherwise important. This issue is, unfortunately, one of the cases that got "penalized" by these trade-offs. |
Thanks very much for helping me walk through this :) |
I hit this while working on transforming
Omd
toTyxml
(noted here ocaml-community/omd#211 (comment)).It looks like Lambdasoup is giving invalid results for some HTML. E.g.
If the newline in the
href
attribute is removed, the duplication doesn't occur.I'm happy to put in some work towards a fix, if that would be useful.
The text was updated successfully, but these errors were encountered: