-
Notifications
You must be signed in to change notification settings - Fork 632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crashes on all Pinterest and many other websites, minimal reproduction #836
Comments
@koresar Thanks for the report. What environment do you run your minimal testcase in? In For the pinterest urls, I see that readability doesn't output anything (when running What version of readability are you using? |
Apologies @gijsk That's Node.js. All Readability versions affected (I tried them all). npm i @mozilla/readability linkedom test.mjs: import { Readability } from "@mozilla/readability";
import { DOMParser } from "linkedom";
const url = "https://www.pinterest.ca/variamsingh87/";
const html = `
<div><div><div>
More than 25 characters!!
</div></div></div>
<div><meta name="twitter:title" content="1"></div>`;
const doc = new DOMParser().parseFromString(html, "text/html");
const base = doc.createElement("base");
base.setAttribute("href", url);
doc.head.appendChild(base);
const reader = new Readability(doc, {
keepClasses: true,
});
const result = reader.parse() ?? {};
console.log(result.textContent ? result.content : null); Run: node test.mjs Output:
Reading your comment I am starting to think that it's the |
No worries. It's possible it's a
|
Looks like it doesn't make much sense to fix linkedom issues in readability. Closing now. |
After some research. It's caused by the input HTML. WebReflection/linkedom#147 `
<div><head><base href="https://www.pinterest.ca/variamsingh87/"></head><body></body><div><p>
More than 25 characters!!
</p></div></div>
<P><meta name="twitter:title" content="1"></P>` So that readability won't accept that since the body isn't actual body |
Stack trace:
Pages to test on:
I couldn't fix myself. Sorry.
The text was updated successfully, but these errors were encountered: