Closed
Description
Sometimes the <ol>
element of an ordered list is dropped from the content
output.
Here is a repro:
test('JSDOM and Readability', () => {
const html = `
<html>
<body>
<ol>
<li><p>AI hasn’t meaningfully changed anything in cybersecurity so far. Deep fake phishing is still rare, L</p></li>
</ol>
</body>
</html>`
const dom: JSDOM = new JSDOM(html)
const extracted = new Readability(dom.window.document, {
debug: true
}).parse()
expect(extracted?.content).toContain("<ol>")
})
Here is the failure:
● JSDOM and Readability
expect(received).toContain(expected) // indexOf
Expected substring: "<ol>"
Received string: "<div id=\"readability-page-1\" class=\"page\"><div>
<li><p>AI hasn’t meaningfully changed anything in cybersecurity so far. Deep fake phishing is still rare, L</p></li>
</div></div>"
82 | }).parse()
83 |
> 84 | expect(extracted?.content).toContain("<ol>")
| ^
85 | })
at Object.<anonymous> (__tests__/email/emailParser.test.ts:84:32)
This is using
"dependencies": {
"jsdom": "^24.0.0",
"@mozilla/readability": "^0.5.0",
}
Here is the debug output:
console.log
Reader: (Readability) **** grabArticle ****
at Readability.log (node_modules/@mozilla/readability/Readability.js:84:21)
console.log
Reader: (Readability) Starting grabArticle loop
at Readability.log (node_modules/@mozilla/readability/Readability.js:84:21)
console.log
Reader: (Readability) Candidate: <li > with score 1
at Readability.log (node_modules/@mozilla/readability/Readability.js:84:21)
console.log
Reader: (Readability) Candidate: <ol > with score -1
at Readability.log (node_modules/@mozilla/readability/Readability.js:84:21)
console.log
Reader: (Readability) Candidate: <body > with score 0.6666666666666666
at Readability.log (node_modules/@mozilla/readability/Readability.js:84:21)
console.log
Reader: (Readability) Looking at sibling node: <ol > with score -1
at Readability.log (node_modules/@mozilla/readability/Readability.js:84:21)
console.log
Reader: (Readability) Sibling has score -1
at Readability.log (node_modules/@mozilla/readability/Readability.js:84:21)
console.log
Reader: (Readability) Appending node: <ol >
at Readability.log (node_modules/@mozilla/readability/Readability.js:84:21)
console.log
Reader: (Readability) Altering sibling: <ol > to div.
at Readability.log (node_modules/@mozilla/readability/Readability.js:84:21)
console.log
Reader: (Readability) _setNodeTag <ol > DIV
at Readability.log (node_modules/@mozilla/readability/Readability.js:84:21)
console.log
Reader: (Readability) Article content pre-prep: <div>
<li><p>AI hasn’t meaningfully changed anything in cybersecurity so far. Deep fake phishing is still rare, L</p></li>
</div>
at Readability.log (node_modules/@mozilla/readability/Readability.js:84:21)
console.log
Reader: (Readability) Cleaning Conditionally <div >
at Readability.log (node_modules/@mozilla/readability/Readability.js:84:21)
console.log
Reader: (Readability) Article content post-prep: <div>
<li><p>AI hasn’t meaningfully changed anything in cybersecurity so far. Deep fake phishing is still rare, L</p></li>
</div>
at Readability.log (node_modules/@mozilla/readability/Readability.js:84:21)
console.log
Reader: (Readability) Article content after paging: <div id="readability-page-1" class="page"><div>
<li><p>AI hasn’t meaningfully changed anything in cybersecurity so far. Deep fake phishing is still rare, L</p></li>
</div></div>
at Readability.log (node_modules/@mozilla/readability/Readability.js:84:21)
console.log
Reader: (Readability) Starting grabArticle loop
at Readability.log (node_modules/@mozilla/readability/Readability.js:84:21)
console.log
Reader: (Readability) Candidate: <li > with score 1
at Readability.log (node_modules/@mozilla/readability/Readability.js:84:21)
console.log
Reader: (Readability) Candidate: <ol > with score -1
at Readability.log (node_modules/@mozilla/readability/Readability.js:84:21)
console.log
Reader: (Readability) Looking at sibling node: <ol > with score -1
at Readability.log (node_modules/@mozilla/readability/Readability.js:84:21)
console.log
Reader: (Readability) Sibling has score -1
at Readability.log (node_modules/@mozilla/readability/Readability.js:84:21)
console.log
Reader: (Readability) Appending node: <ol >
at Readability.log (node_modules/@mozilla/readability/Readability.js:84:21)
console.log
Reader: (Readability) Altering sibling: <ol > to div.
at Readability.log (node_modules/@mozilla/readability/Readability.js:84:21)
console.log
Reader: (Readability) _setNodeTag <ol > DIV
at Readability.log (node_modules/@mozilla/readability/Readability.js:84:21)
console.log
Reader: (Readability) Article content pre-prep: <div>
<li><p>AI hasn’t meaningfully changed anything in cybersecurity so far. Deep fake phishing is still rare, L</p></li>
</div>
at Readability.log (node_modules/@mozilla/readability/Readability.js:84:21)
console.log
Reader: (Readability) Cleaning Conditionally <div >
at Readability.log (node_modules/@mozilla/readability/Readability.js:84:21)
console.log
Reader: (Readability) Article content post-prep: <div>
<li><p>AI hasn’t meaningfully changed anything in cybersecurity so far. Deep fake phishing is still rare, L</p></li>
</div>
at Readability.log (node_modules/@mozilla/readability/Readability.js:84:21)
console.log
Reader: (Readability) Article content after paging: <div id="readability-page-1" class="page"><div>
<li><p>AI hasn’t meaningfully changed anything in cybersecurity so far. Deep fake phishing is still rare, L</p></li>
</div></div>
at Readability.log (node_modules/@mozilla/readability/Readability.js:84:21)
console.log
Reader: (Readability) Starting grabArticle loop
at Readability.log (node_modules/@mozilla/readability/Readability.js:84:21)
console.log
Reader: (Readability) Candidate: <li > with score 1
at Readability.log (node_modules/@mozilla/readability/Readability.js:84:21)
console.log
Reader: (Readability) Candidate: <ol > with score -1
at Readability.log (node_modules/@mozilla/readability/Readability.js:84:21)
console.log
Reader: (Readability) Looking at sibling node: <ol > with score -1
at Readability.log (node_modules/@mozilla/readability/Readability.js:84:21)
console.log
Reader: (Readability) Sibling has score -1
at Readability.log (node_modules/@mozilla/readability/Readability.js:84:21)
console.log
Reader: (Readability) Appending node: <ol >
at Readability.log (node_modules/@mozilla/readability/Readability.js:84:21)
console.log
Reader: (Readability) Altering sibling: <ol > to div.
at Readability.log (node_modules/@mozilla/readability/Readability.js:84:21)
console.log
Reader: (Readability) _setNodeTag <ol > DIV
at Readability.log (node_modules/@mozilla/readability/Readability.js:84:21)
console.log
Reader: (Readability) Article content pre-prep: <div>
<li><p>AI hasn’t meaningfully changed anything in cybersecurity so far. Deep fake phishing is still rare, L</p></li>
</div>
at Readability.log (node_modules/@mozilla/readability/Readability.js:84:21)
console.log
Reader: (Readability) Cleaning Conditionally <div >
at Readability.log (node_modules/@mozilla/readability/Readability.js:84:21)
console.log
Reader: (Readability) Article content post-prep: <div>
<li><p>AI hasn’t meaningfully changed anything in cybersecurity so far. Deep fake phishing is still rare, L</p></li>
</div>
at Readability.log (node_modules/@mozilla/readability/Readability.js:84:21)
console.log
Reader: (Readability) Article content after paging: <div id="readability-page-1" class="page"><div>
<li><p>AI hasn’t meaningfully changed anything in cybersecurity so far. Deep fake phishing is still rare, L</p></li>
</div></div>
at Readability.log (node_modules/@mozilla/readability/Readability.js:84:21)
console.log
Reader: (Readability) Starting grabArticle loop
at Readability.log (node_modules/@mozilla/readability/Readability.js:84:21)
console.log
Reader: (Readability) Candidate: <li > with score 1
at Readability.log (node_modules/@mozilla/readability/Readability.js:84:21)
console.log
Reader: (Readability) Candidate: <ol > with score -1
at Readability.log (node_modules/@mozilla/readability/Readability.js:84:21)
console.log
Reader: (Readability) Looking at sibling node: <ol > with score -1
at Readability.log (node_modules/@mozilla/readability/Readability.js:84:21)
console.log
Reader: (Readability) Sibling has score -1
at Readability.log (node_modules/@mozilla/readability/Readability.js:84:21)
console.log
Reader: (Readability) Appending node: <ol >
at Readability.log (node_modules/@mozilla/readability/Readability.js:84:21)
console.log
Reader: (Readability) Altering sibling: <ol > to div.
at Readability.log (node_modules/@mozilla/readability/Readability.js:84:21)
console.log
Reader: (Readability) _setNodeTag <ol > DIV
at Readability.log (node_modules/@mozilla/readability/Readability.js:84:21)
console.log
Reader: (Readability) Article content pre-prep: <div>
<li><p>AI hasn’t meaningfully changed anything in cybersecurity so far. Deep fake phishing is still rare, L</p></li>
</div>
at Readability.log (node_modules/@mozilla/readability/Readability.js:84:21)
console.log
Reader: (Readability) Article content post-prep: <div>
<li><p>AI hasn’t meaningfully changed anything in cybersecurity so far. Deep fake phishing is still rare, L</p></li>
</div>
at Readability.log (node_modules/@mozilla/readability/Readability.js:84:21)
console.log
Reader: (Readability) Article content after paging: <div id="readability-page-1" class="page"><div>
<li><p>AI hasn’t meaningfully changed anything in cybersecurity so far. Deep fake phishing is still rare, L</p></li>
</div></div>
at Readability.log (node_modules/@mozilla/readability/Readability.js:84:21)
console.log
Reader: (Readability) Grabbed: <div id="readability-page-1" class="page"><div>
<li><p>AI hasn’t meaningfully changed anything in cybersecurity so far. Deep fake phishing is still rare, L</p></li>
</div></div>
at Readability.log (node_modules/@mozilla/readability/Readability.js:84:21)
Metadata
Metadata
Assignees
Labels
No labels