Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjacent tags result in merged text. #939

Closed
timlib opened this issue Dec 30, 2024 · 2 comments
Closed

Adjacent tags result in merged text. #939

timlib opened this issue Dec 30, 2024 · 2 comments

Comments

@timlib
Copy link

timlib commented Dec 30, 2024

If the page has source as such:

<h3>Foo</h3><p>Bar</p>

It will be rendered in the browser as two different words, as intended by the markup. Below I have pasted as non-code and you can see how it looks:

Foo

Bar

However, the "textContent" field of Readability will merge the words as: "FooBar", which is incorrect.

My long-standing solution has been to parse the "content" HTML data of Readability, which works, but adds another layer of non-trivial code maintenance and effort.

I am wondering if there is a solution to this problem? This has been a bug since the Arc days, I've always wondered if anybody else was dealing with it.

@gijsk
Copy link
Contributor

gijsk commented Dec 31, 2024

This is a duplicate of #779 which has more context on this problem.

@gijsk gijsk closed this as not planned Won't fix, can't repro, duplicate, stale Dec 31, 2024
@timlib
Copy link
Author

timlib commented Jan 16, 2025

I'm not sure why this is a WONTFIX bug, it undermines the core utility of the textContent functionality. If there is no intention for textContent to produce actual text content it should be removed as the current situation introduces a very subtle bug most people won't notice immediately. It's much worse to have it exist and be subtly broken, than to not have it at all.

I'm going to open a new issue to remove textContent from Readability.js as that is the best solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants