You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I use monolith to get the single HTML version of a web page, then go through readability (ran with node.js) to keep only the article. This works mostly well.
Just found that the older versions of readability don't have this issue. So I did a quick binary search and found out that the issue was introduced by commit 522eb4b, which just changed one line to exclude elements with visibility: hidden style.
I then found out that there are no elements with visibility: hidden style when it's rendered in Firebox, but all the article text is indeed surrounded by a container element with visibility: hidden in the monolith-generated HTML file. It must be that the visibility style was changed by some JavaScript code at runtime.
I use monolith to get the single HTML version of a web page, then go through readability (ran with node.js) to keep only the article. This works mostly well.
However, it seems broken with articles on WeChat. For example, https://mp.weixin.qq.com/s/koaLJvsFLkfi_j3HKIi6Dw.
The screenshot of rendered HTML generated by monolith is like this:
And the rendered HTML generated by "monolith -> readability" is like this:
Almost all the meaningful article text are lost.
However, If I use Firefox's Reader view on the monolith-generated HTML, everything looks great:
I'm confused. What's the gap?
My environment
This is how I use readability to generate the polished HTML: https://github.com/kfstorm/carnivore/blob/bbfd67930223787e58338a16d2d2dffd5d074998/carnivore/app/readability/index.mjs
The text was updated successfully, but these errors were encountered: