-
Notifications
You must be signed in to change notification settings - Fork 700
Closed
Description
The article:author meta tag is "meant" to contain a URL (see https://developers.facebook.com/blog/post/2013/06/19/platform-updates--new-open-graph-tags-for-media-publishers-and-more/).
On many sites it does seem to contain a URL, but on a number of sites I've tested it contains the author's name.
One example is https://www.atlasobscura.com/articles/the-deck-of-cards-that-made-tarot-a-global-phenomenon
On that site, we have:
<meta property="article:author" content="Laura June Topolsky">On that site, there are no other better sources of author name, so Readability consults the DOM and arrives at an unfortunate author string of Laura June Topolsky July 10, 2015.
My suggestion:
- Consult that meta field when working out the byline (https://github.com/mozilla/readability/blob/main/Readability.js#L1783-L1789)
- ... but ignore it if it contains a URL (assume if it contains a non-empty string that isn't a valid URL, it's probably the author's name
- Prefer that to the message DOM check that often results in byline's containing extraneous data (often publish date)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels