Skip to content

Suggestion: consider article:author meta tag as a source of author name metadata #938

@danielnixon

Description

@danielnixon

The article:author meta tag is "meant" to contain a URL (see https://developers.facebook.com/blog/post/2013/06/19/platform-updates--new-open-graph-tags-for-media-publishers-and-more/).

On many sites it does seem to contain a URL, but on a number of sites I've tested it contains the author's name.

One example is https://www.atlasobscura.com/articles/the-deck-of-cards-that-made-tarot-a-global-phenomenon

On that site, we have:

<meta property="article:author" content="Laura June Topolsky">

On that site, there are no other better sources of author name, so Readability consults the DOM and arrives at an unfortunate author string of Laura June Topolsky July 10, 2015.

My suggestion:

  1. Consult that meta field when working out the byline (https://github.com/mozilla/readability/blob/main/Readability.js#L1783-L1789)
  2. ... but ignore it if it contains a URL (assume if it contains a non-empty string that isn't a valid URL, it's probably the author's name
  3. Prefer that to the message DOM check that often results in byline's containing extraneous data (often publish date)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions