Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle article:author meta tag. Fixes #938 #942

Merged
merged 2 commits into from
Jan 2, 2025
Merged

Conversation

danielnixon
Copy link
Contributor

@danielnixon danielnixon commented Jan 1, 2025

Fixes #938

@@ -0,0 +1,10 @@
{
"title": "If You Can Picture A Tarot Card, It's Because of These 3 People",
"byline": "Laura June Topolsky",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the key line. Without this change, this byline would contain something like "Laura June Topolsky July 10, 2015".

Readability.js Outdated
@@ -1726,7 +1740,7 @@ Readability.prototype = {

// property is a space-separated list of values
var propertyPattern =
/\s*(article|dc|dcterm|og|twitter)\s*:\s*(author|creator|description|published_time|title|site_name)\s*/gi;
/\s*(article|dc|dcterm|og|twitter)\s*:\s*(author|article:author|creator|description|published_time|title|site_name)\s*/gi;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure it's me but why is this not already being matched? The first capturing subgroup has article, then there's optional whitespace followed by a colon followed by optional whitespace, and then this bit has author. After this patch this will also work for e.g. dc:article:author (and article:article:author etc. etc.) but that doesn't seem to be the intent and isn't what's in the testcase... so I'm a bit lost! Sorry for being obtuse.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying locally, the new test passes without this part of the PR. It does fail the BBC test for me locally, where it now finds a byline that it did not find before ("BBC News"). Fixing that, this seems to pass tests without this change, so I'll just merge without this change? If I've missed something, do let me know.

Copy link
Contributor

@gijsk gijsk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woop!

@gijsk gijsk merged commit b6ff1b6 into mozilla:main Jan 2, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Suggestion: consider article:author meta tag as a source of author name metadata
2 participants