Replies: 1 comment
-
|
Here's the plugin/script I'm using. I tested with the Feeder Android app, using rss, to confirm that if I keep the same #!/usr/bin/env python3
""" Use ``feed_guid`` article metadata, where present, to preserve feed item
identifiers imported from another site. Avoids feed readers showing old
items as unread when switching to the Pelican-generated feed.
Also lets the ``feed_link`` article metadata override the itme ``<link>`` in
the feed, in case some readers use ``<link>`` as an article key instead of
``<guid>`` or ``<id>``, but it's unclear whether that's useful. See comments
in ``write_guids_from_wpexport``.
License: Do anything you want, don't expect this to do what you want.
"""
from collections import defaultdict
import re
import sys
import xml.etree.ElementTree as ET
from pelican import signals
def feed_generated(context, feed):
# This mapping of feed items back to their original Articles is FRAGILE
# as it relies on internals of how feeds get generated. This would be
# better done as part of Pelican core than as a plugin.
# https://github.com/getpelican/pelican/discussions/3485
seen = defaultdict(set)
def replace(feed_key, article_key, article):
m = getattr(article, article_key, None)
if m:
assert m not in seen[article_key], '%s=%r is not unique' % (article_key, m)
seen[article_key].add(m)
fs = [f for f in feed.items
# FEED_APPEND_REF setting might have added a query string
if re.sub(r'\?.*', '', f['link']).endswith(a.url)]
assert len(fs) == 1, 'item for %s=%r not found in feed' % (article_key, m)
fs[0][feed_key] = m
for a in context['articles'][:context['FEED_MAX_ITEMS']]:
replace('unique_id', 'feed_guid', a)
replace('link', 'feed_link', a)
def write_guids_from_wpexport(path):
""" Extremely hackish and best not used.
This would ideally be implemented more robustly by ``pelican-import``
https://github.com/getpelican/pelican/discussions/3485.
"""
rss = ET.parse(path)
meta = re.compile(
r'^(:date:.*)\r?\n' # there's always a date
r'(:feed_guid:.*\r?\n)?'
r'(:feed_link:.*\r?\n)?'
, flags=re.MULTILINE)
ns = {'wp': 'http://wordpress.org/export/1.2/'}
for item in rss.findall(".//wp:post_type[.='post']/..", ns):
if item.find('./wp:status', ns).text == 'draft':
continue
guid = re.sub(
# Wordpress export xml uses ``httpS`` but the rss and atom don't
r'^https://', 'http://', item.find('./guid').text)
link = item.find('./link').text
slug = item.find('./wp:post_name', ns).text
assert re.search(r'^[a-zA-Z0-9\-]+$', slug)
rst_path = f'content/{slug}.rst' # assume no --dir-cat to pelican-import
with open(rst_path, encoding='utf-8') as f:
rst = f.read()
with open(rst_path, 'w', encoding='utf-8') as f:
f.write(
meta.sub(
fr'\1\n:feed_guid: {guid}\n'
# Use the replacement below instead of the above to preserve
# links as well as guids, in case some feed readers use the
# links as item identifiers. This REQUIRES separate
# redirects to ensure these old links don't break.
#
# Alternatively, the ARTICLE_SAVE_AS setting can mimic
# Wordpress-style links to articles without doing this.
# but a redirect would still be necessary for links without
# the trailing slash.
# fr'\1\n:feed_guid: {guid}\n:feed_link: {link}\n'
, rst ))
def register():
signals.feed_generated.connect(feed_generated)
if __name__ == '__main__':
write_guids_from_wpexport(sys.argv[1]) |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Moving from Wordpress(.com) to Pelican version 4.11.0, I don't want feed readers to display old articles as unread when I switch hosts. This requires at least that the
<guid>(for rss) and<id>(for atom) tags be kept.Any interest in adding a feature to preserve imported feed item identifiers? Or am I missing an existing way to do it?
I wrote a plugin + script to override the feed item's
unique_idfrom:feed_guid:article metadata, but I think it would be better as part of Pelican core because…feed_generatedhandler, I didn't see a good way to link feed items back to their original articles.pelican-importshould get a corresponding change to write the relevant metadata.Beta Was this translation helpful? Give feedback.
All reactions