Skip to content

XHTML content is parsed out of order? #275

@talonx

Description

@talonx

hetzner.txt

After parsing when I attempt to access the content using

feedItem.contentSnippet

the content is mixed up like this

'Start: 2024-08-06T08:45:00+00:00 Estimated end: 2024-08-08T13:00:00+00:00 We\n' +
    'will be performing routine maintenance work on cloud load balancers in Helsinki.\n' +
    'During this maintenance work, there may be a short connection loss from the\n' +
    'active connections (TCP and HTTP) to the load balancers, or from the load\n' +
    'balancers to their targets. Unfortunately, the maintenance work is taking longer\n' +
    'than we planned. Thank you for your understanding. We have now started the\n' +
    'maintenance work.\n' +
    'In_progressIn_progress2024-08-07T15:33:29+00:002024-08-06T08:45:27+00:00',

Notice the last two "in_progress"es squashed together, and the dates also.

The actual content looks like this (see attached file)

    <strong>In_progress</strong>
    <small>2024-08-07T15:33:29+00:00</small>
    <p>Unfortunately, the maintenance work is taking longer than we planned. Thank you for your understanding.</p>
    <strong>In_progress</strong>
    <small>2024-08-06T08:45:27+00:00</small>
    <p>We have now started the maintenance work. </p>

My parser is defined with a custom field as

type HetznerItem = { category: { term: string } };
const hetznerParser: RSSParser = new RSSParser(new Parser<{}, HetznerItem>(
    {
        customFields: {
            item: ['category'],
        }
    }
));

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions