Skip to content

[Change]: "Fail crawl if not logged in" to "pause crawl if not logged in" if a page behavior detects the browser is not logged in on supported pages (eg. Facebook). #3199

@Klindten

Description

@Klindten

Browsertrix Host

Self-Hosted

What change would you like to see?

I would like the system to pause crawls instead of failing crawls if a page behavior detects the browser is not logged in on supported pages (eg. Facebook).

This way you don´t lose all crawled content (up till the moment you are logged out), you can re-input login-info in Browser-profile and continue crawling afterwards.

Pros: you can input many seeds of a given social media etc. with needing to create many jobs with the "Fail crawl if not logged in" - which will also creativemore administrative work. There migt be a small gab, some missed seeds, but this is acceptable.

If this is implemented it´s a lot easier to start a crawl without needing to monitor it closely to see if eg. a Facebook-loggedin-profile is logged out. The crawl will just continue but not getting the real content. Automically pausing it will help on getting better/less/the relevant data.

It might be great to have an option to get an email if logged out...Like "Pause crawls instead of stopping when quotas are reached or archiving is disabled #2997"
#2997

Additional details

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementRequests a change to a featureideaIdea for a feature in consideration

    Projects

    Status

    Triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions