Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Follow redirects #538

Open
nacnudus opened this issue Oct 9, 2023 · 2 comments
Open

Follow redirects #538

nacnudus opened this issue Oct 9, 2023 · 2 comments

Comments

@nacnudus
Copy link
Contributor

nacnudus commented Oct 9, 2023

Trello

Suppose /old-page has been unpublished and redirected to /new page. You want to find pages that link to /new-page, and you would like pages that still link to /old-page to appear in the search results.

This could be done for GOV.UK redirects in a similar way to how we follow taxons up the hierarchy, with a WITH RECURSIVE SQL statement.

For links to external sites, we'd have to visit the links to find out where they redirect to.

@hwrightson
Copy link
Contributor

Very poorly structured work on how to do this can be found in my repo here: https://github.com/alphagov/data-insights-sandbox/tree/main/hyperlink_tester

At the moment this pulls the links from gov.uk-knowledge-graph content embedded_links table and for each link returns:

  1. The link
  2. The link status code
  3. If it exists, a list of historic status codes, else null
  4. If it exists, a list of historic links, else null

I will slightly refine this to extract only the final item from the historic status codes and links so that it answers the question raised in the original issue.

@nacnudus
Copy link
Contributor Author

A user stumbled on this problem.

I need to find all the mainstream pages that link to this page:
https://www.gov.uk/guidance/visa-processing-times-applications-outside-the-uk
I know there are at least 2, because I stumbled across them. When I use the links tab in govspeak to search for pages that link there, only 5 whitehall pages come up. No mainstream pages, even though I know these ones do link there:
https://www.gov.uk/tier-1-investor/extend-your-visa
https://www.gov.uk/global-talent

Those two mainstream pages link to https://www.gov.uk/guidance/visa-decision-waiting-times-applications-outside-the-uk, which redirects to https://www.gov.uk/guidance/visa-processing-times-applications-outside-the-uk, hence the user's expectation that GovSearch would include the pages in a search for ones that link to https://www.gov.uk/guidance/visa-processing-times-applications-outside-the-uk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants