Skip to content

route() interception causes 'pragma' and 'cache-control' headers to be wrongly added on requests #72

Closed
@corford

Description

@corford

Not sure if this is camoufox-js or underlying camoufox but there seems to be a bug whereby if you setup a route (e.g. to block certain domains or assets), the camoufox browser adds pragma and cache-control headers (both with value no-cache) to all requests. This causes the spider to get blocked by Datadome.

If we swap to using stock google chrome binary rather than camoufox, everything works fine with route interception active (neither of these two headers are present on the requests). If we disable route interception, Camoufox also works fine (these two headers are not present).

In case it's interesting, this is our route logic:

export const addRequestFiltering = async ({
  page,
  filters,
}) => { 
  if (!filters.length) return;

  // Convert non-regex filters to regular expressions, emulating Puppeteer's
  // Network.setBlockedURLs() pattern behaviour
  const regexRequestFilters = filters.map(
    filter => (filter instanceof RegExp
      ? filter
      : new RegExp(`.*${filter.replace(/[.*+?^${}()|[\]\\]/g, '\\$&')}.*`)
    ),
  );

  const routeRegexStr = regexRequestFilters.length > 1
    ? regexRequestFilters.map(regex => `(${regex.source})`).join('|')
    : regexRequestFilters[0].source;

  await page.route(new RegExp(routeRegexStr), route => {
    route.abort('blockedbyclient');
  });
};

await addRequestFiltering({
  page, 
  [
        'cookielaw',
        'googlesyndication.com',
        'trustpilot.com',
  ]
});

Metadata

Metadata

Assignees

Labels

t-toolingIssues with this label are in the ownership of the tooling team.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions