Closed
Description
Not sure if this is camoufox-js
or underlying camoufox but there seems to be a bug whereby if you setup a route (e.g. to block certain domains or assets), the camoufox browser adds pragma
and cache-control
headers (both with value no-cache
) to all requests. This causes the spider to get blocked by Datadome.
If we swap to using stock google chrome binary rather than camoufox, everything works fine with route interception active (neither of these two headers are present on the requests). If we disable route interception, Camoufox also works fine (these two headers are not present).
In case it's interesting, this is our route logic:
export const addRequestFiltering = async ({
page,
filters,
}) => {
if (!filters.length) return;
// Convert non-regex filters to regular expressions, emulating Puppeteer's
// Network.setBlockedURLs() pattern behaviour
const regexRequestFilters = filters.map(
filter => (filter instanceof RegExp
? filter
: new RegExp(`.*${filter.replace(/[.*+?^${}()|[\]\\]/g, '\\$&')}.*`)
),
);
const routeRegexStr = regexRequestFilters.length > 1
? regexRequestFilters.map(regex => `(${regex.source})`).join('|')
: regexRequestFilters[0].source;
await page.route(new RegExp(routeRegexStr), route => {
route.abort('blockedbyclient');
});
};
await addRequestFiltering({
page,
[
'cookielaw',
'googlesyndication.com',
'trustpilot.com',
]
});