Skip to content

feat: Camoufox-based crawler template#2842

Merged
B4nan merged 8 commits intomasterfrom
feat/camoufox-crawler
Feb 24, 2025
Merged

feat: Camoufox-based crawler template#2842
B4nan merged 8 commits intomasterfrom
feat/camoufox-crawler

Conversation

@barjin
Copy link
Member

@barjin barjin commented Feb 12, 2025

Adds a Camoufox-based crawler template (camoufox-ts).

Compared to the basic playwright-ts template, camoufox-ts uses the camoufox-js package, which finds the correct latest Camoufox binary in GitHub Releases assets, downloads it and passes the correct launch options to it.

The main.ts script is modified to run the downloaded binary with the correct launchOptions.
Related to #2836

@barjin barjin self-assigned this Feb 12, 2025
@github-actions github-actions bot added this to the 108th sprint - Tooling team milestone Feb 12, 2025
@github-actions github-actions bot added the t-tooling Issues with this label are in the ownership of the tooling team. label Feb 12, 2025
@barjin
Copy link
Member Author

barjin commented Feb 12, 2025

Todo:

  • automatize running npm run download-camoufox (maybe put it as postinstall?)
  • pass custom fingerprint-modifying options to Camoufox
  • maybe store binaries in a system- (or user-)wide location (~/.crawlee/binaries?)

@barjin barjin added the adhoc Ad-hoc unplanned task added during the sprint. label Feb 12, 2025
@barjin
Copy link
Member Author

barjin commented Feb 14, 2025

Example code:

import { launchOptions } from 'camoufox-js';
import { PlaywrightCrawler } from 'crawlee';
import { firefox } from 'playwright';

const startUrls = ['https://crawlee.dev'];

const crawler = new PlaywrightCrawler({
    requestHandler: async ({ page, enqueueLinks }) => {
        await page.click('h2');
        await page.click('h3');

        await enqueueLinks();
    },
    maxConcurrency: 1,
    launchContext: {
        launcher: firefox,
        launchOptions: await launchOptions({
            headless: false,
            block_images: true,
            fonts: ['Times New Roman'],
            custom_fonts_only: true,
            humanize: true,
        }),
    },
});

await crawler.run(startUrls);

Execution:

Peek.2025-02-14.16-35.mp4

As set, the browser loads no images, uses only one system-installed font (aside from the ones loaded from the page directly) and uses the humanizing script to move the cursor.

@barjin barjin requested a review from B4nan February 14, 2025 16:14
@B4nan B4nan merged commit 7f08de4 into master Feb 24, 2025
9 checks passed
@B4nan B4nan deleted the feat/camoufox-crawler branch February 24, 2025 08:21
B4nan pushed a commit to apify/actor-templates that referenced this pull request Feb 27, 2025
Following the apify/apify-sdk-js#364 and
apify/crawlee#2842 , this PR adds
Camoufox-enabled templates to Apify Actor templates. The implementation
is heavily based on the existing Playwright + Chrome templates.

The only issue (I'm aware of) currently is the immense size of those
images (as they contain Chrome and we add Camoufox binaries). Installing
Camoufox directly to a `node-debian` image results in missing system
dependencies. While it might be possible to install those manually in
the Dockerfile, it might make the Dockerfile too complex for a regular
user.


![image](https://github.com/user-attachments/assets/fb0050fd-fadc-4bbc-80f3-0681dcfa2b92)
@barjin barjin mentioned this pull request Apr 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

adhoc Ad-hoc unplanned task added during the sprint. t-tooling Issues with this label are in the ownership of the tooling team.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants