Skip to content

Commit f6f7f1b

Browse files
ntohidirbushriAhmed-Tawfik94SohamKukretimurphycw
authored
Release v0.8.0: Crash Recovery, Prefetch Mode & Security Fixes (#1712)
* Fix: Use correct URL variable for raw HTML extraction (#1116) - Prevents full HTML content from being passed as URL to extraction strategies - Added unit tests to verify raw HTML and regular URL processing Fix: Wrong URL variable used for extraction of raw html * Fix #1181: Preserve whitespace in code blocks during HTML scraping The remove_empty_elements_fast() method was removing whitespace-only span elements inside <pre> and <code> tags, causing import statements like "import torch" to become "importtorch". Now skips elements inside code blocks where whitespace is significant. * Refactor Pydantic model configuration to use ConfigDict for arbitrary types * Fix EmbeddingStrategy: Uncomment response handling for the variations and clean up mock data. ref #1621 * Fix: permission issues with .cache/url_seeder and other runtime cache dirs. ref #1638 * fix: ensure BrowserConfig.to_dict serializes proxy_config * feat: make LLM backoff configurable end-to-end - extend LLMConfig with backoff delay/attempt/factor fields and thread them through LLMExtractionStrategy, LLMContentFilter, table extraction, and Docker API handlers - expose the backoff parameter knobs on perform_completion_with_backoff/aperform_completion_with_backoff and document them in the md_v2 guides * reproduced AttributeError from #1642 * pass timeout parameter to docker client request * added missing deep crawling objects to init * generalized query in ContentRelevanceFilter to be a str or list * import modules from enhanceable deserialization * parameterized tests * Fix: capture current page URL to reflect JavaScript navigation and add test for delayed redirects. ref #1268 * refactor: replace PyPDF2 with pypdf across the codebase. ref #1412 * Add browser_context_id and target_id parameters to BrowserConfig Enable Crawl4AI to connect to pre-created CDP browser contexts, which is essential for cloud browser services that pre-create isolated contexts. Changes: - Add browser_context_id and target_id parameters to BrowserConfig - Update from_kwargs() and to_dict() methods - Modify BrowserManager.start() to use existing context when provided - Add _get_page_by_target_id() helper method - Update get_page() to handle pre-existing targets - Add test for browser_context_id functionality This enables cloud services to: 1. Create isolated CDP contexts before Crawl4AI connects 2. Pass context/target IDs to BrowserConfig 3. Have Crawl4AI reuse existing contexts instead of creating new ones * Add cdp_cleanup_on_close flag to prevent memory leaks in cloud/server scenarios * Fix: add cdp_cleanup_on_close to from_kwargs * Fix: find context by target_id for concurrent CDP connections * Fix: use target_id to find correct page in get_page * Fix: use CDP to find context by browserContextId for concurrent sessions * Revert context matching attempts - Playwright cannot see CDP-created contexts * Add create_isolated_context flag for concurrent CDP crawls When True, forces creation of a new browser context instead of reusing the default context. Essential for concurrent crawls on the same browser to prevent navigation conflicts. * Add context caching to create_isolated_context branch Uses contexts_by_config cache (same as non-CDP mode) to reuse contexts for multiple URLs with same config. Still creates new page per crawl for navigation isolation. Benefits batch/deep crawls. * Add init_scripts support to BrowserConfig for pre-page-load JS injection This adds the ability to inject JavaScript that runs before any page loads, useful for stealth evasions (canvas/audio fingerprinting, userAgentData). - Add init_scripts parameter to BrowserConfig (list of JS strings) - Apply init_scripts in setup_context() via context.add_init_script() - Update from_kwargs() and to_dict() for serialization * Fix CDP connection handling: support WS URLs and proper cleanup Changes to browser_manager.py: 1. _verify_cdp_ready(): Support multiple URL formats - WebSocket URLs (ws://, wss://): Skip HTTP verification, Playwright handles directly - HTTP URLs with query params: Properly parse with urlparse to preserve query string - Fixes issue where naive f"{cdp_url}/json/version" broke WS URLs and query params 2. close(): Proper cleanup when cdp_cleanup_on_close=True - Close all sessions (pages) - Close all contexts - Call browser.close() to disconnect (doesn't terminate browser, just releases connection) - Wait 1 second for CDP connection to fully release - Stop Playwright instance to prevent memory leaks This enables: - Connecting to specific browsers via WS URL - Reusing the same browser with multiple sequential connections - No user wait needed between connections (internal 1s delay handles it) Added tests/browser/test_cdp_cleanup_reuse.py with comprehensive tests. * Update gitignore * Some debugging for caching * Add _generate_screenshot_from_html for raw: and file:// URLs Implements the missing method that was being called but never defined. Now raw: and file:// URLs can generate screenshots by: 1. Loading HTML into a browser page via page.set_content() 2. Taking screenshot using existing take_screenshot() method 3. Cleaning up the page afterward This enables cached HTML to be rendered with screenshots in crawl4ai-cloud. * Add PDF and MHTML support for raw: and file:// URLs - Replace _generate_screenshot_from_html with _generate_media_from_html - New method handles screenshot, PDF, and MHTML in one browser session - Update raw: and file:// URL handlers to use new method - Enables cached HTML to generate all media types * Add crash recovery for deep crawl strategies Add optional resume_state and on_state_change parameters to all deep crawl strategies (BFS, DFS, Best-First) for cloud deployment crash recovery. Features: - resume_state: Pass saved state to resume from checkpoint - on_state_change: Async callback fired after each URL for real-time state persistence to external storage (Redis, DB, etc.) - export_state(): Get last captured state manually - Zero overhead when features are disabled (None defaults) State includes visited URLs, pending queue/stack, depths, and pages_crawled count. All state is JSON-serializable. * Fix: HTTP strategy raw: URL parsing truncates at # character The AsyncHTTPCrawlerStrategy.crawl() method used urlparse() to extract content from raw: URLs. This caused HTML with CSS color codes like #eee to be truncated because # is treated as a URL fragment delimiter. Before: raw:body{background:#eee} -> parsed.path = 'body{background:' After: raw:body{background:#eee} -> raw_content = 'body{background:#eee' Fix: Strip the raw: or raw:// prefix directly instead of using urlparse, matching how the browser strategy handles it. * Add base_url parameter to CrawlerRunConfig for raw HTML processing When processing raw: HTML (e.g., from cache), the URL parameter is meaningless for markdown link resolution. This adds a base_url parameter that can be set explicitly to provide proper URL resolution context. Changes: - Add base_url parameter to CrawlerRunConfig.__init__ - Add base_url to CrawlerRunConfig.from_kwargs - Update aprocess_html to use base_url for markdown generation Usage: config = CrawlerRunConfig(base_url='https://example.com') result = await crawler.arun(url='raw:{html}', config=config) * Add prefetch mode for two-phase deep crawling - Add `prefetch` parameter to CrawlerRunConfig - Add `quick_extract_links()` function for fast link extraction - Add short-circuit in aprocess_html() for prefetch mode - Add 42 tests (unit, integration, regression) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Updates on proxy rotation and proxy configuration * Add proxy support to HTTP crawler strategy * Add browser pipeline support for raw:/file:// URLs - Add process_in_browser parameter to CrawlerRunConfig - Route raw:/file:// URLs through _crawl_web() when browser operations needed - Use page.set_content() instead of goto() for local content - Fix cookie handling for non-HTTP URLs in browser_manager - Auto-detect browser requirements: js_code, wait_for, screenshot, etc. - Maintain fast path for raw:/file:// without browser params Fixes #310 * Add smart TTL cache for sitemap URL seeder - Add cache_ttl_hours and validate_sitemap_lastmod params to SeedingConfig - New JSON cache format with metadata (version, created_at, lastmod, url_count) - Cache validation by TTL expiry and sitemap lastmod comparison - Auto-migration from old .jsonl to new .json format - Fixes bug where incomplete cache was used indefinitely * Update URL seeder docs with smart TTL cache parameters - Add cache_ttl_hours and validate_sitemap_lastmod to parameter table - Document smart TTL cache validation with examples - Add cache-related troubleshooting entries - Update key features summary * Add MEMORY.md to gitignore * Docs: Add multi-sample schema generation section Add documentation explaining how to pass multiple HTML samples to generate_schema() for stable selectors that work across pages with varying DOM structures. Includes: - Problem explanation (fragile nth-child selectors) - Solution with code example - Key points for multi-sample queries - Comparison table of fragile vs stable selectors * Fix critical RCE and LFI vulnerabilities in Docker API deployment Security fixes for vulnerabilities reported by ProjectDiscovery: 1. Remote Code Execution via Hooks (CVE pending) - Remove __import__ from allowed_builtins in hook_manager.py - Prevents arbitrary module imports (os, subprocess, etc.) - Hooks now disabled by default via CRAWL4AI_HOOKS_ENABLED env var 2. Local File Inclusion via file:// URLs (CVE pending) - Add URL scheme validation to /execute_js, /screenshot, /pdf, /html - Block file://, javascript:, data: and other dangerous schemes - Only allow http://, https://, and raw: (where appropriate) 3. Security hardening - Add CRAWL4AI_HOOKS_ENABLED=false as default (opt-in for hooks) - Add security warning comments in config.yml - Add validate_url_scheme() helper for consistent validation Testing: - Add unit tests (test_security_fixes.py) - 16 tests - Add integration tests (run_security_tests.py) for live server Affected endpoints: - POST /crawl (hooks disabled by default) - POST /crawl/stream (hooks disabled by default) - POST /execute_js (URL validation added) - POST /screenshot (URL validation added) - POST /pdf (URL validation added) - POST /html (URL validation added) Breaking changes: - Hooks require CRAWL4AI_HOOKS_ENABLED=true to function - file:// URLs no longer work on API endpoints (use library directly) * Enhance authentication flow by implementing JWT token retrieval and adding authorization headers to API requests * Add release notes for v0.7.9, detailing breaking changes, security fixes, new features, bug fixes, and documentation updates * Add release notes for v0.8.0, detailing breaking changes, security fixes, new features, bug fixes, and documentation updates Documentation for v0.8.0 release: - SECURITY.md: Security policy and vulnerability reporting guidelines - RELEASE_NOTES_v0.8.0.md: Comprehensive release notes - migration/v0.8.0-upgrade-guide.md: Step-by-step migration guide - security/GHSA-DRAFT-RCE-LFI.md: GitHub security advisory drafts - CHANGELOG.md: Updated with v0.8.0 changes Breaking changes documented: - Docker API hooks disabled by default (CRAWL4AI_HOOKS_ENABLED) - file:// URLs blocked on Docker API endpoints Security fixes credited to Neo by ProjectDiscovery * Add examples for deep crawl crash recovery and prefetch mode in documentation * Release v0.8.0: The v0.8.0 Update - Updated version to 0.8.0 - Added comprehensive demo and release notes - Updated all documentation * Update security researcher acknowledgment with a hyperlink for Neo by ProjectDiscovery * Add async agenerate_schema method for schema generation - Extract prompt building to shared _build_schema_prompt() method - Add agenerate_schema() async version using aperform_completion_with_backoff - Refactor generate_schema() to use shared prompt builder - Fixes Gemini/Vertex AI compatibility in async contexts (FastAPI) * Fix: Enable litellm.drop_params for O-series/GPT-5 model compatibility O-series (o1, o3) and GPT-5 models only support temperature=1. Setting litellm.drop_params=True auto-drops unsupported parameters instead of throwing UnsupportedParamsError. Fixes temperature=0.01 error for these models in LLM extraction. --------- Co-authored-by: rbushria <rbushri@gmail.com> Co-authored-by: AHMET YILMAZ <tawfik@kidocode.com> Co-authored-by: Soham Kukreti <kukretisoham@gmail.com> Co-authored-by: Chris Murphy <chris.murphy@klaviyo.com> Co-authored-by: unclecode <unclecode@kidocode.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
1 parent c85f56b commit f6f7f1b

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

58 files changed

+11966
-2435
lines changed

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -267,6 +267,7 @@ continue_config.json
267267
.private/
268268

269269
.claude/
270+
.context/
270271

271272
CLAUDE_MONITOR.md
272273
CLAUDE.md
@@ -295,3 +296,4 @@ scripts/
295296
*.db
296297
*.rdb
297298
*.ldb
299+
MEMORY.md

CHANGELOG.md

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,46 @@ All notable changes to Crawl4AI will be documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8+
## [0.8.0] - 2026-01-12
9+
10+
### Security
11+
- **🔒 CRITICAL: Remote Code Execution Fix**: Removed `__import__` from hook allowed builtins
12+
- Prevents arbitrary module imports in user-provided hook code
13+
- Hooks now disabled by default via `CRAWL4AI_HOOKS_ENABLED` environment variable
14+
- Credit: Neo by ProjectDiscovery
15+
- **🔒 HIGH: Local File Inclusion Fix**: Added URL scheme validation to Docker API endpoints
16+
- Blocks `file://`, `javascript:`, `data:` URLs on `/execute_js`, `/screenshot`, `/pdf`, `/html`
17+
- Only allows `http://`, `https://`, and `raw:` URLs
18+
- Credit: Neo by ProjectDiscovery
19+
20+
### Breaking Changes
21+
- **Docker API: Hooks disabled by default**: Set `CRAWL4AI_HOOKS_ENABLED=true` to enable
22+
- **Docker API: file:// URLs blocked**: Use Python library directly for local file processing
23+
24+
### Added
25+
- **🚀 init_scripts for BrowserConfig**: Pre-page-load JavaScript injection for stealth evasions
26+
- **🔄 CDP Connection Improvements**: WebSocket URL support, proper cleanup, browser reuse
27+
- **💾 Crash Recovery for Deep Crawl**: `resume_state` and `on_state_change` for BFS/DFS/Best-First strategies
28+
- **📄 PDF/MHTML for raw:/file:// URLs**: Generate PDFs and MHTML from cached HTML content
29+
- **📸 Screenshots for raw:/file:// URLs**: Render cached HTML and capture screenshots
30+
- **🔗 base_url Parameter**: Proper URL resolution for raw: HTML processing
31+
- **⚡ Prefetch Mode**: Two-phase deep crawling with fast link extraction
32+
- **🔀 Enhanced Proxy Support**: Improved proxy rotation and sticky sessions
33+
- **🌐 HTTP Strategy Proxy Support**: Non-browser crawler now supports proxies
34+
- **🖥️ Browser Pipeline for raw:/file://**: New `process_in_browser` parameter
35+
- **📋 Smart TTL Cache for Sitemap Seeder**: `cache_ttl_hours` and `validate_sitemap_lastmod` parameters
36+
- **📚 Security Documentation**: Added SECURITY.md with vulnerability reporting guidelines
37+
38+
### Fixed
39+
- **raw: URL Parsing**: Fixed truncation at `#` character (CSS color codes like `#eee`)
40+
- **Caching System**: Various improvements to cache validation and persistence
41+
42+
### Documentation
43+
- Multi-sample schema generation section
44+
- URL seeder smart TTL cache parameters
45+
- v0.8.0 migration guide
46+
- Security policy and disclosure process
47+
848
## [Unreleased]
949

1050
### Added

Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
FROM python:3.12-slim-bookworm AS build
22

33
# C4ai version
4-
ARG C4AI_VER=0.7.8
4+
ARG C4AI_VER=0.8.0
55
ENV C4AI_VERSION=$C4AI_VER
66
LABEL c4ai.version=$C4AI_VER
77

README.md

Lines changed: 43 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -37,13 +37,13 @@ Limited slots._
3737

3838
Crawl4AI turns the web into clean, LLM ready Markdown for RAG, agents, and data pipelines. Fast, controllable, battle tested by a 50k+ star community.
3939

40-
[✨ Check out latest update v0.7.8](#-recent-updates)
40+
[✨ Check out latest update v0.8.0](#-recent-updates)
4141

42-
**New in v0.7.8**: Stability & Bug Fix Release! 11 bug fixes addressing Docker API issues (ContentRelevanceFilter, ProxyConfig, cache permissions), LLM extraction improvements (configurable backoff, HTML input format), URL handling fixes, and dependency updates (pypdf, Pydantic v2). [Release notes →](https://github.com/unclecode/crawl4ai/blob/main/docs/blog/release-v0.7.8.md)
42+
**New in v0.8.0**: Crash Recovery & Prefetch Mode! Deep crawl crash recovery with `resume_state` and `on_state_change` callbacks for long-running crawls. New `prefetch=True` mode for 5-10x faster URL discovery. Critical security fixes for Docker API (hooks disabled by default, file:// URLs blocked). [Release notes →](https://github.com/unclecode/crawl4ai/blob/main/docs/blog/release-v0.8.0.md)
4343

44-
✨ Recent v0.7.7: Complete Self-Hosting Platform with Real-time Monitoring! Enterprise-grade monitoring dashboard, comprehensive REST API, WebSocket streaming, smart browser pool management, and production-ready observability. [Release notes →](https://github.com/unclecode/crawl4ai/blob/main/docs/blog/release-v0.7.7.md)
44+
✨ Recent v0.7.8: Stability & Bug Fix Release! 11 bug fixes addressing Docker API issues, LLM extraction improvements, URL handling fixes, and dependency updates. [Release notes →](https://github.com/unclecode/crawl4ai/blob/main/docs/blog/release-v0.7.8.md)
4545

46-
✨ Previous v0.7.6: Complete Webhook Infrastructure for Docker Job Queue API! Real-time notifications for both `/crawl/job` and `/llm/job` endpoints with exponential backoff retry, custom headers, and flexible delivery modes. [Release notes →](https://github.com/unclecode/crawl4ai/blob/main/docs/blog/release-v0.7.6.md)
46+
✨ Previous v0.7.7: Complete Self-Hosting Platform with Real-time Monitoring! Enterprise-grade monitoring dashboard, comprehensive REST API, WebSocket streaming, and smart browser pool management. [Release notes →](https://github.com/unclecode/crawl4ai/blob/main/docs/blog/release-v0.7.7.md)
4747

4848
<details>
4949
<summary>🤓 <strong>My Personal Story</strong></summary>
@@ -562,6 +562,45 @@ async def test_news_crawl():
562562
563563
## ✨ Recent Updates
564564

565+
<details open>
566+
<summary><strong>Version 0.8.0 Release Highlights - Crash Recovery & Prefetch Mode</strong></summary>
567+
568+
This release introduces crash recovery for deep crawls, a new prefetch mode for fast URL discovery, and critical security fixes for Docker deployments.
569+
570+
- **🔄 Deep Crawl Crash Recovery**:
571+
- `on_state_change` callback fires after each URL for real-time state persistence
572+
- `resume_state` parameter to continue from a saved checkpoint
573+
- JSON-serializable state for Redis/database storage
574+
- Works with BFS, DFS, and Best-First strategies
575+
```python
576+
from crawl4ai.deep_crawling import BFSDeepCrawlStrategy
577+
578+
strategy = BFSDeepCrawlStrategy(
579+
max_depth=3,
580+
resume_state=saved_state, # Continue from checkpoint
581+
on_state_change=save_to_redis, # Called after each URL
582+
)
583+
```
584+
585+
- **⚡ Prefetch Mode for Fast URL Discovery**:
586+
- `prefetch=True` skips markdown, extraction, and media processing
587+
- 5-10x faster than full processing
588+
- Perfect for two-phase crawling: discover first, process selectively
589+
```python
590+
config = CrawlerRunConfig(prefetch=True)
591+
result = await crawler.arun("https://example.com", config=config)
592+
# Returns HTML and links only - no markdown generation
593+
```
594+
595+
- **🔒 Security Fixes (Docker API)**:
596+
- Hooks disabled by default (`CRAWL4AI_HOOKS_ENABLED=false`)
597+
- `file://` URLs blocked on API endpoints to prevent LFI
598+
- `__import__` removed from hook execution sandbox
599+
600+
[Full v0.8.0 Release Notes →](https://github.com/unclecode/crawl4ai/blob/main/docs/blog/release-v0.8.0.md)
601+
602+
</details>
603+
565604
<details>
566605
<summary><strong>Version 0.7.8 Release Highlights - Stability & Bug Fix Release</strong></summary>
567606

SECURITY.md

Lines changed: 122 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,122 @@
1+
# Security Policy
2+
3+
## Supported Versions
4+
5+
| Version | Supported |
6+
| ------- | ------------------ |
7+
| 0.8.x | :white_check_mark: |
8+
| 0.7.x | :x: (upgrade recommended) |
9+
| < 0.7 | :x: |
10+
11+
## Reporting a Vulnerability
12+
13+
We take security vulnerabilities seriously. If you discover a security issue, please report it responsibly.
14+
15+
### How to Report
16+
17+
**DO NOT** open a public GitHub issue for security vulnerabilities.
18+
19+
Instead, please report via one of these methods:
20+
21+
1. **GitHub Security Advisories (Preferred)**
22+
- Go to [Security Advisories](https://github.com/unclecode/crawl4ai/security/advisories)
23+
- Click "New draft security advisory"
24+
- Fill in the details
25+
26+
2. **Email**
27+
- Send details to: security@crawl4ai.com
28+
- Use subject: `[SECURITY] Brief description`
29+
- Include:
30+
- Description of the vulnerability
31+
- Steps to reproduce
32+
- Potential impact
33+
- Any suggested fixes
34+
35+
### What to Expect
36+
37+
- **Acknowledgment**: Within 48 hours
38+
- **Initial Assessment**: Within 7 days
39+
- **Resolution Timeline**: Depends on severity
40+
- Critical: 24-72 hours
41+
- High: 7 days
42+
- Medium: 30 days
43+
- Low: 90 days
44+
45+
### Disclosure Policy
46+
47+
- We follow responsible disclosure practices
48+
- We will coordinate with you on disclosure timing
49+
- Credit will be given to reporters (unless anonymity is requested)
50+
- We may request CVE assignment for significant vulnerabilities
51+
52+
## Security Best Practices for Users
53+
54+
### Docker API Deployment
55+
56+
If you're running the Crawl4AI Docker API in production:
57+
58+
1. **Enable Authentication**
59+
```yaml
60+
# config.yml
61+
security:
62+
enabled: true
63+
jwt_enabled: true
64+
```
65+
```bash
66+
# Set a strong secret key
67+
export SECRET_KEY="your-secure-random-key-here"
68+
```
69+
70+
2. **Hooks are Disabled by Default** (v0.8.0+)
71+
- Only enable if you trust all API users
72+
- Set `CRAWL4AI_HOOKS_ENABLED=true` only when necessary
73+
74+
3. **Network Security**
75+
- Run behind a reverse proxy (nginx, traefik)
76+
- Use HTTPS in production
77+
- Restrict access to trusted IPs if possible
78+
79+
4. **Container Security**
80+
- Run as non-root user (default in our container)
81+
- Use read-only filesystem where possible
82+
- Limit container resources
83+
84+
### Library Usage
85+
86+
When using Crawl4AI as a Python library:
87+
88+
1. **Validate URLs** before crawling untrusted input
89+
2. **Sanitize extracted content** before using in other systems
90+
3. **Be cautious with hooks** - they execute arbitrary code
91+
92+
## Known Security Issues
93+
94+
### Fixed in v0.8.0
95+
96+
| ID | Severity | Description | Fix |
97+
|----|----------|-------------|-----|
98+
| CVE-pending-1 | CRITICAL | RCE via hooks `__import__` | Removed from allowed builtins |
99+
| CVE-pending-2 | HIGH | LFI via `file://` URLs | URL scheme validation added |
100+
101+
See [Security Advisory](https://github.com/unclecode/crawl4ai/security/advisories) for details.
102+
103+
## Security Features
104+
105+
### v0.8.0+
106+
107+
- **URL Scheme Validation**: Blocks `file://`, `javascript:`, `data:` URLs on API
108+
- **Hooks Disabled by Default**: Opt-in via `CRAWL4AI_HOOKS_ENABLED=true`
109+
- **Restricted Hook Builtins**: No `__import__`, `eval`, `exec`, `open`
110+
- **JWT Authentication**: Optional but recommended for production
111+
- **Rate Limiting**: Configurable request limits
112+
- **Security Headers**: X-Frame-Options, CSP, HSTS when enabled
113+
114+
## Acknowledgments
115+
116+
We thank the following security researchers for responsibly disclosing vulnerabilities:
117+
118+
- **[Neo by ProjectDiscovery](https://projectdiscovery.io/blog/introducing-neo)** - RCE and LFI vulnerabilities (December 2025)
119+
120+
---
121+
122+
*Last updated: January 2026*

crawl4ai/__version__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# crawl4ai/__version__.py
22

33
# This is the version that will be used for stable releases
4-
__version__ = "0.7.8"
4+
__version__ = "0.8.0"
55

66
# For nightly builds, this gets set during build process
77
__nightly_version__ = None

0 commit comments

Comments
 (0)