You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat(core): release version 0.5.0 with deep crawling and CLI
This major release adds deep crawling capabilities, memory-adaptive dispatcher,
multiple crawling strategies, Docker deployment, and a new CLI. It also includes
significant improvements to proxy handling, PDF processing, and LLM integration.
BREAKING CHANGES:
- Add memory-adaptive dispatcher as default for arun_many()
- Move max_depth to CrawlerRunConfig
- Replace ScrapingMode enum with strategy pattern
- Update BrowserContext API
- Make model fields optional with defaults
- Remove content_filter parameter from CrawlerRunConfig
- Remove synchronous WebCrawler and old CLI
- Update Docker deployment configuration
- Replace FastFilterChain with FilterChain
- Change license to Apache 2.0 with attribution clause
-*(deep-crawling)* Add DFS strategy and update exports; refactor CLI entry point
36
+
-*(cli)* Add command line interface with comprehensive features
37
+
-*(config)* Enhance serialization and add deep crawling exports
38
+
-*(crawler)* Add HTTP crawler strategy for lightweight web scraping
39
+
-*(docker)*[**breaking**] Implement supervisor and secure API endpoints
40
+
-*(docker)*[**breaking**] Add JWT authentication and improve server architecture
9
41
10
42
### Changed
11
-
Okay, here's a detailed changelog in Markdown format, generated from the provided git diff and commit history. I've focused on user-facing changes, fixes, and features, and grouped them as requested:
43
+
44
+
-*(browser)* Update browser channel default to 'chromium' in BrowserConfig.from_args method
45
+
-*(crawler)* Optimize response handling and default settings
46
+
-*(crawler)* - Update hello_world example with proper content filtering
47
+
-- Update hello_world.py example
48
+
-*(docs)*[**breaking**] Reorganize documentation structure and update styles
49
+
-*(dispatcher)*[**breaking**] Migrate to modular dispatcher system with enhanced monitoring
50
+
-*(scraping)*[**breaking**] Replace ScrapingMode enum with strategy pattern
51
+
-*(browser)* Improve browser path management
52
+
-*(models)* Rename final_url to redirected_url for consistency
53
+
-*(core)*[**breaking**] Improve type hints and remove unused file
54
+
-*(docs)* Improve code formatting in features demo
55
+
-*(user-agent)* Improve user agent generation system
56
+
-*(core)*[**breaking**] Reorganize project structure and remove legacy code
57
+
-*(docker)* Clean up import statements in server.py
58
+
-*(docker)* Remove unused models and utilities for cleaner codebase
59
+
-*(docker)*[**breaking**] Improve server architecture and configuration
60
+
-*(deep-crawl)*[**breaking**] Reorganize deep crawling functionality into dedicated module
61
+
-*(deep-crawling)*[**breaking**] Reorganize deep crawling strategies and add new implementations
62
+
-*(crawling)*[**breaking**] Improve type hints and code cleanup
63
+
-*(crawler)*[**breaking**] Improve HTML handling and cleanup codebase
-*(config)*[**breaking**] Enhance serialization and config handling
67
+
68
+
### Docs
69
+
70
+
- Add Code of Conduct for the project (#410)
71
+
72
+
### Documentation
73
+
74
+
-*(extraction)* Add clarifying comments for CSS selector behavior
75
+
-*(readme)* Update personal story and project vision
76
+
-*(urls)*[**breaking**] Update documentation URLs to new domain
77
+
-*(api)* Add streaming mode documentation and examples
78
+
-*(readme)* Update version and feature announcements for v0.4.3b1
79
+
-*(examples)* Update demo scripts and fix output formats
80
+
-*(examples)* Update v0.4.3 features demo to v0.4.3b2
81
+
-*(readme)* Update version references and fix links
82
+
-*(multi-url)*[**breaking**] Improve documentation clarity and update examples
83
+
-*(examples)* Update proxy rotation demo and disable other demos
84
+
-*(api)* Improve formatting and readability of API documentation
85
+
-*(examples)* Add SERP API project example
86
+
-*(urls)* Update documentation URLs to new domain
87
+
-*(readme)* Resolve merge conflict and update version info
88
+
89
+
### Fixed
90
+
91
+
-*(browser)* Update default browser channel to chromium and simplify channel selection logic
92
+
-*(browser)*[**breaking**] Default to Chromium channel for new headless mode (#387)
93
+
-*(browser)* Resolve merge conflicts in browser channel configuration
94
+
- Prevent memory leaks by ensuring proper closure of Playwright pages
95
+
- Not working long page screenshot (#403)
96
+
-*(extraction)* JsonCss selector and crawler improvements
97
+
-*(models)*[**breaking**] Make model fields optional with default values
98
+
-*(dispatcher)* Adjust memory threshold and fix dispatcher initialization
99
+
-*(install)* Ensure proper exit after running doctor command
100
+
101
+
### Miscellaneous Tasks
102
+
103
+
-*(cleanup)* Remove unused files and improve type hints
104
+
- Add .gitattributes file
105
+
106
+
## License Update
107
+
108
+
Crawl4AI v0.5.0 updates the license to Apache 2.0 *with a required attribution clause*. This means you are free to use, modify, and distribute Crawl4AI (even commercially), but you *must* clearly attribute the project in any public use or distribution. See the updated `LICENSE` file for the full legal text and specific requirements.
109
+
110
+
---
12
111
13
112
## Version 0.4.3b2 (2025-01-21)
14
113
@@ -286,12 +385,6 @@ This release introduces several powerful new features, including robots.txt comp
286
385
- Fixed potential viewport mismatches by ensuring consistent use of `self.viewport_width`and`self.viewport_height` throughout the code.
287
386
- Improved robustness of dynamic content loading to avoid timeouts and failed evaluations.
Copy file name to clipboardExpand all lines: LICENSE
+19-1Lines changed: 19 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -48,4 +48,22 @@ You may add Your own copyright statement to Your modifications and may provide a
48
48
49
49
9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability.
50
50
51
-
END OF TERMS AND CONDITIONS
51
+
END OF TERMS AND CONDITIONS
52
+
53
+
---
54
+
Attribution Requirement
55
+
56
+
All distributions, publications, or public uses of this software, or derivative works based on this software, must include the following attribution:
57
+
58
+
"This product includes software developed by UncleCode (https://x.com/unclecode) as part of the Crawl4AI project (https://github.com/unclecode/crawl4ai)."
59
+
60
+
This attribution must be displayed in a prominent and easily accessible location, such as:
61
+
62
+
- For software distributions: In a NOTICE file, README file, or equivalent documentation.
63
+
- For publications (research papers, articles, blog posts): In the acknowledgments section or a footnote.
64
+
- For websites/web applications: In an "About" or "Credits" section.
65
+
- For command-line tools: In the help/usage output.
66
+
67
+
This requirement ensures proper credit is given for the use of Crawl4AI and helps promote the project.
Copy file name to clipboardExpand all lines: README.md
+76-2Lines changed: 76 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -574,9 +574,83 @@ To check our development plans and upcoming features, visit our [Roadmap](https:
574
574
575
575
We welcome contributions from the open-source community. Check out our [contribution guidelines](https://github.com/unclecode/crawl4ai/blob/main/CONTRIBUTORS.md) for more information.
576
576
577
-
## 📄 License
577
+
I'll help modify the license section with badges. For the halftone effect, here's a version with it:
578
578
579
-
Crawl4AI is released under the [Apache 2.0 License](https://github.com/unclecode/crawl4ai/blob/main/LICENSE).
579
+
Here's the updated license section:
580
+
581
+
## 📄 License & Attribution
582
+
583
+
This project is licensed under the Apache License 2.0with a required attribution clause. See the [Apache 2.0 License](https://github.com/unclecode/crawl4ai/blob/main/LICENSE) filefor details.
584
+
585
+
### Attribution Requirements
586
+
When using Crawl4AI, you must include one of the following attribution methods:
587
+
588
+
#### 1. Badge Attribution (Recommended)
589
+
Add one of these badges to your README, documentation, or website:
590
+
591
+
| Theme | Badge |
592
+
|-------|-------|
593
+
|**Disco Theme (Animated)**|<a href="https://github.com/unclecode/crawl4ai"><img src="./docs/assets/powered-by-disco.svg" alt="Powered by Crawl4AI" width="200"/></a>|
594
+
|**Night Theme (Dark with Neon)**|<a href="https://github.com/unclecode/crawl4ai"><img src="./docs/assets/powered-by-night.svg" alt="Powered by Crawl4AI" width="200"/></a>|
595
+
|**Dark Theme (Classic)**|<a href="https://github.com/unclecode/crawl4ai"><img src="./docs/assets/powered-by-dark.svg" alt="Powered by Crawl4AI" width="200"/></a>|
596
+
|**Light Theme (Classic)**|<a href="https://github.com/unclecode/crawl4ai"><img src="./docs/assets/powered-by-light.svg" alt="Powered by Crawl4AI" width="200"/></a>|
597
+
598
+
599
+
HTML code for adding the badges:
600
+
```html
601
+
<!-- Disco Theme (Animated) -->
602
+
<a href="https://github.com/unclecode/crawl4ai">
603
+
<img src="https://raw.githubusercontent.com/unclecode/crawl4ai/main/docs/assets/powered-by-disco.svg" alt="Powered by Crawl4AI" width="200"/>
604
+
</a>
605
+
606
+
<!-- Night Theme (Dark with Neon) -->
607
+
<a href="https://github.com/unclecode/crawl4ai">
608
+
<img src="https://raw.githubusercontent.com/unclecode/crawl4ai/main/docs/assets/powered-by-night.svg" alt="Powered by Crawl4AI" width="200"/>
609
+
</a>
610
+
611
+
<!-- Dark Theme (Classic) -->
612
+
<a href="https://github.com/unclecode/crawl4ai">
613
+
<img src="https://raw.githubusercontent.com/unclecode/crawl4ai/main/docs/assets/powered-by-dark.svg" alt="Powered by Crawl4AI" width="200"/>
614
+
</a>
615
+
616
+
<!-- Light Theme (Classic) -->
617
+
<a href="https://github.com/unclecode/crawl4ai">
618
+
<img src="https://raw.githubusercontent.com/unclecode/crawl4ai/main/docs/assets/powered-by-light.svg" alt="Powered by Crawl4AI" width="200"/>
619
+
</a>
620
+
621
+
<!-- Simple Shield Badge -->
622
+
<a href="https://github.com/unclecode/crawl4ai">
623
+
<img src="https://img.shields.io/badge/Powered%20by-Crawl4AI-blue?style=flat-square" alt="Powered by Crawl4AI"/>
624
+
</a>
625
+
```
626
+
627
+
#### 2. Text Attribution
628
+
Add this line to your documentation:
629
+
```
630
+
This project uses Crawl4AI (https://github.com/unclecode/crawl4ai) for web data extraction.
631
+
```
632
+
633
+
## 📚 Citation
634
+
635
+
If you use Crawl4AI in your research or project, please cite:
636
+
637
+
```bibtex
638
+
@software{crawl4ai2024,
639
+
author = {UncleCode},
640
+
title = {Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper},
0 commit comments