Diiferent results beetwen Crawl4AI 0.5.0 and 0.6.0 #1018
SECVBulRep
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, team!
Why same code gives me absolutly difffrent results with Crawl4AI 0.5.0 and 0.6.0:
`
lc = LLMConfig(provider="")
prompt = """
You are given the text content of a website.
bla bla bla
return []."""
extraction_strategy = LLMExtractionStrategy(
llm_config=lc,
extraction_type="schema",
instruction=prompt,
chunk_token_threshold=1200,
overlap_rate=0.1,
apply_chunking=True,
extra_args={"temperature": 0.1},
verbose=True
)
async def crawl_single_url(url: str, base_output_dir: Path):
parsed = urlparse(url)
domain_name = parsed.netloc.replace("www.", "")
output_dir = base_output_dir / domain_name
output_dir.mkdir(parents=True, exist_ok=True)
output_file = output_dir / "raw.json"
async def crawl_all(urls: list[str]):
base_output_dir = Path("crawl_results")
base_output_dir.mkdir(exist_ok=True)
if name == "main":
urls = [
"https://depic.me",
]
asyncio.run(crawl_all(urls))
`
ver 0.6.0 returns:
[INIT].... → Crawl4AI 0.6.0
[FETCH]... ↓ https://depic.me | ✓ | ⏱: 4.49s
[SCRAPE].. ◆ https://depic.me | ✓ | ⏱: 0.01s
[LOG] Call LLM for https://depic.me - block index: 0
[LOG] Extracted 0 blocks from URL: https://depic.me block index: 0
[EXTRACT]. ■ Completed for https://depic.me... | Time: 13.930713609996019s
[COMPLETE] ● https://depic.me | ✓ | ⏱: 18.43s
Finished: https://depic.me
⏱ Total crawl time: 19.52 seconds
Success: https://depic.me
ver 0.5.0 return much more....
Beta Was this translation helpful? Give feedback.
All reactions