Diiferent results beetwen Crawl4AI 0.5.0 and 0.6.0 #1018
SECVBulRep
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi, team!
Why same code gives me absolutly difffrent results with Crawl4AI 0.5.0 and 0.6.0:
`
lc = LLMConfig(provider="")
prompt = """
You are given the text content of a website.
bla bla bla
return []."""
extraction_strategy = LLMExtractionStrategy(
llm_config=lc,
extraction_type="schema",
instruction=prompt,
chunk_token_threshold=1200,
overlap_rate=0.1,
apply_chunking=True,
extra_args={"temperature": 0.1},
verbose=True
)
async def crawl_single_url(url: str, base_output_dir: Path):
parsed = urlparse(url)
domain_name = parsed.netloc.replace("www.", "")
output_dir = base_output_dir / domain_name
output_dir.mkdir(parents=True, exist_ok=True)
output_file = output_dir / "raw.json"
async def crawl_all(urls: list[str]):
base_output_dir = Path("crawl_results")
base_output_dir.mkdir(exist_ok=True)
if name == "main":
urls = [
"https://depic.me",
]
asyncio.run(crawl_all(urls))
`
ver 0.6.0 returns:
[INIT].... → Crawl4AI 0.6.0
[FETCH]... ↓ https://depic.me | ✓ | ⏱: 4.49s
[SCRAPE].. ◆ https://depic.me | ✓ | ⏱: 0.01s
[LOG] Call LLM for https://depic.me - block index: 0
[LOG] Extracted 0 blocks from URL: https://depic.me block index: 0
[EXTRACT]. ■ Completed for https://depic.me... | Time: 13.930713609996019s
[COMPLETE] ● https://depic.me | ✓ | ⏱: 18.43s
Finished: https://depic.me
⏱ Total crawl time: 19.52 seconds
Success: https://depic.me
ver 0.5.0 return much more....
Beta Was this translation helpful? Give feedback.
All reactions