Releases: tijme/not-your-average-web-crawler
Releases · tijme/not-your-average-web-crawler
1.5.0
New feature(s):
- Scope option
tld_must_match
Improvement(s):
- Renamed
domain_must_match
scope option tohostname_must_match
Fixed bug(s):
- Invalid subdomain check in
complies_with_scope
method
1.4.7
Improvement(s):
- PyPi reStructuredText parsing
- Removed reStructuredText lint dependency
1.4.6
1.4.5
Improvement(s):
- HTML scraping performance
Fixed bug(s):
- HTML scraper did not use
<base>
element
1.4.4
Fixed bug(s):
- Missing argument in null route callback
1.4.3
New feature(s):
- JSON scraper
Fixed bug(s):
- Invalid reference to queue on HTTP error
1.4.2
Fixed bug(s):
- Folder traversal not working correctly
- Multiple forms on the same page couldn't be scraped
1.4.1
Fixed bug(s):
- Undefined queue items on crawler sigint
1.4.0
Improvement(s):
- Refactored queue to reduce time complexity
Fixed bug(s):
- Suddenly disappearing cookies
- Slow backtrack in regular expressions
- Infinite crawling recursion due to random input data
1.3.0
New feature(s):
- Autofill forms with random data
- Added form autofill callbacks
- Added custom headers option