v1.0.9 #66
Replies: 3 comments
-
I'm eagerly looking forward to the release of the AppImage for the Linux x64 (Linux Mint user) v1.0.9 GUI version! Thank you for continuing to improve and provide this awesome website crawler / exporter. |
Beta Was this translation helpful? Give feedback.
-
Newbie question - what's the best / recommended way to update 1.08 to 1.09 under Linux Mint without trashing existing configs? Thanks. |
Beta Was this translation helpful? Give feedback.
-
Hey 👋🏼 @janreges. I spent some time today with this crawler command line and GUI and I just wanted to say - really nice work. Thank You! |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
This release introduces a powerful new Website to Markdown converter, allowing you to export entire websites into clean, single or multiple Markdown files, which is ideal for AI context or documentation purposes. We've also added the ability to start crawling directly from a
sitemap.xml
file and significantly enhanced the Offline Website Exporter with more granular control and better handling of international characters. Numerous new command-line options have been added for greater flexibility in crawling, filtering, and reporting, alongside many other improvements and bug fixes.New Features
html2markdown
.--markdown-export-single-file
to combine all website content into a single, organized Markdown file, with smart removal of duplicate headers/footers.sitemap.xml
or sitemap index file directly to the--url
parameter to crawl all listed URLs.--resolve
option (likecurl
) to provide custom IP addresses for specific domains and ports.--extra-columns
option.--max-depth
parameter for limiting how deep the crawler goes (for pages, not assets).--html-report-options
to select which sections to include in the final HTML report.Improvements
--offline-export-remove-unwanted-code
option to automatically strip analytics, cookie consents, and other non-essential scripts.--offline-export-no-auto-redirect-html
flag to prevent the creation of meta-refresh redirect files.--transform-url
to internally change request URLs, useful for crawling sites that serve content from a different domain (e.g., a local instance).--max-non200-responses-per-basename
option to prevent getting stuck in loops with dynamically generated error pages.--timezone
for all dates and times displayed in reports and used in exported filenames.This discussion was created from the release v1.0.9.
Beta Was this translation helpful? Give feedback.
All reactions