Add extract command to extract HAR response content to filesystem #37

chrishamant · 2025-06-05T23:21:43Z

I know this is just unprompted feature and this is total AI slop but this project was well written enough that it was super easy to whip out what I needed. My goal was to cheaply scrape some content on a site that doesn't offer an API nor takes kindly to bots/scraping and requires logging in (not trying to violate TOS). I figured I could just kind of manually click through the site and let devtools collect all the requests, save the har and process it later. I guess this is slightly grey area? my intentions are pure... I figured there had to be a har library and I found this but didn't quite do what I wanted (hargo dump serves a different purpose upon closer inspection). So I cloned this and asked my buddy Claude to help and was able to one-shot this addition and worked fine for what I needed... Sorry for the unprompted PR - I mean no offense if hate the feature, the means by which it was authored or the code itself. Thought I'd submit this in case you or someone else deemed useful.

Thank you for your work!

Add new 'extract' command with alias 'e' to extract response content from HAR files
Support two organization modes: by domain (default) and by content type (--sort flag)
Content type organization groups files into directories: images/, json/, html/, css/, javascript/, fonts/, etc.
Smart filename generation with proper extensions based on MIME types
Handle filename collisions with incremental naming (image_001.jpg, posts_002.json)
Special handling for API responses (posts.json, api_response.json)
Generate CSV manifest file mapping original URLs to extracted file paths
Base64 decode response content when needed
Add comprehensive VS Code debug configurations for all commands
Update README with extract command documentation

mrichman

Add a suite of unit tests and useful comments and I'll consider merging your AI slop.

- Add new 'extract' command with alias 'e' to extract response content from HAR files - Support two organization modes: by domain (default) and by content type (--sort flag) - Content type organization groups files into directories: images/, json/, html/, css/, javascript/, fonts/, etc. - Smart filename generation with proper extensions based on MIME types - Handle filename collisions with incremental naming (image_001.jpg, posts_002.json) - Special handling for API responses (posts.json, api_response.json) - Generate CSV manifest file mapping original URLs to extracted file paths - Base64 decode response content when needed - Add comprehensive VS Code debug configurations for all commands - Update README with extract command documentation - Add initial test harness/structure with simple stab at CI

chrishamant · 2025-06-06T03:55:44Z

I spend a few $$ and added some more comments (their usefulness/utility is for sure a matter of taste/questionable) and fixed the Makefile to add coverage and run newly added tests for the extract functionality... I also sent a YOLO mode addition of using github action to run tests in CI. I'm more familiar with gitlab so can't judge the quality of this approach. https://github.com/chrishamant/hargo/actions/runs/15482304554 (I'd have used your Dockerfile and ran the integration stuff you have setup if I had my druthers but somewhat limited on time atm).

Since I opened the request from my forked master branch I didn't see a way to update the PR to source from the work branch I made to do the follow on request so I instead just amended my previous commit and force pushed FYI. (thinking about it now you could have squashed when you merged instead of accepting two commits but 🤷‍♀️ )

mrichman requested changes Jun 6, 2025

View reviewed changes

chrishamant force-pushed the master branch from 46d558d to d75429d Compare June 6, 2025 03:41

chrishamant requested a review from mrichman June 6, 2025 03:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add extract command to extract HAR response content to filesystem #37

Add extract command to extract HAR response content to filesystem #37

Uh oh!

chrishamant commented Jun 5, 2025 •

edited

Loading

Uh oh!

mrichman left a comment

Uh oh!

chrishamant commented Jun 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add extract command to extract HAR response content to filesystem #37

Are you sure you want to change the base?

Add extract command to extract HAR response content to filesystem #37

Uh oh!

Conversation

chrishamant commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mrichman left a comment

Choose a reason for hiding this comment

Uh oh!

chrishamant commented Jun 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chrishamant commented Jun 5, 2025 •

edited

Loading