-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can it crawl urls within js? #1
Comments
It doesn't look for URLs on JavaScript files, In only does in The original idea is to save the HTML generated by JavaScript, so it can be indexed by search engines. This will break some applications like the ones you have mentioned. I have just added a
On this mode, it will not save the rendered/generated html, it will save the original html as served by the web server. Web apps also have the issue that some JavaScript modules are loaded only when you click around buttons or links. I just had the brilliant idea of fetching from origin requests not found in the zip, it will update the zip with the fetched file. So mostly, to back up a simple application will require crawling and then using it a bit to fetch the missing modules that only are requested when using the app. The spa mode seems to work good with hexed.it, but it doesn't seem to work photopea, I haven't really looked much at why photopea doesn't work. Let me know how that goes. |
ah cool, okay i will try it but currently i do not have much time so please do note explicit wait for my response. "Web apps also have the issue that some JavaScript modules are loaded only when you click around buttons or links" maybe js files can be crawled and when it points to js, css, fonts etc. but this would not work every time ether i think maybe it would be an idea to look into PWAs i mean photopea allows to install it as pwa could it be possible to grab the installed files? it is just an idea. |
I've been encountering a few problems while using files from the archive on a different web server (Caddy): 1When using the files from the archive on another web server like Caddy, some scripts, for example, /wavacity.com/js/amplitude-8.1.0-min.js, are incorrectly served with the content-type image/png. Even "hexed.it" has this problem. 2Some websites use the integrity flag and do not work properly. For example, you might see an error like this:
It would be helpful if these could be filtered out, possibly even filtering out Google Analytics. Or maybe a way to filter all external domains with exceptions like cdns. 3The mpa command should support an option like mpa 0.0.0.0 to allow any PC to connect, not just localhost. Because of that, I cannot use this tool in my case. I use mpa in an LXC container and do not want to use something to proxy it. (Maybe Docker would be an idea. I might make a Dockerfile and can send it here.) 4Wavacity does not run because it cannot detect WebAssembly. But maybe this is a problem from Webserver? I don't know. 5when trying to download wavacity this error message is presented: `✔ https://wavacity.com/css/wavacity_0.1.35.css TypeError [ERR_INVALID_ARG_TYPE]: The first argument must be of type string or an instance of Buffer, ArrayBuffer, or Array or an Array-like Object. Received undefined Node.js v22.3.0` I know your tool was made to run the Archives with MPA, but maybe you could write what needs to be done when using another web server, such as which headers are needed. I want to archive some websites/web tools within my local network so that if the original is down or my internet is down, I could still use them. And it would be cool to use the Web server I already run in my local net. But this is only what I observe and my use if it does not match with your vision please do not change anything :) this is purely what I would like. |
when crawled in |
3, 4 and 5 are solved. It now listen to 2 integrity check it may worth an investigation. |
So like there are websites that load stuff with js
like is it possible to save webapps with it? Like photopea or hexed.it
The text was updated successfully, but these errors were encountered: