-
Notifications
You must be signed in to change notification settings - Fork 36
Description
Hi,
I've noticed a bug on my end when updating the engine past version 7.2.4.
Version 7.2.4 works perfectly fine across our VLOPSs tracking declarations: https://code.europa.eu/dsa/terms-and-conditions-database/vlops-and-vloses/vlop-vlose-declarations.
When updating to version 8.0.0, it crashes/hangs, usually on the "Apple App Store" policies (sometimes to the point that even a Ctrl+C would not kill the engine, and one has to force kill it). Last log lines would be:
[...]
2025-11-12T12:04:12+00:00 info Amazon Store — Data Access for Vetted Researchers No changes after filtering, did not record version
2025-11-12T12:04:13+00:00 info Amazon Store — Marketplace Sellers Conditions Recorded version with id 13c5e016809d3c7a0c889ee02807e122918d8db0
2025-11-12T12:04:13+00:00 info Apple App Store — Apple Developer Agreement No changes after filtering, did not record version
2025-11-12T12:04:14+00:00 info Apple App Store — Claims of Infringement Recorded version with id c597b2c42b13315c91e187b0d6b93747b42da08f
2025-11-12T12:04:14+00:00 info Apple App Store — Developer Program License Agreement Recorded version with id 93bda281b8223a097d12c324f6342a51f70ebd6f
2025-11-12T12:04:15+00:00 info Apple App Store — Media Services Terms and Conditions No changes after filtering, did not record version
2025-11-12T12:04:15+00:00 info Apple App Store — Redress Rights No changes after filtering, did not record version
(not an exact match, sometimes it happens slightly before, sometimes slightly after, but always around the Apple App Store policies).
I am using Node v20.19.5, which should be fully supported.
Tests done:
- Running on single service (including "Apple App Store") works fine.
- Updating to the latest version (9.2.0) still exhibits the same behavior.
- Internet uplink should not be an issue here, and anyways it should crash with a proper timeout/error/socket hang up and not block the scraping process.
Steps to reproduce:
$ git clone https://github.com/OpenTermsArchive/engine && git checkout v8.0.0
$ nvm use lts/iron
$ node -v
v20.19.5
$ npm -v
10.8.2
$ npm ci
npm warn deprecated [email protected]: This module is not supported, and leaks memory. Do not use it. Check out lru-cache if you want a good and tested way to coalesce async requests by a key value, which is much more comprehensive and powerful.
npm warn deprecated [email protected]: This package is deprecated. Use the optional chaining (?.) operator instead.
npm warn deprecated [email protected]: This package is deprecated. Use require('node:util').isDeepStrictEqual instead.
npm warn deprecated @humanwhocodes/[email protected]: Use @eslint/config-array instead
npm warn deprecated [email protected]: Rimraf versions prior to v4 are no longer supported
npm warn deprecated @humanwhocodes/[email protected]: Use @eslint/object-schema instead
npm warn deprecated [email protected]: Glob versions prior to v9 are no longer supported
npm warn deprecated [email protected]: Glob versions prior to v9 are no longer supported
npm warn deprecated [email protected]: Glob versions prior to v9 are no longer supported
npm warn deprecated [email protected]: Please upgrade to latest, formidable@v2 or formidable@v3! Check these notes: https://bit.ly/2ZEqIau
npm warn deprecated [email protected]: The querystring API is considered Legacy. new code should use the URLSearchParams API instead.
npm warn deprecated [email protected]: Use your platform's native DOMException instead
npm warn deprecated [email protected]: no longer maintained
npm warn deprecated [email protected]: Please upgrade to superagent v10.2.2+, see release notes at https://github.com/forwardemail/superagent/releases/tag/v10.2.2 - maintenance is supported by Forward Email @ https://forwardemail.net
npm warn deprecated @accordproject/[email protected]: This version of the package is deprecated
npm warn deprecated @accordproject/[email protected]: Not maintained
npm warn deprecated [email protected]: This version is no longer supported. Please see https://eslint.org/version-support for other options.
npm warn deprecated [email protected]: Package no longer supported. Contact Support at https://www.npmjs.com/support for more info.
added 946 packages in 10s
200 packages are looking for funding
run `npm fund` for details
$ npx ota track
[... - crash as above]
The latter being run with our own set of declarations (note that you need to run the ./build.sh bash script to generate JSON declarations) and an empty data/{snapshots,versions} folder.
I don't see anything specific around the terms where it is crashing (could be PDF handling, but terms with PDF is not the blocking one and always processed ; could be fullDom/htmlOnly, but both are already in terms before ; could be a specific website or URL, but it's semi-random and it's only happening above a specific version of the engine).
Thanks,
Best