-
Notifications
You must be signed in to change notification settings - Fork 407
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Current Behavior
Dataset JSONs are not minified.
$ curl -s --compressed https://data.nextstrain.org/ncov_open_global_2m.json | head -n10 | cut -c 1-120
{
"version": "v2",
"meta": {
"title": "Genomic epidemiology of SARS-CoV-2 with subsampling focused globally over the past 2 months",
"updated": "2024-02-15",
"build_url": "https://github.com/nextstrain/ncov",
"data_provenance": [
{
"name": "GenBank",
"url": "https://www.ncbi.nlm.nih.gov/genbank/"
$ curl -s --compressed https://data.nextstrain.org/zika.json | head -n10 | cut -c 1-120
{"version":"v2","meta":{"title":"Real-time tracking of Zika virus evolution","updated":"2024-02-05","build_url":"https:/
Minification would make a big difference in size:
$ curl -s --compressed https://data.nextstrain.org/ncov_open_global_2m.json | wc --bytes
33630950
$ curl -s --compressed https://data.nextstrain.org/ncov_open_global_2m.json | jq -c | wc --bytes
2841344
We apparently never enabled the optional augur export v2 minification for production builds (an unfortunate oversight!). But even the automatic minification done by recent Augur versions is subverted by custom post-processing that explicitly outputs unminified (pretty-printed) JSON. Oops.
$ g -F json.dump
scripts/add_labels.py
65: json.dump(input_json, f, indent=2)
scripts/add_priorities_to_meta.py
44: json.dump(input_json, fh, indent=2)
scripts/construct-recency-from-submission-date.py
44: json.dump(node_data, fh)
scripts/developer_scripts/parse_mutational_fitness_tsv_into_distance_map.py
68: json.dump(json_output, f, indent=2)
scripts/explicit_translation.py
75: json.dump({"nodes":node_data, "annotations":annotations, "reference":root_sequence_translations}, fh)
scripts/fix-colorings.py
89: json.dump(input_json, f, indent=2)
scripts/include_prefix.py
32: json.dump(auspice_json, f, indent=2)
52: json.dump(modified_tip_frequencies_json, f, indent=2)
workflow/snakemake_rules/export_for_nextstrain.smk
323: json.dump(data, fh, indent=2)
487: response = requests.post("https://slack.com/api/chat.postMessage", headers=headers, data=json.dumps(data))
Expected behavior
All JSONs are minified.
Possible solution
- Adjust
json.dump()andjson.dumps()callsites to respectAUGUR_MINIFY_JSON(or alternatively to always minify) - Replace
json.dump()andjson.dumps()callsites withaugur.utils.write_json()which brings the benefits of respectingAUGUR_MINIFY_JSONbut also automatic minification by size… but we maybe probably kinda sorta should promote that to Augur's public API first.
Additional context
@miparedes was having a heck of time getting his custom builds (based on an older version of this repo) to minify.
victorlin
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working