Skip to content

Dataset JSONs are not minified #1098

@tsibley

Description

@tsibley

Current Behavior
Dataset JSONs are not minified.

$ curl -s --compressed https://data.nextstrain.org/ncov_open_global_2m.json | head -n10 | cut -c 1-120
{
  "version": "v2",
  "meta": {
    "title": "Genomic epidemiology of SARS-CoV-2 with subsampling focused globally over the past 2 months",
    "updated": "2024-02-15",
    "build_url": "https://github.com/nextstrain/ncov",
    "data_provenance": [
      {
        "name": "GenBank",
        "url": "https://www.ncbi.nlm.nih.gov/genbank/"

$ curl -s --compressed https://data.nextstrain.org/zika.json | head -n10 | cut -c 1-120
{"version":"v2","meta":{"title":"Real-time tracking of Zika virus evolution","updated":"2024-02-05","build_url":"https:/

Minification would make a big difference in size:

$ curl -s --compressed https://data.nextstrain.org/ncov_open_global_2m.json | wc --bytes
33630950

$ curl -s --compressed https://data.nextstrain.org/ncov_open_global_2m.json | jq -c | wc --bytes
2841344

We apparently never enabled the optional augur export v2 minification for production builds (an unfortunate oversight!). But even the automatic minification done by recent Augur versions is subverted by custom post-processing that explicitly outputs unminified (pretty-printed) JSON. Oops.

$ g -F json.dump
scripts/add_labels.py
65:        json.dump(input_json, f, indent=2)

scripts/add_priorities_to_meta.py
44:        json.dump(input_json, fh, indent=2)

scripts/construct-recency-from-submission-date.py
44:        json.dump(node_data, fh)

scripts/developer_scripts/parse_mutational_fitness_tsv_into_distance_map.py
68:        json.dump(json_output, f, indent=2)

scripts/explicit_translation.py
75:        json.dump({"nodes":node_data, "annotations":annotations, "reference":root_sequence_translations}, fh)

scripts/fix-colorings.py
89:        json.dump(input_json, f, indent=2)

scripts/include_prefix.py
32:        json.dump(auspice_json, f, indent=2)
52:        json.dump(modified_tip_frequencies_json, f, indent=2)

workflow/snakemake_rules/export_for_nextstrain.smk
323:            json.dump(data, fh, indent=2)
487:    response = requests.post("https://slack.com/api/chat.postMessage", headers=headers, data=json.dumps(data))

Expected behavior
All JSONs are minified.

Possible solution

  1. Adjust json.dump() and json.dumps() callsites to respect AUGUR_MINIFY_JSON (or alternatively to always minify)
  2. Replace json.dump() and json.dumps() callsites with augur.utils.write_json() which brings the benefits of respecting AUGUR_MINIFY_JSON but also automatic minification by size… but we maybe probably kinda sorta should promote that to Augur's public API first.

Additional context
@miparedes was having a heck of time getting his custom builds (based on an older version of this repo) to minify.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions