Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,11 @@
### Bug fixes

* filter: Previously, `--query`, `--exclude-where`, and `--include-where` did not work for the id column (`strain`, `name`, or other from `--metadata-id-columns`). This has been fixed. [#1915][] (@corneliusroemer, @victorlin)
* export v2: Support export of URLs for non-string values. [#1926][] (@joverlee521)

[#1915]: https://github.com/nextstrain/augur/issues/1915
[#1917]: https://github.com/nextstrain/augur/pull/1917
[#1926]: https://github.com/nextstrain/augur/pull/1926

## 32.0.0 (21 October 2025)

Expand Down
13 changes: 6 additions & 7 deletions augur/export_v2.py
Original file line number Diff line number Diff line change
Expand Up @@ -772,7 +772,7 @@ def _transfer_additional_metadata_columns(node, raw_data):
for col in additional_metadata_columns:
if is_valid(value:=raw_data.get(col, None)):
node["node_attrs"][col] = {"value": value}
if type(value) is str and valid_url(url:=raw_data.get(url_name(col), None)):
if valid_url(url:=raw_data.get(url_name(col), None)):
node["node_attrs"][col]['url'] = url

def _transfer_vaccine_info(node, raw_data):
Expand Down Expand Up @@ -820,12 +820,11 @@ def _transfer_colorings_filters(node, raw_data):
for key in trait_keys:
value = raw_data.get(key, None)
if is_valid(value):
if is_numeric(value):
node["node_attrs"][key] = {"value": format_number(value)}
else:
node["node_attrs"][key] = {"value": value}
if valid_url(url:=raw_data.get(url_name(key), None)):
node["node_attrs"][key]['url'] = url
node["node_attrs"][key] = {"value": format_number(value) if is_numeric(value) else value}

if valid_url(url:=raw_data.get(url_name(key), None)):
node["node_attrs"][key]['url'] = url

node["node_attrs"][key].update(attr_confidence(node["name"], raw_data, key))

def _transfer_author_data(node):
Expand Down
60 changes: 21 additions & 39 deletions tests/functional/export_v2/cram/metadata-urls.t
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,13 @@ Setup
Create files for testing.

$ cat >metadata.tsv <<~~
> strain field_A field_A__url
> tipA nextstrain https://nextstrain.org
> tipB BB: not-a-url
> tipC github https://github.com
> tipD DD <not-a-url>
> tipE EE invalid-url
> tipF FF
> strain field_A field_A__url field_B field_B__url
> tipA nextstrain https://nextstrain.org 1 https://nextstrain.org
> tipB BB: not-a-url 2
> tipC github https://github.com 3 https://github.com
> tipD DD <not-a-url> 4
> tipE EE invalid-url 5 invalid-url
> tipF FF 6
> ~~

$ cat >tree.nwk <<~~
Expand All @@ -23,14 +23,9 @@ Check that URLs were extracted from metadata values when added as an "extra meta
$ ${AUGUR} export v2 \
> --tree tree.nwk \
> --metadata metadata.tsv \
> --metadata-columns "field_A" \
> --metadata-columns "field_A" "field_B" \
> --maintainers "Nextstrain Team" \
> --output dataset.json
Validating schema of 'dataset.json'...
Validation of 'dataset.json' succeeded.
Validating produced JSON
Validating that the JSON is internally consistent...

> --output dataset.json &> /dev/null
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests were failing because the order of the guessed type outputs were not guaranteed, causing the test to fail.¹

This stood out to me -- where does the stochasticity come from? (I've seen related issues where STDOUT/STDERR ordering was platform dependent, but this comment implies the ordering of (e.g.) field_A and field_B differs stochastically.)

This isn't blocking, but it seems weird...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was just tracking this down. This is due to node_data_names, which is a set created in parse_node_data_metadata.

We can guarantee order here if we sort the node_data_names:

diff --git a/augur/export_v2.py b/augur/export_v2.py
index 06cbf038fd..3fe320c455 100644
--- a/augur/export_v2.py
+++ b/augur/export_v2.py
@@ -1213,7 +1213,7 @@
             config=get_config_colorings_as_dict(config),
             command_line_colorings=args.color_by_metadata,
             metadata_names=metadata_names,
-            node_data_colorings=node_data_names,
+            node_data_colorings=sorted(node_data_names),
             provided_colors=read_colors(args.colors),
             node_attrs=node_attrs,
             branch_attrs=branch_attrs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah interesting. That implies set ordering is different across different python versions / interpreters. (I realise order's not guaranteed, but still interesting to see it in practice.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, set order could be different per invocation, you can also see this with:

$ python -c "print(set('abcde'))"
{'a', 'b', 'e', 'c', 'd'}
$ python -c "print(set('abcde'))"
{'b', 'e', 'd', 'a', 'c'}


$ python3 "$TESTDIR/../../../../scripts/diff_jsons.py" "$TESTDIR/../data/dataset-with-parsed-urls.json" dataset.json \
> --exclude-paths "root['meta']['updated']"
Expand All @@ -42,15 +37,9 @@ Check that URLs were extracted from metadata values when used as a coloring
$ ${AUGUR} export v2 \
> --tree tree.nwk \
> --metadata metadata.tsv \
> --color-by-metadata "field_A" \
> --color-by-metadata "field_A" "field_B" \
> --maintainers "Nextstrain Team" \
> --output dataset2.json
Validating schema of 'dataset2.json'...
Validation of 'dataset2.json' succeeded.
Trait 'field_A' was guessed as being type 'categorical'. Use a 'config' file if you'd like to set this yourself.
Validating produced JSON
Validating that the JSON is internally consistent...

> --output dataset2.json &> /dev/null

$ python3 "$TESTDIR/../../../../scripts/diff_jsons.py" "$TESTDIR/../data/dataset-with-parsed-urls.json" dataset2.json \
> --exclude-paths "root['meta']['updated']" "root['meta']['colorings']" "root['meta']['filters']"
Expand All @@ -62,12 +51,12 @@ The data is essentially the same, but tipB & tipD have empty-string URLs and tip
$ cat >node-data.json <<~~
> {"nodes":
> {
> "tipA": {"field_A": "nextstrain", "field_A__url": "https://nextstrain.org"},
> "tipB": {"field_A": "BB: not-a-url", "field_A__url": ""},
> "tipC": {"field_A": "github", "field_A__url": "https://github.com"},
> "tipD": {"field_A": "DD <not-a-url>", "field_A__url": ""},
> "tipE": {"field_A": "EE", "field_A__url": "invalid-url"},
> "tipF": {"field_A": "FF"}
> "tipA": {"field_A": "nextstrain", "field_A__url": "https://nextstrain.org", "field_B": 1, "field_B__url": "https://nextstrain.org"},
> "tipB": {"field_A": "BB: not-a-url", "field_A__url": "", "field_B": 2, "field_B__url": ""},
> "tipC": {"field_A": "github", "field_A__url": "https://github.com", "field_B": 3, "field_B__url": "https://github.com"},
> "tipD": {"field_A": "DD <not-a-url>", "field_A__url": "", "field_B": 4, "field_B__url": ""},
> "tipE": {"field_A": "EE", "field_A__url": "invalid-url", "field_B": 5, "field_B__url": "invalid-url"},
> "tipF": {"field_A": "FF", "field_B": 6}
> }
> }
> ~~
Expand All @@ -76,15 +65,8 @@ The data is essentially the same, but tipB & tipD have empty-string URLs and tip
> --tree tree.nwk \
> --node-data node-data.json \
> --maintainers "Nextstrain Team" \
> --output dataset3.json
Validating schema of 'dataset3.json'...
Validation of 'dataset3.json' succeeded.
Trait 'field_A' was guessed as being type 'categorical'. Use a 'config' file if you'd like to set this yourself.
Validating produced JSON
Validating that the JSON is internally consistent...

> --output dataset3.json &> /dev/null


$ python3 "$TESTDIR/../../../../scripts/diff_jsons.py" "$TESTDIR/../data/dataset-with-parsed-urls.json" dataset.json \
> --exclude-paths "root['meta']['updated']"
{}
$ python3 "$TESTDIR/../../../../scripts/diff_jsons.py" "$TESTDIR/../data/dataset-with-parsed-urls.json" dataset3.json \
> --exclude-paths "root['meta']['updated']" "root['meta']['colorings']" "root['meta']['filters']"
{}
22 changes: 21 additions & 1 deletion tests/functional/export_v2/data/dataset-with-parsed-urls.json
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,10 @@
"field_A": {
"value": "nextstrain",
"url": "https://nextstrain.org"
},
"field_B": {
"value": 1,
"url": "https://nextstrain.org"
}
},
"branch_attrs": {}
Expand All @@ -44,6 +48,9 @@
"div": 3.0,
"field_A": {
"value": "BB: not-a-url"
},
"field_B": {
"value": 2
}
},
"branch_attrs": {}
Expand All @@ -55,6 +62,10 @@
"field_A": {
"value": "github",
"url": "https://github.com"
},
"field_B": {
"value": 3,
"url": "https://github.com"
}
},
"branch_attrs": {}
Expand All @@ -74,6 +85,9 @@
"div": 8.0,
"field_A": {
"value": "DD <not-a-url>"
},
"field_B": {
"value": 4
}
},
"branch_attrs": {}
Expand All @@ -84,6 +98,9 @@
"div": 9.0,
"field_A": {
"value": "EE"
},
"field_B": {
"value": 5
}
},
"branch_attrs": {}
Expand All @@ -94,6 +111,9 @@
"div": 6.0,
"field_A": {
"value": "FF"
},
"field_B": {
"value": 6
}
},
"branch_attrs": {}
Expand All @@ -102,4 +122,4 @@
}
]
}
}
}
Loading