Skip to content

Conversation

@victorlin
Copy link
Member

@victorlin victorlin commented Oct 24, 2025

Description of proposed changes

Previously, the only search path was implicitly the current working directory.

This PR adds another default – the config file's parent directory – and prefers it over cwd. Custom search paths are supported via command line option and environment variable.

Related issue(s)

Checklist

  • See measles PR for example usage and testing notes
  • Automated checks pass
  • Check if you need to add a changelog message
  • Check if you need to add tests
  • Check if you need to update docs

@codecov
Copy link

codecov bot commented Oct 25, 2025

Codecov Report

❌ Patch coverage is 86.30137% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.24%. Comparing base (e452789) to head (25c4448).

Files with missing lines Patch % Lines
augur/subsample.py 86.30% 7 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1918      +/-   ##
==========================================
+ Coverage   74.15%   74.24%   +0.08%     
==========================================
  Files          82       82              
  Lines        8986     9054      +68     
  Branches     1828     1848      +20     
==========================================
+ Hits         6664     6722      +58     
- Misses       2018     2025       +7     
- Partials      304      307       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@tsibley tsibley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty close to what I meant when I was describing the approach!

A couple small niggles noted.

}
],
"description": "File(s) with list of strains to exclude. Paths must be relative to the\nworking directory."
"description": "File(s) with list of strains to exclude. Paths must be relative to the\nconfig file location."
Copy link
Member

@tsibley tsibley Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doc change (and the others below) seems incorrect and is also reverted a couple of commits later. Methinks devel/regenerate-subsample-schema needs to be re-run on this commit (and others)?

git rebase -i --exec './devel/regenerate-subsample-schema && git add augur/data/schema-subsample-config.json && git commit --fixup @ --allow-empty' master
git rebase -i --autosquash master

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, good catch. "relative to the config file location" was from an old draft where I had swapped the only search path from cwd to config file parent. With the ability to configure this in augur subsample and the unchanged search path for augur filter as cwd, I figured it'd be best to remove the note entirely from the shared description as part of aa68ab4. I'll note this in the commit message.

Thanks for the handy command!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in force-push, and noted in commit message of 616bc41.


def _resolve_filepath_str(path: str, search_paths: List[Path]) -> str:
"""Resolve a string-based filepath by wrapping _resolve_filepath."""
return _resolve_filepath(Path(path), search_paths).as_posix()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return _resolve_filepath(Path(path), search_paths).as_posix()
return str(_resolve_filepath(Path(path), search_paths))

so that we don't assume POSIX paths? (Not that I think we support Augur on Windows, but…)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I didn't know that str() takes OS into account (doc). That's a good idea.

The string representation of a path is the raw filesystem path itself (in native form, e.g. with backslashes under Windows)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 One thing I meant to add but forget is: at the point of doing str(fn(…)), it seems not worth wrapping into a separate function at all.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the wrapper and switched to str() in various places:

augur/augur/subsample.py

Lines 381 to 386 in 25c4448

if isinstance(value, list):
config[key] = [str(_resolve_filepath(Path(v), search_paths)) for v in value]
filepaths.extend(config[key])
elif isinstance(value, str):
config[key] = str(_resolve_filepath(Path(value), search_paths))
filepaths.append(config[key])

augur/augur/subsample.py

Lines 473 to 474 in 25c4448

if not path.exists():
raise AugurError(f"File {str(path)!r} does not exist.")

augur/augur/subsample.py

Lines 484 to 487 in 25c4448

raise AugurError(dedent(f"""\
File {str(path)!r} not resolvable from any of the following paths:
{indented_list([str(p) for p in search_paths], ' ' + ' ')}"""))

Comment on lines 246 to 301
print_err(dedent(f"""\
WARNING: Both the environment variable AUGUR_SEARCH_PATHS
and the command line argument --search-paths are set.
Only the environment variable will be used."""))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I think it's more conventional (though not universal) for command line options to take precedence over environment variables, i.e. swap the order here.

Conventional precedence from least to most goes: config file → env var → command line option.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll swap the order.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done:

augur/augur/subsample.py

Lines 327 to 336 in 25c4448

if from_cli:
if from_env:
print_err(dedent(f"""\
WARNING: Both the command line argument --search-paths
and the environment variable AUGUR_SEARCH_PATHS are set.
Only the command line argument will be used."""))
return [
*(Path(p) for p in from_cli),
*default,
]

Comment on lines +8 to +14
.. envvar:: AUGUR_SEARCH_PATHS

Colon-separated directory paths.
If set, ``augur subsample`` will search these directories for relative filepaths specified in the config file.
Multiple directories can be set by separating them with colons (e.g., ``/path/one:/path/two``).

This takes precedence over the ``--search-paths`` command line argument.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-blocking

Were you thinking that we'd expand support for this env var (and option) to other commands too as needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I briefly noted this in "Additional notes / 2. Expanding to additional augur commands" in #1897.

It might be weird to customize the search path for files given directly as command line arguments, but it would be good to avoid resolving filepaths in both the workflow and Augur (a pattern which this PR enables).

Comment on lines +235 to +241
def get_referenced_files(
config_file: str,
config_section: Optional[List[str]] = None,
search_paths: Optional[List[str]] = None,
) -> Set[str]:
"""Get the files referenced in a subsample config file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-blocking

This seems likely to fall out of date with the behaviour of augur subsample since it's not used directly. It would be better if augur subsample's own code and this function shared more of a code path. Perhaps file paths should be resolved as an early transformation of config, rather than transformed one by one as Samples are constructed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, I'll look into doing that in this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up doing this with a function _resolve_filepaths(). It recursively walks through the config/schema and might be slightly hard to follow, but I think it's a decent way to share the same code path between augur subsample and get_referenced_files().

@victorlin victorlin force-pushed the victorlin/subsample-filepath-configs branch from d58a672 to 0e4f947 Compare October 30, 2025 23:11
victorlin and others added 8 commits October 30, 2025 20:59
This makes it easier to add/remove steps.
This will be used in a future commit.
Preparing to use the schema in another function.
Previously, the only search path was implicitly the current working
directory. Make this explicit and add an option to search other paths
before it.

This means the notes in option descriptions mentioning cwd as search
path are no longer accurate for augur subsample. Since the descriptions
are shared with augur filter which does not allow custom search paths, I
figured it'd be best to remove the notes entirely.
This allows more intuitive relative filepaths from the perspective of
the config file.
Environment variables can be easier to use than CLI options in some
situations.
This can be used by Snakemake workflows to declare input files for rules
that call augur subsample.
@victorlin victorlin force-pushed the victorlin/subsample-filepath-configs branch from 0e4f947 to 25c4448 Compare October 31, 2025 04:01
@victorlin victorlin requested a review from tsibley October 31, 2025 16:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

subsample: Improve relative filepath search

3 participants