-
Notifications
You must be signed in to change notification settings - Fork 11
Per-pathogen docs prototyping #90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: james/storage
Are you sure you want to change the base?
Conversation
Code entirely by cursor.ai + some manual and prompt-based debugging. Main prompts: --- I'm in @tutorial.rst and I want to be able to write a custom interpreted text roles to do something like :configvalue:`phylogenetic/defaults/config.yaml:strain_id_field` and have it replaced by the corresponding value in `phylogenetic/defaults/config.yaml` of `strain_id_field` which is "accession". The exact syntax used can vary. This will require writing some custom python code to provide this functionality in sphinx. --- That's working great. In @tutorial.rst I actually want to reference the yaml field `inputs[0].metadata` however this renderes as `<config value 'inputs[0].metadata' not found in phylogenetic/defaults/config.yaml>`. --- This is working great for adding the value inline. For some values, e.g. :configvalue:`ingest/defaults/config.yaml:ncbi_datasets_fields` The output is a list and would be better rendered as a YAML codeblock by itself rather than inline. Is this possible? ---
I hoped there was a better way than symlinking, but the robot told me there wasn't so here we are
This printed the reason for each executed rule but was already deprecated and always true in v7 [1] and has been removed entirely in later snakemake versions. The upgrade path is clear [2]: "Deprecated: Drop it and don't worry about anything" [1] <https://snakemake.readthedocs.io/en/v7.22.0/executing/cli.html#output> [2] <https://snakemake.readthedocs.io/en/stable/getting_started/migration.html>
Code entirely written by cursor.ai. Here's the initial prompt: ``` I'd like to add a custom sphinx extension in `docs/src/extensions` which should take a directive similar to: :snakemake-dag:`ingest` 1. Run a command which will generate a visualisation in the dot language. In this case it'd be run from the top level `ingest` directory, and the command would be `snakemake --cores 1 -npf --forceall --dag | grep -v 'Building'`. The STDOUT is the graph viz in dot format. Remember you must use the `augur-dev-snakemake-v9` conda env to run any commands. 2. Take this dot code and use the native sphinx graphviz extension <@https://www.sphinx-doc.org/en/master/usage/extensions/graphviz.html > to render it in the docs page. ``` and lots of debugging prompts followed.
|
Oops, I deleted |
|
I looked over this at a high-level yesterday (i.e. without diving into the code much at all) and agree with the direction it goes in general. I think there's a few details I'd tweak and some things to still figure out, but overall: yeah, this is the direction I'd been imagining for a while. |
Problem description
Ongoing, long-term work has made inroads into the ability to run (and install) pathogen workflows decoupled from the pathogen repo itself, associated with names such as "workflows-as-programs", "nextstrain run", "external analysis directories" etc. Simultaneously we're extending the ability for workflow invocations to customise their behaviour via overwriting default config files, config overlays, multiple inputs etc etc.
This presents us with a situation where we want people to run a workflow (e.g. zika), starting with an ~empty analaysis directory, and without access to the pathogen repo source code (this repo). So, how are they to know what commands to run, and what configuration knobs are available?
There's been some back and forth about this concept. The most salient example I could find is in this avian-flu prototyping PR.
Goal: per-pathogen docs
This prototype explores encapsulating a set of HTML-based docs within the pathogen repo itself and surfaced to the user via
nextstrain docs <pathogen>and/or online via docs.nextstrain.org. These docs would briefly introduce the concept of running analyses from a working directory, include brief tutorials for adding in your own data etc, and fully document all the available configuration options alongside their defaults (essentially the API of the program).If we are to think of workflows as programs, then this documentation can be thought of as man pages.
Keeping docs and code in sync
Keeping code and docs in the same repo helps keep things in sync and goes hand-in-hand with our plans to version pathogen repos. Our only attempt at this has been ncov. Since pathogen repos vendor shared code via nextstrain/shared, we can co-locate relevant documentation in that repo and workflows (e.g. zika) can bring it in alongside the code. For instance, we vendor snakemake code for path resolution and this PR adds a
.rstdocs file alongside that shared code which the zika docs use.We can leverage sphinx extensions to help keep program state and docs in sync. This PR implements a couple of ideas (using cursor.ai for the python coding) and I think there's a lot more we can do.
configvaluedirective which allows us to reference config values in.rstdocs and have them filled in at build time.snakemake-dagdirective which will use snakemake to construct the dag and use graphviz to render the DAG in the docs. This all happens at docs build time.How to test the docs in this PR
We don't have the nice
nextstrain docs UIbut this PR emulates it using commands likenextstrain run zika docs .See the added docs/README.md for instructions on how to test it out. When we eventually bundle pathogens up into their own images then the built docs can be part of this, and before that time there's a number of other ways we can build the docs upon each workflow release. The content in the docs added here expands on many of the ideas introduced in this PR description.Future directions
This is a draft PR as I don't expect it to be merged in its current state. I would like these ideas to make it to production however and would encourage anyone to try out ideas in this PR.