Marine metagenomics platform NBs to get the science started. This work is part of FAIR-EASE project, specifically Pilot 5 for metagenomics to provide as many tools to for emo-bon data.
Please, consider opening issues and PRs with your dream workflow suggestions. I can be to certain extend your worker until 31/8.
- Minimize dependencies to facilitate wide adaptation and further development of the codebase.
- Simplicity over speed, however performance is considered.
- Data import/export options after UDAL queries made easy. (backend data queries developed by VLIZ)
- Combining strengths of python/R/julia packages developed in those languages.
- API calls to other services, such as Galaxy.
Notebooks always generate panel app for user friendly interactions. However working with the code using the same methods as the app should (needs to made sure of by testers) be straightforward.
General statistics of EMO-BON sequencing efforts. The total amount of sampling events has reached more than a 1000 recently. Unfortunately leafmap
widgets have problem with ngrok
tunnels, so only binder integration is possible.
Barebones in the quality_control.ipynb
folder. There are almost 60 output files from the metaGOflow pipeline. This dashboard provides interface to most relevant intermediate ones, ie. all except from the taxonomy and functional analyses.
NB provides visualization of alpha and beta diversities of the metaGOflow analyses. NB is located in diversities_panel.ipynb
. Unfortunately I did not yet resolve hosting the dashboard properly on Colab.
- Request access to the hosted version at the Blue cloud 2026 (BC) Virtual lab environment (VRE) here.
You will need an account on the galaxy earth-system for this NBs to work. Your Galaxy access data should be stored as environmental variables in the .env
file at the root of the repository
GALAXY_URL="https://earth-system.usegalaxy.eu/"
GALAXY_KEY="..."
BUG: For unknown reason the Binder dashboard does not work.
Dashboard illustrating submission of jobs to galaxy (GECCO tool) in bgc_run_gecco.ipynb
.
- Upload and run workflow.
- Or start the workflow with existing data and in existing history on Galaxy.
- Monitor the job.
- Upload local data or query results of the GECCO from the Galaxy.
- Identifying Biosynthetic Gene Clusters (BGCs).
- Visualize BGCs.
- Compare two samples in respect to each other.
dependencies not yet fixed
The examples are heavily inspired and taken from the MGnify project itself
- How to query data and make basic plots such as Sankey from the MGnify database
wf5_MGnify/query_data.ipynb
- Protein families comparison?
- Demonstrate usage of some relevant R and julia packages, Workflows Q: Can it be done in a single NB? Should!
- By the time, BC 2026 might have GPU support
- Irrespective, try AI4EOSC perhaps? Q: Have not seen there much or any metagenomics though
- Correlate with Essential Ocean Variables (EOVs)
(This is probably WF0) Provides summary of up-to date statistics on amounts of sequenced and processed data.
- Currently
venv
is enough, no need for setting upconda
, meaning that the dependencies are pure python. - Utility functionalities are developed in parallel in this repo. Currently not distributed with PyPI, install with
pip install https://github.com/emo-bon/marine-omics-methods.git
.
- Dashboards are developed in panel
- If you put the NB code in the script, you can serve the dashboard in the browser using
panel serve app.py --dev
. - You can however serve the NB as well,
panel serve app.ipynb --dev
. - Note: if you want to run on Google Colab, you will need a
pyngrok
and ngrok token from here Binder
integration is better in terms of running dashboards, but loading the repo might take time or crash, soGColab
in that case is a better option.
- If you put the NB code in the script, you can serve the dashboard in the browser using
- For statistics, we use pingouin and scikit-bio.
- Data part is handled by
pandas
,numpy
etc. This might be upgraded topolars
/fire-ducks
.
- Galaxy support is built upon bioblend.
- Visualization are currently not interactive and developed in
seaborn
ormatplotlib
. This will likely change toholoviz
. - Interactive parts use
jupyterlab