Replies: 18 comments 1 reply
-
Hi @wikiselev , Good point - the issue is a mix of a couple factors
I'd suggest that instead of relying upon the We'll definitely look further into this, however for the time being I'm outlining the steps you can follow to integrate your fileshare based data. Steps to setup Azure File Share via Nextflow CLII'm documenting here the steps which you can rely upon
Create a new Azure Batch poolASSUMPTIONS: I'm assuming that the Azure Fileshare (e.g. I'm also assuming that the environment variables used in the config below are prepopulated in the environment. For example
profiles {
azb {
params.outdir = "az://$AZURE_RNASEQNF_OUTPUT_STORAGE_CONTAINER_NAME/rnaseq-nf-publishdir"
process {
container = 'quay.io/nextflow/rnaseq-nf:v1.1'
}
process.executor = 'azurebatch'
workDir = "az://$AZURE_RNASEQNF_WORK_STORAGE_CONTAINER_NAME/rnaseq-nf-workdir"
azure {
batch {
location = "$AZURE_BATCH_LOCATION"
accountName = "$AZURE_BATCH_ACCOUNT_NAME"
accountKey = "$AZURE_BATCH_ACCOUNT_KEY"
deletePoolsOnCompletion = false
deleteJobsOnCompletion = true
autoPoolMode = true
pools {
auto {
autoScale = true
}
}
}
storage {
accountName = "$AZURE_STORAGE_ACCOUNT_NAME"
accountKey = "$AZURE_STORAGE_ACCOUNT_KEY"
fileShares {
'nextflow-fs' {
mountPath = '/mnt/'
}
}
}
}
}
This step creates the pool with the relevant settings for mounting the fileshare and this can be confirmed via logging into the (assumed Ubuntu) node
Connect this pool to Tower
Launch a pipeline to consume data from Azure Fileshare
transcriptome: "/mnt/batch/tasks/fsmounts/nextflow-fs/ggal_1_48850000_49020000.Ggal71.500bpflank.fa"
|
Beta Was this translation helpful? Give feedback.
-
This looks cool, @abhi18av ! Many thanks for your detailed answer! I have a question - when you say
At the moment when I do everything except uploading files to Azure Fileshare I get the following error:
And this is the
|
Beta Was this translation helpful? Give feedback.
-
@wikiselev , I meant that the Azure Fileshare resource should be present within your Storage Account as outlined here https://learn.microsoft.com/en-us/azure/storage/files/storage-how-to-use-files-portal Once the resource is there, you can accommodate that within your configs/params upon launching from Tower.
You don't necessarily need to, but I think that's the usual use case. In the above comment, as a quick example, I have uploaded the transcriptome file (manually via Azure portal) and then sourced it during the execution of a pipeline.
Mmm, this is interesting - I have to say it's a head-scratcher for me 🤔 When comparing to logs from my own execution, you should see something like
If the execution reached thus far, I think the creds are correctly configured. I'd suggest that you double check the name of the blob container (inside your storage account), which you intend to use i.e. do you have a blob container called |
Beta Was this translation helpful? Give feedback.
-
Thanks, @abhi18av ! That's useful. I've checked all of the credentials and I do have all of the storage blobs/file shares created: I am gonna keep trying, but if you have any other ideas, please let me know. Would you be able to provide a json/parameters that Nextflow is using to create a pool with a mounted storage? I have a python script that I can use to create pools (I've assembled it from Azure documentation and from looking at JSON of pools created by Tower), i.e.:
I've created a pool using this script and then manually added it in Tower and tested the pipeline. However, when I checked the nodes in that pool there was an error during mounting. Maybe you know what's wrong with |
Beta Was this translation helpful? Give feedback.
-
I should mention that I've used the same script to mount a blob container (substituted |
Beta Was this translation helpful? Give feedback.
-
Ok, as usual (!), I've updated Nextflow and it all worked:
However, it got stuck because the default machine type is I've checked the created pool configuration on Azure and saw this:
I'll try to use my python script to create a pool with mounted file share using this configuration now. Looks like the |
Beta Was this translation helpful? Give feedback.
-
Hi @wikiselev , Happy to see this coming along!
For this, you can reply upon the named pool specific settings here https://www.nextflow.io/docs/latest/azure.html#named-pools
Yup, if you rely upon Nextflow to create the pool with fileshare mounts, these settings are applied already (optionally customizable) |
Beta Was this translation helpful? Give feedback.
-
Amazing, thanks so much, @abhi18av ! And I am glad @pditommaso is so happy )) So, I think mounting works 100% and the pipeline starts well from Tower. However, it now fails at
I think |
Beta Was this translation helpful? Give feedback.
-
But... Maybe I am wrong and there should be symlinks to the index files in the current directory... I just noticed that I used |
Beta Was this translation helpful? Give feedback.
-
Well, that didn't work... So, it looks like when one creates a pool from running Nextflow in a command line it uses |
Beta Was this translation helpful? Give feedback.
-
Another piece of info - when I mounted a blob container instead of a file share before (see python script above), the pipeline failed on the same process, however, the error message was different:
Looks like in this case |
Beta Was this translation helpful? Give feedback.
-
I think that at this point, the issue might have do with the sample quality itself since the logs contain Perhaps its best to ask in the nf-core Slack @pditommaso , I'm thinking that this could also be a discussion rather than an issue? |
Beta Was this translation helpful? Give feedback.
-
I agree, it could be. But I've also checked the mounting by running a different pipeline -
In my experience and according to Harshil this is usually related to incomplete download of the genome file, which is strange as I synchronised my file share with the blob container (containing genome references) using |
Beta Was this translation helpful? Give feedback.
-
@abhi18av would you be able to test the toy
|
Beta Was this translation helpful? Give feedback.
-
Another question - do you know anyone who is using file share mounting on Azure successfully? Maybe they can provide some advice. |
Beta Was this translation helpful? Give feedback.
-
Actually, I've used it previously but not sure about other folks who are relying upon AzureFS at a larger scale. I'm also iterating on various Azure based configs to setup an automated nf-core megatests for all pipeline on Azure. This thread does highlight that we need to improve the docs about the FileShare a bit more - I'll be happy to chat on the Nextflow Slack (username
Yup, I can confirm it works, when you launch it from tower.nf (after setting up a nextflow-cli generate pool, which I referred here #3480 (comment) ). I know this is a bit round-about, but till the time that Azure FS support is rolled out in Tower Forge, we can only rely upon a manual compute-env setup and then launch the pipeline from Tower itself. The underlying detail is that, when you launch a job using Tower the head job is launched within the pool and therefore already has access to the However, when you launch the job using baseline Nextflow CLI (assuming, from your own workstation) then the NOTE: You could also mount the Azure FS on your own workstation, but I think the networking within Azure world would be just faster so Azure VM based launch might be better. |
Beta Was this translation helpful? Give feedback.
-
Many thanks again for your help, @abhi18av ! So, after appropriate testing I can confirm/conclude the following:
Note that mounted files are all accessible within the compute node. Note that exactly the same pipeline finished OK if no file share is mounted and references are downloaded from a blob container within each Nextflow process. Note that my file share is synchronised with the blob container using
|
Beta Was this translation helpful? Give feedback.
-
Bug report
I am using Tower+Azure and trying to mount a file share with igenomes to compute nodes. I’ve followed these instructions and added the following extra config in the Tower form before launching my pipeline:
My
igenomes
file share is under the same storage account as my blob storage and I was able to mount it manually to another Azure instance.Expected behavior and actual behavior
Expected: the pipeline is expected to start and run normally.
Actual: When I run the pipeline it fails in the very beginning with the following message:
Steps to reproduce the problem
Program output
Environment
(Run on Tower.nf)
$SHELL --version
)Beta Was this translation helpful? Give feedback.
All reactions