-
Notifications
You must be signed in to change notification settings - Fork 214
Add working with files side quest #601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: intermediate_training
Are you sure you want to change the base?
Conversation
Adds a side quest for working with files, vaguely based on the Metadata propagation section of the advanced training. Story: - Create `file` object from string - Look at `file` object attributes - Extract sample metadata from filename - Use `Channel.fromPath` to create a channel of files - Extract sample metadata from filename within map operator - Use `Channel.fromFilePairs` to create a channel of file pairs - Use `publishDir` to save results Should help introduce the concept of handling files better. Problems: - Doesn't ram home the "always use files as inputs!!!" message. We could do that in the final section on processes?
✅ Deploy Preview for nextflow-training ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- needs to be added to the "Side Quests" Side bar & "Menu of side quests"
- IIRC in the other trainings we started to indent the code blocks with the amount that is needed in the code so we don't have to do this when copying the code blocks.
reviewed until approximately the end of section 3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am mostly just thinking out loud here:
I am wondering how this will mix with any metadata module. I think at one point we discussed pulling out the section out of the nf-core module and have it separate. It feels too small to be its own side quest and I think it could fit into here quite well. But I also wouldn't want to convolute this module with extra things about hashmaps and keys.
If the flattening of the meta data is not hugely important, we could replace that with a map and collapse the content?
|
||
Wait, we have a problem. We have 2 replicates for sampleA, but only 1 output file! We are overwriting the output file each time. | ||
|
||
### 5.4. Make the published files unique |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose this is to use something from the meta info as example on how it can be used in the workflow. Should we do something similar as in the nf-core module or change the nf-core module (we used branch to change the execution path) ? Just to revisit the same concepts again
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not a bad idea. We could use filter
(from splitting and grouping) instead of branch
and use a closure with a filename to unique-ify the published file in config.
|
||
--- | ||
|
||
## Summary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The indents of the learnings render weirdly.
Copying discussion from slack here, so it won't get lost: How about we rename the "Working with Files" chapter to "Working with Input Data" and cover both files and metadata explicitly? To bring things close to nf-core & bridge the gap to grouping/splitting, we could add a section in 3.5 on how to use a map instead of a list. |
Co-authored-by: Friederike Hanssen <[email protected]>
I've added a metamap here: 6998ddf but I'm not sure about it, we might be introducing too much too early? |
@FriederikeHanssen I've addressed most of your comments now - take a second look. |
Co-authored-by: Maxime U Garcia <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love it!
- Schedule wise, we should probably run this before nf4-science, since we use the file() objects there already.
Launching `main.nf` [infallible_swartz] DSL2 - revision: 7f4e68c0cb | ||
|
||
[[id:sampleA, replicate:1, type:normal, readNum:R2], /workspaces/training/side-quests/working_with_files/data/sampleA_rep1_normal_R2_001.fastq.gz] | ||
[[id:sampleA, replicate:1, type:normal, readNum:R1], /workspaces/training/side-quests/working_with_files/data/sampleA_rep1_normal_R1_001.fastq.gz] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this addition. The only thing that could potentially be confusing I think is, that in 3.2 (Extracting Metadata from Filenames) we already had a map and a file, but then flatten it
Therefore, it's easier if the input channel is flat instead of the nested structure we have here.
Maybe we need some justification, why the file is not part of the map
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In 3.2 it's a list (array) and a file, do you think that's different enough to be obvious?
[[sampleA, rep1, normal, R1, 001], /workspaces/training/side-quests/working_with_files/data/sampleA_rep1_normal_R1_001.fastq.gz]
[[sampleA, rep1, normal, R2, 001], /workspaces/training/side-quests/working_with_files/data/sampleA_rep1_normal_R2_001.fastq.gz]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for why the file isn't in the map - good point. I don't have a great answer for that other than 'because Nextflow says so'...hmmm not sure what to do.
|
||
Launching `main.nf` [prickly_stonebraker] DSL2 - revision: f62ab10a3f | ||
|
||
[[id:sampleA, replicate:1, type:normal, readNum:R], [/workspaces/training/side-quests/working_with_files/data/sampleA_rep1_normal_R1_001.fastq.gz, /workspaces/training/side-quests/working_with_files/data/sampleA_rep1_normal_R2_001.fastq.gz]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get the same results, but do we want readNum:R
? I would have expected 1 or 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhhhhhhhhhh fromFilePairs strips the contents of the {}
...hmm we might need to rethink this.
|
||
!!! note | ||
|
||
We are calling our map '`meta`'. This is the first introduction of a concept called `metamap` which we will cover later! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what we want to cover beyond this tbh. The next step would maybe be adding fields via nf-schema, but I think that would be a training focused on input validation
Adds a side quest for working with files, vaguely based on the Metadata propagation section of the advanced training.
Story:
file
object from stringfile
object attributesChannel.fromPath
to create a channel of filesChannel.fromFilePairs
to create a channel of file pairspublishDir
to save resultsShould help introduce the concept of handling files better.
Problems:
Notes
FASTQ files generated with:
Where samples.txt: