Add working with files side quest #601

adamrtalbot · 2025-04-11T15:26:28Z

Adds a side quest for working with files, vaguely based on the Metadata propagation section of the advanced training.

Story:

Create file object from string
Look at file object attributes
Extract sample metadata from filename
Use Channel.fromPath to create a channel of files
Extract sample metadata from filename within map operator
Use Channel.fromFilePairs to create a channel of file pairs
Use publishDir to save results

Should help introduce the concept of handling files better.

Problems:

Doesn't ram home the "always use files as inputs!!!" message. We could do that in the final section on processes?

Notes

FASTQ files generated with:

while read name; do
    # Generate a random number between 1 and 10
    random_num=$((1 + RANDOM % 10))
    fq generate -n ${random_num} --read-length 10 data/${name}_R1_001.fastq.gz data/${name}_R2_001.fastq.gz
done < scripts/samples.txt

Where samples.txt:

sampleA_rep1_normal
sampleA_rep1_tumor
sampleA_rep2_normal
sampleA_rep2_tumor
sampleB_rep1_normal
sampleB_rep1_tumor
sampleC_rep1_normal
sampleC_rep1_tumor

Adds a side quest for working with files, vaguely based on the Metadata propagation section of the advanced training. Story: - Create `file` object from string - Look at `file` object attributes - Extract sample metadata from filename - Use `Channel.fromPath` to create a channel of files - Extract sample metadata from filename within map operator - Use `Channel.fromFilePairs` to create a channel of file pairs - Use `publishDir` to save results Should help introduce the concept of handling files better. Problems: - Doesn't ram home the "always use files as inputs!!!" message. We could do that in the final section on processes?

netlify · 2025-04-11T15:26:45Z

✅ Deploy Preview for nextflow-training ready!

Name	Link
🔨 Latest commit	`de22980`
🔍 Latest deploy log	https://app.netlify.com/sites/nextflow-training/deploys/6819fd3073f79600084979da
😎 Deploy Preview	https://deploy-preview-601--nextflow-training.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

docs/side_quests/working_with_files.md

FriederikeHanssen

needs to be added to the "Side Quests" Side bar & "Menu of side quests"
IIRC in the other trainings we started to indent the code blocks with the amount that is needed in the code so we don't have to do this when copying the code blocks.

reviewed until approximately the end of section 3.

docs/side_quests/working_with_files.md

FriederikeHanssen

I am mostly just thinking out loud here:
I am wondering how this will mix with any metadata module. I think at one point we discussed pulling out the section out of the nf-core module and have it separate. It feels too small to be its own side quest and I think it could fit into here quite well. But I also wouldn't want to convolute this module with extra things about hashmaps and keys.

If the flattening of the meta data is not hugely important, we could replace that with a map and collapse the content?

docs/side_quests/working_with_files.md

FriederikeHanssen · 2025-04-14T14:12:24Z

docs/side_quests/working_with_files.md

+
+Wait, we have a problem. We have 2 replicates for sampleA, but only 1 output file! We are overwriting the output file each time.
+
+### 5.4. Make the published files unique


I suppose this is to use something from the meta info as example on how it can be used in the workflow. Should we do something similar as in the nf-core module or change the nf-core module (we used branch to change the execution path) ? Just to revisit the same concepts again

That's not a bad idea. We could use filter (from splitting and grouping) instead of branch and use a closure with a filename to unique-ify the published file in config.

FriederikeHanssen · 2025-04-14T14:16:46Z

docs/side_quests/working_with_files.md

+
+---
+
+## Summary


The indents of the learnings render weirdly.

docs/side_quests/splitting-and-grouping.md

FriederikeHanssen · 2025-04-22T15:03:45Z

Copying discussion from slack here, so it won't get lost:

How about we rename the "Working with Files" chapter to "Working with Input Data" and cover both files and metadata explicitly? To bring things close to nf-core & bridge the gap to grouping/splitting, we could add a section in 3.5 on how to use a map instead of a list.
That would also remove a little complexity in the grouping chapters and focus more on just the data shuffling.

Co-authored-by: Friederike Hanssen <[email protected]>

adamrtalbot · 2025-04-23T11:13:08Z

I am mostly just thinking out loud here: I am wondering how this will mix with any metadata module. I think at one point we discussed pulling out the section out of the nf-core module and have it separate. It feels too small to be its own side quest and I think it could fit into here quite well. But I also wouldn't want to convolute this module with extra things about hashmaps and keys.

If the flattening of the meta data is not hugely important, we could replace that with a map and collapse the content?

I've added a metamap here: 6998ddf

but I'm not sure about it, we might be introducing too much too early?

adamrtalbot · 2025-04-23T14:32:29Z

@FriederikeHanssen I've addressed most of your comments now - take a second look.

docs/side_quests/working_with_files.md

Co-authored-by: Maxime U Garcia <[email protected]>

… files

FriederikeHanssen

Love it!

Schedule wise, we should probably run this before nf4-science, since we use the file() objects there already.

FriederikeHanssen · 2025-04-30T11:10:58Z

docs/side_quests/working_with_files.md

+Launching `main.nf` [infallible_swartz] DSL2 - revision: 7f4e68c0cb
+
+[[id:sampleA, replicate:1, type:normal, readNum:R2], /workspaces/training/side-quests/working_with_files/data/sampleA_rep1_normal_R2_001.fastq.gz]
+[[id:sampleA, replicate:1, type:normal, readNum:R1], /workspaces/training/side-quests/working_with_files/data/sampleA_rep1_normal_R1_001.fastq.gz]


I like this addition. The only thing that could potentially be confusing I think is, that in 3.2 (Extracting Metadata from Filenames) we already had a map and a file, but then flatten it

Therefore, it's easier if the input channel is flat instead of the nested structure we have here.

Maybe we need some justification, why the file is not part of the map

In 3.2 it's a list (array) and a file, do you think that's different enough to be obvious?

[[sampleA, rep1, normal, R1, 001], /workspaces/training/side-quests/working_with_files/data/sampleA_rep1_normal_R1_001.fastq.gz] [[sampleA, rep1, normal, R2, 001], /workspaces/training/side-quests/working_with_files/data/sampleA_rep1_normal_R2_001.fastq.gz]

As for why the file isn't in the map - good point. I don't have a great answer for that other than 'because Nextflow says so'...hmmm not sure what to do.

FriederikeHanssen · 2025-04-30T11:13:34Z

docs/side_quests/working_with_files.md

+
+Launching `main.nf` [prickly_stonebraker] DSL2 - revision: f62ab10a3f
+
+[[id:sampleA, replicate:1, type:normal, readNum:R], [/workspaces/training/side-quests/working_with_files/data/sampleA_rep1_normal_R1_001.fastq.gz, /workspaces/training/side-quests/working_with_files/data/sampleA_rep1_normal_R2_001.fastq.gz]]


I get the same results, but do we want readNum:R ? I would have expected 1 or 2.

Ahhhhhhhhhhh fromFilePairs strips the contents of the {}...hmm we might need to rethink this.

docs/side_quests/working_with_files.md

FriederikeHanssen · 2025-04-30T11:19:41Z

docs/side_quests/working_with_files.md

+
+!!! note
+
+    We are calling our map '`meta`'. This is the first introduction of a concept called `metamap` which we will cover later!


I am not sure what we want to cover beyond this tbh. The next step would maybe be adding fields via nf-schema, but I think that would be a training focused on input validation

adamrtalbot requested a review from vdauwera April 11, 2025 15:28

adamrtalbot added 3 commits April 11, 2025 15:51

fixups and make paths relative to codespaces

e4733e7

Add real FASTQ data

9593653

Reduce code blocks for clarity

0cd65a6

FriederikeHanssen self-requested a review April 14, 2025 11:48

FriederikeHanssen reviewed Apr 14, 2025

View reviewed changes

docs/side_quests/working_with_files.md Outdated Show resolved Hide resolved

FriederikeHanssen reviewed Apr 14, 2025

View reviewed changes

docs/side_quests/working_with_files.md Outdated Show resolved Hide resolved

FriederikeHanssen reviewed Apr 14, 2025

View reviewed changes

docs/side_quests/working_with_files.md

---

## Summary

Copy link

Collaborator

FriederikeHanssen Apr 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indents of the learnings render weirdly.

vdauwera added the side quests label Apr 14, 2025

robsyme reviewed Apr 20, 2025

View reviewed changes

docs/side_quests/splitting-and-grouping.md Outdated Show resolved Hide resolved

adamrtalbot and others added 11 commits April 22, 2025 17:43

Typo in splitting and grouping path

2af659c

Remove splitting and grouping (wrong branch)

49703ef

Clarify the difference between strings and files in Nextflow

cbbff2c

Clarify files in Nextflow

bc15704

Correct summary sentence to reflect code, it was copy+pasted wrong

d583ea1

Update docs/side_quests/working_with_files.md

f3c9ecc

Co-authored-by: Friederike Hanssen <[email protected]>

Clarify why we want to flatten tuples

7461343

Simplify the map operation part

d25b50d

Remove reference to Groovy methods

2ec1b41

Fix indentation

637855c

Conver data to metamap to introduce metamaps early

6998ddf

adamrtalbot added 2 commits April 23, 2025 12:14

Merge branch 'master' into side_quest_working_with_files

947ea15

Refine summary points

d37a127

Use markdown numbering

bbb263c

maxulysse reviewed Apr 25, 2025

View reviewed changes

docs/side_quests/working_with_files.md Outdated Show resolved Hide resolved

maxulysse reviewed Apr 25, 2025

View reviewed changes

docs/side_quests/working_with_files.md Outdated Show resolved Hide resolved

maxulysse reviewed Apr 25, 2025

View reviewed changes

docs/side_quests/working_with_files.md Outdated Show resolved Hide resolved

maxulysse reviewed Apr 25, 2025

View reviewed changes

docs/side_quests/working_with_files.md Show resolved Hide resolved

maxulysse reviewed Apr 25, 2025

View reviewed changes

docs/side_quests/working_with_files.md Outdated Show resolved Hide resolved

adamrtalbot and others added 10 commits April 25, 2025 14:10

Add link to file properties documentation

3ff7a4e

Explain multiple assignment better

07eb63d

Update docs/side_quests/working_with_files.md

9b4081f

Co-authored-by: Maxime U Garcia <[email protected]>

Add Channel.fromFilePairs docs as link to fromFilePairs section

a387086

Added glob explanation as a note

cdebb47

Added glob explanation as a note

c9078a1

Use bullet points instead of numbers for key concepts of working with…

b512d8f

… files

working with files use before-after syntax correctly

afb1445

working with files add highlighting

113a3bf

working with files highlighting fixup

aaeab0a

FriederikeHanssen reviewed Apr 30, 2025

View reviewed changes

adamrtalbot added 3 commits May 2, 2025 13:44

Merge branch 'master' into side_quest_working_with_files

8a37968

Fixups: Respond to review comments

ce66665

Merge branch 'master' into side_quest_working_with_files

de22980

FriederikeHanssen changed the base branch from master to intermediate_training June 5, 2025 11:06


		Wait, we have a problem. We have 2 replicates for sampleA, but only 1 output file! We are overwriting the output file each time.

		### 5.4. Make the published files unique


		Launching `main.nf` [prickly_stonebraker] DSL2 - revision: f62ab10a3f

		[[id:sampleA, replicate:1, type:normal, readNum:R], [/workspaces/training/side-quests/working_with_files/data/sampleA_rep1_normal_R1_001.fastq.gz, /workspaces/training/side-quests/working_with_files/data/sampleA_rep1_normal_R2_001.fastq.gz]]


		!!! note

		We are calling our map '`meta`'. This is the first introduction of a concept called `metamap` which we will cover later!

Add working with files side quest #601

Are you sure you want to change the base?

Add working with files side quest #601

Uh oh!

Conversation

adamrtalbot commented Apr 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Notes

Uh oh!

netlify bot commented Apr 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for nextflow-training ready!

Uh oh!

Uh oh!

FriederikeHanssen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

FriederikeHanssen left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

FriederikeHanssen commented Apr 22, 2025

Uh oh!

adamrtalbot commented Apr 23, 2025

Uh oh!

adamrtalbot commented Apr 23, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

FriederikeHanssen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

adamrtalbot commented Apr 11, 2025 •

edited

Loading

netlify bot commented Apr 11, 2025 •

edited

Loading

FriederikeHanssen left a comment •

edited

Loading