Skip to content

Splitting and grouping side quest #595

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 38 commits into
base: master
Choose a base branch
from

Conversation

adamrtalbot
Copy link
Collaborator

@adamrtalbot adamrtalbot commented Apr 8, 2025

First draft of a splitting and grouping side quest. This will likely be part 2 of a 4 part series:

  1. file and string handling
  2. splitting and grouping
  3. (meta)data handling
  4. efficient grouping with groupKey

This includes separating samples using filter, then spreading by intervals then grouping together by sample ID and interval (i.e. grouping replicates).

Bad bits so far:

  • Lots of focus on data structure which could be simplified. We could use this to introduce some groovy stuff (e.g. normal_samples*?.fastq1)
  • No use of processes. This should be included somewhere, perhaps we could use a fake "align" process like @robsyme does in the advanced training?

Good bits:

  • Hammers home the message of channels, channels, channels.
  • Reiterates the usefulness of map
  • Avoids metamaps explicitly while carefully introducing the concepts.

First draft of a splitting and grouping side quest.
This includes separating samples using filter, then grouping and spreading by intervals.

Early version but introduces key operator concepts to participants.
This commit reverses the order to spread over intervals prior to grouping. This achieves two things:

1. It explains everything once and only once to make the tutorial simpler
2. It provides a real world reason for using groupTuple

This makes the flow of the tutorial easier to understand, at the cost of very verbose outputs.
@adamrtalbot adamrtalbot requested a review from Copilot April 8, 2025 14:53
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 1 out of 4 changed files in this pull request and generated no comments.

Files not reviewed (3)
  • side-quests/splitting_and_grouping/data/intervals.txt: Language not supported
  • side-quests/splitting_and_grouping/data/samplesheet.csv: Language not supported
  • side-quests/splitting_and_grouping/main.nf: Language not supported

Copy link

netlify bot commented Apr 8, 2025

Deploy Preview for nextflow-training ready!

Name Link
🔨 Latest commit edaa09a
🔍 Latest deploy log https://app.netlify.com/sites/nextflow-training/deploys/680fac528be964000822364e
😎 Deploy Preview https://deploy-preview-595--nextflow-training.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link
Collaborator

@pinin4fjords pinin4fjords left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIce! I might simplify the join one by reducing the ways of manipulating the grouping key.

@adamrtalbot adamrtalbot marked this pull request as ready for review April 10, 2025 10:20
@adamrtalbot adamrtalbot requested a review from vdauwera April 10, 2025 10:42
@adamrtalbot adamrtalbot requested a review from Copilot April 10, 2025 11:18
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 2 out of 4 changed files in this pull request and generated no comments.

Files not reviewed (2)
  • side-quests/splitting_and_grouping/data/samplesheet.csv: Language not supported
  • side-quests/splitting_and_grouping/main.nf: Language not supported

@adamrtalbot adamrtalbot requested a review from robsyme April 10, 2025 14:02
Copy link
Collaborator

@FriederikeHanssen FriederikeHanssen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I quite like it.

For the introduction, I would vote for having a small picture illustrating the concept using the mail sorting. Maybe that's just me and the way I tend to memorise things.

Maybe it's because I just looked at the working with file PR, but it feels like, we could finish that module with splitCSV and then reuse it here. That way we could keep files and what to do with them a bit more separate?

Let's move into the project directory.

```bash
cd side-quests/splitting-and-grouping
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cd side-quests/splitting-and-grouping
cd side-quests/splitting_and_grouping


```groovy title="main.nf" linenums="2" hl_lines="5 8 9"
ch_samplesheet = Channel.fromPath("./data/samplesheet.csv")
.splitCsv(header: true)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth mentioning that we could have done the mapping here with the same results and that we do this here for teaching purposes?


Let's now group the samples by this new grouping element, using the [`groupTuple` operator](https://www.nextflow.io/docs/latest/operator.html#grouptuple).

=== "After"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why, but this blocks does not render correctly (only After, before is fine)
Screenshot 2025-04-30 at 12 06 13

Copy link
Collaborator

@FriederikeHanssen FriederikeHanssen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good now. There is just the tiny rendering thing. I still think in a second iteration we should add some illustrations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants