Skip to content

Commit 38e3550

Browse files
pinin4fjordsclaude
andcommitted
Restructure validation intro and reduce overconfident claims
Addresses PR feedback about section structure and overconfident language. Changes: - Moved intro material (sections 1.1-1.3) to top-level overview before numbered sections - Moved configuration step (old 1.4) into section 1 as first practical step before examining schema - Renumbered all sections (old section 2 → 1, old section 3 → 2) - Updated takeaways to be less presumptuous: "You've learned" instead of "You now know", "seen in action" instead of "know how" - Removed abrupt section ending - section 1 now flows naturally from config to schema examination to adding parameters The structure now feels more complete with each section having clear practical outcomes rather than ending on pure exposition. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
1 parent f8b57c9 commit 38e3550

File tree

1 file changed

+59
-75
lines changed

1 file changed

+59
-75
lines changed

docs/hello_nf-core/05_input_validation.md

Lines changed: 59 additions & 75 deletions
Original file line numberDiff line numberDiff line change
@@ -32,49 +32,10 @@ Pipeline failed before execution - please fix the errors above
3232

3333
The pipeline fails immediately with clear, actionable error messages. This saves time, compute resources, and frustration.
3434

35-
## Two types of validation
35+
## The nf-schema plugin
3636

37-
nf-core pipelines validate two different kinds of input:
38-
39-
### Parameter validation
40-
41-
This validates command-line parameters (flags like `--outdir`, `--batch`, `--input`):
42-
43-
- Checks parameter types, ranges, and formats
44-
- Ensures required parameters are provided
45-
- Validates file paths exist
46-
- Defined in `nextflow_schema.json`
47-
48-
### Input data validation
49-
50-
This validates the contents of input files (like sample sheets or CSV files)
51-
52-
- Checks column structure and data types
53-
- Validates file references within the input file
54-
- Ensures required fields are present
55-
- Defined in `assets/schema_input.json`
56-
57-
!!! note
58-
59-
This section assumes you have completed [Part 4: Make an nf-core module](./04_make_module.md) and have a working `core-hello` pipeline with nf-core-style modules.
60-
61-
If you didn't complete Part 4 or want to start fresh for this section, you can use the `core-hello-part4` solution as your starting point:
62-
63-
```bash
64-
cp -r hello-nf-core/solutions/core-hello-part4 core-hello
65-
cd core-hello
66-
```
67-
68-
This gives you a fully functional nf-core pipeline with modules ready for adding input validation.
69-
70-
---
71-
72-
## 1. The nf-schema plugin
73-
74-
The [nf-schema plugin](https://nextflow-io.github.io/nf-schema/latest/) is a Nextflow plugin that provides comprehensive validation capabilities for any Nextflow pipeline.
75-
While nf-schema is a standalone tool that can be used in any Nextflow workflow, it's heavily integrated into the nf-core ecosystem and is the standard validation solution for all nf-core pipelines.
76-
77-
### 1.1. Core functionality
37+
The [nf-schema plugin](https://nextflow-io.github.io/nf-schema/latest/) is a Nextflow plugin that provides comprehensive validation capabilities for Nextflow pipelines.
38+
While nf-schema works with any Nextflow workflow, it's the standard validation solution for all nf-core pipelines.
7839

7940
nf-schema provides several key functions:
8041

@@ -86,7 +47,7 @@ nf-schema provides several key functions:
8647

8748
nf-schema is the successor to the deprecated nf-validation plugin and uses standard [JSON Schema Draft 2020-12](https://json-schema.org/) for validation.
8849

89-
### 1.2. The two schema files
50+
## Two schema files
9051

9152
An nf-core pipeline uses two schema files for validation:
9253

@@ -97,7 +58,25 @@ An nf-core pipeline uses two schema files for validation:
9758

9859
Both schemas use JSON Schema format, a widely-adopted standard for describing and validating data structures.
9960

100-
### 1.3. When validation occurs
61+
### Two types of validation
62+
63+
nf-core pipelines validate two different kinds of input:
64+
65+
**Parameter validation** validates command-line parameters (flags like `--outdir`, `--batch`, `--input`):
66+
67+
- Checks parameter types, ranges, and formats
68+
- Ensures required parameters are provided
69+
- Validates file paths exist
70+
- Defined in `nextflow_schema.json`
71+
72+
**Input data validation** validates the contents of input files (like sample sheets or CSV files):
73+
74+
- Checks column structure and data types
75+
- Validates file references within the input file
76+
- Ensures required fields are present
77+
- Defined in `assets/schema_input.json`
78+
79+
### When validation occurs
10180

10281
```mermaid
10382
graph LR
@@ -110,7 +89,26 @@ graph LR
11089

11190
Validation happens **before** any pipeline processes run, providing fast feedback and preventing wasted compute time.
11291

113-
### 1.4. Configure validation to skip input file validation
92+
!!! note
93+
94+
This section assumes you have completed [Part 4: Make an nf-core module](./04_make_module.md) and have a working `core-hello` pipeline with nf-core-style modules.
95+
96+
If you didn't complete Part 4 or want to start fresh for this section, you can use the `core-hello-part4` solution as your starting point:
97+
98+
```bash
99+
cp -r hello-nf-core/solutions/core-hello-part4 core-hello
100+
cd core-hello
101+
```
102+
103+
This gives you a fully functional nf-core pipeline with modules ready for adding input validation.
104+
105+
---
106+
107+
## 1. Parameter validation (nextflow_schema.json)
108+
109+
Let's start by adding parameter validation to our pipeline. This validates command-line flags like `--input`, `--outdir`, and `--batch`.
110+
111+
### 1.1. Configure validation to skip input file validation
114112

115113
The nf-core pipeline template comes with nf-schema already installed and configured:
116114

@@ -120,7 +118,7 @@ The nf-core pipeline template comes with nf-schema already installed and configu
120118

121119
The validation behavior is controlled through the `validation{}` scope in `nextflow.config`.
122120

123-
Since we'll be working on parameter validation first (section 2) and won't configure the input data schema until section 3, we need to temporarily tell nf-schema to skip validating the `input` parameter's file contents.
121+
Since we'll be working on parameter validation first (this section) and won't configure the input data schema until section 2, we need to temporarily tell nf-schema to skip validating the `input` parameter's file contents.
124122

125123
Open `nextflow.config` and find the `validation` block (around line 246). Add `ignoreParams` to skip input file validation:
126124

@@ -146,30 +144,16 @@ Open `nextflow.config` and find the `validation` block (around line 246). Add `i
146144
This configuration tells nf-schema to:
147145

148146
- **`defaultIgnoreParams`**: Skip validation of complex parameters like `genomes` (set by template developers)
149-
- **`ignoreParams`**: Skip validation of the `input` parameter's file contents (temporary - we'll remove this in section 3)
147+
- **`ignoreParams`**: Skip validation of the `input` parameter's file contents (temporary - we'll remove this in section 2)
150148
- **`monochromeLogs`**: Control colored output in validation messages
151149

152150
!!! note "Why ignore the input parameter?"
153151

154152
The `input` parameter in `nextflow_schema.json` has `"schema": "assets/schema_input.json"` which tells nf-schema to validate the *contents* of the input CSV file against that schema.
155153
Since we haven't configured that schema yet, we temporarily ignore this validation.
156-
We'll remove this setting in section 3 after configuring the input data schema.
157-
158-
### Takeaway
159-
160-
You now understand what nf-schema does, the two types of validation it provides, when validation occurs, and how to configure validation behavior. You've also temporarily disabled input file validation so we can focus on parameter validation first.
161-
162-
### What's next?
163-
164-
Start by implementing parameter validation for command-line flags.
165-
166-
---
167-
168-
## 2. Parameter validation (nextflow_schema.json)
169-
170-
Let's start by adding parameter validation to our pipeline. This validates command-line flags like `--input`, `--outdir`, and `--batch`.
154+
We'll remove this setting in section 2 after configuring the input data schema.
171155

172-
### 2.1. Examine the parameter schema
156+
### 1.2. Examine the parameter schema
173157

174158
Let's look at a section of the `nextflow_schema.json` file that came with our pipeline template:
175159

@@ -225,7 +209,7 @@ Key validation features:
225209

226210
Notice the `batch` parameter we've been using isn't defined yet in the schema!
227211

228-
### 2.2. Add the batch parameter
212+
### 1.3. Add the batch parameter
229213

230214
While the schema is a JSON file that can be edited manually, **manual editing is error-prone and not recommended**.
231215
Instead, nf-core provides an interactive GUI tool that handles the JSON Schema syntax for you and validates your changes:
@@ -317,7 +301,7 @@ grep -A 25 '"input_output_options"' nextflow_schema.json
317301

318302
You should see that the `batch` parameter has been added to the schema with the "required" field now showing `["input", "outdir", "batch"]`.
319303

320-
### 2.3. Test parameter validation
304+
### 1.4. Test parameter validation
321305

322306
Now let's test that parameter validation works correctly.
323307

@@ -348,7 +332,7 @@ The pipeline should run successfully, and the `batch` parameter is now validated
348332

349333
### Takeaway
350334

351-
You now know how to use the interactive `nf-core pipelines schema build` tool to add parameters to `nextflow_schema.json` and test parameter validation.
335+
You've learned how to use the interactive `nf-core pipelines schema build` tool to add parameters to `nextflow_schema.json` and seen parameter validation in action.
352336
The web interface handles all the JSON Schema syntax for you, making it easy to manage complex parameter schemas without error-prone manual JSON editing.
353337

354338
### What's next?
@@ -357,11 +341,11 @@ Now that parameter validation is working, let's add validation for the input dat
357341

358342
---
359343

360-
## 3. Input data validation (schema_input.json)
344+
## 2. Input data validation (schema_input.json)
361345

362346
Now let's add validation for the contents of our input CSV file. While parameter validation checks command-line flags, input data validation ensures the data inside the CSV file is structured correctly.
363347

364-
### 3.1. Understand the greetings.csv format
348+
### 2.1. Understand the greetings.csv format
365349

366350
Let's remind ourselves what our input looks like:
367351

@@ -381,7 +365,7 @@ This is a simple CSV with:
381365
- One greeting per line
382366
- Text strings with no special format requirements
383367

384-
### 3.2. Design the schema structure
368+
### 2.2. Design the schema structure
385369

386370
For our use case, we want to:
387371

@@ -392,7 +376,7 @@ For our use case, we want to:
392376

393377
We'll structure this as an array of objects, where each object has a `greeting` field.
394378

395-
### 3.3. Update the schema file
379+
### 2.3. Update the schema file
396380

397381
The nf-core pipeline template includes a default `assets/schema_input.json` designed for paired-end sequencing data.
398382
We need to replace it with a simpler schema for our greetings use case.
@@ -469,7 +453,7 @@ The key changes:
469453
- **`errorMessage`**: Custom error message shown if validation fails
470454
- **`required`**: Changed from `["sample", "fastq_1"]` to `["greeting"]`
471455

472-
### 3.4. Add a header to the greetings.csv file
456+
### 2.4. Add a header to the greetings.csv file
473457

474458
When nf-schema reads a CSV file, it expects the first row to contain column headers that match the field names in the schema.
475459

@@ -504,7 +488,7 @@ You've created a JSON schema for the greetings input file and added the required
504488

505489
Implement the validation in the pipeline code using `samplesheetToList`.
506490

507-
### 3.5. Implement samplesheetToList in the pipeline
491+
### 2.5. Implement samplesheetToList in the pipeline
508492

509493
Now we need to replace our simple CSV parsing with nf-schema's `samplesheetToList` function, which validates and converts the sample sheet.
510494

@@ -609,7 +593,7 @@ You've successfully implemented input data validation using `samplesheetToList`
609593

610594
Re-enable input validation in the config and test both parameter and input data validation to see them in action.
611595

612-
### 3.6. Re-enable input validation
596+
### 2.6. Re-enable input validation
613597

614598
Now that we've configured the input data schema, we can remove the temporary ignore setting we added in section 1.4.
615599

@@ -636,7 +620,7 @@ Open `nextflow.config` and remove the `ignoreParams` line from the `validation`
636620

637621
Now nf-schema will validate both parameter types AND the input file contents.
638622

639-
### 3.7. Test input validation
623+
### 2.7. Test input validation
640624

641625
Let's verify that our validation works by testing both valid and invalid inputs.
642626

@@ -713,7 +697,7 @@ The schema validation ensures that input files have the correct structure before
713697

714698
### Takeaway
715699

716-
You now know how to implement and test both parameter validation and input data validation. Your pipeline validates inputs before execution, providing fast feedback and clear error messages.
700+
You've implemented and tested both parameter validation and input data validation. Your pipeline now validates inputs before execution, providing fast feedback and clear error messages.
717701

718702
!!! tip "Further reading"
719703

0 commit comments

Comments
 (0)