Description
Bug report
Expected behavior and actual behavior
In my minimal example I am trying to get the size of a collection of input files that is passed into a process. If there is 1 file in the input the GET_FILE_STATS
process fails with No such file or directory
but if there is more than 1 file the GET_FILE_STATS
process will succeed.
Steps to reproduce the problem
main.nf
#!/usr/bin/env nextflow
nextflow.enable.dsl = 2
process GENERATE_FILE {
input:
val file_name
output:
path("$file_name")
script:
"""
echo "Hello there from ${file_name}" > '${file_name}'
"""
}
process GET_FILE_CONTENT {
input:
path input_files
output:
path('data.txt')
script:
"""
files=( '${input_files.join("' '")}' )
echo "Hello from main" > data.txt
for f in \${files[@]} ; do
cat \$f >> data.txt
done
"""
}
process GET_FILE_STATS {
input:
path input_files
output:
path('data.txt')
script:
"""
echo "There are ${input_files.size()} files" > data.txt
echo "The total size is: ${input_files*.size().sum()} bytes" >> data.txt
"""
}
workflow {
files_to_make = Channel.fromList(['file_1.txt'])
// files_to_make = Channel.fromList(['file_1.txt', 'file_2.txt'])
GENERATE_FILE(files_to_make)
files = GENERATE_FILE.out
GET_FILE_CONTENT(files.collect())
file_data = GET_FILE_CONTENT.out.splitText()
file_data.collect().view()
GET_FILE_STATS(files.collect())
file_data = GET_FILE_STATS.out.splitText()
file_data.collect().view()
}
Program output
Failed execution when there is 1 file
In this example files_to_make = Channel.fromList(['file_1.txt'])
$ nextflow run main.nf
N E X T F L O W ~ version 25.04.4
Launching `main.nf` [adoring_cuvier] DSL2 - revision: 68eebc0ab7
executor > local (2)
[79/5fc932] GENERATE_FILE (1) [100%] 1 of 1 ✔
[ef/4dc64b] GET_FILE_CONTENT [100%] 1 of 1 ✔
[- ] GET_FILE_STATS -
['Hello from main\n', 'Hello there from file_1.txt\n']
ERROR ~ Error executing process > 'GET_FILE_STATS'
Caused by:
No such file or directory: file_1.txt
Source block:
"""
echo "There are ${input_files.size()} files" > data.txt
echo "The total size is: ${input_files*.size().sum()} bytes" >> data.txt
"""
Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`
-- Check '.nextflow.log' file for details
Successful execution when there is more than 1 file
In this example files_to_make = Channel.fromList(['file_1.txt', 'file_2.txt'])
$ nextflow run main.nf
N E X T F L O W ~ version 25.04.4
Launching `main.nf` [shrivelled_linnaeus] DSL2 - revision: d96ebfaf04
executor > local (4)
[39/f89949] GENERATE_FILE (1) [100%] 2 of 2 ✔
[c3/169d5d] GET_FILE_CONTENT [100%] 1 of 1 ✔
[5e/d26150] GET_FILE_STATS [100%] 1 of 1 ✔
['There are 2 files\n', 'The total size is: 56 bytes\n']
['Hello from main\n', 'Hello there from file_2.txt\n', 'Hello there from file_1.txt\n']
Environment
- Nextflow version:
25.04.4
- Java version:
openjdk 21.0.7 2025-04-15
- Operating system:
Ubuntu 24.04.2 LTS
- Bash version:
GNU bash, version 5.2.21(1)-release (x86_64-pc-linux-gnu)
Additional context
I originally noticed this problem when I was trying to dynamically request memory in a process based off the size of the input files. If I were to add this: memory { Math.ceil(input_files*.size().sum() / (1024 ** 3))).GB }
to the top of the GET_FILE_CONTENT
process in the above example it would fail if there was only 1 file in input_files
but not if there are more than 1. If I set the memory to a constant integer it will run successfully. The files are obviously getting transferred to the process work directory but there is something strange going on with file.size()
. I have also tried using input_files.collect{ it.size() }.sum()
but I get the same file not found error.