Skip to content

Process fails to get file size when there is only 1 file in the input, but not when there is more than 1. #6223

Open
@ajmaurais

Description

@ajmaurais

Bug report

Expected behavior and actual behavior

In my minimal example I am trying to get the size of a collection of input files that is passed into a process. If there is 1 file in the input the GET_FILE_STATS process fails with No such file or directory but if there is more than 1 file the GET_FILE_STATS process will succeed.

Steps to reproduce the problem

main.nf

#!/usr/bin/env nextflow

nextflow.enable.dsl = 2

process GENERATE_FILE {
    input:
        val file_name

    output:
        path("$file_name")

    script:
    """
    echo "Hello there from ${file_name}" > '${file_name}'
    """
}

process GET_FILE_CONTENT {
    input:
        path input_files

    output:
        path('data.txt')

    script:
    """
    files=( '${input_files.join("' '")}' )

    echo "Hello from main" > data.txt

    for f in \${files[@]} ; do
        cat \$f >> data.txt
    done
    """
}

process GET_FILE_STATS {
    input:
        path input_files

    output:
        path('data.txt')

    script:
    """
    echo "There are ${input_files.size()} files" > data.txt
    echo "The total size is: ${input_files*.size().sum()} bytes" >> data.txt
    """
}

workflow {
    files_to_make = Channel.fromList(['file_1.txt'])
    // files_to_make = Channel.fromList(['file_1.txt', 'file_2.txt'])

    GENERATE_FILE(files_to_make)
    files = GENERATE_FILE.out

    GET_FILE_CONTENT(files.collect())
    file_data = GET_FILE_CONTENT.out.splitText()
    file_data.collect().view()

    GET_FILE_STATS(files.collect())
    file_data = GET_FILE_STATS.out.splitText()
    file_data.collect().view()
}

Program output

Failed execution when there is 1 file

In this example files_to_make = Channel.fromList(['file_1.txt'])

$ nextflow run main.nf

 N E X T F L O W   ~  version 25.04.4

Launching `main.nf` [adoring_cuvier] DSL2 - revision: 68eebc0ab7

executor >  local (2)
[79/5fc932] GENERATE_FILE (1) [100%] 1 of 1 ✔
[ef/4dc64b] GET_FILE_CONTENT  [100%] 1 of 1 ✔
[-        ] GET_FILE_STATS    -
['Hello from main\n', 'Hello there from file_1.txt\n']
ERROR ~ Error executing process > 'GET_FILE_STATS'

Caused by:
  No such file or directory: file_1.txt


Source block:
  """
  echo "There are ${input_files.size()} files" > data.txt
  echo "The total size is: ${input_files*.size().sum()} bytes" >> data.txt
  """

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

 -- Check '.nextflow.log' file for details

Successful execution when there is more than 1 file

In this example files_to_make = Channel.fromList(['file_1.txt', 'file_2.txt'])

$ nextflow run main.nf

 N E X T F L O W   ~  version 25.04.4

Launching `main.nf` [shrivelled_linnaeus] DSL2 - revision: d96ebfaf04

executor >  local (4)
[39/f89949] GENERATE_FILE (1) [100%] 2 of 2 ✔
[c3/169d5d] GET_FILE_CONTENT  [100%] 1 of 1 ✔
[5e/d26150] GET_FILE_STATS    [100%] 1 of 1 ✔
['There are 2 files\n', 'The total size is: 56 bytes\n']
['Hello from main\n', 'Hello there from file_2.txt\n', 'Hello there from file_1.txt\n']

Environment

  • Nextflow version: 25.04.4
  • Java version: openjdk 21.0.7 2025-04-15
  • Operating system: Ubuntu 24.04.2 LTS
  • Bash version: GNU bash, version 5.2.21(1)-release (x86_64-pc-linux-gnu)

Additional context

I originally noticed this problem when I was trying to dynamically request memory in a process based off the size of the input files. If I were to add this: memory { Math.ceil(input_files*.size().sum() / (1024 ** 3))).GB } to the top of the GET_FILE_CONTENT process in the above example it would fail if there was only 1 file in input_files but not if there are more than 1. If I set the memory to a constant integer it will run successfully. The files are obviously getting transferred to the process work directory but there is something strange going on with file.size(). I have also tried using input_files.collect{ it.size() }.sum() but I get the same file not found error.

failing_nextflow.log

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions