Skip to content

Error running Epinano_DiffErr.R on 5mer output from Epinano_sumErr.py #125

@GeoffLyle

Description

@GeoffLyle

I have been running into an issue trying to run Epinano_DiffErr.R on the output from Epinano_sumErr.py.

Running Epnano_sumErr.py appears to work:
Epinano_sumErr.py --quality --file NHA_hTERT_DRNA_20220609_self_transcript_aligned.sorted.plus_strand.per.site.5mer.csv --out NHA-hTERT_5mer.sum_err.csv --kmer 5

python3 Epinano_sumErr.py --quality --file full_fq_to_sample_transcripts_output.sorted.plus_strand.per.site.5mer.csv --out DIPG-IV_5mer.sum_err.csv --kmer 5

However, when I try to run this output through Epinano_DiffErr.R I run into the following error:
Rscript Epinano_DiffErr.R -k NHA-hTERT_5mer.sum_err.csv -w DIPG-IV_5mer.sum_err.csv -c 30 -d 0.1 -t 3 -o DIPG-IV_NHA-hTERT_5mer_sumErr --feature sum_err3

Error:

Error in merge.data.frame(dat1, dat2, by = "chr_pos") :
negative length vectors are not allowed

This appears to be due to a memory limit issue.

Note:
I also tried changing line 126 in Epinano_DiffErr.R:
combine <- merge(dat1, dat2, by="chr_pos")
to:
combine <- dplyr::full_join(dat1, dat2, by="chr_pos")

I thought that the dplyr package could fix the memory limit issue, but I'm getting this error now:

Error: cannot allocate vector of size 127613.3 Gb
Execution halted

This is the size of the dataframes I want to merge:
[1] "Number of rows in dat1: 3571272"
54.5 Mb
[1] "Number of rows in dat2: 9592079"
146.4 Mb

Have you run into this error when running Epinano_DiffErr.R, and if so what was your solution?

PS: It would also be great to be able to pass the 5 sum_err columns at the same time as was suggested in #122

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions