psiPerIsoform resulting in empty .psi file #161

TinyTasy · 2023-04-12T10:04:28Z

Dear SUPPA team,

Thank you so much for your amazing tool. It is really helpful for differential isoform analysis.

I am trying to use SUPPA on pacbio single-cell isoseq data. I aligned my data with pbbm2 and used pigeon (SQANTI-based) to obtain a .gff file. Using gffread, I converted my .gff file into a .gtf file. My gtf file looks like this:

Thus, having the pb gene and transcript ID as the 9th column in the gtf file.

My expression file is a tab-seperated (.tsv) file consists of 268 samples (pseudobulks) and looks like this:

If I now execute this command:

python3.4 /vol/projects/agrinko/TREM2_7_03_2022/SUPPA-2.3/suppa.py psiPerIsoform
-g /vol/projects/agrinko/TREM2_7_03_2022/data/Trem2_Longread/pacbio_TREM2.gtf
-e /vol/projects/agrinko/TREM2_7_03_2022/data/Trem2_Longread/pseudobulk_without_rownames.tsv
-o /vol/projects/agrinko/TREM2_7_03_2022/data/Trem2_Longread/psiPerIsoform_output

I get this warning for each transcript:

INFO:psiPerGene:Reading GTF data.
INFO:psiPerGene:Reading Expression data.
INFO:psiPerGene:Calculating inclusion and generating output.
INFO:lib.tools:Expression for transcript "PB.104659.2" not found. Ignoring it in calculation.
INFO:lib.tools:Expression for transcript "PB.104659.16" not found. Ignoring it in calculation.
INFO:lib.tools:Expression for transcript "PB.98879.2" not found. Ignoring it in calculation.
INFO:lib.tools:Expression for transcript "PB.98879.3" not found. Ignoring it in calculation.
.
.
.

And my .psi output file is empty, only the sample names are persisting.

I already tried multiple things, such as testing tab seperated .txt files and .tsv files. I also already used the transcripts as rownames.

Do you have any idea what could be the issue? Any help is greatly appreciated.

Sincerely,
Tasy

EduEyras · 2023-04-12T11:04:19Z

Hi Tasy, Thanks for your email. Perhaps the transcript IDs in your GTF and in your expression file are different? They look different in your screen captures. SUPPA would not be able to match them. The expression file should have the transcript ID without the " ". Also, the GTF format uses " " for transcript and gene IDs: see e.g. https://asia.ensembl.org/info/website/upload/gff.html Please let me know if that would fix it best Eduardo

…

On Wed, 12 Apr 2023 at 20:04, TinyTasy ***@***.***> wrote: Dear SUPPA team, Thank you so much for your amazing tool. It is really helpful for differential isoform analysis. I am trying to use SUPPA on pacbio single-cell isoseq data. I aligned my data with pbbm2 and used pigeon (SQANTI-based) to obtain a .gff file. Using gffread, I converted my .gff file into a .gtf file. My gtf file looks like this: [image: Screenshot from 2023-04-12 11-54-45] <https://user-images.githubusercontent.com/118251413/231423185-3e586c04-d231-45c4-9a25-3c1c59ba445d.png> Thus, having the pb gene and transcript ID as the 9th column in the gtf file. My expression file is a tab-seperated (.tsv) file consists of 268 samples (pseudobulks) and looks like this: [image: Screenshot from 2023-04-12 11-56-32] <https://user-images.githubusercontent.com/118251413/231423691-97b5e60f-9440-410f-9747-1df66fa303b6.png> If I now execute this command: python3.4 /vol/projects/agrinko/TREM2_7_03_2022/SUPPA-2.3/suppa.py psiPerIsoform -g /vol/projects/agrinko/TREM2_7_03_2022/data/Trem2_Longread/pacbio_TREM2.gtf -e /vol/projects/agrinko/TREM2_7_03_2022/data/Trem2_Longread/pseudobulk_without_rownames.tsv -o /vol/projects/agrinko/TREM2_7_03_2022/data/Trem2_Longread/psiPerIsoform_output I get this warning for each transcript: INFO:psiPerGene:Reading GTF data. INFO:psiPerGene:Reading Expression data. INFO:psiPerGene:Calculating inclusion and generating output. INFO:lib.tools:Expression for transcript "PB.104659.2" not found. Ignoring it in calculation. INFO:lib.tools:Expression for transcript "PB.104659.16" not found. Ignoring it in calculation. INFO:lib.tools:Expression for transcript "PB.98879.2" not found. Ignoring it in calculation. INFO:lib.tools:Expression for transcript "PB.98879.3" not found. Ignoring it in calculation. . . . And my .psi output file is empty, only the sample names are persisting. I already tried multiple things, such as testing tab seperated .txt files and .tsv files. I also already used the transcripts as rownames. Do you have any idea what could be the issue? Any help is greatly appreciated. Sincerely, Tasy — Reply to this email directly, view it on GitHub <#161>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADCZKB2KYGOEICYQCXUUX73XAZ43RANCNFSM6AAAAAAW3OQLLU> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

TinyTasy · 2023-04-12T11:24:51Z

Hello Eduardo!

Thank you for your quick reply.
It's almost embarassing, but yes, the error laid in the expression file, I indeed only had to remove the " ".

I am grateful for your help, thank you!

Sincerely,
Tasy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

psiPerIsoform resulting in empty .psi file #161

psiPerIsoform resulting in empty .psi file #161

TinyTasy commented Apr 12, 2023

EduEyras commented Apr 12, 2023 via email

TinyTasy commented Apr 12, 2023

psiPerIsoform resulting in empty .psi file #161

psiPerIsoform resulting in empty .psi file #161

Comments

TinyTasy commented Apr 12, 2023

EduEyras commented Apr 12, 2023 via email

TinyTasy commented Apr 12, 2023