Different data processing on metabolomics, I get different R2. #60

Chenjiani1112 · 2020-10-24T10:34:11Z

Hi.
I have three multi-omics datasets of RNA seq (vst normalization), DNA methylation (beta value) and plasma metabolomics.
I normalized my metabolite data with the total sum of all detected ions and deleted unstable metabolite using QC, and deleted the outliers based on these retrained metabolites using IQR, then I normalized samples by median and normalized these plasma metabolite using pareto scaling.
Finally, I used my RNA seq, DNA methylation and plasma metabolites as input data to run MOFA.
Howerver, the results showed that all latent factors can explain about 0% variance in plasma metabolomics.
Then, I transformed my plasma mteabolite data using log transform and normalized by pareto scaling. This MOFA result（ plasma metabolites with log）showed a dramatic difference compared with the prior MOFA resul t( plasma metabolites without log transform), that is all latent factors can explain about 10% variance in plasma metabolomics.

I am confused about the data input on metabolomics.
Thanks.

rargelaguet · 2020-10-29T11:15:31Z

Hi @Chenjiani1112 ,
you have to use the log transformed values for the plasma metabolites. MOFA needs the data to be normal-ish distributed.

P.S. This mofa version is depreciated. Please move to MOFA v2 (https://biofam.github.io/MOFA2/)

Chenjiani1112 · 2020-10-29T11:26:03Z

Hi @Chenjiani1112 ,
you have to use the log transformed values for the plasma metabolites. MOFA needs the data to be normal-ish distributed.

P.S. This mofa version is depreciated. Please move to MOFA v2 (https://biofam.github.io/MOFA2/)

Thanks for your help!

Chenjiani1112 · 2020-11-14T05:05:28Z

Hi @Chenjiani1112 ,
you have to use the log transformed values for the plasma metabolites. MOFA needs the data to be normal-ish distributed.

P.S. This mofa version is depreciated. Please move to MOFA v2 (https://biofam.github.io/MOFA2/)

Hi. Thanks for sovling my doubts. Now, I have another problem. When I transformed my metabolomics data by log transform, a number of data <0 were produced. I think this situation would exert great influence on my MOFA result.

Thanks

nvall · 2020-11-14T13:59:03Z

Hi @Chenjiani1112,
This may be related to a values between 0 and 1. If this is the case then you may want to normalize with an other transformation or you should modify the values between 0 and 1 depending on what is the original distribution of your data (eg. defining the minimum as 1)

Chenjiani1112 · 2020-11-14T14:03:29Z

Hi

Hi @Chenjiani1112,
This may be related to a values between 0 and 1. If this is the case then you may want to normalize with an other transformation or you should modify the values between 0 and 1 depending on what is the original distribution of your data (eg. defining the minimum as 1)

Thanks!

Chenjiani1112 · 2020-11-30T04:38:07Z

Hi @Chenjiani1112 ,
you have to use the log transformed values for the plasma metabolites. MOFA needs the data to be normal-ish distributed.

P.S. This mofa version is depreciated. Please move to MOFA v2 (https://biofam.github.io/MOFA2/)

Hi. @rargelaguet
Thanks for helping me resolve my prior confusions. I have appreciated your published article about MOFA and your MOFA-related documents/tuorials. However, now I have another doubt when running MOFA. As I mentioned earlier, I have three multi-omics datasets of RNA-seq, DNA methylation and plasma metabolomics, I know you used vst data for RNA-seq data and M value for DNA methylation. Now, I want to use log2FPKM data for RNA-seq data; beta value data for DNA methylation; quantile normed, log2 transformed and pareto scaling data for plasma metabolomics. due to my research design. I want to know can I use log2FPKM for RNA-seq data as input data to run MOFA? This is my confusion. Meanwhile, I found that log normalised RNA-seq data or M-values of bulk methylation data was recommended in your MOFA tuorials.

Looking forward to your reply.
Thanks!

Best,
Chen.

rargelaguet · 2020-11-30T07:30:11Z

Hi Chen,
the important requirement for MOFA is that the data needs to be continuous. Also, the closer it looks to a gaussian distribution the better, but this is not necessary. Can you attach here a histogram of your matrices before and after normalisation? Then it will be easier to provide guidance

Chenjiani1112 mentioned this issue Nov 30, 2020

Thanks. another problem on running MOFA #61

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Different data processing on metabolomics, I get different R2. #60

Different data processing on metabolomics, I get different R2. #60

Chenjiani1112 commented Oct 24, 2020 •

edited

Loading

rargelaguet commented Oct 29, 2020

Chenjiani1112 commented Oct 29, 2020

Chenjiani1112 commented Nov 14, 2020

nvall commented Nov 14, 2020

Chenjiani1112 commented Nov 14, 2020

Chenjiani1112 commented Nov 30, 2020

rargelaguet commented Nov 30, 2020

Different data processing on metabolomics, I get different R2. #60

Different data processing on metabolomics, I get different R2. #60

Comments

Chenjiani1112 commented Oct 24, 2020 • edited Loading

rargelaguet commented Oct 29, 2020

Chenjiani1112 commented Oct 29, 2020

Chenjiani1112 commented Nov 14, 2020

nvall commented Nov 14, 2020

Chenjiani1112 commented Nov 14, 2020

Chenjiani1112 commented Nov 30, 2020

rargelaguet commented Nov 30, 2020

Chenjiani1112 commented Oct 24, 2020 •

edited

Loading