Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different data processing on metabolomics, I get different R2. #60

Open
Chenjiani1112 opened this issue Oct 24, 2020 · 7 comments
Open

Comments

@Chenjiani1112
Copy link

Chenjiani1112 commented Oct 24, 2020

Hi.
I have three multi-omics datasets of RNA seq (vst normalization), DNA methylation (beta value) and plasma metabolomics.
I normalized my metabolite data with the total sum of all detected ions and deleted unstable metabolite using QC, and deleted the outliers based on these retrained metabolites using IQR, then I normalized samples by median and normalized these plasma metabolite using pareto scaling.
Finally, I used my RNA seq, DNA methylation and plasma metabolites as input data to run MOFA.
Howerver, the results showed that all latent factors can explain about 0% variance in plasma metabolomics.
Then, I transformed my plasma mteabolite data using log transform and normalized by pareto scaling. This MOFA result( plasma metabolites with log)showed a dramatic difference compared with the prior MOFA resul t( plasma metabolites without log transform), that is all latent factors can explain about 10% variance in plasma metabolomics.

I am confused about the data input on metabolomics.
Thanks.

@rargelaguet
Copy link
Contributor

Hi @Chenjiani1112 ,
you have to use the log transformed values for the plasma metabolites. MOFA needs the data to be normal-ish distributed.

P.S. This mofa version is depreciated. Please move to MOFA v2 (https://biofam.github.io/MOFA2/)

@Chenjiani1112
Copy link
Author

Hi @Chenjiani1112 ,
you have to use the log transformed values for the plasma metabolites. MOFA needs the data to be normal-ish distributed.

P.S. This mofa version is depreciated. Please move to MOFA v2 (https://biofam.github.io/MOFA2/)

Hi @Chenjiani1112 ,
you have to use the log transformed values for the plasma metabolites. MOFA needs the data to be normal-ish distributed.

P.S. This mofa version is depreciated. Please move to MOFA v2 (https://biofam.github.io/MOFA2/)

Thanks for your help!

@Chenjiani1112
Copy link
Author

Hi @Chenjiani1112 ,
you have to use the log transformed values for the plasma metabolites. MOFA needs the data to be normal-ish distributed.

P.S. This mofa version is depreciated. Please move to MOFA v2 (https://biofam.github.io/MOFA2/)

Hi. Thanks for sovling my doubts. Now, I have another problem. When I transformed my metabolomics data by log transform, a number of data <0 were produced. I think this situation would exert great influence on my MOFA result.

Thanks

@nvall
Copy link

nvall commented Nov 14, 2020

Hi @Chenjiani1112,
This may be related to a values between 0 and 1. If this is the case then you may want to normalize with an other transformation or you should modify the values between 0 and 1 depending on what is the original distribution of your data (eg. defining the minimum as 1)

@Chenjiani1112
Copy link
Author

Hi

Hi @Chenjiani1112,
This may be related to a values between 0 and 1. If this is the case then you may want to normalize with an other transformation or you should modify the values between 0 and 1 depending on what is the original distribution of your data (eg. defining the minimum as 1)

Thanks!

@Chenjiani1112
Copy link
Author

Hi @Chenjiani1112 ,
you have to use the log transformed values for the plasma metabolites. MOFA needs the data to be normal-ish distributed.

P.S. This mofa version is depreciated. Please move to MOFA v2 (https://biofam.github.io/MOFA2/)

Hi. @rargelaguet
Thanks for helping me resolve my prior confusions. I have appreciated your published article about MOFA and your MOFA-related documents/tuorials. However, now I have another doubt when running MOFA. As I mentioned earlier, I have three multi-omics datasets of RNA-seq, DNA methylation and plasma metabolomics, I know you used vst data for RNA-seq data and M value for DNA methylation. Now, I want to use log2FPKM data for RNA-seq data; beta value data for DNA methylation; quantile normed, log2 transformed and pareto scaling data for plasma metabolomics. due to my research design. I want to know can I use log2FPKM for RNA-seq data as input data to run MOFA? This is my confusion. Meanwhile, I found that log normalised RNA-seq data or M-values of bulk methylation data was recommended in your MOFA tuorials.

Looking forward to your reply.
Thanks!

Best,
Chen.

@rargelaguet
Copy link
Contributor

Hi Chen,
the important requirement for MOFA is that the data needs to be continuous. Also, the closer it looks to a gaussian distribution the better, but this is not necessary. Can you attach here a histogram of your matrices before and after normalisation? Then it will be easier to provide guidance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants