Supporting data for "A probabilistic multi-omics data matching method for detecting sample errors in integrative analysis"
Application to simulated datasets suggests that proMODMatcher achieved robust statistical power even when the number of cis-associations was small and/or the number of samples was large. Application of our proMODMatcher to multi-omics datasets in The Cancer Genome Atlas (TCGA) and International Cancer Genome Consortium (ICGC) identified sample errors in multiple cancer datasets. Our procedure was not only able to identify sample labeling errors but also to unambiguously identify the source of the errors. Our results demonstrate that these errors should be identified and corrected before integrative analysis.
Our results indicate that sample labeling errors were common in large multi-omics datasets. These errors should be corrected before integrative analysis.
