Supporting data for "KOREF_S1: the phased, parental Trio-binned Korean reference genome using long-reads and Hi-C sequencing methods"
We produced 705 Gb ONT reads and 114 Gb PacBio HiFi reads, and corrected ONT reads by PacBio reads. The corrected ultra-long reads reached higher accuracy of 1.4% base-errors than the previous KOREF_S1v1.0, which was mainly built with short reads. KOREF has parental genome information, and we successfully phased it using a trio-binning method acquiring a near-complete haploid-assembly. The final assembly resulted in total length of 2.9 Gb with an N50 of 150 Mb, and the longest scaffold covered 97.3% of GRCh38’s chromosome 2. And the final assembly showed high base accuracy, less than 0.01% of base-errors.
KOREF_S1v2.1 is the first chromosome-scale haploid assembly of the Korean reference genome with high contiguity and accuracy. Our study provides useful resources of the Korean reference genome and demonstrates a new strategy of hybrid assembly which collaborates ONT’s PromethION and PacBio’s HiFi-CCS.
