Exploring the Human Pan-Genome using Long-Read Sequencing

Exploring the Human Pan-Genome using Long-Read Sequencing

The human reference genome is a blueprint of genetic information that is essential for the development and functioning of the human body. The first human genome was sequenced in 2001, and since then, genome sequencing technology has advanced tremendously. With the advent of long-read (third generation) sequencing technology, scientists are now exploring the human pan-genome, which includes all the genetic information of the human population.

Development of Human Pan-genome

The currently available reference sequence of the human genome is becoming obsolete. Since the beginning of the genome era, scientists have used a single "reference" genome as the foundation for a variety of human genetic analyses. The entire collection of genes, non-coding sections, and structural changes found in every member of the human species and each of its subpopulations is referred to as the human pan-genome. Using a pan-genome may reduce the bias in genetic research that comes from using a single reference genome by capturing all of the genetic variations within one individual [1]. With a complete sequence of the human pangenome that surpassing the biased standard genome (GRCh38.p13), researchers can use this information to better understand the?disease and create more potent treatments.

Long-read sequencing technologies, , such as Pacific Biosciences’ Single Molecule Real-Time (SMRT) sequencing and Oxford Nanopore Technologies’ nanopore sequencing, have sparked a genomics revolution by allowing the sequencing of longer DNA segments than ever before. These third-generation sequencing technologies play a pivotal role in constructing the human pan-genome through various key contributions.

  1. Bridge the gaps in repetitive and complex regions of the genome with the benefit of longer read length (15–20 kb), resulting in improved continuity and accuracy.
  2. Facilitate the detection and analysis of structural variations (SV) and gene rearrangements with high read accuracy(90%) of bases (≥Q30), shedding light on genomic diversity.
  3. Aid in the assembly of the genome by connecting fragmented regions, yielding a more comprehensive pan-genome.

The current reference genome couldn’t represent the diversity of human populations, it is usually a consensus sequence produced from a small number of individuals, and it does not capture the entire degree of genetic variation that exists within and between populations. In 2021, a representative genome from Asia were examined by researchers to represent all of the genetic diversity within Han Chinese genomes.[2] The Chinese pan-genome was powered by Novogene used genomic reads from 486 Han Chinese individuals to identify 276 Mbp DNA sequences, validating the Chinese pan-genome as a representative source of population-specific DNA missing from GRCh38. These newly defined long-read sequencing as potential supplements to the current human reference. Owing to advancements in long-read sequencing technologies, it has become feasible to construct the human pan-genome. This development holds great promise for diverse applications in medical research.[3]

The Potential Application of Human Pan-genome Research

Pan-genomes, which are thorough representations of the genetic diversity within a species, now be constructed using long-read sequencing, which has become a potent tool in this process. Traditional reference genomes, which are often obtained from a single individual and do not fully represent the genetic variety within a population, are less precise and comprehensive than pan-genomes. Long-read sequencing pan-genome construction has a number of potential uses in human population and genome study. [4]

  • Comprehensive genome coverage: Traditional single reference genomes can only represent the genome of one or a few individuals, whereas a pan-genome considers a broader population diversity. It includes genome variations and polymorphisms from multiple individuals, providing a more comprehensive and accurate genome reference.
  • Discovery of human genome diversity: Utilizing the human pan-genome reveals the extensive diversity within the human genome. This helps us better understand the patterns of genetic variations within human populations, population migration history, and the complexity of genome structures. Through the study of the pan-genome, we can gain a more comprehensive understanding of human genome evolution and diversity.[5]
  • Support for rare variants and population-specific research: The pan-genome aids in the identification of rare genetic variants and population-specific genetic variations. For research and diagnosis of rare diseases, relying solely on a single reference genome may not provide sufficient information. The pan-genome allows for better capture of variations between populations, helping to explain the genetic mechanisms underlying rare diseases.

In summary, the use of a pan-genome provides a more comprehensive and accurate genome reference, revealing human genome diversity, supporting personalized medicine, and enabling the study of structural variations. It enhances our understanding of the complexity of the human genome and contributes to advancements in healthcare and disease research.

References

1. Duan, Z., et al., HUPAN: a pan-genome analysis pipeline for human genomes. Genome biology, 2019. 20(1): p. 1-11.

2. Li, Qiuhui, et al. "Building a Chinese pan-genome of 486 individuals." Communications biology 4.1 (2021): 1016.

3. Khamsi, R., A more-inclusive genome project aims to capture all of human diversity. Nature, 2022. 603(7901): p. 378-381.

4. Logsdon, G.A., M.R. Vollger, and E.E. Eichler, Long-read human genome sequencing and its applications. Nature Reviews Genetics, 2020. 21(10): p. 597-614.

5. Greely, H.T., Human genome diversity: what about the other human genome project? Nature Reviews Genetics, 2001. 2(3): p. 222-227.

要查看或添加评论,请登录