Human Genome Project
Human Genome Project(HGP)
- This mega-project aimed to decode the genetic information contained within human DNA, which serves as the blueprint for our genetic makeup.
- The human genome is estimated to consist of approximately 3 billion base pairs (bp) of DNA.
- In the early stages of the project, the cost of sequencing was around US $3 per bp, leading to a staggering total estimated cost of about 9 billion US dollars.
- To put this into perspective, if the sequences were transcribed into books with 1000 letters per page and 1000 pages per book, it would require 3300 such books to store the DNA sequence information from a single human cell.
- To tackle the enormous volume of data generated by the project, advanced computational techniques and high-speed computers became essential for data storage, retrieval, and analysis.
- The emergence of a new field in biology known as Bioinformatics played a crucial role in handling and interpreting this vast amount of genetic information.
- The HGP aimed to unravel the genetic code of the human species, providing insights into the fundamental building blocks of life.
- It would shed light on the structure and function of human genes, enabling us to understand the genetic basis of health and disease.
- The project's findings promised advancements in medicine, diagnostics, and personalized healthcare.
The Human Genome Project exemplifies the power of scientific collaboration, technological innovation, and the pursuit of knowledge to unravel the intricacies of our genetic makeup, ultimately benefiting humanity in profound ways.
Goals of HGP
Aimed to identify and catalog all the genes present in the human DNA, estimated to be around 20,000-25,000 in number.
Aspired to decode the precise sequences of the 3 billion chemical base pairs comprising human DNA.
Organized the generated genetic information into comprehensive databases for accessibility and research purposes.
Worked on improving tools and techniques for effective analysis of the vast amount of genetic data.
Transferred knowledge and technologies derived from the project to various sectors, including industries, to stimulate innovation and applications.
Recognized the importance of addressing ethical, legal, and social concerns arising from the project's outcomes, such as privacy and genetic discrimination.
- The HGP was a 13-year-long collaborative effort led by the U.S. Department of Energy and the National Institute of Health.
- It gained significant support from international partners, including the Wellcome Trust (U.K.), Japan, France, Germany, China, and others.
- The project successfully reached its completion milestone in 2003.
- Understanding the effects of DNA variations among individuals holds the promise of innovative approaches to diagnosing, treating, and potentially preventing numerous human disorders.
- Beyond human biology, studying the DNA sequences of non-human organisms offers insights into their natural capabilities, which can be harnessed to address challenges in diverse fields such as healthcare, agriculture, energy production, and environmental remediation.
- Several non-human model organisms, including bacteria, yeast, nematodes, fruit flies, and plants, have also been sequenced, expanding our knowledge of genetics and biology.
- One approach aimed at identifying all genes expressed as RNA, known as Expressed Sequence Tags (ESTs).
- This method focused on capturing the RNA molecules produced from actively expressed genes.
2. Whole Genome Sequencing:
- The second approach involved the comprehensive sequencing of the entire genome, encompassing both coding and non-coding regions.
- Later, different regions within the sequenced genome were assigned specific functions through a process known as Sequence Annotation.
- DNA was extracted from a cell and fragmented into smaller, more manageable pieces. DNA is a long polymer, and sequencing very long pieces posed technical challenges.
- These fragments were cloned into host organisms, primarily bacteria and yeast, using specialized vectors such as BAC (bacterial artificial chromosomes) and YAC (yeast artificial chromosomes).
- Cloning amplified each DNA fragment, facilitating subsequent sequencing.
- Automated DNA sequencers, based on Frederick Sanger's sequencing method, were employed to determine the sequences of these fragments.
- Overlapping regions within the sequences were crucial for assembly, as these overlaps allowed fragments to be aligned properly.
- Due to the vast quantity of data, computer-based programs and algorithms were developed to aid in sequence alignment and assembly.
- After sequencing, the obtained DNA sequences were annotated and assigned to specific locations on each chromosome.
- The sequencing of Chromosome 1, the last of the 24 human chromosomes (22 autosomes and X and Y), was completed in May 2006.
- Assigning genetic and physical maps to the genome was another challenge, accomplished using information on restriction endonuclease recognition sites' polymorphism and repetitive DNA sequences like microsatellites.
1. Human Genome Project (HGP):
- The Human Genome Project commenced in 1990 with the goal of sequencing the entire human genome.
- Two primary approaches were used to sequence the genome:
- Expressed Sequence Tags (ESTs): Identifying genes expressed as RNA.
- Whole Genome Sequencing: Sequencing the entire genome, including coding and non-coding regions, and annotating regions with functions later.
- Total DNA from a cell was isolated.
- DNA fragments were generated from the isolated DNA. These fragments were relatively smaller in size.
- Fragments were cloned into host organisms like bacteria or yeast using specialized vectors.
- Cloning amplified each DNA fragment for easier sequencing.
- DNA fragments were sequenced using automated DNA sequencers based on Frederick Sanger's sequencing method.
- These sequences were arranged based on overlapping regions present in them.
- Specialized computer-based programs were developed for sequence alignment, as manual alignment was impractical.
- These sequences were annotated, and functions were assigned to them.
- Genetic and physical maps of the genome were created using information about polymorphisms of restriction endonuclease recognition sites and repetitive DNA sequences, such as microsatellites.
Salient Features of the Human Genome
1. Genome Size:
The human genome comprises a staggering 3,164.7 million base pairs (bp).
- Genes vary significantly in size. On average, a gene consists of about 3,000 base pairs.
- The largest known human gene is dystrophin, spanning an impressive 2.4 million bases.
- Contrary to earlier estimates of 80,000 to 140,000 genes, it is now believed that the human genome contains approximately 30,000 genes.
- Remarkably, nearly 99.9 percent of nucleotide bases are identical in all individuals.
- The functions of more than 50 percent of discovered genes remain unknown.
- Less than 2 percent of the human genome encodes proteins.
- A substantial portion of the human genome is composed of repeated sequences.
- These repetitive sequences, which repeat many times (sometimes hundreds to thousands), are not believed to have direct coding functions.
- They provide insights into chromosome structure, dynamics, and evolution.
- Chromosome 1 hosts the most genes, with a total of 2,968 genes.
- In contrast, the Y chromosome has the fewest genes, with only 231.
- Scientists have identified approximately 1.4 million locations where single-base DNA differences, known as SNPs (pronounced as "snips"), occur in humans.
- This information holds the promise of revolutionizing disease-associated sequence discovery and tracing human evolutionary history.
The Human Genome Project not only unveiled the vast complexity of our genetic blueprint but also provided critical insights into gene structure, function, and variation. These findings have far-reaching implications for genetics, medicine, and our understanding of human biology.