What species have had their genomes sequenced/are being sequenced?

What species have had their genomes sequenced/are being sequenced?

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

The human genome project released it's first complete genome nearly ten years ago. Since then many species have also been sequenced.

I am trying to find a list of completed (and possibly ongoing/initiated) projects sequencing other species along with some very basic summary data such as number of genes (divided amongst the sex chromosomes and autosomes), length of DNA, number of chromosomes etc.

This is for a presentation I am giving at a conference and would make a nice addition to my talk.

The GOLD database (Genomes Online DB) contains data on the sequencing status, and also some stats (number of chromosomes, genome size) -- but this extra data is not available for all species.

There are several lists on Wikipedia, for example for plants, bacteria and eucaryotes.

The Genome 10K project, in their words "aims to assemble a genomic zoo-a collection of DNA sequences representing the genomes of 10,000 vertebrate species, approximately one for every vertebrate genus.". Here is their species list.

At NCBI, you can find a table with genome Information per organism.

For each organism, you can find the Kingdom, group and subgroup it belongs to, the size in Megabases, the number of chromosomes, organelles and plasmids (if present) and the number of assemblies.

Whole-Genome Data Point to Four Species of Giraffe

Ruth Williams
May 6, 2021

ABOVE: Giraffa reticulata in Samburu National Park, Kenya

A fter performing the most detailed genomic sequence analysis to date of the world’s tallest land animal, researchers argue for the existence of four distinct giraffe species. But their report, published yesterday (May 5) in Current Biology, appears not to have settled the long-standing debate among giraffe experts on precise species numbers, with some still arguing there are likely more species and others fewer.

“This is really state-of-the-art genetic data [and] a tremendous contribution to science,” says evolutionary geneticist Rasmus Heller of the University of Copenhagen who was not involved in the research. “It’s really nice that we finally have whole genome data on this scale for giraffes,” he adds, noting that having numerous genomes representing so many giraffe populations is “not easy to get.” As to whether he thinks the data confirm the existence of four and only four species, he says, “it’s been kind of a contentious issue for a number of years and . . . I prefer to stay a little bit agnostic. . . . I really don’t know, to be honest.”

Since humans started classifying species, the iconic giraffe, which roams the savannahs of Africa munching on trees and towering above all other animals, had been considered a single species. With the advent of genetic sequencing, suggestions of six, eight, four and three species of giraffe, with varying numbers of subspecies, have been proposed.

And the debate has been “surprisingly heated,” says Heller. “It’s a problem of life being messy and difficult to pigeonhole, and humans having brains that compulsively put things into pigeonholes,” adds wildlife biologist Derek Lee of Penn State University who did not participate in the study.

“[We wanted] to tackle this problem once and for all,” says evolutionary geneticist Axel Janke of the Senckenberg Biodiversity and Climate Research Centre in Germany.

Janke’s team created a reference genome by de novo sequencing a newly acquired giraffe DNA sample, and used this to align 50 more whole genomes obtained from high-quality re-sequencing of existing giraffe samples and two publically available giraffe sequences. Forty-three of the samples came from wild populations in 17 locations across the African continent, collected by members of the Giraffe Conservation Foundation (GCF). The remaining eight individuals came from three European zoos. The samples represent members of all suspected species or subspecies of the mammal.

Comparative sequence analyses, examining nearly 200,000 single nucleotide polymorphisms in the animals’ genomes, confirmed Janke’s previous finding that the sequences cluster into four distinct groups, or species. The previous study had been based on only seven genomic loci together with mitochondrial sequences. The team’s whole-genome analysis further suggested that the four lineages had been evolving separately with no significant evidence of hybridization. According to the team, the species are: Giraffa camelopardalis (including the subspecies G. c. antiquorum, G. c. camelopardalis, and G. c. peralta) G. tippelskirchi (including the subspecies G. t. tippelskirchi and G. t. thornicrofti) G. giraffa (including the subspecies G. g. angolensis and G. g. giraffa) and G. reticulata.

While hybridization between these species can occur in captivity, hence the previous argument for a single species of giraffe, “we don’t see signs of hybridization in the genomes, and by inference we say it does not occur in the wild,” says Janke, and that supports the biological definition of separate species, he says.

See “Hybrid Animals Are Not Nature’s Misfits”

Evolutionary biologist Alexandre Hassanin of Sorbonne University, whose previous analysis had argued for the existence of only three species, disagrees. He writes in an email to The Scientist that hybridization actually does occur between G. reticulata and G. camelopardalis, making them members of the same species. The current paper, he says, “is limited in scope because the authors choose to include only a single wild population of the subspecies reticulata in their genomic analyses.” If they had included “populations of reticulata found in the western and northern parts of its distribution, I am pretty sure that the conclusions would be different,” he adds.

In contrast, molecular biologist Douglas Cavener of Penn State University argues there may be more than just four species. He says that because the team does not disclose the precise locations of the animals sampled, it is possible that some may be members of the same family and therefore genetically very similar to each other. If that is the case, he says, the natural populations may actually be more diverse, with more species and subspecies than this study indicates.

Janke writes in a follow-up email to The Scientist, “The samples are from dart biopsies taken by GCF over the course of a few days, I think. Of course, one can never exclude involving related individuals, but I am very confident by knowing how professionally the GCF team works firsthand that they did their best to avoid sampling from related individuals.”

See “Genome Reveals Clues to Giraffes’ ‘Blantantly Strange’ Body Shape”

So, why does knowing the precise number of giraffe species even matter?

“Whether we like it or not, the species is still the basic currency . . . to measure biodiversity,” says Heller, and “the way in which conservation attention and resources are allocated is based on species delimitation. . . . It is the species unit that we care about protecting.”

If there are four species of giraffe instead of one, Heller explains, then “they will each have their own status,” which may strengthen conservation efforts.

Ultimately, says Cavener, it’s a pity there has been so much controversy over this question of species because, when it comes down to it, “everybody, I think, has the same interest in mind, and that is giraffe conservation.”

Scientists sequence genomes of 240 animals to understand evolution at DNA level

A multidisciplinary team of scientists led by Elinor Karlsson, PhD, associate professor of molecular medicine in the Program in Bioinformatics and Computational Biology, has captured biodiversity at a genetic level. By sequencing the genome of 240 mammalian species, 122 of which had never been sequenced, researchers identified a correlation between regions of reduced genetic diversity in species with a higher risk extinction. Further use of these comparative genomes will allow scientists to identify stretches of DNA that have remained unchanged (or conserved) in mammals for millions of years, leading to new insights into human health, disease and biodiversity.

“What we’ve been able to do by sequencing these genomes is capture biodiversity at a genetic level,” said Dr. Karlsson. “Taking this data, we can analyze mammalian genomes across species to see what’s changing or not changing over millions of years in interesting ways across all these genomes. This includes areas of the genome where changes are most likely to lead to disease or illness.”

The data, published in Nature, has already been used to further understanding of disease and illness. Earlier this year, Karlsson was one of the authors that used the work in a Proceedings of the National Academy of Sciences study that identified species that may be especially vulnerable to human-to-animal transmission for SARS-CoV-2.

To capture a diverse and broad array of species to generate a genomic data set that was useful, Karlsson included at least one species from each eutherian family. Among the species selected are nine that are the sole members of their family and seven that are critically endangered, including the Mexican howler monkey, hirola, Russian saiga, social tuco-tuco, indri, northern white rhinoceros and black rhinoceros. In total, 80 percent of mammalian families are represented in Karlsson’s comparative analysis.

“A lot of these animals can’t be found in zoos,” explained Karlsson. “We could only get DNA samples by going out into the field and finding these species in their native habitat. For species that live in remote places, like the rain forest or deep ocean, getting a DNA sample back to the lab that was of a quality that could be sequenced is a huge challenge.”

Once Karlsson and her team had the sequences, they had to analyze the data. To do this, the various genomes had to be lined up correctly so that corresponding genetic regions were being accurately studied. Comparing 240 genomes, including humans, base-by-base and lining them all up accurately took nine months of cloud computing to get to a single base resolution. “Computationally, this is a huge lift,” said Karlsson.

Once all the data was processed, scientists were able to isolate 3.1 percent of the mammalian genome that was nearly identical between all 240 species.

“What this means,” said Karlsson, “is that these DNA sequences were unchanged since the time all these species shared a common ancestor—going back millions and millions of years. This is more than we would expect from random mutations. This would suggest that these areas of DNA are critical to life, and that animals with mutations in these areas tended not to survive long enough to reproduce.”

One of the initial questions Karlsson was able to investigate was how much diversity exists in the genome of a given species.

“If we are looking for early signals that a population might be threatened, and could benefit from intervention from conservation groups, we can find that in the genetic data,” said Karlsson. “Species with less biodiversity are likely to have fewer genetic differences between the DNA inherited from mom and the DNA inherited from dad. These species could be identified using genetic data before population numbers drop precipitously, and prioritized for in-depth study.”

While looking for areas of similarities between species can lead to insights into human health and disease, Karlsson is also intrigued by genetic differences between species. “If you think of all the things other species can do that humans can’t, like hibernation,” said Karlsson. “Every year, animals that hibernate stock up on calories, they become insulin resistant, and they hibernate. Then they just bounce back. Humans cannot do that. It would be disastrous. What are the genes that control that? What does that mean and how does it relate back to how the human genome works? That’s the ultimate question.”

Scientists Sequence Genomes of 131 Placental Mammal Species

In a study that has implications to advance medicine and biodiversity conservation, a large international consortium of researchers involved in the Zoonomia Project has sequenced and analyzed the genomes of 131 species of placental mammals, bringing the worldwide total to 240.

The Zoonomia Project brings the fraction of eutherian mammal families that are represented by at least one assembly to 83%. This image shows the brown-throated sloth (Bradypus variegatus) in the Cahuita National Park, Costa Rica. Image credit: Christian Mehlführer / CC BY 2.5.

The genomics revolution is enabling advances not only in medical research, but also in basic biology and in the conservation of biodiversity, where genomic tools have helped to apprehend poachers and to protect endangered populations.

However, we have only a limited ability to predict which genomic variants lead to changes in organism-level phenotypes, such as increased disease risk — a task that, in humans, is complicated by the sheer size of the genome.

Comparative genomics can address this challenge by identifying nucleotide positions that have remained unchanged across millions of years of evolution, focusing the search for disease-causing variants.

In 2011, the 29 Mammals Project identified genetic regions of evolutionary constraint that in total comprise 4.2% of the genome, by measuring sequence conservation in humans plus 28 other mammals.

These regions proved to be more enriched for the heritability of complex diseases than any other functional mark, including coding status.

By expanding the number of species and making an alignment that is independent of any single reference genome, the Zoonomia Project — formerly called the 200 Mammals Project — was designed to detect evolutionary constraint in the eutherian lineage at increased resolution, and to provide genomic resources for over 130 previously uncharacterized species.

“The comparison of the genomes from the 240 mammals will help geneticists to identify the mutations that lead to human diseases,” said Professor Kerstin Lindblad-Toh, a researcher at Uppsala University, SciLifeLab and the Broad Institute of MIT and Harvard.

The Zoonomia scientists identified genetic innovations that seem to protect certain animals from diseases like cancer and diabetes.

They also pinpointed genomic elements that have remained unchanged across millions of years of evolution, which predict where mutations are likely to be associated with risk of disease, and could reveal new avenues of therapeutic development.

“Before the microscope, we couldn’t see what was going on inside of a cell,” said Dr. Oliver Ryder, Kleberg endowed director of conservation genetics at the San Diego Zoo Institute for Conservation Research.

“Now, we’re viewing life from an entirely new perspective. DNA carries instructions, and now we’re able to read those.”

Phylogenetic tree of the mammalian families in the Zoonomia Project alignment, including both new assemblies and all other high-quality mammalian genomes publicly available in GenBank when the team started the alignment. Existing taxonomic classifications recognize a total of 127 extant families of eutherian mammal, including 43 families that were not previously represented in GenBank (red boxes) and 41 families with additional representative genome assemblies (pink boxes). Of the remaining families, 21 had GenBank genome assemblies but no Zoonomia Project assembly (gray boxes) and 22 had no representative genome assembly (white boxes). Parenthetical numbers indicate the number of species with genome assemblies in a given family. Image credit: Genereux et al., doi: 10.1038/s41586-020-2876-6.

In addition to understanding the human genome, all new placental mammal genomes together can be used to study how specific species adapt to different environments.

For example, some otters have a thick, water-resistant coat, and some mice, but not all, have adapted to hibernation. These animal traits can help us understand human traits such as metabolic diseases.

With climate change and more animal habitats being affected by human activities, it is becoming more and more important to defend endangered species.

In the new study, animals on the IUCN red list of endangered species had less variation in their genome, which is consistent with their endangered status.

“We hope that our extensive data set, which is available to all scientists in the world, will be used for understanding disease genetics and the protection of biodiversity,” Professor Lindblad-Toh said.

“Genome sequences for endangered species can help identify a species’ extinction risks and steer conservation efforts,” said Dr. Megan Owen, corporate director of wildlife conservation science at San Diego Zoo Global.

“They also give wildlife officials tools to apprehend poachers and wildlife traffickers.”

“One of the most exciting things about the Zoonomia Project is that many of our core questions are accessible to people both within and outside of science,” said Dr. Diane Genereux, a research scientist in the Vertebrate Genomics Group at the Broad Institute of MIT and Harvard.

“By designing scientific projects that are accessible to all, we can ensure benefits for public, human, and environmental health.”

The team’s results were published in the November 12, 2020 issue of the journal Nature.

How Scientists Decide Which Animal Genomes to Sequence

What do African clawed frogs, orangutans and goats all have in common? Geneticists have looked deep, deep inside their genes: These species have had their whole genomes sequenced.

Related Content

You may have heard about the possibility of getting your own whole genome sequenced. A few years ago, the price of sequencing a human genome dropped to $1,000. It's not pocket change, but nor is it the $2.7 billion it cost to sequence the first human genome. With animals, though, it's more complicated. Since no others of that species have ever been sequenced, it's more difficult to put the genome together without any reference.

The roundworm C. elegans became the first animal to have its genome  sequenced, in 1998. Since then, better technology for genome sequencing has allowed scientists to move on to significantly more complicated organisms and do the sequencing much more quickly and effectively. 

But it's still unlikely that scientists will ever sequence every animal's genome. They have to pick and choose. So where to start?

There’s no one criteria on which this decision is made. Sometimes it’s to raise awareness about the species and its potential benefit for humanity: That was the reason researchers from the National University of Singapore gave when applying for funding to sequence the temple pit viper’s genome earlier this year,  writes Samantha Boh for the Singapore Times. The viper is “the only snake species known to produce a toxin called waglerin,” she writes– “a neuromuscular inhibitor which scientists believe could be developed into a muscle relaxant drug.”

Beyond the potential medical benefits of genome sequencing, the practice important to basic scientific–and historical–understanding of the world. “Nestled in the genomes of living species are the historic footprints of the adaptive events that led them to where they are today,” said Stephen O’Brien, chief of the Laboratory of Genomic Diversity, at a conference.

Studying the present genomes of animals can tell scientists about their past as a species–and the history of the environments where they’ve lived and the other species who have lived with them. For example, the genomes of domesticated animals can help explain humanity’s past. Both humans and animals like cows and pigs were changed (and continue to be changed) when part of humanity settled down and started farming. Studying how they evolved as they became domesticated helps geneticists understand the factors in ancient human evolution, and it can help explain when exactly the animals were domesticated.

These domestic animals' genomes have much to offer humanity as well. “Accurate reference genomes are important for understanding an organism’s biology, for learning about the genetic causes of health and disease and, in animals, for making breeding decisions,”  according to a National Human Genome Research Institute press release.  

Sometimes sequencing an animal’s genome helps scientists stay sharp. Canadian researchers who normally work on the human genome sequenced the beaver’s genome earlier this year in celebration of Canada’s 150th birthday. “Most of our efforts are on human genomes,” scientist Stephen Scherer told me. “But it actually stimulates us intellectually to look beyond what we’re doing.” It didn't hurt that the beaver is the national symbol of Canada. Because sometimes, good public relations is as good of a reason as any.

Papadum, the San Clemente goat whose genome was reconstructed using a new technique earlier this year. (Brian L. Sayre)

About Kat Eschner

Kat Eschner is a freelance science and culture journalist based in Toronto.


The DNA sequencing methods used in the 1970s and 1980s were manual, for example Maxam-Gilbert sequencing and Sanger sequencing. Several whole bacteriophage and animal viral genomes were sequenced by these techniques, but the shift to more rapid, automated sequencing methods in the 1990s facilitated the sequencing of the larger bacterial and eukaryotic genomes. [10]

The first organism to have its entire genome sequenced was Haemophilus influenzae in 1995. [11] After it, the genomes of other bacteria and some archaea were first sequenced, largely due to their small genome size. H. influenzae has a genome of 1,830,140 base pairs of DNA. [11] In contrast, eukaryotes, both unicellular and multicellular such as Amoeba dubia and humans (Homo sapiens) respectively, have much larger genomes (see C-value paradox). [12] Amoeba dubia has a genome of 700 billion nucleotide pairs spread across thousands of chromosomes. [13] Humans contain fewer nucleotide pairs (about 3.2 billion in each germ cell - note the exact size of the human genome is still being revised) than A. dubia however their genome size far outweighs the genome size of individual bacteria. [14]

The first bacterial and archaeal genomes, including that of H. influenzae, were sequenced by Shotgun sequencing. [11] In 1996 the first eukaryotic genome (Saccharomyces cerevisiae) was sequenced. S. cerevisiae, a model organism in biology has a genome of only around 12 million nucleotide pairs, [15] and was the first unicellular eukaryote to have its whole genome sequenced. The first multicellular eukaryote, and animal, to have its whole genome sequenced was the nematode worm: Caenorhabditis elegans in 1998. [16] Eukaryotic genomes are sequenced by several methods including Shotgun sequencing of short DNA fragments and sequencing of larger DNA clones from DNA libraries such as bacterial artificial chromosomes (BACs) and yeast artificial chromosomes (YACs). [17]

In 1999, the entire DNA sequence of human chromosome 22, the shortest human autosome, was published. [18] By the year 2000, the second animal and second invertebrate (yet first insect) genome was sequenced - that of the fruit fly Drosophila melanogaster - a popular choice of model organism in experimental research. [19] The first plant genome - that of the model organism Arabidopsis thaliana - was also fully sequenced by 2000. [20] By 2001, a draft of the entire human genome sequence was published. [21] The genome of the laboratory mouse Mus musculus was completed in 2002. [22]

In 2004, the Human Genome Project published an incomplete version of the human genome. [23] In 2008, a group from Leiden, The Netherlands, reported the sequencing of the first female human genome (Marjolein Kriek).

Cells used for sequencing Edit

Almost any biological sample containing a full copy of the DNA—even a very small amount of DNA or ancient DNA—can provide the genetic material necessary for full genome sequencing. Such samples may include saliva, epithelial cells, bone marrow, hair (as long as the hair contains a hair follicle), seeds, plant leaves, or anything else that has DNA-containing cells.

The genome sequence of a single cell selected from a mixed population of cells can be determined using techniques of single cell genome sequencing. This has important advantages in environmental microbiology in cases where a single cell of a particular microorganism species can be isolated from a mixed population by microscopy on the basis of its morphological or other distinguishing characteristics. In such cases the normally necessary steps of isolation and growth of the organism in culture may be omitted, thus allowing the sequencing of a much greater spectrum of organism genomes. [24]

Single cell genome sequencing is being tested as a method of preimplantation genetic diagnosis, wherein a cell from the embryo created by in vitro fertilization is taken and analyzed before embryo transfer into the uterus. [25] After implantation, cell-free fetal DNA can be taken by simple venipuncture from the mother and used for whole genome sequencing of the fetus. [26]

Early techniques Edit

Sequencing of nearly an entire human genome was first accomplished in 2000 partly through the use of shotgun sequencing technology. While full genome shotgun sequencing for small (4000–7000 base pair) genomes was already in use in 1979, [27] broader application benefited from pairwise end sequencing, known colloquially as double-barrel shotgun sequencing. As sequencing projects began to take on longer and more complicated genomes, multiple groups began to realize that useful information could be obtained by sequencing both ends of a fragment of DNA. Although sequencing both ends of the same fragment and keeping track of the paired data was more cumbersome than sequencing a single end of two distinct fragments, the knowledge that the two sequences were oriented in opposite directions and were about the length of a fragment apart from each other was valuable in reconstructing the sequence of the original target fragment.

The first published description of the use of paired ends was in 1990 as part of the sequencing of the human HPRT locus, [28] although the use of paired ends was limited to closing gaps after the application of a traditional shotgun sequencing approach. The first theoretical description of a pure pairwise end sequencing strategy, assuming fragments of constant length, was in 1991. [29] In 1995 the innovation of using fragments of varying sizes was introduced, [30] and demonstrated that a pure pairwise end-sequencing strategy would be possible on large targets. The strategy was subsequently adopted by The Institute for Genomic Research (TIGR) to sequence the entire genome of the bacterium Haemophilus influenzae in 1995, [31] and then by Celera Genomics to sequence the entire fruit fly genome in 2000, [32] and subsequently the entire human genome. Applied Biosystems, now called Life Technologies, manufactured the automated capillary sequencers utilized by both Celera Genomics and The Human Genome Project.

Current techniques Edit

While capillary sequencing was the first approach to successfully sequence a nearly full human genome, it is still too expensive and takes too long for commercial purposes. Since 2005 capillary sequencing has been progressively displaced by high-throughput (formerly "next-generation") sequencing technologies such as Illumina dye sequencing, pyrosequencing, and SMRT sequencing. [33] All of these technologies continue to employ the basic shotgun strategy, namely, parallelization and template generation via genome fragmentation.

Other technologies are emerging, including nanopore technology. Though nanopore sequencing technology is still being refined, its portability and potential capability of generating long reads are of relevance to whole-genome sequencing applications. [34]

Analysis Edit

In principle, full genome sequencing can provide the raw nucleotide sequence of an individual organism's DNA. However, further analysis must be performed to provide the biological or medical meaning of this sequence, such as how this knowledge can be used to help prevent disease. Methods for analysing sequencing data are being developed and refined.

Because sequencing generates a lot of data (for example, there are approximately six billion base pairs in each human diploid genome), its output is stored electronically and requires a large amount of computing power and storage capacity.

While analysis of WGS data can be slow, it is possible to speed up this step by using dedicated hardware. [35]

A number of public and private companies are competing to develop a full genome sequencing platform that is commercially robust for both research and clinical use, [36] including Illumina, [37] Knome, [38] Sequenom, [39] 454 Life Sciences, [40] Pacific Biosciences, [41] Complete Genomics, [42] Helicos Biosciences, [43] GE Global Research (General Electric), Affymetrix, IBM, Intelligent Bio-Systems, [44] Life Technologies, Oxford Nanopore Technologies, [45] and the Beijing Genomics Institute. [46] [47] [48] These companies are heavily financed and backed by venture capitalists, hedge funds, and investment banks. [49] [50]

A commonly-referenced commercial target for sequencing cost until the late 2010s was $1,000, however, the private companies are working to reach a new target of only $100. [51]

Incentive Edit

In October 2006, the X Prize Foundation, working in collaboration with the J. Craig Venter Science Foundation, established the Archon X Prize for Genomics, [52] intending to award $10 million to "the first team that can build a device and use it to sequence 100 human genomes within 10 days or less, with an accuracy of no more than one error in every 1,000,000 bases sequenced, with sequences accurately covering at least 98% of the genome, and at a recurring cost of no more than $1,000 per genome". [53] The Archon X Prize for Genomics was cancelled in 2013, before its official start date. [54] [55]

History Edit

In 2007, Applied Biosystems started selling a new type of sequencer called SOLiD System. [56] The technology allowed users to sequence 60 gigabases per run. [57]

In June 2009, Illumina announced that they were launching their own Personal Full Genome Sequencing Service at a depth of 30× for $48,000 per genome. [58] [59] In August, the founder of Helicos Biosciences, Stephen Quake, stated that using the company's Single Molecule Sequencer he sequenced his own full genome for less than $50,000. [60] In November, Complete Genomics published a peer-reviewed paper in Science demonstrating its ability to sequence a complete human genome for $1,700. [61] [62]

In May 2011, Illumina lowered its Full Genome Sequencing service to $5,000 per human genome, or $4,000 if ordering 50 or more. [63] Helicos Biosciences, Pacific Biosciences, Complete Genomics, Illumina, Sequenom, ION Torrent Systems, Halcyon Molecular, NABsys, IBM, and GE Global appear to all be going head to head in the race to commercialize full genome sequencing. [33] [64]

With sequencing costs declining, a number of companies began claiming that their equipment would soon achieve the $1,000 genome: these companies included Life Technologies in January 2012, [65] Oxford Nanopore Technologies in February 2012, [66] and Illumina in February 2014. [67] [68] In 2015, the NHGRI estimated the cost of obtaining a whole-genome sequence at around $1,500. [69] In 2016, Veritas Genetics began selling whole genome sequencing, including a report as to some of the information in the sequencing for $999. [70] In summer 2019 Veritas Genetics cut the cost for WGS to $599. [71] In 2017, BGI began offering WGS for $600. [72]

However, in 2015 some noted that effective use of whole gene sequencing can cost considerably more than $1000. [73] Also, reportedly there remain parts of the human genome that have not been fully sequenced by 2017. [74] [75] [76]

DNA microarrays Edit

Full genome sequencing provides information on a genome that is orders of magnitude larger than by DNA arrays, the previous leader in genotyping technology.

For humans, DNA arrays currently provide genotypic information on up to one million genetic variants, [77] [78] [79] while full genome sequencing will provide information on all six billion bases in the human genome, or 3,000 times more data. Because of this, full genome sequencing is considered a disruptive innovation to the DNA array markets as the accuracy of both range from 99.98% to 99.999% (in non-repetitive DNA regions) and their consumables cost of $5000 per 6 billion base pairs is competitive (for some applications) with DNA arrays ($500 per 1 million basepairs). [40]

Mutation frequencies Edit

Whole genome sequencing has established the mutation frequency for whole human genomes. The mutation frequency in the whole genome between generations for humans (parent to child) is about 70 new mutations per generation. [80] [81] An even lower level of variation was found comparing whole genome sequencing in blood cells for a pair of monozygotic (identical twins) 100-year-old centenarians. [82] Only 8 somatic differences were found, though somatic variation occurring in less than 20% of blood cells would be undetected.

In the specifically protein coding regions of the human genome, it is estimated that there are about 0.35 mutations that would change the protein sequence between parent/child generations (less than one mutated protein per generation). [83]

In cancer, mutation frequencies are much higher, due to genome instability. This frequency can further depend on patient age, exposure to DNA damaging agents (such as UV-irradiation or components of tobacco smoke) and the activity/inactivity of DNA repair mechanisms. [ citation needed ] Furthermore, mutation frequency can vary between cancer types: in germline cells, mutation rates occur at approximately 0.023 mutations per megabase, but this number is much higher in breast cancer (1.18-1.66 somatic mutations per Mb), in lung cancer (17.7) or in melanomas (≈33). [84] Since the haploid human genome consists of approximately 3,200 megabases, [85] this translates into about 74 mutations (mostly in noncoding regions) in germline DNA per generation, but 3,776-5,312 somatic mutations per haploid genome in breast cancer, 56,640 in lung cancer and 105,600 in melanomas.

The distribution of somatic mutations across the human genome is very uneven, [86] such that the gene-rich, early-replicating regions receive fewer mutations than gene-poor, late-replicating heterochromatin, likely due to differential DNA repair activity. [87] In particular, the histone modification H3K9me3 is associated with high, [88] and H3K36me3 with low mutation frequencies. [89]

Genome-wide association studies Edit

In research, whole-genome sequencing can be used in a Genome-Wide Association Study (GWAS) - a project aiming to determine the genetic variant or variants associated with a disease or some other phenotype. [90]

Diagnostic use Edit

In 2009, Illumina released its first whole genome sequencers that were approved for clinical as opposed to research-only use and doctors at academic medical centers began quietly using them to try to diagnose what was wrong with people whom standard approaches had failed to help. [91] In 2009, a team from Stanford led by Euan Ashley performed clinical interpretation of a full human genome, that of bioengineer Stephen Quake. [92] In 2010, Ashley’s team reported whole genome molecular autopsy [93] and in 2011, extended the interpretation framework to a fully sequenced family, the West family, who were the first family to be sequenced on the Illumina platform. [94] The price to sequence a genome at that time was US$19,500, which was billed to the patient but usually paid for out of a research grant one person at that time had applied for reimbursement from their insurance company. [91] For example, one child had needed around 100 surgeries by the time he was three years old, and his doctor turned to whole genome sequencing to determine the problem it took a team of around 30 people that included 12 bioinformatics experts, three sequencing technicians, five physicians, two genetic counsellors and two ethicists to identify a rare mutation in the XIAP that was causing widespread problems. [91] [95] [96]

Due to recent cost reductions (see above) whole genome sequencing has become a realistic application in DNA diagnostics. In 2013, the 3Gb-TEST consortium obtained funding from the European Union to prepare the health care system for these innovations in DNA diagnostics. [97] [98] Quality assessment schemes, Health technology assessment and guidelines have to be in place. The 3Gb-TEST consortium has identified the analysis and interpretation of sequence data as the most complicated step in the diagnostic process. [99] At the Consortium meeting in Athens in September 2014, the Consortium coined the word genotranslation for this crucial step. This step leads to a so-called genoreport. Guidelines are needed to determine the required content of these reports.

Genomes2People (G2P), an initiative of Brigham and Women's Hospital and Harvard Medical School was created in 2011 to examine the integration of genomic sequencing into clinical care of adults and children. [100] G2P's director, Robert C. Green, had previously led the REVEAL study — Risk EValuation and Education for Alzheimer's Disease – a series of clinical trials exploring patient reactions to the knowledge of their genetic risk for Alzheimer's. [101] [102]

In 2018, researchers at Rady Children's Institute for Genomic Medicine in San Diego, CA determined that rapid whole-genome sequencing (rWGS) can diagnose genetic disorders in time to change acute medical or surgical management (clinical utility) and improve outcomes in acutely ill infants. The researchers reported a retrospective cohort study of acutely ill inpatient infants in a regional children's hospital from July 2016-March 2017. Forty-two families received rWGS for etiologic diagnosis of genetic disorders. The diagnostic sensitivity of rWGS was 43% (eighteen of 42 infants) and 10% (four of 42 infants) for standard genetic tests (P = .0005). The rate of clinical utility of rWGS (31%, thirteen of 42 infants) was significantly greater than for standard genetic tests (2%, one of 42 P = .0015). Eleven (26%) infants with diagnostic rWGS avoided morbidity, one had a 43% reduction in likelihood of mortality, and one started palliative care. In six of the eleven infants, the changes in management reduced inpatient cost by $800,000-$2,000,000. These findings replicate a prior study of the clinical utility of rWGS in acutely ill inpatient infants, and demonstrate improved outcomes and net healthcare savings. rWGS merits consideration as a first tier test in this setting. [103]

Rare variant association study Edit

Whole genome sequencing studies enable the assessment of associations between complex traits and both coding and noncoding rare variants (minor allele frequency (MAF) < 1%) across the genome. Single-variant analyses typically have low power to identify associations with rare variants, and variant set tests have been proposed to jointly test the effects of given sets of multiple rare variants. [104] SNP annotations help to prioritize rare functional variants, and incorporating these annotations can effectively boost the power of genetic association of rare variants analysis of whole genome sequencing studies. [105]

The introduction of whole genome sequencing may have ethical implications. [106] On one hand, genetic testing can potentially diagnose preventable diseases, both in the individual undergoing genetic testing and in their relatives. [106] On the other hand, genetic testing has potential downsides such as genetic discrimination, loss of anonymity, and psychological impacts such as discovery of non-paternity. [107]

Some ethicists insist that the privacy of individuals undergoing genetic testing must be protected. [106] Indeed, privacy issues can be of particular concern when minors undergo genetic testing. [108] Illumina's CEO, Jay Flatley, claimed in February 2009 that "by 2019 it will have become routine to map infants' genes when they are born". [109] This potential use of genome sequencing is highly controversial, as it runs counter to established ethical norms for predictive genetic testing of asymptomatic minors that have been well established in the fields of medical genetics and genetic counseling. [110] [111] [112] [113] The traditional guidelines for genetic testing have been developed over the course of several decades since it first became possible to test for genetic markers associated with disease, prior to the advent of cost-effective, comprehensive genetic screening.

When an individual undergoes whole genome sequencing, they reveal information about not only their own DNA sequences, but also about probable DNA sequences of their close genetic relatives. [106] This information can further reveal useful predictive information about relatives' present and future health risks. [114] Hence, there are important questions about what obligations, if any, are owed to the family members of the individuals who are undergoing genetic testing. In Western/European society, tested individuals are usually encouraged to share important information on any genetic diagnoses with their close relatives, since the importance of the genetic diagnosis for offspring and other close relatives is usually one of the reasons for seeking a genetic testing in the first place. [106] Nevertheless, a major ethical dilemma can develop when the patients refuse to share information on a diagnosis that is made for serious genetic disorder that is highly preventable and where there is a high risk to relatives carrying the same disease mutation. Under such circumstances, the clinician may suspect that the relatives would rather know of the diagnosis and hence the clinician can face a conflict of interest with respect to patient-doctor confidentiality. [106]

Privacy concerns can also arise when whole genome sequencing is used in scientific research studies. Researchers often need to put information on patient's genotypes and phenotypes into public scientific databases, such as locus specific databases. [106] Although only anonymous patient data are submitted to locus specific databases, patients might still be identifiable by their relatives in the case of finding a rare disease or a rare missense mutation. [106] Public discussion around the introduction of advanced forensic techniques (such as advanced familial searching using public DNA ancestry websites and DNA phenotyping approaches) has been limited, disjointed, and unfocused. As forensic genetics and medical genetics converge toward genome sequencing, issues surrounding genetic data become increasingly connected, and additional legal protections may need to be established. [115]

The first nearly complete human genomes sequenced were two Americans of predominantly Northwestern European ancestry in 2007 (J. Craig Venter at 7.5-fold coverage, [116] [117] [118] and James Watson at 7.4-fold). [119] [120] [121] This was followed in 2008 by sequencing of an anonymous Han Chinese man (at 36-fold), [122] a Yoruban man from Nigeria (at 30-fold), [123] a female clinical geneticist (Marjolein Kriek) from the Netherlands (at 7 to 8-fold), and a female caucasian Leukemia patient (at 33 and 14-fold coverage for tumor and normal tissues). [124] Steve Jobs was among the first 20 people to have their whole genome sequenced, reportedly for the cost of $100,000. [125] As of June 2012 [update] , there were 69 nearly complete human genomes publicly available. [126] In November 2013, a Spanish family made their personal genomics data publicly available under a Creative Commons public domain license. The work was led by Manuel Corpas and the data obtained by direct-to-consumer genetic testing with 23andMe and the Beijing Genomics Institute). This is believed to be the first such Public Genomics dataset for a whole family. [127]


Egea, L. A., Merida-Garcia, R., Kilian, A., Hernandez, P. & Dorado, G. Assessment of genetic diversity and structure of large garlic (Allium sativum) germplasm bank, by diversity arrays technology “genotyping-by-sequencing” platform (DArTseq). Front. Genet. 8, 98 (2017).

Peska, V., Mandakova, T., Ihradska, V. & Fajkus, J. Comparative dissection of three giant genomes: Allium cepa, Allium sativum, and Allium ursinum. Int. J. Mol. Sci. 20, E733 (2019).

Li, H. et al. The wolds of wine: old, new and ancient. Wine Econ. Pol. 7, 178–182 (2018).

Zheng, Z., Chen, J. & Deng, X. Historical perspectives, management, and current research of citrus HLB in Guangdong Province of China, where the disease has been endemic for over a hundred years. Phytopathology 108, 1224–1236 (2018).

Hubner, S. et al. Sunflower pan-genome analysis shows that hybridization altered gene content and disease resistance. Nat. Plants 5, 54–62 (2019).

Jaillon, O. et al. The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature 449, 463–467 (2007).

Leisner, C. P. et al. Genome sequence of M6, a diploid inbred clone of the high-glycoalkaloid-producing tuber-bearing potato species Solanum chacoense, reveals residual heterozygosity. Plant J. 94, 562–570 (2018).

Chen, F. et al. The sequenced angiosperm genomes and genome databases. Front. Plant Sci. 9, 418 (2018).

Gardner, E. M., Johnson, M. G., Ragone, D., Wickett, N. J. & Zerega, N. J. Low-coverage, whole-genome sequencing of Artocarpus camansi (Moraceae) for phylogenetic marker development and gene discovery. Appl Plant Sci. 4, apps.1600017 (2016).

Mori, K. et al. Identification of RAN1 orthologue associated with sex determination through whole genome sequencing analysis in fig (Ficus carica L.). Sci. Rep. 7, 41124 (2017).

Liu, M. J. et al. The complex jujube genome provides insights into fruit tree biology. Nat. Commun. 5, 5315 (2014).

Edger, P. P. et al. Origin and evolution of the octoploid strawberry genome. Nat. Genet. 51, 541–547 (2019).

Shulaev, V. et al. The genome of woodland strawberry (Fragaria vesca). Nat. Genet. 43, 109–116 (2011).

Hirakawa, H. et al. Dissection of the octoploid strawberry genome by deep sequencing of the genomes of Fragaria species. DNA Res. 21, 169–181 (2014).

Velasco, R. et al. The genome of the domesticated apple (Malus x domestica Borkh.). Nat. Genet. 42, 833–839 (2010).

He, N. et al. Draft genome sequence of the mulberry tree Morus notabilis. Nat. Commun. 4, 2445 (2013).

Shirasawa, K. et al. The genome sequence of sweet cherry (Prunus avium) for use in genomics-assisted breeding. DNA Res. 24, 499–508 (2017).

Verde, I. et al. The high-quality draft genome of peach (Prunus persica) identifies unique patterns of genetic diversity, domestication and genome evolution. Nat. Genet. 45, 487–494 (2013).

Wu, J. et al. The genome of the pear (Pyrus bretschneideri Rehd.). Genome Res. 23, 396–408 (2013).

Chagne, D. et al. The draft genome sequence of European pear (Pyrus communis L. ‘Bartlett’). PLoS ONE 9, e92644 (2014).

VanBuren, R. et al. A near complete, chromosome-scale assembly of the black raspberry (Rubus occidentalis) genome. Gigascience 7, giy094 (2018).

Zhang, Q. X. et al. The genome of Prunus mume. Nat. Commun. 3, 1318 (2012).

Baek, S. et al. Draft genome sequence of wild Prunus yedoensis reveals massive inter-specific hybridization between sympatric flowering cherries. Genome Biol. 19, 127 (2018).

Nakamura, N. et al. Genome structure of Rosa multiflora, a wild ancestor of cultivated roses. DNA Res. 25, 113–121 (2018).

Raymond, O. et al. The Rosa genome provides new insights into the domestication of modern roses. Nat. Genet. 50, 772–777 (2018).

Lu, M., An, H. M. & Li, L. L. Genome survey sequencing for the characterization of the genetic background of Rosa roxburghii tratt and leaf ascorbate metabolism genes. PLoS ONE 11, e0147530 (2016).

Zhang, L. et al. A high-quality apple genome assembly reveals the association of a retrotransposon and red fruit colour. Nat. Commun. 10, 1494 (2019).

Sato, S. et al. The tomato genome sequence provides insights into fleshy fruit evolution. Nature 485, 635–641 (2012).

Razali, R. et al. The genome sequence of the wild tomato Solanum pimpinellifolium provides insights into salinity tolerance. Front. Plant Sci. 9, 1402 (2018).

Xu, X. et al. Genome sequence and analysis of the tuber crop potato. Nature 475, 189–194 (2011).

Qin, C. et al. Whole-genome sequencing of cultivated and wild peppers provides insights into Capsicum domestication and specialization. Proc. Natl Acad. Sci. USA 111, 5135–5140 (2014).

Kim, S. et al. Genome sequence of the hot pepper provides insights into the evolution of pungency in Capsicum species. Nat. Genet. 46, 270–278 (2014).

Ahn, Y. K. et al. Whole genome resequencing of Capsicum baccatum and Capsicum annuum to discover single nucleotide polymorphism related to Powdery Mildew resistance. Sci. Rep. 8, 5188 (2018).

Hirakawa, H. et al. Draft genome sequence of eggplant (Solanum melongena L.): the representative solanum species indigenous to the old world. DNA Res. 21, 649–660 (2014).

Hoshino, A. et al. Genome sequence and analysis of the Japanese morning glory Ipomoea nil. Nat. Commun. 7, 13295 (2016).

Sierro, N. et al. Reference genomes and transcriptomes of Nicotiana sylvestris and Nicotiana tomentosiformis. Genome Biol. 14, R60 (2013).

Bombarely, A. et al. Insight into the evolution of the Solanaceae from the parental genomes of Petunia hybrida. Nat. Plants 2, 16074 (2016).

Ruggieri, V., Bostan, H., Barone, A., Frusciante, L. & Chiusano, M. L. Integrated bioinformatics to decipher the ascorbic acid metabolic network in tomato. Plant Mol. Biol. 91, 397–412 (2016).

Sierro, N. et al. The tobacco genome sequence and its comparison with those of tomato and potato. Nat. Commun. 5, 3833 (2014).

Varshney, R. K. et al. Draft genome sequence of pigeonpea (Cajanus cajan), an orphan legume crop of resource-poor farmers. Nat. Biotechnol. 30, 83–89 (2012).

Varshney, R. K. et al. Draft genome sequence of chickpea (Cicer arietinum) provides a resource for trait improvement. Nat. Biotechnol. 31, 240–246 (2013).

Gupta, S. et al. Draft genome sequence of Cicer reticulatum L., the wild progenitor of chickpea provides a resource for agronomic trait improvement. DNA Res. 24, 1–10 (2017).

Schmutz, J. et al. Genome sequence of the palaeopolyploid soybean. Nature 463, 178–183 (2010).

Young, N. D. et al. The Medicago genome provides insight into the evolution of rhizobial symbioses. Nature 480, 520–524 (2011).

Schmutz, J. et al. A reference genome for common bean and genome-wide analysis of dual domestications. Nat. Genet. 46, 707–713 (2014).

Cooper, J. W. et al. Enhancing faba bean (Vicia faba L.) genome resources. J. Exp. Bot. 68, 1941–1953 (2017).

Kang, Y. J. et al. Draft genome sequence of adzuki bean, Vigna angularis. Sci. Rep. 5, 8069 (2015).

Kang, Y. J. et al. Genome sequence of mungbean and insights into evolution within Vigna species. Nat. Commun. 5, 5443 (2014).

Griesmann, M. et al. Phylogenomics reveals multiple losses of nitrogen-fixing root nodule symbiosis. Science 361, eaat1743 (2018).

Hane, J. K. et al. A comprehensive draft genome sequence for lupin (Lupinus angustifolius), an emerging health food: insights into plant-microbe interactions and legume evolution. Plant Biotechnol. J. 15, 318–330 (2017).

Mochida, K. et al. Draft genome assembly and annotation of Glycyrrhiza uralensis, a medicinal legume. Plant J. 89, 181–194 (2017).

De Vega, J. J. et al. Red clover (Trifolium pratense L.) draft genome provides a platform for trait improvement. Sci. Rep. 5, 17394 (2015).

Cullis, C. & Kunert, K. J. Unlocking the potential of orphan legumes. J. Exp. Bot. 68, 1895–1903 (2017).

Yang, J. et al. The genome sequence of allopolyploid Brassica juncea and analysis of differential homoeolog gene expression influencing selection. Nat. Genet. 48, 1225–1232 (2016).

Liu, S. Y. et al. The Brassica oleracea genome reveals the asymmetrical evolution of polyploid genomes. Nat. Commun. 5, 3930 (2014).

Wang, X. W. et al. The genome of the mesopolyploid crop species Brassica rapa. Nat. Genet. 43, 1035–1039 (2011).

Kasianov, A. S. et al. High-quality genome assembly of Capsella bursa-pastoris reveals asymmetry of regulatory elements at early stages ofpolyploid genome evolution. Plant J. 91, 278–291 (2017).

Slotte, T. et al. The Capsella rubella genome and the genomic consequences of rapid mating system evolution. Nat. Genet. 45, 831–835 (2013).

Kitashiba, H. et al. Draft sequences of the radish (Raphanus sativus L.) genome. DNA Res. 21, 481–490 (2014).

Dorn, K. M., Fankhauser, J. D., Wyse, D. L. & Marks, M. D. A draft genome of field pennycress (Thlaspi arvense) provides tools for the domestication of a new winter biofuel crop. DNA Res. 22, 121–131 (2015).

Guo, X. et al. The genomes of two Eutrema species provide insight into plant adaptation to high altitudes. DNA Res. (2018).

Zhang, J. et al. Genome of plant maca (Lepidium meyenii) illuminates genomic basis for high-altitude adaptation in the central Andes. Mol. Plant 9, 1066–1077 (2016).

Milia, G., Camiolo, S., Avesani, L. & Porceddu, A. The dynamic loss and gain of introns during the evolution of the Brassicaceae. Plant J. 82, 915–924 (2015).

Singh, S., Das, S. & Geeta, R. A segmental duplication in the common ancestor of Brassicaceae is responsible for the origin of the paralogs KCS6-KCS5, which are not shared with other angiosperms. Mol. Phylogenet. Evol. 126, 331–345 (2018).

Murat, F. et al. Understanding Brassicaceae evolution through ancestral genome reconstruction. Genome Biol. 16, 262 (2015).

Barrera-Redondo, J. et al. The genome of Cucurbita argyrosperma (silver-seed gourd) reveals faster rates of protein-coding gene and long noncoding RNA turnover and neofunctionalization within Cucurbita. Mol. Plant 12, 506–520 (2019).

Sun, H. et al. Karyotype stability and unbiased fractionation in the paleo-allotetraploid cucurbita genomes. Mol. Plant 10, 1293–1306 (2017).

Montero-Pau, J. et al. De novo assembly of the zucchini genome reveals a whole-genome duplication associated with the origin of the Cucurbita genus. Plant Biotechnol. J. 16, 1161–1171 (2018).

Wu, S. et al. The bottle gourd genome provides insights into Cucurbitaceae evolution and facilitates mapping of a Papaya ring-spot virus resistance locus. Plant J. 92, 963–975 (2017).

Urasaki, N. et al. Draft genome sequence of bitter gourd (Momordica charantia), a vegetable and medicinal plant in tropical and subtropical regions. DNA Res. 24, 51–58 (2017).

Garcia-Mas, J. et al. The genome of melon (Cucumis melo L.). Proc. Natl Acad. Sci. USA 109, 11872–11877 (2012).

Guo, S. et al. The draft genome of watermelon (Citrullus lanatus) and resequencing of 20 diverse accessions. Nat. Genet. 45, 51–58 (2013).

Itkin, M. et al. The biosynthetic pathway of the nonsugar, high-intensity sweetener mogroside V from Siraitia grosvenorii. Proc. Natl Acad. Sci. USA 113, E7619–E7628 (2016).

Xia, M. et al. Improved de novo genome assembly and analysis of the Chinese cucurbit Siraitia grosvenorii, also known as monk fruit or luo-han-guo. Gigascience 7, giy067 (2018).

Wang, J. et al. An Overlooked paleotetraploidization in Cucurbitaceae. Mol. Biol. Evol. 35, 16–26 (2018).

Yang, L. M. et al. Chromosome rearrangements during domestication of cucumber as revealed by high-density genetic mapping and draft genome assembly. Plant J. 71, 895–906 (2012).

Shang, Y. et al. Biosynthesis, regulation, and domestication of bitterness in cucumber. Science 346, 1084–1088 (2014).

Wu, G. A. et al. Sequencing of diverse mandarin, pummelo and orange genomes reveals complex history of admixture during citrus domestication. Nat. Biotechnol. 32, 656–662 (2014).

Wang, X. et al. Genomic analyses of primitive, wild and cultivated citrus provide insights into asexual reproduction. Nat. Genet. 49, 765–772 (2017).

Zhang, Y., Barthe, G., Grosser, J. W. & Wang, N. Transcriptome analysis of root response to citrus blight based on the newly assembled Swingle citrumelo draft genome. BMC Genomics 17, 485 (2016).

Wang, L. et al. Genome of wild mandarin and domestication history of mandarin. Mol. Plant 11, 1024–1037 (2018).

Xu, Q. et al. The draft genome of sweet orange (Citrus sinensis). Nat. Genet. 45, 59–66 (2013).

Shimizu, T. et al. Draft sequencing of the heterozygous diploid genome of satsuma (Citrus unshiu Marc.) using a hybrid assembly approach. Front. Genet. 8, 180 (2017).

Wu, G. A. et al. Genomics of the origin and evolution of Citrus. Nature 554, 311–316 (2018).

Li, Y. H. et al. De novo assembly of soybean wild relatives for pan-genome analysis of diversity and agronomic traits. Nat. Biotechnol. 32, 1045–1052 (2014).

Golicz, A. A. et al. The pangenome of an agronomically important crop plant Brassica oleracea. Nat. Commun. 7, 13390 (2016).

Ou, L. J. et al. Pan-genome of cultivated pepper (Capsicum) and its use in gene presence-absence variation analyses. New Phytol. 220, 360–363 (2018).

Gao, L. et al. The tomato pan-genome uncovers new genes and a rare allele regulating fruit flavor. Nat. Genet. 51, 1044–1051 (2019).

Jung, S. et al. The Genome Database for Rosaceae (GDR): year 10 update. Nucleic Acids Res. 42, D1237–D1244 (2014).

Stein, L. D. Using GBrowse 2.0 to visualize and share next-generation sequence data. Brief Bioinform. 14, 162–171 (2013).

Westesson, O., Skinner, M. & Holmes, I. Visualizing next-generation sequencing data with JBrowse. Brief Bioinform. 14, 172–177 (2013).

Hofmeister, B. T. & Schmitz, R. J. Enhanced JBrowse plugins for epigenomics data visualization. BMC Bioinformatics 19, 159 (2018).

Buels, R. et al. JBrowse: a dynamic web platform for genome visualization and analysis. Genome Biol. 17, 66 (2016).

Johnson, M. et al. NCBI BLAST: a better web interface. Nucleic Acids Res. 36, W5–W9 (2008).

Deng, W., Nickle, D. C., Learn, G. H., Maust, B. & Mullins, J. I. ViroBLAST: a stand-alone BLAST web server for flexible queries of multiple databases and user’s datasets. Bioinformatics 23, 2334–2336 (2007).

Potter, S. C. et al. HMMER web server: 2018 update. Nucleic Acids Res. 46, W200–W204 (2018).

Caspi, R. et al. The MetaCyc database of metabolic pathways and enzymes. Nucleic Acids Res. 46, D633–D639 (2018).

Fernandez-Pozo, N. et al. The Sol Genomics Network (SGN)-from genotype to phenotype to breeding. Nucleic Acids Res. 43, D1036–D1041 (2015).

Foerster, H. et al. SolCyc: a database hub at the Sol Genomics Network (SGN) for the manual curation of metabolic networks in Solanum and Nicotiana specific databases. Database 2018, bay035 (2018).

Toronto International Data Release Workshop Authors et al.Prepublication data sharing. Nature 461, 168–170 (2009).

Zheng, Y. et al. Cucurbit Genomics Database (CuGenDB): a central portal for comparative and functional genomics of cucurbit crops. Nucleic Acids Res. 47, D1128–D1136 (2019).

Cheng, F. et al. BRAD, the genetics and genomics database for Brassica plants. BMC Plant Biol. 11, 136 (2011).

Wang, X. et al. HMOD: an omics database for herbal medicine plants. Mol. Plant 11, 757–759 (2018).

Lee, T. H., Tang, H. B., Wang, X. Y. & Paterson, A. H. PGDD: a database of gene and genome duplication in plants. Nucleic Acids Res. 41, D1152–D1158 (2013).

Qiao, X. et al. Gene duplication and evolution in recurring polyploidization-diploidization cycles in plants. Genome Biol. 20, 38 (2019).

Tang, H. B. Disentangling a polyploid genome. Nat. Plants 3, 688–689 (2017).

Zhu, T. et al. Sequencing a Juglans regia×J. microcarpa hybrid yields high-quality genome assemblies of parental species. Hortic. Res. 6, 55 (2019).

Wu, G. A. & Gmitter, F. G. Novel assembly strategy cracks open the mysteries of walnut genome evolution. Hortic. Res. 6, 57 (2019).

Smedley, D. et al. The BioMart community portal: an innovative alternative to large, centralized data repositories. Nucleic Acids Res. 43, W589–W598 (2015).

Yano, K. et al. Genome-wide association study using whole-genome sequencing rapidly identifies new genes influencing agronomic traits in rice. Nat. Genet. 48, 927–934 (2016).

Simao, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).

Parra, G., Bradnam, K. & Korf, I. CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics 23, 1061–1067 (2007).

Li, X. et al. Improved hybrid de novo genome assembly of domesticated apple (Malus x domestica). Gigascience 5, 35 (2016).

Daccord, N. et al. High-quality de novo assembly of the apple genome and methylome dynamics of early fruit development. Nat. Genet. 49, 1099–1106 (2017).

Bowers, R. M. et al. Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nat. Biotechnol. 35, 725–731 (2017).

Marsupials versus mammals

Marsupials, our mammalian brethren, are found mostly in Australia and New Guinea. They have many weird features that separate them from other mammals, including a very short pregnancy, after which they shelter their very immature offspring in a pouch.

Sequences of the kangaroo and other marsupials have shed light on how these features have developed after the placental mammal-marsupial split 150 million years ago. The genome sequencing of an opossum and a small kangaroo species called the tammar wallaby show that the group may have evolved in South America, not Australia.

Analysis of the tammar wallaby genome indicates that large areas of the marsupial genome are similar to the genome of normal, placental mammals.

Human Genome Sequencing: Approaches and Applications

A list of different methods used for mapping of human genomes is given below. These techniques are also useful for the detection of normal and disease genes in humans.

1. DNA sequencing : Physical map of DNA can be identified with highest resolution.

2. Use of probes : To identify RFLPs, STS and SNPs.

3. Radiation hybrid mapping: Fragment genome into large pieces and locate markers and genes. Requires somatic cell hybrids.

4. Fluorescence in situ hybridization (FISH) : To localize a gene on chromosome.

5. Sequence tagged site (STS) mapping : Applicable to any part of DNA sequence if some sequence information is available.

6. Expressed sequence tag (EST) mapping : A variant of STS mapping expressed genes are actually mapped and located.

7. Pulsed-field gel electrophoresis (PFGE) : For the separation and isolation of large DNA fragments.

8. Cloning in vectors (plasmids, phages, variable lengths, cosmids, YACs, BACs).: To isolate DNA fragments of variable length.

9. Polymerase chain reaction (PCR) : To amplify gene fragments.

10. Chromosome walking : Useful for cloning of overlapping DNA fragments (restricted to about 200 kb).

11. Chromosome jumping : DNA can be cut into large fragments and circularized for use in chromosome walking.

12. Detection of cytogenetic abnormalities : Certain genetic diseases can be identified by cloning the affected genes e.g. Duchenne muscular dystrophy.

13. Databases : Existing databases facilitate gene identification by comparison of DNA and protein sequences.

For elucidating human genome, different approaches were used by the two HGP groups. IHCSC predominantly employed map first and sequence later approach. The principal method was hierarchical shotgun sequencing. This technique involves fragmentation of the genome into small fragments (100-200 kb), inserting them into vectors (mostly bacterial artificial chromosomes, BACs) and cloning. The cloned fragments could be sequenced.

Celera Genomics used whole genome shotgun approach. This bypasses the mapping step and saves time. Further, Celera group was lucky to have high-throughput sequenators and powerful computer programmes that helped for the early completion of human genome sequence.

Whose Genome was Sequenced?

One of the intriguing questions of human genome project is whose genome is being sequenced and how will it relate to the 6 billion or so population with variations in world? There is no simple answer to this question.

However, looking from the positive side, it does not matter whose genome is sequenced, since the phenotypic differences between individuals are due to variations in just 0.1% of the total genome sequences. Therefore many individual genomes can be used as source material for sequencing.

Much of the human genome work was performed on the material supplied by the Centre for Human Polymorphism in Paris, France. This institute had collected cell lines from sixty different French families, each spanning three generations. The material supplied from Paris was used for human genome sequencing.

Human Genome Sequence -Results Summarised:

The information on the human genome projects is too vast, and only some highlights can be given below. Some of them are briefly described.

Major Highlights of human Genome:

1. The draft represents about 90% of the entire human genome. It is believed that most of the important parts have been identified.

2. The remaining 10% of the genome sequences are at the very ends of chromosomes (i.e. telomeres) and around the centromeres.

3. Human genome is composed of 3200 Mb (or 3.2 Gb) i.e. 3.2 billion base pairs (3,200,000,000).

4. Approximately 1.1 to 1.5% of the genome codes for proteins.

5. Approximately 24% of the total genome is composed of introns that split the coding regions (exons), and appear as repeating sequences with no specific functions.

6. The number of protein coding genes is in the range of 30,000-40,000.

7. An average gene consists of 3000 bases, the sizes however vary greatly. Dystrophin gene is the larget known human gene with 2.4 million bases.

8. Chromosome 1 (the target human chromosome) contains the highest number of genes (2968), while the Y chromosome has the lowest. Chromosomes also differ in their GC content and number of transposable elements.

9. Genes and DNA sequences associated with many diseases such as breast cancer, muscle diseases, deafness and blindness have been identified.

10. About 100 coding regions appear to have been copied and moved by RNA-based transposition (retro- transposons).

11. Repeated sequences constitute about 50% of the human genome.

12. A vast majority of the genome (

97%) has no known functions.

13. Between the humans, the DNA differs only by 0.2% or one in 500 bases.

14. More than 3 million single nucleotide polymorphisms (SNPs) have been identified.

15. Human DNA is about 98% identical to that of chimpanzees.

16. About 200 genes are close to that found in bacteria.

Most of the Genome Sequence is Identified:

About 90% of the human genome has been sequenced. It is composed of 3.2 billion base pairs (3200 Mb or 3.2 Gb). If written in the format of a telephone book, the base sequence of human genome would fill about 200 telephone books of 1000 pages each. Some other interesting analogs/ sidelights of genome are given in Table 12.3.

Individual differences in genomes:

It has to be remembered that every individual, except identical twins, have their own versions of genome sequences. The differences between individuals are largely due to single nucleotide polymorphisms (SNPs). SNPs represent positions in the genome where some individuals have one nucleotide (i.e. an A), and others have a different nucleotide (i.e. a G). The frequency of occurrence of SNPs is estimated to be one per 1000 base pairs. About 3 million SNPs are believed to be present and at least half of them have been identified.

Benefits/Applications of Human Genome Sequencing:

It is expected that the sequencing of human genome and the genomes of other organisms will dramatically change our understanding and perceptions of biology and medicine. Some of the benefits of human genome project are given.

Identification of human genes and their functions:

Analysis of genomes has helped to identify the genes, and functions of some of the genes. The functions of other genes and the interaction between the gene products needs to be further elucidated.

Understanding of polygenic disorders:

The biochemistry and genetics of many single- gene disorders have been elucidated e.g. sickle-cell anemia, cystic fibrosis, and retinoblastoma. A majority of the common diseases in humans, however, are polygenic in nature e.g. cancer, hypertension, diabetes. At present, we have very little knowledge about the causes of these diseases. The information on the genome sequence will certainly help to unravel the mysteries surrounding polygenic diseases.

Improvements in gene therapy:

At present, human gene therapy is in its infancy for various reasons. Genome sequence knowledge will certainly help for more effective treatment of genetic diseases by gene therapy.

Improved diagnosis of diseases:

In the near future, probes for many genetic diseases will be available for specific identification and appropriate treatment.

Development of pharmacogenomics:

The drugs may be tailored to treat the individual patients. This will become possible considering the variations in enzymes and other proteins involved in drug action, and the metabolism of the individuals.

Genetic basis of psychiatric disorders:

By studying the genes involved in behavioural patterns, the causation of psychiatric diseases can be understood. This will help for the better treatment of these disorders.

Understanding of complex social trait:

With the genome sequence now in hand, the complex social traits can be better understood. For instance, recently genes controlling speech have been identified.

Knowledge on mutations:

Many events leading to the mutations can be uncovered with the knowledge of genome.

Better understanding of developmental biology:

By determining the biology of human genome and its regulatory control, it will be possible to understand how humans develop from a fertilized eggs to adults.

Comparative genomics:

Genomes from many organisms have been sequenced, and the number will increase in the coming years. The information on the genomes of different species will throw light on the major stages in evolution.

Development of biotechnology:

The data on the human genome sequence will spur the development of biotechnology in various spheres.

Watch the video: Как убрать СПАЗМ и расслабить диафрагму. 2 простых приема. Совет остеопата. (May 2022).