Recent Scenario of Marine Genomics Download PDF

Journal Name : SunText Review of BioTechnology

DOI : 10.51737/2766-5097.2020.009

Article Type : Review Article

Authors : Leena Grace B

Keywords : Marine; Environment; Model species; Genomics; Sequencing; Annotation

Abstract

Marine environment is the cradle of life containing 95% of the world’s biomass and 38 (19 endemic) of the 39 known animal phyla. The fundamental understanding of the biotechnological potential of marine organisms is the assessment of their genetic capabilities, i.e. sequencing of their genome and annotation of the genes which is the focus of genomics. Currently, about 1000 prokaryotic genomes have been sequenced and annotated. More than half of these genomes are of medical or industrial relevance and no phylogenetic systematic genome sequencing has been carried out until recently. Though mitochondrial genomes are useful for the identification of fish species and populations, the focus of most genome research was displayed on the nuclear genome. Diverse and unique marine microbial assemblages challenged to discover special functions occupied by these microorganisms. In the evolutionary tree of marine organisms, key roles were taken by whole genome sequencing and functional genomics. This elaborated the determination of the total deoxyribonucleic acid sequence of organisms and fine-scale up of genetic mapping efforts. It led to the development of bioinformatics by producing large data sets of cytogenetics, molecular genetics, quantitative genetics and population genetics. This links the raw genome information to meaningful biological information. Marine fishes also exhibit high levels of gene flow mainly due to the pelagic larval phase and consequent dispersal of many pelagic and demersal resources. In view of the fact that marine microbes are important for the earth system. Marine genomics resources development has been primarily decided on marine microbes which include both prokaryotic and eukaryotic plankton, because they involved in significant mineral cycles of the oceans. The marine ecosystem plays a great role in the sustenance of the global environment, for the reason that half of the annual primary production of the earth happens in the ocean. The present day technologies and the development of high-throughput technology for sequencing DNA from the natural marine environment and its resources paved way to generate enormous sequences. However, genomics resources of marine taxa are currently limited to the full genome sequence of the ‘model species’. It will be important to actively channel this process in the future to ensure the coverage of groups in particular to most important eukaryotes.


Introduction

An interdisciplinary science highlight the structure, function, evolution, mapping and editing of genomes is Genomics. A genome is the total set of DNA, including all the genes of an organism. Collective characterization and quantification of genes were aimed by genomics which direct the production of proteins with the assistance of enzymes and messenger molecules [1]. Proteins make up the body structures such as organs, tissues, control chemical reactions and carry signals between cells. Genomics also involves the sequencing and analysis of genomes by using high throughput DNA sequencing and bioinformatics to assemble and analyze the function and structure of entire genomes [2]. Discovery based research and systems biology facilitated the understanding of the most complex biological systems such as the brain which triggered a revolution in the advances of genomics. The intragenomic (within the genome) trends such as epistasis (effect of one gene on another), pleiotropy (few traits expressed by a gene), heterosis (hybrid vigour) and other interactions linking loci and alleles within the genome were also represent the genome advances [3]. Central to the understanding of the biotechnological potential of marine organisms is the assessment of their genetic capabilities, i.e. sequencing of their genome and annotation of the genes [4]. This understanding is the focus of genomics [5]. Currently, about 1000 prokaryotic genomes have been sequenced and annotated. More than half of these genomes are of medical or industrial relevance and no phylogenetic systematic genome sequencing has been carried out until recently [6]. phylogenetically diverse microbial genomes sequencing results in the discovery of many novel proteins per genome demonstrating the existence of a huge reservoir of undiscovered proteins [7]. About 7500 bacterial species have been validly described; it follows that still thousands of new proteins will be discovered by sequencing in a systematic manner from all cultured bacterial species [8]. Another level of diversity has to be expected from the uncultured prokaryotes which make up about 70% from 100 bacterial phyla. This uncultured diversity became apparent when the first whole genome analysis of marine microbial communities revealed as many new clusters of ortholog groups (COGs) as were already known at the time [9]. On the other end of the phylogenetic diversity, i.e. comparing different strains of a bacterial species, it is clear that each new strain can add hundreds of new genes [10]. In addition to bacteria, aquatic ecosystems contain viruses which are the most common biological entities in the marine environment. This means that the pan-genome of a microbial species, comprising all genes of all strains of that species is several times larger than the core genome [11]. The abundance of viruses exceeds that of prokaryotes at least by factor of ten and they have an enormous impact on the other microbiota, lysing about 20% of its biomass each day. Recent metagenomic surveys of marine viruses demonstrated their unique gene pool and molecular architecture [12]. Their host range covers all major groups of marine organisms from archaea to mammals. Algal genome sizes can even vary about 20 fold within a genus, as illustrated with Thalassiosira species [13]. The overall size range for microalgal genomes is 10 Mb to 20 Gb, with an average size of around 450 Mb, except for Chlorophyta, that are on the average four times larger. Many marine microalgae are highly complex single celled organisms containing chromosomal DNA as well as mitochondrial and chloroplast DNA [14]. They have a complex nucleus that has been subjected to extensive exchange of genes between the organelles and the nucleus (endosymbiotic gene transfer) as well as horizontal gene transfer during their hundred million years of evolution. In addition, the first genome of a macroalgae (Ectocarpus) has been sequenced and several others are being completed. The challenge is to investigate this novel ‘terraincognita’ through post-genomics, biochemical approaches and genetic developments [15]. The reward for this challenge is an improved understanding of the biochemical functioning of key players in aquatic ecosystems with new insights into the regulatory genetic network of eukaryotes and their early evolution with great potential for the production of a huge variety of bioproducts [16].


Marine Species and Genomics

The majority of marine microbes cannot be cultured in the laboratory and so were not amenable to study the methods that had proved so successful with medically important microorganisms throughout the 20th century [17]. It was only with the development of high throughput technology to sequence DNA from the natural marine environment and this information demonstrated the exceptional diversity of microbes in the marine environment. In reality, most marine microbes are exclusively novel in their characteristics. Marine microbial assemblages are diverse and unique and the challenge is to discover what functions are displayed by these microorganisms [18]. At present gene resources for other marine taxa are limited to the full genome sequence at the level of ‘model species’, the purple sea urchin Strongylocentrotus purpuratus. For other model and non-model species such as surf clam Spisula solidissima, the sea squirts Ciona intestinalis and Ciona savignyi, the tunicate Oikopleura dioica, the little skate Leucoraja erinacea and the mollusk parasite Perkinsus marinus, the sequencing experiments are inprogress [19]. Marine Microbial Genome Sequencing Project funded in 2004, sequenced nearly 180 marine microorganisms, of which 80% were already published. Microorganisms are known to be the “gatekeepers” of these processes. So their catalytic activities and interaction with the environment will enhance the ability to monitor, model and predict changes in the marine ecosystem [20].



Case Studies in Marine Genomics

Because of the vast phyletic diversity of marine organisms, existing genomic model organisms are often with limited relevance, because there is an enormous evolutionary distance separating these models from an organism of interest. Genome sequencing has been completed in the unicellular green algae Chlamydomonas reinhardtii [21]. Genome projects are inprogress in the marine key species such as Emiliania huxleyi (a pelagic coccolithophore), Hydra magnipapillata, Litopenaeus vannamei (the pacific white shrimp) and Amphioxus (the closest living invertebrate relative of the vertebrates). In the prokaryotes several marine organisms such as multiple strains of the pelagic photosynthetic bacteria Synechococcus and Prochlorococcus, rapid progress in sequencing was achieved with many sequenced genomes [22]


Whole Genome Sequence of Diatoms

Based upon the studies of Sogin et al. [23] the sequencing of the genomes of environmentally important organisms such as the diatom Thalassiosira pseudonana provided the first complete genome from the heterodont lineage. In addition to this environmental and the phylogenetic importance, silicate metabolism also gained attention. Like most diatoms, the biotechnological potential of silicate metabolism constructs a silicate exoskeleton - the frustule (the production of this structure has great applications in nanotechnology). 


Whole Genomes of Aquatic Animals

Full genome sequences of some fishes such as zebrafish, fugu, tetraodon, medaka and three-spined stickleback are most valuable. Some non-model organism must be used for answering many questions, because there are close to 30,000 species of fish with maximally more than 300,000,000 years of independent evolution between groups. The species occur in different habitats from arctic streams to marine tropical areas including underground caves and hypoxic tropical lakes. Consequently, a small tropical freshwater cyprinid “zebrafish” may not be for marine tuna which has weighing several kilograms. Whenever a new method becomes available for genomic studies, its utility for non-model organisms should be evaluated which has been done for microarray methodology [24]. Genomics information of many aquatic invertebrates’ species is even less satisfactory than on fishes. This shown that one third of the genes of the recently sequenced genome of Daphnia pulex have no complements in former sequenced genomes. Especially these Daphnia specific genes respond rapidly to environmental disturbances [25]. Altogether, the genome data on fish and Daphnia suggest both rapid evolution and rapid development of genetic responses to environmental changes [26]. Recently, scientists from Norway have examined and presented the genome sequence of Atlantic cod, Gadus morhua. The entire genome assembly was 454 sequencing of shotgun and paired-end libraries and automated annotation identified 22,154 genes. Atlantic cod has missing the genes for MHC?II, CD4 and invariant chain (Ii) that was the conserved trait of jawed vertebrates in the adaptive immune system [27].


Genomes of Crustaceans

Amazing group of organisms satisfying all types of habitats in the ocean with a wide array of adaptations are crustaceans (lobster, shrimp, crab, etc.) which hold the supreme species diversity among marine animals. They are not only plentiful in number, but also the most commercially exploited food species for human utilization [28]. However, they are not well studied like their terrestrial arthropod relatives (insects). Especially in the Indo-Pacific region, the tiger shrimp (Penaeus monodon) has been one of the most important captured and cultured marine crustaceans [29]. However, the tiger shrimp industry has been weighed down by viral diseases [30] that lead to economic losses. Developments in shrimp genomics have been limited although a reasonably good EST database is available [31]. Tiger shrimp genomic analysis will make a key contribution to decipher the evolutionary history representing the crustacean lineages. The genomic sequences information will benefit the shrimp industry by contributing genomic tools to discover the viral diseases and to build up the breeding program [32].


Puffer Fish Genome Features in Draft Sequences

Fugu is the delicious fish and the liver is poisonous. The liver has to be removed in a peculiar manner before preparing it into a cuisine. The poison of a single liver of a fugu is able to kill 30 persons. The fugu genome was the first vertebrate genome sequenced after human, the whole-genome shotgun method. These draft sequences unravel many interesting diversities in specific protein families sandwiched between human and fugu [33]. The Tetraodon genome sequence was subsequently produced [34] with the whole-genome shotgun method with a higher redundancy in sequence reads (8.3 vs. 5.6). Puffer fish possess about 70 different families of transposable elements, only 20 for human or mouse was reported. Interestingly in Tetraodon, SINE and LINE families are distributed in opposite regions of the genome compared to human or mouse genome. In mammals SINEs are rich in G + C sequences and in Tetraodon more A + T regions and vice versa for LINE elements. More surprisingly, these initial studies of Tetraodon and fugu showed a number of differences in their genomes. G + C rich region in both Tetraodon and mammal genomes is absent in fugu [35].


Challenges in Computations and Trials

Bioinformatics will have to use latest computing technology to deal with the challenges of data analysis in less time. The steep fall in sequencing cost and the concomitant increase in sequencing speed outpaces the improvement in computational power, this will likely continue for several years. Until recently, the main repository for DNA sequences, GenBank, grew at about the same rate as computing power, following Moore’s law and doubling every 18 months. GenBank contained about 300 Gb of sequence unto 2009. Today 2 Gb of sequences were generated due to the availability of sophisticated next generation sequencers. Current algorithms for sequence data analysis have their roots in the early times of sequence acquirement. Algorithms that are used for aligning genome sequences tend to have an exponential computing time requirement due to increase of analysis that leads analysed sequences. New concepts and approaches will be necessary to reduce this into a linear requirement [36].


Conclusion

The function of genes has so far largely been studied in a very limited number of species and in the context of individual organisms. In the next decade, an immense modification towards the addition of the ecological and evolutionary context in gene function analysis will have to be inserted with the genomics. As such, genetics will move on from a largely biomedical perspective to an ecological perspective with special relevance for global change questions [4]. A full understanding of the ecosystem, its services and its stability will not be possible without understanding the genetics of adaptations and community interactions.


References

  1. Colin S, Deniaud E, Jam M, Descamps V, Chevolot Y, Kervarec N, et al. Cloning and biochemical characterization of the fucanase FcnA: definition of a novel glycoside hydrolase family specific for sulfated fucans. Glycobiology. 2006; 16: 1021-1032.
  2. Chandonia JM, Brenner SE. The impact of structural genomics: expectations and outcomes. Science. 2006; 311: 347-351.
  3. Schena M, Heller RA, Theriault TP, Konrad K, Lachenmeier E, Davis RW. Microarrays: biotechnology’s discovery platform for functional genomics. Trends Biotechnol. 1998; 16: 301-306.
  4. Metzker ML. Sequencing technologies-the next generation. Nat Rev Genet. 2010; 11: 31-46.
  5. Gupta PK. Single-molecule DNA sequencing technologies for future genomics research. Trends Biotechnol. 2008; 26: 602-611.
  6. Rodi CP, Bunch RT, Curtis SW, Kier LD, Cabonce MA, Davila JC, et al. Revolution through genomics in investigative and discovery toxicology. Toxicol Pathol. 1990; 27: 107-110.
  7. Hall N. Advanced sequencing technologies and their wider impact in microbiology. J Exp Biol. 2007; 210: 1518-1525.
  8. Pevsner J. Bioinformatics and functional genomics (2nd Edn.). Hoboken, NJ, 7: Wiley-Blackwell, London. 2009.
  9. Shendure J, Ji H. Next-generation DNA sequencing. Nat Biotechnol. 2008; 26: 1135-1145.
  10. Leu JH, Chang CC, Wu JL, Hsu CW, Hiron I, Aoki T, et al. Comparative analysis of differentially expressed genes in normal and white spot syndrome virus infected Penaeus monodon. BMC Genomics. 2007; 8: 120-133.
  11. DeLong EF, PrestonCM, Mincer T, Rich V, Hallam SJ, Frigaard NU, et al. Community genomics among stratified microbial assemblages in the ocean’s interior. Science. 2006; 311: 496-503.
  12. Frias Lopez J, Shi Y, Tyson GW, Coleman ML, Schuster SC. Microbial community gene expression in ocean surface waters. Proc Natl Acad Sci. 2008; 105: 3805-3810.
  13. Sogin ML, Morrison HG, Huber JA, Welch DM, Huse SM, Neal PR, et al. Microbial diversity in the deep sea and the underexplored ‘rare biosphere’. Proc Natl Acad Sci. 2006; 103: 12115-12120.
  14. Gilbert JA, Field D, Huang Y, Edwards R, Li W, Gilna P, et al. Detection of large numbers of novel sequences in the metatranscriptomes of complex marine microbial communities. PLoS ONE. 2008; 3: 3042.
  15. Ten Bosch JR, Grody WW. Keeping up with the next generation: massively parallel sequencing in clinical diagnostics. J Mol Diagn. 2008; 10: 484-492.
  16. Hollywood K, Brison DR, Goodacre R. Metabolomics: Current technologies and future trends. Proteomics. 2006; 6: 4716-4723.
  17. Szalay A, Gray J. 2020 Computing: Science in an exponential world. Nature. 2006; 440: 413-414.
  18. Thomas MA, Klaper R. Genomics for the ecological tool box. Trends Ecol Evol. 2004; 19: 439-445.
  19. Davis RH. The age of model organisms. Nat Rev Genet. 2004; 5: 69-76.
  20. Valenzuela-Quinonez F. How fisheries management can benefit from genomics? Brief Funct Genomics. 2016; 15: 352-357.
  21. Reyes-Prieto A, Yoon HS, Bhattacharya D. Marine Algal Genomics and Evolution. Encyclopedia of Ocean Sciences. 2019; 1: 561-568.
  22. Ferrie DEK. The origin of the Hox/ParaHox genes, the Ghost Locus hypothesis and the complexity of the first animal. Brief Funct Genomics. 2016; 15: 333-341.
  23. Sogin ML, Morrison HG, Huber JA, Welch DM, Huse SM, Neal PR, et al. Microbial diversity in the deep sea and the underexplored ‘rare biosphere’. Proc Natl Acad Sci. 2006; 103: 12115-12120.
  24. Meyer F. Genome Sequencing vs. Moore's Law: Cyber challenges for the next decade. CTWatch Quarterly. 2006; 2: 14-17.
  25. Davidson EH. Emerging properties of animal gene regulatory networks. Nature. 2010; 468: 911-920.
  26. Venter JC, Remington JF, Heidelberg AL, Halpern D, Rusch JA, Eisen DWU. Environmental genome shotgun sequencing of the Sargasso Sea. Science. 2004; 304: 66-74.
  27. Wilkening J, Wilke AND, Folker M. Using Clouds for Metagenomics: A Case Study. In: Proceedings IEEE Clouds. 2009; 12-19.
  28. Van Straalen NM, Roelofs D. An introduction to ecological genomics. Oxford University Press, Oxford. 2006.
  29. Margulies M, Egholm M, Altman WE, Attiya S. Genome sequencing in microfabricated high-density picolitre reactors. Nature. 2005; 437: 376-380.
  30. Supungul P, Klinbunga S, Pichyangkura R, Jitrapakdee S, Hirono I, Aoki T, et al. Identification of immune-related genes in hemocytes of black tiger shrimp (Penaeus monodon). Mar Biotechnol. 2002; 4: 487-494.
  31. Leu JH, Chang CC, Wu JL, Hsu CW, Hiron I, Aoki T, et al. Comparative analysis of differentially expressed genes in normal and white spot syndrome virus infected Penaeus monodon. BMC Genomics. 2007; 8: 120-133.
  32. Worm W, Barbier EB, Beaumont N. Impacts of biodiversity loss on ocean ecosystem services. Science. 2006; 314: 787-790.
  33. Yooseph S, Sutton G, Rusch DB, Halpern AL, Williamson SJ, Remington K. The Sorcerer II Global Ocean Sampling expedition: expanding the universe of protein families. PLoS Biol. 2007; 5: 16.
  34. Davidson EH. Emerging properties of animal gene regulatory networks. Nature. 2010; 468: 911-920.
  35. Zengler K, Toledo G, Rappé MS, Mathur EJ, Short JM, Keller M. Cultivating the uncultured. Proc Natl Acad Sci. 2002; 99: 15681-15686.
  36. Domazet-Loso M, Haubold B. Efficient estimation of pairwise distances between genomes. Bioinformatics. 2009.
  37. Glöckner FO, Kube M, Bauer M, Teeling H, Lombardot T, Ludwi W. Complete genome sequence of the marine planctomycete Pirellula sp. strain 1. Proc Natl Acad Sci. 2003; 100: 8298-8303.