Article Type : Review Article
Authors : Leena Grace B
Keywords : Marine; Environment; Model species; Genomics; Sequencing; Annotation
Marine environment is
the cradle of life containing 95% of the world’s biomass and 38 (19 endemic) of
the 39 known animal phyla. The fundamental understanding of the biotechnological
potential of marine organisms is the assessment of their genetic capabilities,
i.e. sequencing of their genome and annotation of the genes which is the focus
of genomics. Currently, about 1000 prokaryotic genomes have been sequenced and
annotated. More than half of these genomes are of medical or industrial
relevance and no phylogenetic systematic genome sequencing has been carried out
until recently. Though mitochondrial genomes are useful for the identification
of fish species and populations, the focus of most genome research was
displayed on the nuclear genome. Diverse and unique marine microbial
assemblages challenged to discover special functions occupied by these
microorganisms. In the evolutionary tree of marine organisms, key roles were
taken by whole genome sequencing and functional genomics. This elaborated the
determination of the total deoxyribonucleic acid sequence of organisms and
fine-scale up of genetic mapping efforts. It led to the development of
bioinformatics by producing large data sets of cytogenetics, molecular
genetics, quantitative genetics and population genetics. This links the raw
genome information to meaningful biological information. Marine fishes also
exhibit high levels of gene flow mainly due to the pelagic larval phase and
consequent dispersal of many pelagic and demersal resources. In view of the
fact that marine microbes are important for the earth system. Marine genomics
resources development has been primarily decided on marine microbes which
include both prokaryotic and eukaryotic plankton, because they involved in
significant mineral cycles of the oceans. The marine ecosystem plays a great
role in the sustenance of the global environment, for the reason that half of
the annual primary production of the earth happens in the ocean. The present
day technologies and the development of high-throughput technology for
sequencing DNA from the natural marine environment and its resources paved way
to generate enormous sequences. However, genomics resources of marine taxa are
currently limited to the full genome sequence of the ‘model species’. It will
be important to actively channel this process in the future to ensure the
coverage of groups in particular to most important eukaryotes.
An interdisciplinary science highlight the structure, function, evolution, mapping and editing of genomes is Genomics. A genome is the total set of DNA, including all the genes of an organism. Collective characterization and quantification of genes were aimed by genomics which direct the production of proteins with the assistance of enzymes and messenger molecules [1]. Proteins make up the body structures such as organs, tissues, control chemical reactions and carry signals between cells. Genomics also involves the sequencing and analysis of genomes by using high throughput DNA sequencing and bioinformatics to assemble and analyze the function and structure of entire genomes [2]. Discovery based research and systems biology facilitated the understanding of the most complex biological systems such as the brain which triggered a revolution in the advances of genomics. The intragenomic (within the genome) trends such as epistasis (effect of one gene on another), pleiotropy (few traits expressed by a gene), heterosis (hybrid vigour) and other interactions linking loci and alleles within the genome were also represent the genome advances [3]. Central to the understanding of the biotechnological potential of marine organisms is the assessment of their genetic capabilities, i.e. sequencing of their genome and annotation of the genes [4]. This understanding is the focus of genomics [5]. Currently, about 1000 prokaryotic genomes have been sequenced and annotated. More than half of these genomes are of medical or industrial relevance and no phylogenetic systematic genome sequencing has been carried out until recently [6]. phylogenetically diverse microbial genomes sequencing results in the discovery of many novel proteins per genome demonstrating the existence of a huge reservoir of undiscovered proteins [7]. About 7500 bacterial species have been validly described; it follows that still thousands of new proteins will be discovered by sequencing in a systematic manner from all cultured bacterial species [8]. Another level of diversity has to be expected from the uncultured prokaryotes which make up about 70% from 100 bacterial phyla. This uncultured diversity became apparent when the first whole genome analysis of marine microbial communities revealed as many new clusters of ortholog groups (COGs) as were already known at the time [9]. On the other end of the phylogenetic diversity, i.e. comparing different strains of a bacterial species, it is clear that each new strain can add hundreds of new genes [10]. In addition to bacteria, aquatic ecosystems contain viruses which are the most common biological entities in the marine environment. This means that the pan-genome of a microbial species, comprising all genes of all strains of that species is several times larger than the core genome [11]. The abundance of viruses exceeds that of prokaryotes at least by factor of ten and they have an enormous impact on the other microbiota, lysing about 20% of its biomass each day. Recent metagenomic surveys of marine viruses demonstrated their unique gene pool and molecular architecture [12]. Their host range covers all major groups of marine organisms from archaea to mammals. Algal genome sizes can even vary about 20 fold within a genus, as illustrated with Thalassiosira species [13]. The overall size range for microalgal genomes is 10 Mb to 20 Gb, with an average size of around 450 Mb, except for Chlorophyta, that are on the average four times larger. Many marine microalgae are highly complex single celled organisms containing chromosomal DNA as well as mitochondrial and chloroplast DNA [14]. They have a complex nucleus that has been subjected to extensive exchange of genes between the organelles and the nucleus (endosymbiotic gene transfer) as well as horizontal gene transfer during their hundred million years of evolution. In addition, the first genome of a macroalgae (Ectocarpus) has been sequenced and several others are being completed. The challenge is to investigate this novel ‘terraincognita’ through post-genomics, biochemical approaches and genetic developments [15]. The reward for this challenge is an improved understanding of the biochemical functioning of key players in aquatic ecosystems with new insights into the regulatory genetic network of eukaryotes and their early evolution with great potential for the production of a huge variety of bioproducts [16].
The majority of marine microbes cannot be cultured in the laboratory and so were not amenable to study the methods that had proved so successful with medically important microorganisms throughout the 20th century [17]. It was only with the development of high throughput technology to sequence DNA from the natural marine environment and this information demonstrated the exceptional diversity of microbes in the marine environment. In reality, most marine microbes are exclusively novel in their characteristics. Marine microbial assemblages are diverse and unique and the challenge is to discover what functions are displayed by these microorganisms [18]. At present gene resources for other marine taxa are limited to the full genome sequence at the level of ‘model species’, the purple sea urchin Strongylocentrotus purpuratus. For other model and non-model species such as surf clam Spisula solidissima, the sea squirts Ciona intestinalis and Ciona savignyi, the tunicate Oikopleura dioica, the little skate Leucoraja erinacea and the mollusk parasite Perkinsus marinus, the sequencing experiments are inprogress [19]. Marine Microbial Genome Sequencing Project funded in 2004, sequenced nearly 180 marine microorganisms, of which 80% were already published. Microorganisms are known to be the “gatekeepers” of these processes. So their catalytic activities and interaction with the environment will enhance the ability to monitor, model and predict changes in the marine ecosystem [20].
Because of the vast phyletic diversity of marine
organisms, existing genomic model organisms are often with limited relevance,
because there is an enormous evolutionary distance separating these models from
an organism of interest. Genome sequencing has been completed in the
unicellular green algae Chlamydomonas reinhardtii [21]. Genome projects are
inprogress in the marine key species such as Emiliania huxleyi (a pelagic
coccolithophore), Hydra magnipapillata, Litopenaeus vannamei (the pacific white
shrimp) and Amphioxus (the closest living invertebrate relative of the
vertebrates). In the prokaryotes several marine organisms such as multiple
strains of the pelagic photosynthetic bacteria Synechococcus and
Prochlorococcus, rapid progress in sequencing was achieved with many sequenced
genomes [22]
Based upon the studies of Sogin et al. [23] the sequencing of the
genomes of environmentally important organisms such as the diatom Thalassiosira
pseudonana provided the first complete genome from the heterodont lineage. In
addition to this environmental and the phylogenetic importance, silicate
metabolism also gained attention. Like most diatoms, the biotechnological
potential of silicate metabolism constructs a silicate exoskeleton - the
frustule (the production of this structure has great applications in
nanotechnology).
Full genome sequences of some fishes such as zebrafish, fugu, tetraodon, medaka and three-spined stickleback are most valuable. Some non-model organism must be used for answering many questions, because there are close to 30,000 species of fish with maximally more than 300,000,000 years of independent evolution between groups. The species occur in different habitats from arctic streams to marine tropical areas including underground caves and hypoxic tropical lakes. Consequently, a small tropical freshwater cyprinid “zebrafish” may not be for marine tuna which has weighing several kilograms. Whenever a new method becomes available for genomic studies, its utility for non-model organisms should be evaluated which has been done for microarray methodology [24]. Genomics information of many aquatic invertebrates’ species is even less satisfactory than on fishes. This shown that one third of the genes of the recently sequenced genome of Daphnia pulex have no complements in former sequenced genomes. Especially these Daphnia specific genes respond rapidly to environmental disturbances [25]. Altogether, the genome data on fish and Daphnia suggest both rapid evolution and rapid development of genetic responses to environmental changes [26]. Recently, scientists from Norway have examined and presented the genome sequence of Atlantic cod, Gadus morhua. The entire genome assembly was 454 sequencing of shotgun and paired-end libraries and automated annotation identified 22,154 genes. Atlantic cod has missing the genes for MHC?II, CD4 and invariant chain (Ii) that was the conserved trait of jawed vertebrates in the adaptive immune system [27].
Amazing group of organisms satisfying all types
of habitats in the ocean with a wide array of adaptations are crustaceans
(lobster, shrimp, crab, etc.) which hold the supreme species diversity among
marine animals. They are not only plentiful in number, but also the most
commercially exploited food species for human utilization [28]. However, they
are not well studied like their terrestrial arthropod relatives (insects).
Especially in the Indo-Pacific region, the tiger shrimp (Penaeus monodon) has
been one of the most important captured and cultured marine crustaceans [29].
However, the tiger shrimp industry has been weighed down by viral diseases [30]
that lead to economic losses. Developments in shrimp genomics have been limited
although a reasonably good EST database is available [31]. Tiger shrimp genomic
analysis will make a key contribution to decipher the evolutionary history
representing the crustacean lineages. The genomic sequences information will
benefit the shrimp industry by contributing genomic tools to discover the viral
diseases and to build up the breeding program [32].
Fugu is the delicious fish and the liver is poisonous. The liver has to be removed in a peculiar manner before preparing it into a cuisine. The poison of a single liver of a fugu is able to kill 30 persons. The fugu genome was the first vertebrate genome sequenced after human, the whole-genome shotgun method. These draft sequences unravel many interesting diversities in specific protein families sandwiched between human and fugu [33]. The Tetraodon genome sequence was subsequently produced [34] with the whole-genome shotgun method with a higher redundancy in sequence reads (8.3 vs. 5.6). Puffer fish possess about 70 different families of transposable elements, only 20 for human or mouse was reported. Interestingly in Tetraodon, SINE and LINE families are distributed in opposite regions of the genome compared to human or mouse genome. In mammals SINEs are rich in G + C sequences and in Tetraodon more A + T regions and vice versa for LINE elements. More surprisingly, these initial studies of Tetraodon and fugu showed a number of differences in their genomes. G + C rich region in both Tetraodon and mammal genomes is absent in fugu [35].
Bioinformatics will have to use latest computing technology to deal with
the challenges of data analysis in less time. The steep fall in sequencing cost
and the concomitant increase in sequencing speed outpaces the improvement in
computational power, this will likely continue for several years. Until
recently, the main repository for DNA sequences, GenBank, grew at about the
same rate as computing power, following Moore’s law and doubling every 18
months. GenBank contained about 300 Gb of sequence unto 2009. Today 2 Gb of
sequences were generated due to the availability of sophisticated next
generation sequencers. Current algorithms for sequence data analysis have their
roots in the early times of sequence acquirement. Algorithms that are used for
aligning genome sequences tend to have an exponential computing time
requirement due to increase of analysis that leads analysed sequences. New
concepts and approaches will be necessary to reduce this into a linear
requirement [36].
The function of genes has so far largely been studied in a very limited
number of species and in the context of individual organisms. In the next
decade, an immense modification towards the addition of the ecological and
evolutionary context in gene function analysis will have to be inserted with
the genomics. As such, genetics will move on from a largely biomedical
perspective to an ecological perspective with special relevance for global
change questions [4]. A full understanding of the ecosystem, its services and
its stability will not be possible without understanding the genetics of
adaptations and community interactions.