CHAPTER ONE
1.0 INTRODUCTION
Bioinformatics is the science of storing, extracting,
organizing, analyzing, interpreting and utilizing information from biological
sequences and molecules (Khalid, 2010). Bioinformatics is often defined as the
application of computational techniques to understand and organize the information
associated with biological macro-molecules (Luscombe et al., 2001). It has been
mainly fueled by advances in DNA sequencing and mapping techniques (Khalid,
2010). Over the past few decades, rapid developments in genomic, other
molecular research technologies and information technologies have combined to
produce a tremendous amount of information related to molecular biology. The
primary goal of bioinformatics is to increase the understanding of biological
processes (Khalid, 2010).As biology is increasingly becoming a
technology-driven science, databases have become indispensable to store not
only data, but also the results of experiments generated by different research
projects around the world (Hey et al., 2009). A biological database is a
collection of information, or data from a biological system, stored in a
computer readable format. Some databases are also called data repositories if
they function as a place where large biological datasets can be stored and
retrieved by users. Sharing of data between scientists accelerates the speed of
discoveries and has the potential to greatly advance a scientific field as a
whole (this is known as the Fourth Paradigm of Data-Driven Scientific Discovery
(Hey et al., 2009). There are two types of biological databases: public
databases that are freely accessible on-line, and private databases that
require payment before you can access them (Dutilh and Keșmir, 2016).
The genome of a species encodes genes and other functional
elements, interspersed with non-functional nucleotides in a single
uninterrupted string of DNA (IHGSC, 2001).
Recognizing protein-coding genes typically relies on finding
stretches of nucleotides free of stop codons called Open Reading Frames (ORFs)
that are too long to have likely occurred by chance. Since stop codons occur at
a frequency of roughly 1 in 20 random sequence, ORFs of at least 60 amino acids
will occur frequently by chance (5% under a simple Poisson model), and even
ORFs of 150 amino acids will appear by chance in a large genome (0.05%). This
poses a huge challenge for higher eukaryotes in which genes are typically
broken into many, small exons (on average 125 nucleotides long for internal
exons in mammals (IHGSC, 2001).
Some regions within a protein sequence are more conserved
than others during evolution (Dutilh and Keșmir, 2016). These regions are
generally important for the function of a protein and/or the maintenance of its
three dimensional structure, or other features related to its localization or
modification. By analyzing constant and variable properties of such groups of
similar sequences, it is possible to derive a signature for a protein family or
domain, which distinguishes its members from other unrelated proteins by
sequence alignment, which allows us to discover these signatures (Dutilh and
Keșmir, 2016). Sequence alignment is defined as the bioinformatics task of
locating equivalent regions of two or more sequences, and aligning their
nucleotide or amino acid residues side by side, to maximize their similarity
(Dutilh and Keșmir, 2016). Multiple sequence alignments allow for
identification of conserved sequence regions. This is very useful in designing
experiments to test and modify the function of specific proteins, in predicting
the function and structure of proteins, and in identifying new members of
protein families (Dutilh and Keșmir, 2016).
DNA Sequencing is a technique/method by which the exact
order of nucleotides within a DNA molecule is determined (Mayor et al., 2000).
Comparative data analysis provides the opportunity to determine what is shared
and what is unique to each species (Mayor et al., 2000).
Growth in animals is controlled by a complex system, in
which the somatotropic axis plays a key role. The genes that operate in the
somatotropic axis are responsible for the postnatal growth, mainly GH that acts
on the growth of bones and muscles mediated by IGF-1 (Sellier, 2000). The
growth hormone (GH) and insulin-like growth factor 1 (IGF-1) genes are
candidates for growth in bovine, since they play a key role in growth
regulation and development (Hossner et al., 1997; Tuggle and Trenkle, 1996).
Effects of GH on growth are observed in several tissues, including bone, muscle
and adipose tissue. These effects result from both direct action of GH on the
partition of nutrients and cellular multiplication and IGF-1-mediated action
stimulating cell proliferation and metabolic processes associated to protein
deposition (Boyd and Bauman, 1989). IGF-1 stimulates protein metabolism and is
important for the function of some organs, being considered a factor of
cellular proliferation and differentiation (Andreaet al., 2005). Polymorphisms
in GH gene have been used as a genetic marker associated with different
performances and productions traits such as body weight, birth weight and
weaning weight in goat (Wickramaratne et al., 2010), The rabbit GH gene has
already been sequenced by Wallis and Wallis (1995) and has been investigated as
a gene associated with market weight of commercial rabbit (Fontanesi et al.,
2012). Mutations of this GH gene have been described in goats (Malveiro et al.,
2001), and poultry (Feng et al., 1997) to affect important production traits.
In chickens divergently selected for high or low growth
rates, there were significantly higher IGF-1 mRNA levels in the high growth
rate line than in the low growth rate line (Beccavin, et al., 2001). The growth
hormone receptor (GHR), insulin-like growth factor-1 (GH-IGF-1) system controls
the number of follicles in animals that are recruited to the rapid growth phase
(Roberts et al., 1994; Monget, et al., 2002). It is also known that the
GH-IGF-1 system has been modified as a result of selection for enhanced growth
rate (Ballard et al., 1990; Ge et al., 2001). The insulin-like growth factor
gene (IGF1) is a candidate gene for growth, body composition and metabolism,
skeletal characteristics and growth of adipose tissue and fat deposition in
chickens (Zhou et al., 2005). Earlier research on GHR, IGF-1 and IGFBP-3 in
cattle, goats and chickens showed genetic polymorphisms and their association
with production traits (Liu et al., 2010). The IGF1 gene is essential for
normal embryonic and postnatal growth in mammals (Bian et al., 2008).
Myostatin (MSTN), previously called Growth differentiation
factor 8 (GDF8), is a member of transforming growth factor-β (TGF-β)
superfamily. It is a negative regulator for both embryonic development and
adult homeostasis of skeletal muscle (Tu et al., 2014). Myostatin (MSTN) is a
negative regulator of the muscle growth factor, which belongs to the
transforming growth factor beta superfamily (McPherron et al., 1997). It is
able to negatively control the growth of muscle cells by inhibiting the
transcriptional activity of MyoD family members. Its expression is negatively
correlated with muscle weight (Weber et al., 2005). Mutations in the myostatin
gene have also been shown to cause doublemuscling in humans and other species
(Clop et al., 2006). These findings suggest that strategies for inhibiting
myostatin function may be applied to improve animal growth. Homozygote and
heterozygote cattle with mutations of the MSTN gene-conserved Ribbon bases
exhibit the advantage of strong muscle in increase birth weight, and obvious
double-hip muscle characteristics (Casas et al., 1999). As the candidate gene
in pig double-hip muscle, the MSTN gene has an important impact on the amount
of lean meat and fat deposition (Sonstegard et al., 1998). The rabbit is a high
quality and efficient meat producing livestock as well as a common experimental
animal. Therefore, providing
information on its genetic basis and regulation mechanism of
skeletal muscle growth and development has an important theoretical and
practical significance (Qiao, 2014). The effects of the SNPs of myostatin gene
on chicken growth in a F2 resource population are associated with increase in
abdominal fat weight, abdominal fat percentage, birth weight and breast muscle
percentage (Zhiliang et al., 2004). Notably, these data suggest that myostatin
could be an ideal molecular marker for marker-assisted selection for skeletal
muscle and adipose growth in chicken breeding program. It was reported that
TTTTA deletion phenomenon occurred in MSTN gene was unique for goats when
compared with sheep, cattle, water buffalo, domestic yak, pigs, and humans
(Grisolia et al., 2009; Zhang et al., 2013) Khichar et al. (2016) found an
important effect of a 5-base pair (bp) deletion onearly body weight and size of
a goat.
1.1 Justification
Identification of a candidate gene is a powerful method for
understanding the direct genetic basis involved in the expression of
quantitative traits and their differences between individuals (Rothschild and
Soller, 1997; Nagaraja et al., 2000). Mutations of the MSTN gene-conserved
region bases in chicken, rabbit and goat will lead to the activation or
inhibition of the gene expression product and the loss or increase in function
or inhibiting muscle growth, which will result in excessive muscle development
and expression (Lee and McPherron, 1999). Indeed, there have been several
recent examples in which comparative sequence data have led to the discovery
and understanding of function of previously undefined genes. The complete
human/mouse orthologous-sequence dataset proved particularly valuable in the
characterization of gene families in humans and mice (Dehal et al., 2001). For
instance, by comparing olfactory receptor gene families on human chromosome 19,
computational analysis indicated that humans have approximately 49 olfactory
receptor genes, but only 22 had maintained an open reading frame and appeared
functional. This contrasts with the vast majority of the homologous mouse genes
that have retained an open reading frame. This finding of reduced olfactory
receptor diversity in humans is consistent with the reduced olfactory needs and
capabilities of humans relative to rodents (Pennacchio and Rubin, 2003).
Growth hormone gene (GH) a single polypeptide produced in the
anterior pituitary gland is a promising candidate gene marker for improving
milk and meat production in goats and other farm animals (Min et al., 2005).
IGF1 is a mediator of many biological effects; it increases the absorption of
glucose, stimulates myogenesis and production of progesterone, inhibits
apoptosis, participates in the activation of cell cycle genes, increases the
synthesis of lipids, and intervenes in the synthesis of DNA, protein, RNA , and
in cell proliferation (Mohammadi et al., 2011)
The increasing availability of genomic sequence from
multiple organisms has provided biomedical scientists with a large dataset for
orthologous-sequence comparisons. The rationale for using cross-species
sequence comparisons to identify biologically active regions of a genome is
based on the observation that sequences that perform important functions are
frequently conserved between evolutionarily distant species, distinguishing
them from nonfunctional surrounding sequences. (Pennacchio and Rubin, 2003). Sequence
alignment is a good way of predicting the function of a gene or protein.
Moreover, sequences contain a lot more information, such as from which organism
the gene or protein is derived, and what are the evolutionary relationships of
the gene or species with other genes or species. Much of this information can
only be discovered by finding homologs of the gene or protein in other species
(Dutilh and Keșmir, 2016).
To justify this study, a comparative genomics analysis to
access the similarities and differences between these three growth genes;
Growth hormone (GH), Myostatin (MSTN) and Insulin-like growth factor-1 (IGF-1)
gene among chicken, rabbit, and sheep will identify the similarities or
differences in the rate of increase in growth and body size to maturity, final
body size at maturity, and body conformation at maturity. The analysis of
sequences conserved between these three species will further enrich available
information of biologically active sequences in these species.