In general, many livestock breed present today have pass through the centuries of natural and human selection. Thus, different breeds were adopted to different environment condition and production criteria. As the results, livestock animals have different phenotypic characteristics that distinguish them from other subgroups in the same species; however, there are difficulties and limitations in distinguishing between different individuals and groups using their basic morphology or biological samples such as blood, tissue, and secretions. This problem has resolved by the genetic classification of individuals of different phenotypes by discovering their DNA profile and genomic level variation (Syvanen, 2001). Particularly, in mitochondrial DNA (mtDNA) and nuclear DNA. In fact, these patterns of genetic variation have various forms such as single nucleotide polymorphism (SNP), insertion and deletion (indel), simple tandem repeat (STR), copy number variation (CNV).
Genomic studies provide basic information to distinguish individuals and varieties at the DNA level (Singh et al., 2014). The discovery of these genetic variations that make the diverse population allows academic, industry and animal breeders to develop a new strain which survived in prevailing climatic changes and reaction changes in consumer demand. During past few decades, genotyping techniques have developed and their costs have sharply decreased, so the scope of application for the genetic markers is continuously expanding. In the animal industry, genetic markers can be a useful identification tool for establishment of traceability system, genetic improvement by recognition of chromosomal regions where harbor the loci that significantly affect economically important traits in livestock production (Fontanesi, 2009). The majority of molecular genetic markers such as SNP and microsatellite markers used nowadays are from high-throughput systems. Moreover, identification of maternal and paternal lineages, within diversity and phylogenetic studies using specific mtDNA and Y chromosome markers are also common in use.
In this purpose, genetic diversity analysis approach using genetic variation is discussed, because it can be used as basic data for developing various application studies in diverse areas. Further, the detail used of molecular marker and high thought put technology for evaluation of diversity and mapping quantitative traits loci (QTL) and candidate gene studies are presented.
Mitochondrial DNA for diversity study in livestock
Mitochondrion is an organelle with intracellular nuclear, consist of circular double helix DNA separate from the cell nucleus. In addition, it has characteristics of maternal origin, since the mitochondria in the oocyte that are inherited to descendants during the reproductive process (Cummins, 2000). Furthermore, unlike genomic DNA sequences, mtDNA has no recombination and lack of rapid nucleotide substitutions, making it suitable for genetic diversity studies and evolutionary studies (Hoque et al., 2013). The mtDNA consist of totally 37 genes, all of which are essential genes for normal function. Within the mitochondria, the D-loop control region, which does not have functions and cytochrome oxidase I (COI) gene, which is associated with oxygen metabolism (Hebert et al., 2003), are widely used for genetic diversity studies because they contain more genetic variations than other part of the mitochondria DNA (Jin et al., 2009). In addition, there are seven NADH dehydrogenase subunit genes called ND that catalyzed the oxidation of Nicotinamide Adenine Dinucleotide Hydrogen (NADH) in the mitochondria. Among them, ND4 and ND5 have reported in previous studies, which were contained many frameshift mutations in 41 amino acid complexes I. These two genes were used to confirming the diversity of breeds (Bourges et al., 2004). In particular, it was used as ideal markers for classifying Bos Indicus and Bos Taurus breeds (Yoon et al., 2008). Therefore, it is possible to distinguish the varieties among breeds by exploring the mutation of the mtDNA and can be used to identify the origin of breeds and the deciphering the background of evolution (Seo and Lee, 2016).
SNP (Single Nucleotide Polymorphism): One of the important Nuclear DNA markers
Single nucleotide polymorphism (SNP) indicates the single base substitution in DNA sequence. In principle, one of the four nucleotide bases can be changes at the individual sequence position but it is appeared in diallelic (Nielsen, 2000). There are two main reasons for this. First, the frequency of single nucleotide substitution at the polymorphism site is very low, ranging from 1x10-9 to 5x10-9 per nucleotide and per year at the neutral positions in mammals. Second reason is the bias mechanism towards the mutation, resulted in two types of SNP. The mutation mechanism may be transition (purine-purine; A↔G or pyrimidine-pyrimidine; C↔T) or transversion (purine-pyrimidine; A↔C and A↔T or pyrimidine-purine; G↔C and G↔T) (Collins and Jukes, 1994; Vignal et al., 2002). The SNP markers that are found as differences in the DNA sequences can be used for genetic identification of varieties and as causative mutations that can affect economic phenotypes of animals (Seo and Lee, 2016). In this respect, the discovery of SNPs and their uses in a variety of genetic studies suggest that they can be efficiently used to investigate genetic diversity and improve the economic traits.
SNP markers generated by Sanger sequencing method
DNA sequencing technology has started by Fred Sanger in 1977 and been developing rapidly during last couple of decades. To identify SNP mutations in the DNA sequence, the techniques including RFLP (Restriction Fragment Length Polymorphism), RAPD (Random Amplified Polymorphic DNA), and AFLP (Amplified Fragment Length Polymorphism) were used. RFLP was firstly used as a genetic analysis tool in 1974 (Grodzicker et al., 1974). This can identify specific base changes in the DNA sequence using restriction enzymes that recognize specific sequences and generate endonucleolytic cleavages, which are of defined length fragments discriminated using agarose gel electrophoresis techniques (Soller and Beckmann, 1983). This technique has been developed as a simple Mendelian codominant marker that allows easy identification of the individual genotype. In addition, it can be used as a genetic marker to identify whole or partial genetic characteristics in the pedigree relationship (Botstein et al., 1980). Usually, gel electrophoresis has a disadvantage of taking long time to confirm the results. To overcome these limitations, an automated electrophoresis technique was developed to identify various species such as cattle, sheep, chicken, turkey and fish using the Agilent 2100 Bioanalyzer chip (Dooley and Garrett, 2001; Fajardo et al., 2010). The RAPD technique was used for randomly amplified the anonymous segment of nuclear DNA using PCR in 1974 (Welsh and McClelland, 1990). This method is used short primer set of 8-10 bp and amplified with low annealing temperature (from 36 to 40°C) and confirmed the presence or absence of a band, indicating the polymorphisms (Welsh and McClelland, 1990; Williams et al., 1990; Liu and Cordes, 2004). Because, it generates multiple products representing the different locus, genomic variation can be easily and accurately identified without prior knowledge of the DNA sequence (Huang et al., 2003; Liu and Cordes 2004). The RAPD genotyping technique can determine the nucleotide sequence differences or the presence of indel(s) in the primer binding site (Liu and Cordes 2004). Therefore, RAPD is previously indicated as the useful tool for identification of species and was developed to the fingerprinting technique. The AFLP is usually identified with detection of restriction fragments in genome wide area by PCR amplification (Vos et al., 1995). The restriction enzyme cleaves the double strand of DNA and adapter is connected to the end of DNA fragment, and template DNA can be generated using complementary primer sets (Vos et al., 1995). This method is a fingerprinting technique for unknown genetic information site using the limited sets of generic primer pairs and is used as a genome mapping tool for genetic diversity study of species without a dense marker map (Mueller and Wolfenbarger, 1999).
Development and application of Next Generation Sequencing (NGS) using High-Density SNP data
Thirty years after the development of Sanger sequencing, Roche 454 of Life Science, Solexa of Illumina and SoLiD system of Applied Biosystems was developed since 2005. This sequence genotyping technology called Next Generation Sequencing (NGS) (Schuster, 2007). These three sequencing technology platforms were compared and analyzed for the sequencing accuracy, variant accuracy, coverage rate, false positive rate, false negative rate and variant discrepancy rate (Harismendy et al., 2009). Many of genome sequencing data accumulated in the field of animal genomics including humans have confirmed nucleotide sequence variations throughout the genome, leading to the development of SNP chips using the array method. For an instant, in cattle, 50K SNP BeadChip array was used for bovine genome-wide association study (GWAS) not only that but also sequence variations have been detected and provide a platform for bovine disease gene study and QTL mapping (Matukumalli et al., 2009). Also, large scale porcine genomes were investigated using 60K BeadChip arrays in pigs (Ramos et al., 2009). In case of sheep, a copy number variation (CNV) map of ovine was generated using a 50K SNP BeadChip array (Liu et al., 2013). For the chicken, draft genome sequencing was first released in 2004 (Hillier et al., 2004) and, subsequently, high density 600K SNP chip was released for SNP genotyping in chicken with wide range of implication (Kranis et al., 2013).
Thereby, SNP array is a new popular technology in diverse animal species and can be used for association mapping, genetic diversity, and phylogenetic study. In addition, mass SNP chip data can be analyzed rapidly and accurately for better understanding of LD (Linkage Disequilibrium), Tandem Repeat Elements, Indel (Indels and deletion), and CNV (Copy Number Variation).
Understanding of LD structure in a population is necessary for developing phenotype related genetic markers and effective application of MAS (Marker Assisted Selection) (Abasht et al., 2009). In general, LD structure change is caused by factors such as migration, selection, and genetic drift in the population. This LD size is determined by the recombination rate in the meiosis stage. The size of LD (r2) can be confirmed by calculating the degree of linkage of the adjacent two loci using allele frequency (Hill and Robertson, 1968). In addition, effective population size can be estimated by calculated LD (r2) values (Sved, 1971). Effective population size is the minimum number of individuals in a population that remains unchanged allele frequency in a repeated generation (Wright, 1940). Since LD can be measured the association of a pair of loci (Devlin and Risch, 1995; Hayes et al., 2003), a chromosome segment homozygosity (CSH) measurement was attempted (Hayes et al., 2003). LD evaluation of the population is essential to evaluate the genetic characteristics of the population and establish an efficient selection strategy (Barrett and Cardon, 2006). In addition, accurate LD evaluation of a population can provide a basis for tracking signature of selection for a phenotype of population and can help to detect causal mutations and genetic markers, which are associated with a phenotype through the genome-wide association study (GWAS). This can be a basic study to maximize the amount of genetic improvement by increasing the selection effect. These LD evaluation methods are also developing together with the recent progress of NGS technology and various types of high-density SNP chips. In addition, with the help of recent imputation method, low-density SNP marker genotyping has been widely used for LD evaluation and GWAS (Hayes and Goddard, 2001).
The VNTR (Variable Number Tandem Repeat) technique is based on the differences in the tandem repeat of eukaryotic genome nucleotide sequences (Takezaki and Nei, 2008). It is a sequence of 1 to 6 bp are called microsatellite (STR; Short Tandem Repeats) (Litt and Luty, 1989) and a sequence of 10 to 60 bp are called minisatellite (VNTR), according to the number of repeat motifs (Jeffreys et al., 1985). Microsatellite is widely used for genetic diversity analysis and breeding program studies in developing countries due to its relatively easy genetic analysis and relatively low cost (Rege et al., 2011). This marker has reported to have an ability to identify polymorphisms by comparing the sizes of repeated simple sequences. It has reported that individuals and breeds can be identified if the expected heterozygosity (Hexp) is higer than 0.5 and polymorphism information contents (PIC) values is higher than 0.6, respectively (Botstein et al., 1980; Vignal et al., 2002). Though, high polymorphic status, quantitative trait loci (QTL) studies were conducted to find positional candidate genes related to economic traits in the early 2000s, recently, they have been replaced by GWAS studies using HD SNPs.
One of the genetic structural mutations, copy number variation (CNV) can contribute to genetic diversity and evolution. The beginning of CNV study, it was defined as a DNA segment larger than 1 kb in size compared to the reference genome, but recently it contains a short segment of 50 bp (Feuk et al., 2006; Alkan et al., 2011). Recently, NGS data have been used to estimate CNV more precisely for the genome-wide studies, for an example, Korean native chicken as well as comparison of CNV in Hanwoo, Angus and Holstein population (Cho et al., 2014; Seo et al., 2015).
SNP markers versus MS markers
Despite the fact that the MS markers are high polymorphic, the MS typing technique is becoming limited use. Especially, because of the technical limit, the labor intensive and the cost of the experiment. One of the technical problems with MS markers is the inconsistency of allele sizes, which makes it difficult to contrast the results generated from various laboratories (Vignal et al., 2002). The reason for the different size of alleles is that allele sizes are measured differently depending on the condition of PCR reaction. For an example; Taq polymerase has synthesized different adenine number at the 3’end of the PCR fragment, depending on the PCR amplification conditions (Brownstein et al., 1996). These additional nucleotide sequences lead to incorrect PCR amplification sizes (Ginot et al., 1996). The error of the fragment size results in the generation of an artificial new allele, and these false positive alleles can be corrected in the pedigree population analysis but in the random population analysis it can cause different results (Vignal et al., 2002). In addition, MS markers require professional analysis to minimize genotyping errors compared with recently developed SNP analysis methods, and it is also relatively expensive and time-consuming to perform PCR and data analysis. On the other hand, SNP markers can be simply confirmed the existence or not. SNP errors are not easy to generate in one of the two alleles which genotyped heterozygote to homozygote or homozygote to heterozygote. In comparison, the SNP analysis method has many advantages in terms of cost, accuracy of analyzing, time-consuming. Also, array-based full automated analysis system is used to identify allele genotypes (Vignal et al., 2002). In general, the MS markers were widely used for identification of individuals and paternity discrimination, but SNP markers were relatively fewer alleles than MS markers. Previously, comparing the efficiency power between 13 MS markers and 37 SNP markers for pig identification and parentage test was obtained and the results indicated that they had similar estimates of a probability of identity (PI) (Lee et al., 2012). Therefore, if a small number of SNP markers was customized, fast and accurate results can be obtained. In particular, the number of SNPs annotated in the genome-wide region through the NGS sequencing analysis developed in the 2000s became the basis for the development of low-density and high-density SNP platforms, and the developed array tools which makes easier and cheaper for the genomic researches.