Hawlader A. Al-Mamun1, Priscila A. Bernardes1,2, Dajeong Lim3, Byoungho Park4 and Cedric Gondro1,5
1School of Environmental and Rural Science, University of New England, Australia
2Faculty of Agronomy and Veterinary Sciences, Universidade Estadual Paulista, Brasil.
3Animal Genomics & Bioinformatics Division, National Institute of Animal Science, RDA, Republic of Korea
4Animal Breeding & Genetics Division, National Institute of Animal Science, RDA, Republic of Korea
5College of Agriculture & Natural Resources, Michigan State University, USA
Correspondence to Cedric Gondro, E-mail: gondroce@msu.edu
Volume 1, Number 1, Pages 59-68, September 2017.
Journal of Animal Breeding and Genomics 2017, 1(1), 59-68. https://doi.org/10.12972/jabng.20170007
Received on 3 July, 2017, Revised on 20 August, 2017, Accepted on 25 August, 2017, Published on September 30, 2017.
Copyright © 2017 Korean Society of Animal Breeding and Genetics.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0).
Over the years, industry, breeding programs and research initiatives have invested heavily in the phenotyping and genotyping of large numbers of animals across the various SNP platforms. Since sequencing is still relatively expensive and to make the most of the historical data already collected; a widely used strategy is to sequence key representative animals from a population and then use this information to impute the sequence of the others. In this paper, we describe the main steps currently used in the Korean Hanwoo cattle pipeline to impute 50k SNP data up to sequence level assisted by a set of reference animals which were sequenced using Illumina sequencing technology. PLINK, VCFtools, Eagle and Minimac3 are used for the imputation steps. Code and a small example dataset are provided to illustrate the process in practice. This simple roadmap can be used for phasing and imputation of livestock genomic datasets, adding additional value to the datasets already collected across the various platforms.
Sequence Imputation, Hanwoo, SNP, genomic prediction, PLINK, VCFtools, Eagle, Minimac3, R
This project was supported by a grant from the Next-Generation BioGreen 21 Program PJ01134906 and PJ012611, Rural Development Administration, Republic of Korea and Australian Research Council (DP130100542). We want to thank Iona Macleod, Bolormaa Sunduimijid and Hans Daetwyler for kindly sharing their broad experience with sequence imputation; we appreciate all the hard work that was needed to underpin a robust imputation strategy.