A guide to imputation of low density single nucleotide polymorphism data up to sequence level

Hawlader A. Al-Mamun; Priscila A. Bernardes; Dajeong Lim; Byoungho Park and Cedric Gondro

TechnicalProtocol

A guide to imputation of low density single nucleotide polymorphism data up to sequence level

Hawlader A. Al-Mamun¹, Priscila A. Bernardes^1,2, Dajeong Lim³, Byoungho Park⁴ and Cedric Gondro^1,5

¹School of Environmental and Rural Science, University of New England, Australia
²Faculty of Agronomy and Veterinary Sciences, Universidade Estadual Paulista, Brasil.
³Animal Genomics & Bioinformatics Division, National Institute of Animal Science, RDA, Republic of Korea
⁴Animal Breeding & Genetics Division, National Institute of Animal Science, RDA, Republic of Korea
⁵College of Agriculture & Natural Resources, Michigan State University, USA

Correspondence to Cedric Gondro, E-mail: gondroce@msu.edu

Volume 1, Number 1, Pages 59-68, September 2017.
Journal of Animal Breeding and Genomics 2017, 1(1), 59-68. https://doi.org/10.12972/jabng.20170007
Received on 3 July, 2017, Revised on 20 August, 2017, Accepted on 25 August, 2017, Published on September 30, 2017.
Copyright © 2017 Korean Society of Animal Breeding and Genetics.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0).

ABSTRACT

Over the years, industry, breeding programs and research initiatives have invested heavily in the phenotyping and genotyping of large numbers of animals across the various SNP platforms. Since sequencing is still relatively expensive and to make the most of the historical data already collected; a widely used strategy is to sequence key representative animals from a population and then use this information to impute the sequence of the others. In this paper, we describe the main steps currently used in the Korean Hanwoo cattle pipeline to impute 50k SNP data up to sequence level assisted by a set of reference animals which were sequenced using Illumina sequencing technology. PLINK, VCFtools, Eagle and Minimac3 are used for the imputation steps. Code and a small example dataset are provided to illustrate the process in practice. This simple roadmap can be used for phasing and imputation of livestock genomic datasets, adding additional value to the datasets already collected across the various platforms.

KEYWORDS

Sequence Imputation, Hanwoo, SNP, genomic prediction, PLINK, VCFtools, Eagle, Minimac3, R

ACKNOWLEDGEMENTS

This project was supported by a grant from the Next-Generation BioGreen 21 Program PJ01134906 and PJ012611, Rural Development Administration, Republic of Korea and Australian Research Council (DP130100542). We want to thank Iona Macleod, Bolormaa Sunduimijid and Hans Daetwyler for kindly sharing their broad experience with sequence imputation; we appreciate all the hard work that was needed to underpin a robust imputation strategy.

Journal of Animal Breeding and Genomics (J Anim Breed Genom)

Indexed in KCI

OPEN ACCESS, PEER REVIEWED

pISSN 1226-5543
eISSN 2586-4297

TechnicalProtocol

A guide to imputation of low density single nucleotide polymorphism data up to sequence level

ABSTRACT

KEYWORDS

ACKNOWLEDGEMENTS

Section