Resumo:
Imputation accuracy among other things depends on the size of the reference panel, the
marker's minor allele frequency (MAF), and the correct placement of variants on the
reference genome assembly. Using high-density genotypes of 3938 Nellore cattle from
Brazil, we investigated the accuracy of imputation from 50K to 777K SNP density,
using map positions determined according to the bovine genome assemblies UMD3.1
and ARS-UCD1.2. We assessed the effect of reference and target panel sizes on the prephasing-based imputation quality using ten-fold cross-validation. Further, we compared
the reliability of the model-based imputation quality score (Rsq) from Minimac3 to
empirical imputation accuracy. The overall accuracy of imputation measured as the
squared correlation between true and imputed allele dosages (R2dose) was virtually
identical using either the UMD3.1 or ARS-UCD1.2 genome assembly. When the size of
the reference panel increased from 250 to 2000, R2dose increased from 0.845 to 0.917,
and the number of polymorphic markers in the imputed data set increased from 586,701
to 618,660. Advantages in both accuracy and marker density were also observed when
larger target panels were imputed, likely resulting from more accurate haplotype
inference. Imputation accuracy and the marker density in the imputed data increased
from 0.903 to 0.913 and from 593,239 to 595,570 when haplotypes were inferred in 500
and 2900 target animals, respectively. The model-based imputation quality scores from
Minimac3 (Rsq) were highly correlated to but systematically higher than empirically
23 estimated accuracies. The correlation between these metrics increased with the size of
the reference panel and MAF of imputed variants. Accurate imputation of BovineHD
BeadChip markers is possible in Nellore cattle using the new bovine reference genome
assembly ARS-UCD1.2. The use of large reference and target panels improves the
accuracy of the imputed genotypes and provides genotypes for more markers
segregating at low frequency for downstream genomic analyses. The model-based
imputation quality score from Minimac3 (Rsq) can be used to detect poorly imputed
variants but its reliability depends on the size of the reference panel used and MAF of
the imputed variants.