Scientific Publication

Sampling strategies for conserving diversity when forming core subsets using genetic markers

Abstract

Core subsets can be formed on the basis of molecular markers and different sampling strategies. This research used genetic markers on three maize data sets for studying 24 stratified sampling strategies to investigate which strategy conserved the most diversity in the core subset as compared with the original sample. The strategies were formed by combining three factors: (i) two clustering methods (UPGMA and Ward), based on (ii) two initial genetic distance measures, and using (iii) six allocation criteria (two based on the size of the cluster and four based on maximizing distances in the core (the D method) used with four diversity indices). The objectives were (i) to study the influence of these factors and their interaction on the diversity of the core subsets and (ii) to compare the 24 stratified sampling strategies with the M strategy implemented in the MSTRAT algorithm. Success of each strategy was measured on the basis of maximizing genetic distances (Modified Roger and Cavalli-Sforza and Edwards distances) and genetic diversity indices (Shannon index, proportion of heterozygous loci, and number of effective alleles) in each core. Twenty independent stratified random samples were obtained for each strategy using a sampling intensity of 20% of the collection. For the three data sets, the UPGMA with D allocation methods produced core subsets with significantly more diversity than the other methods and were better than the M strategy for maximizing genetic distance. For most of the diversity indices, the M strategy outperformed the D method