Consensus Genetic Maps

Because constructing a genetic map is an involved process, scientists have traditionally created a single map for each organism. However, a mapping population must be polymorphic in a particular marker in order for that marker to be mapped [21], and not every population can be polymorphic in all markers of interest. For this reason, scientists have recently begun making multiple maps for a single organism.

In response, we have developed methods for finding a consensus from these multiple maps [1], [2]. Our work was inspired by research on finding consensus rankings– traditionally studied in the context of social choice [9] and more recently in ranking web query results [8]. An overview of our results follow:

  1. We have modeled the genetic map as a partial order. For two markers, we use the notation u < v to denote u precedes v and u||v to denote u and v are not related.
  2. We have defined the weighted symmetric difference distance, a more generalized version of the symmetric difference distance [16] and Kemeny distance [18].
  3. We have developed a method for finding a consensus and proved that our method finds a median partial order under our distance metric. The problem of finding the consensus is NP hard[16], but an exact solution can often be found. Our solution makes use of the following concepts:
    • Transitive reduction and closure in graphs [7], [13]
    • Strongly connected components [26]
    • All pairs shortest paths [12]
    • Cycle enumeration [17]
    • Translation of minimum feedback arc set to set cover
  4. We have shown that the median obeys certain interesting properties:
    • Unanimity Criterion: If u < v in all inputs, then u < v in the median.
    • Positive Responsiveness Criterion: if u||v in the median, then changing v < u to u < v or u||v to u < v in some input results in u < v in the median.
    • Extended Condorcet Criterion: if u < v in a majority of inputs, and there does not exist some w such that w < u and v < w in the majority of inputs, then u < v in the median.
  5. We validated our method using six genetic maps generated for the crop plant Zea Mays and verified the results using a separate wet lab process, showing the resulting consensus map to be 99.5 percent accurate.