Consensus Genetic Maps
Because constructing a genetic map is an involved
process, scientists have traditionally created a single map
for each organism. However, a mapping population must
be polymorphic in a particular marker in order for that
marker to be mapped [21], and not every population can
be polymorphic in all markers of interest. For this reason,
scientists have recently begun making multiple maps for
a single organism.
In response, we have developed methods for finding a
consensus from these multiple maps [1], [2]. Our work
was inspired by research on finding consensus rankings–
traditionally studied in the context of social choice [9]
and more recently in ranking web query results [8]. An
overview of our results follow:
- We have modeled the genetic map as a partial
order. For two markers, we use the notation u < v
to denote u precedes v and u||v to denote u and
v are not related.
- We have defined the weighted symmetric difference
distance, a more generalized version of the
symmetric difference distance [16] and Kemeny
distance [18].
- We have developed a method for finding a consensus
and proved that our method finds a median
partial order under our distance metric. The problem
of finding the consensus is NP hard[16], but
an exact solution can often be found. Our solution
makes use of the following concepts:
- Transitive reduction and closure in graphs [7],
[13]
- Strongly connected components [26]
- All pairs shortest paths [12]
- Cycle enumeration [17]
- Translation of minimum feedback arc set to set
cover
- We have shown that the median obeys certain
interesting properties:
- Unanimity Criterion: If u < v in all inputs,
then u < v in the median.
- Positive Responsiveness Criterion: if u||v in
the median, then changing v < u to u < v or
u||v to u < v in some input results in u < v
in the median.
- Extended Condorcet Criterion: if u < v in
a majority of inputs, and there does not exist
some w such that w < u and v < w in the
majority of inputs, then u < v in the median.
- We validated our method using six
genetic maps generated for the crop plant Zea
Mays and verified the results using a separate wet
lab process, showing the resulting consensus map
to be 99.5 percent accurate.