# Results for solving Zebra and Sherlock

## What is in this article

## Results

### Hypotheses 1 and 2

The hypothesis that achieving arc consistency using AC-3 is sufficient to solve the Zebra problem can be quickly disproved using the benchmark Zebra problem presented earlier in this paper. Once arc consistency has been achieved there still remain several variables with more than one possible value in its domain. Similar results can be seen for the Sherlock problem by applying AC-3 to a random case generated by the GenerateRandomProblem() procedure of the preceding section.

One such sample problem appears in Appendix A

Kumar (in [Kumar92]) reports that arc consistency is not, in general, sufficient to find a solution (or that there is no solution). He then goes on to show that for a constraint graph having n nodes, it can be solved using only arc consistency, if the graph is n-consistent. He also shows that it is sometimes possible to solve a n-node bcsp with less than n-consistency.

### Hypotheses 3 and 4

Given that arc consistency by itself cannot solve all binary constraint problems it follows that an algorithm such as BM-CBJ2 which can is better at solving the bcspes. Having said that, comparing the performance of AC-3 to that of BM-CBJ2 for problems that AC-3 can solve is still an interesting exercise. Therefore, using problems generated using GenerateStrongKProblem(), we obtain the following results over 1000 test cases for each of the Zebra and Sherlock problem.

Measurement | AC-3 vs. BM-CBJ2 |
---|---|

Fewer Checks (number of cases) | 25975 |

Less Time (number of cases) | 277710 |

Average Number of Checks | 234103755 |

Minimum Number of Checks | 9876137 |

Maximum Number of Checks | 42086134394 |

Average Time (CPU + System) (seconds) | 1.771.72 |

Minimum Time (CPU + System) (seconds) | 1.571.39 |

Maximum Time (CPU + System) (seconds) | 1.985.24 |

AC-3 vs. BM-CBJ2: Zebra

Measurement | AC-3 vs. BM-CBJ2 |
---|---|

Fewer Checks (number of cases) | 124876 |

Less Time (number of cases) | 417573 |

Average Number of Checks | 89090139170 |

Minimum Number of Checks | 33529294 |

Maximum Number of Checks | 16439161250399 |

Average Time (CPU + System) (seconds) | 2.164.67 |

Minimum Time (CPU + System) (seconds) | 1.831.50 |

Maximum Time (CPU + System) (seconds) | 2.561145.12 |

AC-3 vs. BM-CBJ2: Sherlock

What is most interesting about the results is the fact that in a small number cases BM-CBJ2 is *much* worse than AC-3, so much so that for Sherlock most of the performance measures look worse for BM-CBJ2 than AC-3 despite the fact that it was better more than half the time. Exploring the properties of the problems that result in such bad performance is a useful future exercise.

### Hypothesis 5

The hypothesis that BM-CBJ2 is more efficient at solving most of a large random sample of of Zebra problems is true, as can be seen below. Further, we can see that the algorithm tested can be ordered, both in terms of constraint checks and in terms of time, as follows: BM-CBJ2 < DoubleX < Xover < Mutate although DoubleX is only slightly better than Xover. The change to Xover to get DoubleX is not a clear winner, but crossover plus mutation is without a doubt better than mutation alone. The numerical results follow.

Measurement | BM-CBJ2 | Mutate | Xover | DoubleX |
---|---|---|---|---|

Average Number of Checks | 3554 | 12588855 | 3897429 | 3315161 |

Minimum Number of Checks | 412 | 541110 | 379620 | 499423 |

Maximum Number of Checks | 15093 | 132081622 | 323868611 | 212341720 |

Average Time (CPU + System) (seconds) | 2.17 | 62.65 | 22.00 | 19.30 |

Minimum Time (CPU + System) (seconds) | 1.82 | 5.44 | 4.66 | 5.14 |

Maximum Time (CPU + System) (seconds) | 6.49 | 686.84 | 1603.49 | 986.12 |

Zebra, BM-CBJ2 vs Mutate vs Xover vs DoubleX

## Conclusions and Future Work

This paper studied a number of hypothesis, and in the process developed some algorithms. The algorithms for generating random problems proved to be harder to develop than expected. Original attempts involved attempts to use a heuristic rather than solving the bcsp for each new addition of a constraint, but that failed so the current solution (using AC-3 for strongly k-consistent generation and CBJ-Multi for ‘totally random’ problems) was developed.

The genetic algorithm development proved easier although a genetic algorithm which bested BM-CBJ2 was not found in producing this paper (a possible route to take for this is to make the chromosones more complex, rather than simply being the solution to the bcsp). BM-CBJ2 proved to be the best complete (i.e. works for all problems) algorithm as AC-3 cannot always find a solution, but for strongly k-consistent problems AC-3 was found to be the better choice for both the Zebra and the Sherlock problems. If determining k-consistency before solving the problem proves to be both possible and efficient a hybrid approach could prove to be ideal.

Further work on random problem generation is needed. It would, example, be useful to test the ‘totally random’ problems to determine if they are a good cross-section of real-world problems, or if they tend to cluster around a particular type of problem. In addition, it could be informative to determine how many of good random selection of problems are strongly k-consistent (and therefore solvable by AC-3).

Additional questions include the value of AC-3 preprocessing, which according to [Kumar92] is a common practise, and how BM-CBJ2 stacks up against both other backtracking algorithms ([Kondrak97] proves that BM-CBJ2 performs fewer constraint checks but whether is is faster depends on whether the reduction in constraint checking comes with a greater overhead cost). BM-CBJ2 also needs to be compared to other algorithms out there. In addition, this paper makes no attempt to compare variable instantiation orders for BM-CBJ2, and it is known that variable instantiation order can have a significant performance impact on backtracking and backjumping algorithms [Prosser93].