Linear Ordering in the SAT Encoding of the All-Different Constraint over Bit-Vectors

By Pavel Surynek
Charles University in Prague, Faculty of Mathematics and Physics

Abstract.

A novel eager encoding of the ALLDIFFERENT constraint over bit-vectors is presented in this short paper. It is based on 1‑to‑1 mapping of the input bit-vectors to a linearly ordered set of auxiliary bit-vectors. Experiments with four SAT solvers showed that the new encoding could be solved order of magnitudes faster than the standard encoding in a hard unsatisfiable case.

1      INTRODUCTION AND MOTIVATION

Models of many real-life problems require a subset of modeling variables to be pair-wise distinct. This requirement is known as an ALLDIFFERENT constraint [4] in the constraint programming context. As the SAT solving technology [1], [3], [5] is becoming a tool of choice in many practical applications, efficient manipulation with the ALLDIFFERENT constraint in SAT solvers is of interest. Unlike other works on translating the ALLDIFFERENT constraint into SAT that use direct encoding of variable’s domains [4], we study how to encode the constraint over the set of bit-vectors, which essentially use binary encoding. We present a new eager encoding that maps the given set of bit-vectors to a linearly ordered set of auxiliary bit-vectors. We show that the new encoding is more efficient for hard unsatisfiable cases of the constraint on which SAT solvers struggle with the existent encoding for bit-vectors [2].

2      BACKGROUND – STANDARD MODEL

Suppose to have a set of bit-vectors  {B1, … , Bn } each of length l. Bit-vectors are interpreted as non-negative integers. The ALLDIFFERENT constraint over  B1, … , Bn – denoted as ALLDIFFERENT(B1, … , Bn ) – requires that numbers represented by the bit-vectors are all distinct.

The standard encoding [2] basically follows the scheme where pair-wise differences are encoded:

Trivially, it is possible to encode the individual inequalities as follows. Let the i-th bit of the k-th bit-vector with i in {1,…,l} and k in {1,…,n} be denoted as Bki

However, if unfolded into the CNF representation though the distributive rule it results into too many clauses which is impractical. Therefore encoding using auxiliary propositional variables is used. It follows the standard technique of Tseitin’s hierarchical encoding. A fresh propositional variable is introduced for each inequality between individual bits of the involved bit-vectors. That is, there is a new variable aii,j for every i,j in {1,…,n} with i<j and I in {1,…,l} (remark by Editors: boldface i and normal i refer to different counters, the difference is clear in the original, word, submission-see the PDF in case of doubt). The auxiliary variable indicates if the corresponding bits in the inequality between bit-vectors differ or not. Thus, the following clauses are included to express this interpretation:

Bit-vectors  Bi and Bj differ if they differ in at least one position; that is, the following clauses should be included:

Notice that auxiliary variables are linked to the original bits only in one direction. If aii,j is set to TRUE then Bii and Bij  are forced to differ. However, if   aii,j is set to FALSE then Bii and Bij are left unconstrained.

Proposition 1 (STANDARD ENCODING SIZE).  The standard encoding of the ALLDIFFERENT constraint requires l n propositonal variables to represent the bit-vectors and l n (n+1) / 2 auxiliary propositional variables; that is O(l n2) variables altogether. The number of clauses is  1 + l n (n+1), ; that is, that is O(l n2). ■

Figure 1: Illustration of the standard and the bijection ALLDIFFERENT encodings. In the bijection encoding, a 1-to-1 mapping of the bit-vectors is found first. Then the values of bit-vectors are forced to be linearly ordered according to their position in the mapping.

3      ALTERNATIVE BIJECTION ENCODING

We observed that a SAT solver struggles over the standard encoding especially in the unsatisfiable case according to our preliminary experiments. Therefore we developed an alternative encoding that is more suitable for this case. It maps the original bit-vectors to a linearly ordered set of auxiliary bit-vectors. First, a 1-to-1 mapping (bijection) between sets of bit-vectors needs to be modeled to enable this encoding style (see Figure 1 for illustration).

Let the new linearly ordered bit-vectors be denoted as A1, … , An . Additionally bit‑vectors α1,..,αn and β1,…,βn  of size ⌈lg2 n⌉ are introduced to model the bijection. The bit-vector αk indicates what  Ai  with i in {1,…,m} the original Bk  will be mapped to. Bit-vectors βj are used to enforce that at most one original bit-vector is mapped to a single ordered bit-vector. The following integer constraints are to establish the bijection:

It is crucial that domains of bit-vectors   αand βj  consists of exactly n values to ensure that the bijection is modeled correctly (extra values are forbidden). The individual integer implication is encoded with a single auxiliary propositional variable ekj  as follows:

where in the last line integer constraints enforcing the ordering are introduced.

The individual inequality is encoded as a strict lexicographic ordering over the two bit-vectors. Now, l fresh propositional variables are introduced to indicate the first bit where Ai is less than Aj. The ordering itself then just means that there exists such a first bit where bit-vectors differ:

Proposition 2 (BIJECTION ENCODING SIZE).  The bijection encoding requires 2l n  propositional variables to represent the bit-vectors, 2 n ⌈lg2 n⌉ variables to represent the bijection, and n2 + l  auxiliary propositional variables; that is  propositional variables altogether. The number of clauses is n2 ( 1 + l + ⌈lg2 n⌉) + (n-1)l(l+1); that is, O(n2 max{⌈lg2 n⌉,l) + n l2). ■

Table 1. Comparison of sizes of the standard and the bijection encoding.

 #bit-vectors (16-bits) Standard Bijection #Variables #Clauses #Variables #Clauses 64 67584 133056 9968 176943 128 266240 536448 28400 690031 256 1056768 2154240 90096 2756591

4      EXPERIMENTAL EVALUATION

As shown in Table 1, the bijection encoding has fewer variables while the number of clauses is slightly higher than in the standard encoding. Nevertheless, we also need runtime comparison. A setup where a transition-phase behavior was observed is presented. We used 32 bit-vectors consisting of 6 bits. Additionally, there was a lower bound and an upper bound per each bit‑vector.

If d ≤ 34 is a given domain size, then the lower bound  bkL  and the upper bound  bk for the bit-vector   Bk were generated randomly as follows: bkL was selected uniformly from [0..34-d] and  bwas set to bkU+d. Thus,   bkL  ≤ Bk bkis enforced for each k. Finally, a single ALLDIFFERENT over 32 bit-vectors was added.

Four SAT solvers were used in the evaluation: MINISAT [3], GLUCOSE [1], and CRYPTOMINISAT [5]. The runtime was measured for different domain sizes  ranging from  to  – Figure 2. For small  unsatisfiability could be checked easily; for large  the same could be done for satisfiability. The most interesting behavior occurred around d=13 which represent difficult cases.

None of the tested SAT solvers was able to solve all the instances over the standard encoding in the time limit of 1 hour (wall clock limit per instance). The best performing over the standard encoding was GLUCOSE, which solved 29 instances out of 33 and was also the fastest. Over the bijection encoding, MINISAT and CRYPTOMINISAT solved all the instances and very importantly, the runtime of CRYPTOMINISAT was always below 2 seconds. GLUCOSE also performed relatively well compared to the standard encoding with 30 solved instances.

Generally, the standard encoding can be solved faster in the satisfiable case. However, the bijection encoding is significantly better in the hard unsatisfiable case. This is because it can be checked more easily for this encoding if there are enough values in domains of bit-vectors to establish the required pair-wise difference (at least by some SAT solvers). A single linearly ordered set of bit-vectors is matched into the domains while in case of the standard encoding all the orderings (permutations) of the original bit-vectors may be checked.

Figure 2. Instances are sorted according to the increasing runtime.

5      CONCLUSION

A new encoding for the ALLDIFFERENT constraint over bit-vectors based on 1-to-1 mapping has been proposed. It has fewer variables and it is more efficient in difficult unsatisfiable cases than the existent encoding [2] that uses pair-wise differences. In the future work, it would be also interesting to investigate how the presented eager encodings performs with respect to the strong ALLDIFFERENT propagators [4] integrated with the solver lazily via the SMT framework and also how it performs in applications.

REFERENCES

[1]      G. Audemard, L. Simon, ‘Predicting Learnt Clauses Quality in Modern SAT Solver’, Proceedings of IJCAI 2009, (2009).

[2]      A. Biere, R. Brummayer, ‘Consistency Checking of All Different Constraints over Bit-Vectors within a SAT Solver’, Proceedings of FMCAD 2008, 1-4, (2008).

[3]      N. Eén, N. Sörensson, ‘An Extensible SAT-solver’, Proceedings of SAT 2003, 502-518, (2003).

[4]      P.  Nightingale, I. Gent, ‘A New Encoding of AllDifferent into SAT’, CP 2004 Workshop on Modelling and Reformulating CSPs, (2004).

[5]      J.-C. Régin, ‘A Filtering Algorithm for Constraints of Difference in CSPs’, Proceedings of AAAI 1994, 362-367, (1994).

[6]      M. Soos, K. Nohl, C. Castelluccia, ‘Extending SAT Solvers to Cryptographic Problems’, Proceedings of SAT 2009, 244-257, (2009).