Let B be a collection of regions with added pieces. That is,
each B Î B is a region BR of size k to be tiled, together
with a set BP of n pieces from the 209 (of course n ³ k).
For B Î B, let A(B) be the collection of all subsets of BP
of size k, and let the solution set S(B) Ì A(B) be the
collection of choices of k pieces which will tile the region. Let's
pretend that each S Î S(B) can only tile the region in one way
(although it may be that due to a small symmetric subregion in the
solution there are `duplicates').
We will assume that each piece has a `goodness' pi such that the
probability of being able to tile a region of size k is proportional
to the product of these numbers. Throughout, li = log(pi).
Let F be a set of boundary features and for j Î F, cj(R) be
the value of the jth boundary feature of the region R. For
example, we used c0(R) = the number of corners (changes in
direction) in the boundary of R, c1(R) = total length of bad edge
in R in units of altitudes, and c2(R) = the total length of good
edge in R in units of half-base of equilateral triangle. (`bad
edge' is our name for edge which consists of altitudes of the
equilateral triangles, and `good edge' our name for the other
sort). We shall assume there are numbers mj such that the
probability of being able to tile a region R is proportional to
|
exp |
æ è
|
å
j Î F
|
mjcj(R) |
ö ø
|
. |
|
Actually the above assumptions are not really assumptions since you
can always declare the probability of being able to tile something is
such-and-such. The above `assumptions' are really just defining the
information on which our `distinguisher' is allowed to depend.
Using the above definitions the probability of being able to tile a region R
with a set of pieces X is according to our model
|
P(R,X) = exp |
æ è
|
å
i Î X
|
li+ |
å
j Î F
|
mj cj(R) |
ö ø
|
. |
| (1) |
We need to choose li and mj to make P(R,X) high when X can
tile R and low when it can't.
The likelihood of the model given the data is
|
|
Õ
B Î B
|
|
é ë
|
Õ
S Î S(B)
|
P(BR,S) |
Õ
X Î A(B)\S(B)
|
(1-P(BR,X)) |
ù û
|
. |
|
We're going to make a few modifications (described below) to this
formula, and after that we'll choose li and mj to maximise the
(new) likelihood. [By the way S = the negative logarithm of the likelihood
has another interpretation as `conditional entropy', or residual
uncertainty. Imagine trying to explain (or compress) the raw binary
sequence of 0,1 formed by taking all pairs (R,X) (in some random
order) and writing 0 if the pieces X cannot tile the region R
and writing 1 if they can. If there are N digits altogether in
this 0,1 sequence and proportion p of them are 1 and you
compressed it without knowing anything about tiling you would probably
reduce it to length S0 = -N(plog(p)+(1-p)log(1-p)). This is what
you would get for S if you didn't allow P(R,X) to depend on anything,
in which case you would choose P(R,X) to be equal to the constant
p. At the other extreme we could allow our probabilities to depend
on everything and we would get P(R,X) = 1 when the set of pieces X
can tile R and P(R,X) = 0 when it can't. Then the residual
uncertainty S would be 0. Of course the problem with such a
P(R,X) is that it presumably can only be evaluated by an actual
search. We're trying to find the minimum S given we're allowed to
do some computation of the form (1). We'll end up with S between 0
and S0.]
Now we'll describe the modifications to the above formula, but first let's talk
instead in terms of the more convenient log likelihood, L:
|
L = |
å
B Î B
|
|
é ë
|
å
S Î S(B)
|
log(P(BR,S))+ |
å
X Î A(B)\S(B)
|
log(1-P(BR,X)) |
ù û
|
. |
| (2) |
What sort of size are the numbers? We mainly used k = 24, and n
is likely to be at least 60. Say n = 60. The typical
number of solutions will vary according to how good the pieces and
regions are, but let's aim (very roughly) at a few hundred solutions
per B Î B. There are (60 || 24) » 4×1016 elements
of A(B) (choices of 24 pieces) so P(R,X) will be very small, of the
order 10-14, and so we can replace log(1-P(BR,X)) with -P(BR,X)
and A(B)\S(B) with A(B) in (2).
(Actually this is not really an approximation - we're just changing to a model
where the number of solutions to (R,X) has Poisson distribution with parameter
P(R,X), which happens to be a very similar model because P(R,X) is so small.)
Now the formula for L looks like
|
L = |
å
B Î B
|
|
é ë
|
æ è
|
å
S Î S(B)
|
logP(BR,S) |
ö ø
|
- |
å
X Î A(B)
|
P(BR,X) |
ù û
|
. |
|
Let nB = |S(B)| = the number of solutions in B.
The trouble with using the above L is that nB will vary a lot. Some
B might have nB = 0, others nB = 5 and a few B might have
nB = 100000, because it is hard to accurately control how many
solutions you get when you add a lot of pieces. This L will be
dominated by the terms arising from B for which nB is
particularly large. It will turn out that we may as well not have
bothered exhausting 99.9% of the regions, because the pi and mj
will end up being geared almost entirely towards minimising the terms in
L arising from B for which nB is very large. So most of our
data will be ignored and the model will spend all of its effort trying
to `explain' the B for which nB is very large, and the values it
gets for pi and mj will be pretty useless. When we chose lots of
different B, we were trying to make a uniform sample from the (very
large) space of all possible B. In effect with the above L we are weighting each
sample point according to nB, but this is bad because the results
associated with a given B are not independent. We decided to smooth
this problem out by using a fiddle factor, wB, to weight each B.
wB is defined as
|
wB = |
1 nB
|
|
æ ç ç
ç è
|
1- |
3
|
ö ÷ ÷
÷ ø
|
. |
|
This definition is obviously a bit arbitrary, but it probably doesn't matter
too much. wBnB is chosen to be between 0 and 1 and increasing in nB.
Our final version of L is then
|
L = |
å
B Î B,nB > 0
|
wB |
é ë
|
æ è
|
å
S Î S(B)
|
logP(BR,S) |
ö ø
|
- |
å
X Î A(B)
|
P(BR,X) |
ù û
|
. |
|
This can be rewritten equivalently as
|
L = |
å
i
|
aili+ |
å
j
|
bjmj- |
å
B Î B, nB > 0
|
wBexp |
æ è
|
å
j
|
mjcj(BR) |
ö ø
|
Ak(BP), |
| (3) |
where
|
Ak(X) = |
å
[(Y Ì X) || (|Y| = k)]
|
|
Õ
i Î Y
|
pi, |
|
and the constants ai, bj are defined by
|
ai = |
å
B Î B,nB > 0
|
|
é ë
|
wB |
å
[(S Î S(B)) || (i Î S)]
|
1 |
ù û
|
|
|
and
|
bj = |
å
B Î B,nB > 0
|
|
é ë
|
wB |
å
S Î S(B)
|
cj(BR) |
ù û
|
. |
|
The nice thing about L is that it can be evaluated quickly. The only
problematic term might be Ak(BP), but in fact we can do the
necessary sum over all possible piece subsets even though there might
be, e.g., (60 || 24) » 4×1016 of them. If we really
had to do 1016 work to evaluate L, then everything above would
be useless because we would never know how to choose pi and mj
to maximise L. So we weren't really free to choose any model we
liked - we had to keep in mind the need to actually be able to
calculate the likelihood. To evaluate Ak(X) we list the elements of
X in order and make use of the fact that for i Î X,
|
Ak(X) = piAk-1(X\{i})+Ak(X\{i}). |
|
So if
X = {x1,¼,xn} we can build up to Ak(X) in n stages,
where at the rth stage we keep track of
Ai({x1,¼,xr}) for all relevant i.
Now it is pretty clean optimising (3) over li and mj because it
is smooth and concave (so a local maximum is a global maximum and there
are no places to get stuck), and it is easy to evaluate first and
second derivatives. For example if i Î X,
|
|
¶ ¶li
|
Ak(X) = piAk-1(X\{i}). |
|
By the way, the main problem with broadening this model to include
interaction terms between pieces is that L will no longer be
possible to evaluate, because we'll be faced instead of Ak(X) with
sums like this
|
|
å
[(Y Ì X) || (|Y| = k)]
|
|
Õ
i,j Î Y
|
bi,j, |
|
which I think
are impossible in general. We did experiment with a restricted set of
interaction terms, e.g. the 100 most important pairs of pieces, but
this did not obviously make a great improvement, and it slowed
everything down so we put it aside, intending perhaps to return to it
when we had done other things that seemed more important! There are
other (non maximum likelihood) ways of evaluating the effectiveness of
model parameters, but we weren't sure how good they would be, and
we didn't pursue them. So our final model never included piece-piece
interaction terms, although we do have some fairly good data about the
100 or so most important interactions. piece-boundary interactions,
on the other hand, would be easy to introduce, but we did not get
around to trying this.
File translated from
TEX
by
TTH,
version 2.73.
On 6 Nov 2000, 10:35.