1. Introduction
The problem of pattern generation was first investigated by [1] and then extended by [2] and [3] . The idea of playing patterns against each other was then explored by [4] , who considered only the case of
and all m patterns
being of the same length, and generalized by [5] to any admissible value of p and patterns being of arbitrary lengths. This article is summarizing and streamlining these results, pointing out that the existing formulas fail when m becomes large (as symbolic inversion of matrices is no longer feasible) and resolving this problem by deriving (with the help of Sherman-Morrison identity) formulas which require numerical evaluation only.
The main mathematical tool of this article is a skillful application of generating functions; a generating function of a sequence, say
is defined as
. It is easy to see that a product of two such generating functions, say
, corresponds to a new sequence (called a convolution of the two sequences) with terms
,
,
,
,
.
When a sequence
is defined by
, where Z is an integer-valued random variable, the corresponding generating function, say
, is called probability generating function (PGF); to find the expected value of Z, denoted
or
, one needs to evaluate
at
. Similarly,
yields the second factorial moment of Z, namely
, which can be easily converted into the variance of Z, thus equal to
.
2. Generating Patterns
We first review the technique for constructing a probability generating function (PGF) of the number of trials to generate one such pattern (of specific consecutive outcomes, such as FFSF) for the first time. This requires introducing the following notation: for any pattern
, let
be the first k symbols of
and
be its last k symbols, while
is the probability of generating
in
trials, where
is the length of
, e.g.
.
For any two patterns, say
and
,
is a set of all positive integers k such that
where
indicates a perfect match of the corresponding strings; note that this set may be empty. Let
be the largest of these integers (equal to 0 when the set is empty; also:
cannot be bigger than the length of either pattern), and
be the corresponding
(a string of zero length when
). Thus, for example, when
and
, we get
and
, and
and
. Furthermore,
and
, implying that
for any pattern. The
operation is easily performed by the following Mathematica code
(1)
where each pattern needs to be represented by a list of zeros (for every F) and ones (for every S); to find
thus requires typing
(2)
to obtain {0, 0, 1, 0, 0, 0}, which represents FFSFFF.
Consider generating an infinite (in principle) sequence of Bernoulli trials with repeated occurrences of a specific pattern, say
. Note that it is assumed that consecutive occurrences are not allowed to overlap, i.e. upon a completion of the pattern, none of its symbols can be used to help generate its next occurrence (which needs to be built from scratch—yet another way of putting it).
Let
be the probability that the first occurrence of
is completed at Trial n, while
is the probability that
is completed (from scratch—no help from a potential previous occurrence, but not necessarily for the first time) at Trial n. Using the total-probability formula (TPF), we can partition
based on where the first occurrence of
has happened, thus getting
(3)
Note that this identity is correct for all positive integers n (not when
), with the understanding that
and
(assumed from now on).
Multiplying each side of (3) by
and summing over n from 1 to
yields
(4)
where
is the probability generating function (PDF) of the
sequence while
is the generating function of the
sequence. This follows from the fact that the RHS of (3) represents the nth term of a convolution of the two sequences; note that the summation missed the
term on the LHS but (since
) no term on the RHS; subtracting 1 form
on the LHS of (4) is correcting for this discrepancy.
The last equation implies that
(5)
The importance of this formula rests in a relatively simple way (described shortly) of finding
; (5) then easily converts it into (otherwise inaccessible)
To find
, we apply the TPF to the probability of the consecutive outcomes which define
being completed at Trial n (due to the non-overlapping rule, this may not constitute a legitimate occurrence of Pattern
). Partitioning this (trivial) probability according to where a legitimate occurrence of
may have been completed yields
(6)
where
is the length of
and
, with the understanding that
. Multiplying each side of the last equation by
and summing over n, from
to
. results in
(7)
since
and
.
Based on (5), this further implies that
(8)
The last line represents the corresponding Taylor expansion of
at
. The expected number of trials to generate
for the first time is given by
or equivalently (and more easily) by the linear coefficient of this expansion, namely
; similarly, the second factorial moment equals to the quadratic coefficient further multiplied by 2; this yields
(9)
for the corresponding variance.
Example 1: When
,
(10)
since, when observing
at the last
trials, the actual pattern may have been completed at any one of the trials indicated by a dot. This then yields
and
, easily converted into a variance formula.
Finding
for any specific pattern is made routine by the following Mathematica function
(11)
Thus, using the same
as our example, typing Q[{0, 0, 0, 1, 0, 0, 0}] results in the last line of (10); the expected number of trials to complete
and the corresponding variance can be easily found by evaluating this expression and its z derivative at
, as already indicated.
Subsequently, we will also need the PGF of a number of extra trials to generate the next occurrence of a pattern
, given that pattern
has just been completed and
is allowed to utilize any of its symbols (i.e.
may overlap with
whenever possible, up to and including
). This PGF, denoted
and called conditional, is found by realizing that, to generate the first occurrence of
from scratch,
must be generated first, while the remaining (independent) trials to complete
are those of
. This implies that
(12)
where
is just a new notation for
of
,
is a similarly constructed
of
, and the conditional
is a simple solution of (12).
3. Competing Patterns
Let us now consider m such patterns with individual PGFs denoted
, where
, and the corresponding
coefficients are
. Similarly, the conditional PGFs for the number of trials to generate Pattern
with the help of Pattern
are denoted
when
, and their coefficients are
(where
). The probability that Pattern
is generated, for the first time (and thus winning the game) before any of the other pattern occurs, at Trial n is denoted
(with
); the corresponding generating function is
. We assume that none of these patterns is a sub-string of any of the others (which would either prevent it from winning the game, or allow it to be completed at the same time as the other pattern).
We assume that n trials have been completed, and we use TPF to partition
(the probability that Pattern
has been completed, for the first time, at Trial n) according to which of the other patterns may have won the game at Trial k (for
). This yields, for any non-negative n,
(13)
Multiplying by
and summing over n from 0 to
converts these equations into
(14)
by setting
, in agreement with (12). Dividing by
and switching sides makes these m linear equations for
into
(15)
which has a simple matrix-algebra solution, namely
(16)
where
is a vector whose components are the
generating functions,
is a matrix whose
element is equal to
,
is a column vector with all m components equal to 1, and the small circle implies matrix multiplication.
The probability of Pattern
winning the game is then given by the value of
at
; since
is a highly singular matrix of rank 1, this value can be found only by taking the
limit of the RHS of (16). But the RHS itself is increasingly more difficult to compute as m increases, since inverting large matrices symbolically (note that
are all functions of z) is practically impossible.
We can bypass this problem by expanding each
at
, which yields
, where
, i.e. the expected number of trials to generate
. This further implies that, to the
accuracy,
(17)
where
is a matrix of the
values. Using Sherman-Morrison formula, we get, to the same accuracy
(18)
Post-multiplying by
then enables us to approximate
by
(19)
resulting in
(20)
Example 2: When playing patterns FFFSF, SFSSS, FFFSS and SSSFS against each other while
(tossing a fair coin), the four probabilities of winning are computed by the following Mathematica code
(21)
resulting in 0.3113, 0.2058, 0.3113 and 0.1716 respectively (these are huge differences by gambling standards). Note that now (after p has been assigned a specific value) the computation of the matrix inverse is purely numerical, posing no problem even when hundreds of patterns are involved.
4 Game’s Duration
The PGF of the number of trials to complete any such game is given by
(22)
where the last approximation no longer lets us compute individual probabilities of the distribution, but it is sufficient to find the corresponding expected value; all we need to do is to differentiate the expression with respect to z and substitute
, resulting in
(23)
An alternate way of deriving both (20) and the last formula is based on an elegant probabilistic argument expounded in [5] .
To derive a similar formula for the corresponding variance (which, unlike M, necessitates using the novel approach of this article), we need to extend the expansion of
to reach an
accuracy, thus getting
(24)
This further implies that, to the same accuracy,
(25)
where
is the matrix of the
values. To correspondingly extend the accuracy of the RHS of (22), we only need to replace
by
(26)
therein. The second derivative of the resulting expression, evaluated at
, yields the corresponding second factorial moment, namely
(27)
Adding M and subtracting
then yields a formula for the variance of the number of trials to complete the game, namely
(28)
Continuing Example 2: Extending the previous Mathematica program (where
has already been computed) by
(29)
results in 31.96 for this variance, and 5.65 for the corresponding standard deviation.
Monte-Carlo Verification
We have randomly generated about hundred thousand rounds of the game of Example 2, using the following Mathematica code
(30)
This produces a set of relative frequencies (one for each pattern) of winning the game; in our case, we obtained 0.310, 0.205, 0.313 and 0.171, in excellent agreement with the earlier computed theoretical probabilities. Similarly, the following extra line
(31)
yields the average number of trials per game and the corresponding (sample) standard deviation; in our case these are 10.52 and 5.60 respectively (again, in good agreement with the theoretical values of Example 2).
5. More Examples
Example 3: To demonstrate that the algorithm (and the corresponding Mathematica code) can easily accommodate patterns of different lengths, we now play SSS, SFFFFFFFFFFFS, and twenty-one F’s followed by a single S against each other, where S represents getting a six and F stands for any other value when rolling a die.
This requires changing the definition of
in (21) to
(32)
and replacing p = q = 0.5; by p = 1/6.; q = 5/6., thus getting 0.343, 0.323 and 0.335 respectively for the probabilities of winning, and 92.35 and 79.99 for the expected number of rolls and the corresponding standard deviation.
Example 4: Here we investigate how changing the p value affects the winning probabilities when all six patterns of length four with exactly two S’s each are played against each other. This now requires the following changes in (21)
(33)
After executing the program, the following extra line
(34)
results in six graphs of Figure 1, indicating that FFSS (red) has always a better chance of winning than FSFS (blue) which in turn has higher probability than SFFS (green); for the remaining three patterns, it is the corresponding complement. Also, the former group of patterns has a substantial advantage over the latter group when p is small, and the opposite happens when p is large.
To find the value of p which maximizes the probability of FFSS winning, we have to add another line of code, namely
(35)
to get 0.1039 (similarly, SSFF would have its best chance of winning at
).
Example 5: Finally, assuming that each of our opponents has selected his pattern already, and we are then allowed to choose ours, the question is how do we do that to maximize our chances of winning. The following program indicates (using a specific situation) how to get the answer by simply trying all permissible possibilities.
Using
and SSFSFF, SSFFFF, FFSFFF, SFFSFS for the opponents’ choices (when all patterns must be of length six), the following few lines of code
(36)
yield (practically instantaneously) the best winnig chance of 0.2627 if we select SSFFSF. Note that FFFFFF would be the worst possible choice, reducing our probability of winning to 0.0403 only.
6. Conclusions
We have summarized and extended the existing theory of pattern generation with the aim of making it accessible to wide audience; the only prerequisites to understanding our derivation of key formulas is a rudimentary knowledge of matrices and generating functions. We have deliberately tried to make our exposition as brief as possible, not to obstruct fundamental issues with less important
details. In this closing section we will only indicate the directions in which the study of pattern competition can be extended, in most cases rather routinely.
The first modification assumes that, instead of having two possibilities for an outcome (our S and F), there may be three or more (e.g. rolling a die can result in showing one of six numbers); on closer reflection one can see that the formulas presented in this article remain practically unchanged (only the p and q probabilities need to be replaced by
).
Secondly, it is also possible to employ generating functions to investigate the distribution of number of occurrences of a specific pattern, or the number of games (playing several patterns against each other), to be completed in a fixed number of trials.
Finally, one can modify the game in such a way that a pattern can win only after completing a specific number of its occurrences (before any of the other patterns can do the same); here, there are several possibilities in terms of which occurrences (e.g. of like patterns only, dissimilar patterns only, all patterns, etc.) may be allowed to overlap and thus help generate the next one. We leave these for our readers to explore.