Competing Patterns in Bernoulli Sequence of Trials

Abstract

Consider performing a sequence of Bernoulli trials (each resulting in either a success, denoted S, or a failure F, with a probability of p and q := 1 - p respectively) until one of m specific strings (or patterns) of consecutive outcomes is generated. This can be seen as a game where m players select one such pattern each and the one whose pattern occurs first wins. We present symbolic formulas for the m probabilities of winning, and for the mean number of trials and the corresponding standard deviation to complete this game. Several numerical examples are presented, including a search for optimal strategy.

Share and Cite:

Vrbik, P. and Vrbik, J. (2023) Competing Patterns in Bernoulli Sequence of Trials. Applied Mathematics, 14, 314-323. doi: 10.4236/am.2023.145019.

1. Introduction

The problem of pattern generation was first investigated by [1] and then extended by [2] and [3] . The idea of playing patterns against each other was then explored by [4] , who considered only the case of p = 1 2 and all m patterns

being of the same length, and generalized by [5] to any admissible value of p and patterns being of arbitrary lengths. This article is summarizing and streamlining these results, pointing out that the existing formulas fail when m becomes large (as symbolic inversion of matrices is no longer feasible) and resolving this problem by deriving (with the help of Sherman-Morrison identity) formulas which require numerical evaluation only.

The main mathematical tool of this article is a skillful application of generating functions; a generating function of a sequence, say a 0 , a 1 , a 2 , a 3 , is defined as A ( z ) : = j = 0 a j z j . It is easy to see that a product of two such generating functions, say A ( z ) B ( z ) , corresponds to a new sequence (called a convolution of the two sequences) with terms a 0 b 0 , a 0 b 1 + a 1 b 0 , a 0 b 2 + a 1 b 1 + a 2 b 0 , a 0 b 3 + a 1 b 2 + a 2 b 1 + a 3 b 0 , .

When a sequence p 0 , p 1 , p 2 , p 3 , is defined by p j = Pr ( Z = j ) , where Z is an integer-valued random variable, the corresponding generating function, say P ( z ) , is called probability generating function (PGF); to find the expected value of Z, denoted E ( Z ) or μ z , one needs to evaluate P ( z ) = p 1 + 2 p 2 z + 3 p 3 z 2 + 4 p 4 z 3 + at z = 1 . Similarly, P ( z = 1 ) yields the second factorial moment of Z, namely E ( Z 2 ) E ( Z ) , which can be easily converted into the variance of Z, thus equal to P ( 1 ) + P ( 1 ) P ( 1 ) 2 .

2. Generating Patterns

We first review the technique for constructing a probability generating function (PGF) of the number of trials to generate one such pattern (of specific consecutive outcomes, such as FFSF) for the first time. This requires introducing the following notation: for any pattern T , let T k be the first k symbols of T and T k be its last k symbols, while Pr ( T ) is the probability of generating T in l trials, where l is the length of T , e.g. Pr ( FFSFSSS ) = p 4 q 3 .

For any two patterns, say T and W , W T is a set of all positive integers k such that T k W k where indicates a perfect match of the corresponding strings; note that this set may be empty. Let K be the largest of these integers (equal to 0 when the set is empty; also: K cannot be bigger than the length of either pattern), and W T be the corresponding T K (a string of zero length when K = 0 ). Thus, for example, when T : = FFFSFFF and W : = FFSFFFFSF , we get W T = { 1,2,6 } and W T = FFSFFF , and T W = { 1,5 } and T W = FFFSF . Furthermore, T T = { 1,2,3,7 } and W W = { 1,4,9 } , implying that T T = T for any pattern. The operation is easily performed by the following Mathematica code

H [ a _ , b _ ] : = M o d u l e [ { n = M i n [ L e n g t h [ a ] , L e n g t h [ b ] ] } , W h i l e [ T a k e [ a , n ] = ! = T a k e [ b , n ] , n = n 1 ] ; T a k e [ a , n ] ] (1)

where each pattern needs to be represented by a list of zeros (for every F) and ones (for every S); to find W T thus requires typing

H [ { 0,0,1,0,0,0,0,1,0 } , { 0,0,0,1,0,0,0 } ] (2)

to obtain {0, 0, 1, 0, 0, 0}, which represents FFSFFF.

Consider generating an infinite (in principle) sequence of Bernoulli trials with repeated occurrences of a specific pattern, say T . Note that it is assumed that consecutive occurrences are not allowed to overlap, i.e. upon a completion of the pattern, none of its symbols can be used to help generate its next occurrence (which needs to be built from scratch—yet another way of putting it).

Let f n be the probability that the first occurrence of T is completed at Trial n, while u n is the probability that T is completed (from scratch—no help from a potential previous occurrence, but not necessarily for the first time) at Trial n. Using the total-probability formula (TPF), we can partition u n based on where the first occurrence of T has happened, thus getting

u n = f 0 u n + f 1 u n 1 + f 2 u n 2 + + f n 1 u 1 + f n u 0 (3)

Note that this identity is correct for all positive integers n (not when n = 0 ), with the understanding that f 0 = 0 and u 0 = 1 (assumed from now on).

Multiplying each side of (3) by z n and summing over n from 1 to yields

U ( z ) 1 = F ( z ) U ( z ) (4)

where F ( z ) is the probability generating function (PDF) of the f 0 , f 1 , f 2 , sequence while U ( z ) is the generating function of the u 0 , u 1 , u 2 , sequence. This follows from the fact that the RHS of (3) represents the nth term of a convolution of the two sequences; note that the summation missed the u 0 = 1 term on the LHS but (since f 0 u 0 = 0 ) no term on the RHS; subtracting 1 form U ( z ) on the LHS of (4) is correcting for this discrepancy.

The last equation implies that

F ( z ) = ( 1 + 1 U ( z ) 1 ) 1 (5)

The importance of this formula rests in a relatively simple way (described shortly) of finding U ( z ) ; (5) then easily converts it into (otherwise inaccessible) F ( z )

To find U ( z ) , we apply the TPF to the probability of the consecutive outcomes which define T being completed at Trial n (due to the non-overlapping rule, this may not constitute a legitimate occurrence of Pattern T ). Partitioning this (trivial) probability according to where a legitimate occurrence of T may have been completed yields

Pr ( T ) = k T T u n l + k Pr ( T l k ) (6)

where l is the length of T and n l , with the understanding that Pr ( T 0 ) = 1 . Multiplying each side of the last equation by z n and summing over n, from l to . results in

Pr ( T ) z l 1 z = ( U ( z ) 1 ) k T T z l k Pr ( T l k ) (7)

since u 1 = u 2 = = u l 1 = 0 and u 0 = 1 .

Based on (5), this further implies that

F ( z ) = ( 1 + ( 1 z ) k T T z l k Pr ( T l k ) Pr ( T ) z l ) 1 : = 1 1 + ( 1 z ) Q ( z ) 1 + Q ( 1 ) ( z 1 ) + ( Q ( 1 ) + Q ( 1 ) 2 ) ( z 1 ) 2 + (8)

The last line represents the corresponding Taylor expansion of F ( z ) at z = 1 . The expected number of trials to generate T for the first time is given by F ( 1 ) or equivalently (and more easily) by the linear coefficient of this expansion, namely μ T = Q ( 1 ) ; similarly, the second factorial moment equals to the quadratic coefficient further multiplied by 2; this yields

2 Q ( 1 ) + μ T + μ T 2 (9)

for the corresponding variance.

Example 1: When T : = FFFSFFF ,

Q ( z ) = 1 + Pr ( SFFF ) z 4 + Pr ( FSFFF ) z 5 + Pr ( FFSFFF ) z 6 Pr ( FFFSFFF ) z 7 = 1 + p q 3 z 4 + p q 4 z 5 + p q 5 z 6 p q 6 z 7 (10)

since, when observing F ˙ F ˙ F ˙ SFF F ˙ at the last l trials, the actual pattern may have been completed at any one of the trials indicated by a dot. This then yields μ T = Q ( 1 ) = 1 + p q 3 + p q 4 + p q 5 p q 6 and Q ( 1 ) = 7 p q 6 3 q 3 2 q 2 1 q , easily converted into a variance formula.

Finding Q ( z ) for any specific pattern is made routine by the following Mathematica function

Q [ x _ ] : = M o d u l e [ { h = 0, n = L e n g t h [ x ] } , D o [ h = h + I f [ T a k e [ x , k ] = = T a k e [ x , k ] , ( p z ) ^ T o t a l [ D r o p [ x , k ] ] ( q z ) ^ T o t a l [ 1 D r o p [ x , k ] ] ,0 ] , { k , n } ] ; h / ( p z ) ^ T o t a l [ x ] / ( q z ) ^ T o t a l [ 1 x ] ] (11)

Thus, using the same T as our example, typing Q[{0, 0, 0, 1, 0, 0, 0}] results in the last line of (10); the expected number of trials to complete T and the corresponding variance can be easily found by evaluating this expression and its z derivative at z = 1 , as already indicated.

Subsequently, we will also need the PGF of a number of extra trials to generate the next occurrence of a pattern T , given that pattern W has just been completed and T is allowed to utilize any of its symbols (i.e. T may overlap with W whenever possible, up to and including T W ). This PGF, denoted F T | W ( z ) and called conditional, is found by realizing that, to generate the first occurrence of T from scratch, T W must be generated first, while the remaining (independent) trials to complete T are those of T | W . This implies that

F T ( z ) = F T W ( z ) F T | W ( z ) (12)

where F T ( z ) is just a new notation for F ( z ) of T , F T W ( z ) is a similarly constructed F ( z ) of T W , and the conditional F T | W ( z ) is a simple solution of (12).

3. Competing Patterns

Let us now consider m such patterns with individual PGFs denoted F i ( z ) , where i = 1,2 , , m , and the corresponding z n coefficients are f i , n . Similarly, the conditional PGFs for the number of trials to generate Pattern i with the help of Pattern j are denoted F i | j ( z ) when i j , and their coefficients are f i | j , n (where f i | j ,0 = 0 ). The probability that Pattern i is generated, for the first time (and thus winning the game) before any of the other pattern occurs, at Trial n is denoted x i , n (with x i ,0 = 0 ); the corresponding generating function is X i ( z ) . We assume that none of these patterns is a sub-string of any of the others (which would either prevent it from winning the game, or allow it to be completed at the same time as the other pattern).

We assume that n trials have been completed, and we use TPF to partition f i , n (the probability that Pattern i has been completed, for the first time, at Trial n) according to which of the other patterns may have won the game at Trial k (for k = 1,2, , n ). This yields, for any non-negative n,

f i , n = j i m k = 1 n x j , k f i | j , n k + x i , n (13)

Multiplying by z n and summing over n from 0 to converts these equations into

F i ( z ) = j i m X j ( z ) F i | j ( z ) + X i ( z ) = j = 1 m X j ( z ) F i | j ( z ) (14)

by setting F i | i ( z ) = 1 , in agreement with (12). Dividing by F i ( z ) and switching sides makes these m linear equations for X j ( z ) into

F i j ( z ) 1 X j ( z ) = 1 (15)

which has a simple matrix-algebra solution, namely

X ( z ) = G ( z ) 1 1 (16)

where X ( z ) is a vector whose components are the X j ( z ) generating functions, G ( z ) is a matrix whose ( i , j ) th element is equal to F i j ( z ) 1 , 1 is a column vector with all m components equal to 1, and the small circle implies matrix multiplication.

The probability of Pattern j winning the game is then given by the value of X j ( z ) at z = 1 ; since G ( 1 ) is a highly singular matrix of rank 1, this value can be found only by taking the z 1 limit of the RHS of (16). But the RHS itself is increasingly more difficult to compute as m increases, since inverting large matrices symbolically (note that F i j ( z ) 1 are all functions of z) is practically impossible.

We can bypass this problem by expanding each F i j ( z ) 1 at z = 1 , which yields 1 μ i j ( z 1 ) + , where μ i j = F i j ( 1 ) = Q i j ( 1 ) , i.e. the expected number of trials to generate i j . This further implies that, to the O ( ( z 1 ) 2 ) accuracy,

G ( z ) 1 1 T + K ( 1 z ) + (17)

where K is a matrix of the Q i j ( 1 ) values. Using Sherman-Morrison formula, we get, to the same accuracy

( 1 1 T + K ( 1 z ) ) 1 K 1 1 z K 1 1 z ( 1 1 T K 1 1 z + 1 T K 1 1 ) (18)

Post-multiplying by 1 then enables us to approximate X ( z ) by

K 1 1 z ( 1 1 1 T K 1 1 1 z + 1 T K 1 1 ) = K 1 1 z 1 + 1 T K 1 1 (19)

resulting in

l i m z 1 X ( z ) = K 1 1 1 T K 1 1 (20)

Example 2: When playing patterns FFFSF, SFSSS, FFFSS and SSSFS against each other while p = q = 1 2 (tossing a fair coin), the four probabilities of winning are computed by the following Mathematica code

T = { { 0,0,0,1,0 } , { 1,0,1,1,1 } , { 0,0,0,1,1 } , { 1,1,1,0,1 } } ; p = q = 0.5 ; K i n v = I n v e r s e [ ( L = M a p [ Q , O u t e r [ H , T , T ,1 ] , { 2 } ] ) / . z > 1 ] ; ( K i n v . T a b l e [ 1, L e n t h [ T ] ] ) ( M = 1 / T o t a l [ K i n v ,2 ] ) / / S i m p l i f y (21)

resulting in 0.3113, 0.2058, 0.3113 and 0.1716 respectively (these are huge differences by gambling standards). Note that now (after p has been assigned a specific value) the computation of the matrix inverse is purely numerical, posing no problem even when hundreds of patterns are involved.

4 Game’s Duration

The PGF of the number of trials to complete any such game is given by

j = 1 m X j ( z ) = 1 T G ( z ) 1 1 1 T K 1 1 1 z + 1 T K 1 1 (22)

where the last approximation no longer lets us compute individual probabilities of the distribution, but it is sufficient to find the corresponding expected value; all we need to do is to differentiate the expression with respect to z and substitute z = 1 , resulting in

M : = 1 1 T K 1 1 (23)

An alternate way of deriving both (20) and the last formula is based on an elegant probabilistic argument expounded in [5] .

To derive a similar formula for the corresponding variance (which, unlike M, necessitates using the novel approach of this article), we need to extend the expansion of F i j ( z ) 1 to reach an O ( ( z 1 ) 3 ) accuracy, thus getting

1 μ i j ( z 1 ) Q i j ( 1 ) ( z 1 ) 2 + (24)

This further implies that, to the same accuracy,

G ( z ) 1 1 T + K ( 1 z ) L ( 1 z ) 2 + (25)

where L is the matrix of the Q i j ( 1 ) values. To correspondingly extend the accuracy of the RHS of (22), we only need to replace K 1 by

( K L ( 1 z ) ) 1 K 1 + ( 1 z ) K 1 L K 1 (26)

therein. The second derivative of the resulting expression, evaluated at z = 1 , yields the corresponding second factorial moment, namely

2 M 2 ( 1 + 1 T K 1 L K 1 1 ) (27)

Adding M and subtracting M 2 then yields a formula for the variance of the number of trials to complete the game, namely

M 2 ( 1 + 2 1 T K 1 L K 1 1 ) + M (28)

Continuing Example 2: Extending the previous Mathematica program (where M = 10.58 has already been computed) by

M ^ 2 ( 1 + 2 T o t a l [ K i n v . ( D [ L , z ] / . z > 1 ) . K i n v ,2 ] ) + M (29)

results in 31.96 for this variance, and 5.65 for the corresponding standard deviation.

Monte-Carlo Verification

We have randomly generated about hundred thousand rounds of the game of Example 2, using the following Mathematica code

T = { { 0,0,0,1,0 } , { 1,0,1,1,1 } , { 0,0,0,1,1 } , { 1,1,1,0,1 } } ; p = q = 0.5 ; r = { } ; D o [ X = R a n d o m I n t e g e r [ { 0,1 } ,10 ^ 4 ] ; W h i l e [ L e n g t h [ X ] > 40, j = 5 ; W h i l e [ L e n g t h [ a = P o s i t i o n [ M a p [ T a k e [ X ,5 ] = = # & , T ] , T r u e ] ] = = 0, j = j + 1 ; X = D r o p [ X ,1 ] ] ; X = D r o p [ X ,5 ] ; r = A p p e n d [ r , { a [ [ 1,1 ] ] , j } ] ] ,106 ] B i t C o u n t s [ ( a = T r a n s p o s e [ r ] ) [ [ 1 ] ] , { 1,5,1 } ] / L e n g t h [ r ] / / N (30)

This produces a set of relative frequencies (one for each pattern) of winning the game; in our case, we obtained 0.310, 0.205, 0.313 and 0.171, in excellent agreement with the earlier computed theoretical probabilities. Similarly, the following extra line

{ M e a n [ a [ [ 2 ] ] ] , S t a n d a r d D e v i a t i o n [ a [ [ 2 ] ] ] } / / N (31)

yields the average number of trials per game and the corresponding (sample) standard deviation; in our case these are 10.52 and 5.60 respectively (again, in good agreement with the theoretical values of Example 2).

5. More Examples

Example 3: To demonstrate that the algorithm (and the corresponding Mathematica code) can easily accommodate patterns of different lengths, we now play SSS, SFFFFFFFFFFFS, and twenty-one F’s followed by a single S against each other, where S represents getting a six and F stands for any other value when rolling a die.

This requires changing the definition of T in (21) to

T = { { 1,1,1 } , F l a t t e n [ { 1, T a b l e [ 0,11 ] ,1 } ] , A p p e n d [ T a b l e [ 0,21 ] ,1 ] } (32)

and replacing p = q = 0.5; by p = 1/6.; q = 5/6., thus getting 0.343, 0.323 and 0.335 respectively for the probabilities of winning, and 92.35 and 79.99 for the expected number of rolls and the corresponding standard deviation.

Example 4: Here we investigate how changing the p value affects the winning probabilities when all six patterns of length four with exactly two S’s each are played against each other. This now requires the following changes in (21)

C l e a r [ p ] ; q = 1 p ; T = P e r m u t a t i o n s [ { 0,0,1,1 } ] ; (33)

After executing the program, the following extra line

P l o t [ % , { p ,0,1 } , P l o t S t y l e > { R e d , B l u e , Y e l l o w , G r e e n , B r o w n , B l a c k } ] (34)

results in six graphs of Figure 1, indicating that FFSS (red) has always a better chance of winning than FSFS (blue) which in turn has higher probability than SFFS (green); for the remaining three patterns, it is the corresponding complement. Also, the former group of patterns has a substantial advantage over the latter group when p is small, and the opposite happens when p is large.

To find the value of p which maximizes the probability of FFSS winning, we have to add another line of code, namely

p / . N S o l v e [ D [ % % [ [ 1 ] ] , p ] = = 0, p ] [ [ 6 ] ] (35)

to get 0.1039 (similarly, SSFF would have its best chance of winning at p = 0.8961 ).

Example 5: Finally, assuming that each of our opponents has selected his pattern already, and we are then allowed to choose ours, the question is how do we do that to maximize our chances of winning. The following program indicates (using a specific situation) how to get the answer by simply trying all permissible possibilities.

Using p = 0.5 and SSFSFF, SSFFFF, FFSFFF, SFFSFS for the opponents’ choices (when all patterns must be of length six), the following few lines of code

p = q = 0.5 ; T = { { 1,1,0,1,0,0 } , { 1,1,0,0,0,0 } , { 0,0,1,0,0,0 } , { 1,0,0,1,0,1 } } ; W = C o m p l e m e n t [ F l a t t e n [ A p p l y [ O u t e r [ L i s t , # # ] & , T a b l e [ { 0,1 } ,6 ] ] ,5 ] , T ] ; r = { } ; D o [ W T = P r e p e n d [ T , W [ [ i ] ] ] ; K i n v = I n v e r s e [ M a p [ Q , O u t e r [ H , W T , W T ,1 ] , { 2 } ] / . z > 1 ; r = A p p e n d [ r , ( K i n v . T a b l e [ 1,5 ] ) [ [ 1 ] ] / T o t a l [ K i n v ,2 ] ] , { i ,60 } ] { a = M a x [ r ] , W [ [ P o s i t i o n [ r , a ] [ [ 1,1 ] ] ] ] } (36)

yield (practically instantaneously) the best winnig chance of 0.2627 if we select SSFFSF. Note that FFFFFF would be the worst possible choice, reducing our probability of winning to 0.0403 only.

6. Conclusions

We have summarized and extended the existing theory of pattern generation with the aim of making it accessible to wide audience; the only prerequisites to understanding our derivation of key formulas is a rudimentary knowledge of matrices and generating functions. We have deliberately tried to make our exposition as brief as possible, not to obstruct fundamental issues with less important

Figure 1. Probabilities of winning.

details. In this closing section we will only indicate the directions in which the study of pattern competition can be extended, in most cases rather routinely.

The first modification assumes that, instead of having two possibilities for an outcome (our S and F), there may be three or more (e.g. rolling a die can result in showing one of six numbers); on closer reflection one can see that the formulas presented in this article remain practically unchanged (only the p and q probabilities need to be replaced by p 1 , p 2 , p 3 , ).

Secondly, it is also possible to employ generating functions to investigate the distribution of number of occurrences of a specific pattern, or the number of games (playing several patterns against each other), to be completed in a fixed number of trials.

Finally, one can modify the game in such a way that a pattern can win only after completing a specific number of its occurrences (before any of the other patterns can do the same); here, there are several possibilities in terms of which occurrences (e.g. of like patterns only, dissimilar patterns only, all patterns, etc.) may be allowed to overlap and thus help generate the next one. We leave these for our readers to explore.

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

References

[1] Feller, W. (1968) An Introduction to Probability Theory and Its Applications, Vol. 1. 3rd Edition, John Wiley & Sons, New York.
[2] Li, S.R. (1980) A Martingale Approach to the Study of Occurrences of Sequence Patters in Repeated Experiments. Annals of Probability, 8, 1171-1176.
https://doi.org/10.1214/aop/1176994578
[3] Guibas, L.J. and Odlyzko, A.M. (1981) String Overlaps, Pattern Matching and Nontransitive Games. Journal of Combinatorial Theory A, 30, 183-208.
https://doi.org/10.1016/0097-3165(81)90005-4
[4] Blom, G. and Thornburn, D. (1982) How Many Random Digits Are Required Until Given Sequences Are Obtained? Journal of Applied Probability, 19, 518-531.
https://doi.org/10.2307/3213511
[5] Vrbik, J. and Vrbik, P. (2015) Playing Several Patterns against One Another. arXiv:1507.01322

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.