1. Introduction
Various sophisticated statistical methods have been employed to analyze and predict lottery numbers: frequency analysis [1] , regression analysis [2] , machine learning [3] , artificial intelligence [4] , computer simulation [5] , clustering and pattern recognition [6] . The mathematics behind the theory is based on previous draws and patterns which arise from them—previous draws dictate the future probability of certain number being drawn. In this work, we analyze what numbers are likely to be drawn (independent of past draws) by using elementary probability.
2. Ordered Statistics
We choose K balls among N numbered balls and order them in ascending order. Let Xk be the kth largest. For example, X1 is the smallest and XK is the largest among the K chosen balls. For
, Xk has the following probability mass function:
Theorem 1
Proof The event
means that we need to choose
numbers among
and we need to choose
numbers among
. Hence,
Remark This is not the same as hypergeometric distribution discussed in [7] .
Example For the Mega Millions in the U.S. [8] , players pick six numbers from two separate pools of numbers—five different numbers from 1 to 70 and one bonus number (Mega Ball) from 1 to 25. Here, we ignore the bonus number because it does not affect the distribution of the order statistics. Using the Theorem, the table on the next page displays the numbers of X1, X2, X3, X4 and X5 with the top five highest probability.
Remark At the time this work is carried out, according to Lotto America [9] and USAMega [10] , the most frequent Mega Millions numbers are 3, 10, 14, 17, 31, 46, 64, … Some of the numbers are not showing up in our calculation because these are the statistics for the sixth/current version of Mega Millions (October 31, 2017 to present: first 5 numbers are chosen from 1 to 70 and the Mega Ball is chosen from 1 to 25). Statistical analysis is typically based on a sufficient sample size to draw meaningful conclusions. In the context of lotteries, the number of past draws available for analysis is often limited. With a small sample size, it becomes challenging to identify statistically significant patterns or trends.
We next describe the long-term behavior of Xk.
Corollary 2 The expectation of Xk is
for
.
We now simplify of
by using a different approach.
Theorem 3
for
.
Proof If K numbers are randomly selected in the interval
and each number is equally likely to be picked, then
, where
are the order statistics over the unit interval
. Yk satisfies [7] [11] [12] .
and
Hence,
Corollary 4
for
.
3. Conclusion
It’s important to note that while statistical analysis can provide insights into patterns and frequencies, lottery drawings are still random, and there is no guaranteed method to predict future winning numbers. These methods should be used for informational purposes and to assist in making informed choices, but the element of chance always remains dominant in lottery games. Moreover, lottery systems are complex, involving various factors such as ball machines, condition of the balls, number selection methods, and multiple games within a lottery. It can be challenging to capture all the intricacies and variables accurately in a statistical model. Finally, lottery games are games of chance, and the odds of winning are typically very low. It’s essential to approach playing the lottery with the understanding that it is purely for entertainment purposes.