A New Approach for the DFT NIST Test Applicable for Non-Stationary Input Sequences ()
1. Introduction
In this paper, the problem of estimating the probability of signal randomness degree is addressed, which is widespread used in cryptography applications [1] [2]. The key component of the encryption process is the Random Number Generator (RNG) [3] [4], that is a physical apparatus or a software algorithm, that generates a sequence of signs (numbers) without any visible deterministic pattern or order. A True Random Number Generator (TRNG) is an apparatus that is based on a non-deterministic entropy source while a Pseudo Random Number Generator (PRNG) is based on software or hardware [5]. The quality of an RNG is not always perfect due to the fact that physical generators (TRNG) are sensitive to external influences [6] such as temperature [7], power supply, obsolescence, may suffer from a slow and persistent failure that is not noticeable or may suffer from a sharp decline in quality [8]. In addition, generators based on software (PRNG) are defective since they are based on deterministic algorithms that generate the numbers by some formula and some initial value [9] [10].
The series obtained from the RNG should be checked if this is not a lacking random series [11]. There is a great variety of statistical tests for checking the quality of a random generator [12] [13] [14]. These tests are not meant to prove mathematically that the sequence that the generator produces is random, but rather to find out if there are any vulnerabilities in the generator that may indicate poor randomness [15]. The tests on the input sequence (a sequence of bits of ones and zeros), determine whether the tested sequence contains certain predictable characteristics in a truly random sequence. Such a test result is not deterministic but probabilistic. For example, a truly random sequence should contain zero and one bits approximately equally [14]. If the tested sequence does not meet this requirement, the generator that created the sequence fails experimentally. Conversely, if the one-bit number and zero equals a certain predetermined divergence rate, it can be said that the random sequence was “accepted”. The term was “accepted” means that the claim of randomness is not rejected [16]. Even if the sequence successfully undergoes a given statistical test, it is not an absolute proof of its randomness but more than a probabilistic confirmation [17].
The NIST tests [18] are very popular [19] [20]. However, these tests (NIST) have several problems [21] [22] [23] [24] and as stated in [25], the NIST tests [18] require that the behavior of the tested sequence shall be stationary. But, according to [25], it is not rare for entropy sources (TRNGs) to generate time-varying data, since entropy sources usually depend on physical phenomena (e.g. thermal noise) and some of them are fragile and sensitive to external factors (e.g. temperature), which means the distribution of their outputs or the dependency among the outputs may change with external factors. In addition, much software based TRNGs use the computer’s workload as their entropy sources, which are also variable [25].
The NIST tests [18] contain fifteen tests, based on probabilistic methods that examine the statistical independency between the bits in the tested input sequence. The result is a probability value for the “degree of randomness”. This value is called “P value” [18].
In this paper, we deal with non-stationary sequences for which test number six (the DFT test from the NIST document [18] ) is not suitable. Although the non-overlapping template matching test and the overlapping template matching test (test number seven and eight of the NIST document [18] ) are designed to compensated for the inability of the DFT test to test non-stationary input sequences, still, inaccurate results are obtained for the non-stationary input sequence case. Thus, the NIST tests [18], do not supply satisfying results for non- stationary input sequences.
The various tests from NIST [18] supply us a probabilistic confirmation that the tested sequence is random. But it does not give us the location of the periodic part that turns the whole sequence to be non-random. Thus, in cases where the tested sequence failed to be random, the whole sequence is disqualified even if only a short part of it, caused it.
In this paper, we propose a time-frequency approach, that analysis the input sequence in the frequency and time domain in parallel, by using innovative functions such as the Wigner Distribution [26] [27] and the GGD [28] [29]. This new approach solves two open issues:
1) For the first time, the ability to test non-stationary sequences is possible.
2) For the first time, it is possible to locate the periodic section in the tested input sequence.
This approach offers a test that can replace the DFT test in the NIST document [18] and also may be a replacement for test number seven and eight from the NIST tests [18]. The proposed algorithm (test) also finds the location of the periodic part in the tested sequence if it is present. Thus, the periodic part can be cut out from the tested sequence or it can be maybe corrected. Obviously, the corrected and shortened sequences have to be tested again for randomness. The whole issue of how correcting the sequence will not be discussed in this paper. Simulation results will confirm the effectiveness of the proposed approach for non-stationary sequences compared to tests six, seven and eight from the NIST tests [18].
This paper is organized as follows. In Section 2, we explain the motivation of using the Wigner function in estimating the probability of signal randomness degree. In Section 3, we present the various parameters used for the Wigner test which will be shown in Section 4 to be part of the whole proposed algorithm. In Section 4, we present the structure of the whole algorithm, by using both the Wigner and the GGD functions. In Section 5, we test our new proposed algorithm via simulation and compare the results to those obtained from the NIST (test six, seven and eight [18] ). Finally, the conclusion is given in Section 6.
2. Motivation
The Wigner function [27] is the Fourier transform for the Autocorrelation function which allows us to get the spectrum as a function of time. In this way we can track the locations of cyclic parts in the tested sequence, by detecting frequency deviations at certain time points. The result of the Wigner function [27] is a matrix, where the lines represent the time domain, the columns represent the frequency domain.
The absolute output values from the Wigner function [27] can be shown in a 3-D image which shows the tested signal passed via the Wigner function [27] as a function of frequency and time. In Figure 1, the absolute output values from the Wigner function [27] are shown in a 3-D image, where the input signal to the Wigner function [27] is a cosine signal with a frequency of 10 [MHz]. Please note, that in the time domain the signal at the specified frequency of 10 [MHz] is present throughout the time being tested, while in the frequency domain it has a single frequency. Figure 2 shows the absolute output values from the Wigner function [27] in a 3-D image, where the input signal to the Wigner function [27] is a combination of three cosine signals with frequencies of 3, 7 and 10 [MHz]. Please note that according to Figure 2 we have more than only three frequencies due to the cross-correlation effect between the input frequencies. Thus, for signals containing multiple frequencies, the Wigner function [27] creates a difficulty in identifying the frequencies of the input signal. Thus, making a difficulty in identifying the right image (frequency).
The random signal has no specific frequency, otherwise, the signal is not random. In fact, the random signal behaves like the assembly of multiple signals, which causes some difficulties in identifying processes within the signal. Figure 3 shows a 3-D image obtained from the absolute output values from the Wigner function [27] for a random input sequence.
Figure 1. Mesh—one cosine signal, signal frequency:
, Sampling Frequency:
,
. Length of the signal: 1024 samples
.
Figure 2. Mesh—two cosine signal, signal frequency:
, sampling frequency:
, length of the signal: 1024 samples.
Figure 3. Mesh—random signal, length of the signal: 1024 samples.
Our new method should respond to changes in the time domain, where the existing methods (tests) (NIST test [18] ) fail. Thus, we test our method with non- stationary random signal. Non-stationary signals can be artificially assembled by using a number of random signals, as shown in Figure 4. where the “cyclic” part is also a random signal that is repeated during the signal.
Figure 5 shows two images obtained from the absolute output values from the Wigner function [27] where the left image was obtained for a random input signal while the right image was obtained for a non-random input signal having periodic parts in the signal. According to Figure 5, the signal intensity (z-axis) for the non-random input case, is approximately double the size compared with the signal intensity for the random input case. This shows that the values in the matrix obtained by the Wigner function [27] may be helpful in conjunction with some predefined parameters for classifying the probability of signal randomness degree.
Figure 4. Non-stationary signal structure.
Figure 5. Comparison of 2 tested sequences. Left: non-random sequence, right: random sequence.
With the help of the Wigner function [27] we can detect changes in the time plane. In order to see this, please refer to Figure 6, in which we show three different cases where the periodic part is placed in the tested sequence at different places: at the beginning, at the center and at the end of the tested sequence. Please note that in Figure 6 the y-axis shows the intensity of the signal. According to Figure 6, the location of the periodic segment can be detected via the matrix obtained from the output of the Wigner function [27], by tracing the changes in the intensity (y-axis in Figure 6).
3. The Wigner Test
In the previous section, we have mentioned that the output values from the matrix obtained from the Wigner function [27] may be helpful in conjunction with some predefined parameters for classifying the probability of signal randomness degree. In the following, we normalized the absolute values from the matrix obtained by the Wigner function [27] by a predefined parameter that depends on the input sequence length. For a sequence length of 1024, 2048 and 4096 the predefined parameter is set to 350, 550 and 850 respectively. These parameters were determined according to signal intensity (z-axis) expected to be obtained from a random sequence, so that a division of this value normalizes the matrix in such a way that it is possible to detect increases in the signal intensity, when the sequence is not random. The value was obtained by performing simulation experiments on 20 random sequences.
In the following we present four parameters (“Power”, “Distance”, “Density” and “Highest”) with their range of values for which the input sequence is declared as a random sequence. Please note, in order to declare that the sequence is a random sequence, the four parameters (“Power”, “Distance”, “Density” and “Highest”) must be in the range for which the sequence is declared as a random
Figure 6. Three tested sequences with a periodic section—a look at the time plane, the location of a periodic segment at the beginning of the sequence (left side), middle (center) and end (right side).
sequence. Please note that in the following we mean by “samples” the absolute values from the matrix obtained from the Wigner function [27]. It should be pointed out that all the values or range of values of the listed parameters from below, were found by simulation experiments only.
The parameters are:
1) Power—Describes the height of the samples. To obtain a normalized range— the samples are divided by their maximum obtained value. For the random signal case, the received range must be between: Power = 0.7 - 0.8.
2) Distance—Describes the “distance” and is equal to the number of samples between two samples with “Power” 0.8. To obtain a normalized range—the received value is divided by the number of samples of the tested sequence. In the random signal case, the received “Distance” is: Distance = 0.1 - 0.3.
3) Density—Describes the ratio of the number of samples above the “Power” of 0.75 in the central part of the matrix to the numbers of samples above the “Power” of 0.75 outside the central part of the matrix. For a random signal, a high ratio is being received, meaning that most of the high samples are in the centre of the signal. In order to declare the input sequence as a random sequence, the received “Density” should be: Density > 0.87.
4) Highest—Describes the number of samples above the “Power” of 0.9. In a random signal the received number is Highest < 15.
Simulation Wigner Test
Table 1 shows some simulation results using the Wigner test (with the above four parameters test). For this purpose, eight sequences were applied with different number of cycles, with different cycle length and segment position of the cyclic part. Each structure was tested with sequence length of 1024, 2048 and 4096. In the following, the highlighted red section is the periodic part.
The sequences structure:
For sequence length of 1024 samples:
Sequence number 1—8 × 128 (8 cycles of 128 bits sequence),
Table 1. Result of the Wigner function.
Sequence number 2—16 × 64 (16 cycles of 64 bits sequence),
Sequence number 3—768 + 32 × 8 (periodic part in the end of the sequence),
Sequence number 4—32 × 8 + 768 (periodic part in the start of the sequence),
Sequence number 5—384 + 32 × 8 + 384 (periodic part in the middle of sequence),
Sequence number 6—128 + 320 + 128 + 320 + 128,
Sequence number 7—64 + 128 + 64 + 128 + 64 + 128 + 64 + 128 + 64 + 128 + 64,
Sequence number 8—32 × 2 + 128 + 32 × 2 + 128 + 32 × 2 + 128 + 32 × 2 + 128 + 32 × 2 + 128 + 32 × 2.
For sequence length of 2048 samples:
Sequence number 9—8 × 256 (8 cycles of 256 bits sequence),
Sequence number 10—16 × 128 (16 cycles of 128 bits sequence),
Sequence number 11—1536 + 64 × 8 (periodic part in the end of the sequence),
Sequence number 12—64 × 8 + 1536 (periodic part in the start of the sequence),
Sequence number 13—768 + 64 × 8 + 768 (periodic part in the middle of sequence),
Sequence number 14—256 + 640 + 256 + 640 + 256,
Sequence number 15—128 + 256 + 128 + 256 + 128 + 256 + 128 + 256 + 128 + 256 + 128,
Sequence number 16—64 × 2 + 256 + 64 × 2 + 256 + 64 × 2 + 256 + 64 × 2 + 256 + 64 × 2 + 256 + 64 × 2.
For sequence length of 4096 samples:
Sequence number 17—8 × 512 (8 cycles of 512 bits sequence),
Sequence number 18—16 × 256 (16 cycles of 256 bits sequence),
Sequence number 19—3072 + 128 × 8 (periodic part in the end of the sequence),
Sequence number 20—128 × 8 + 3072 (periodic part in the start of the sequence),
Sequence number 21—1536 + 128 × 8 + 1536 (periodic part in the middle of sequence),
Sequence number 22—512 + 1280 + 512 + 1280 + 512,
Sequence number 23—256 + 512 + 256 + 512 + 256 + 512 + 256 + 512 + 256 + 512 + 256,
Sequence number 24—128 × 2 + 512 + 128 × 2 + 512 + 128 × 2 + 512 + 128 × 2 + 512 + 128 × 2 + 512 + 128 × 2
Simulation results:
Please note, in Table 1 the red colored values are the indices that classify the sequence as non-random. The sequence is considered as random only when the four parameters (“Power”, “Distance”, “Density” and “Highest”) meet the criteria of the random sequence.
According to Table 1, the four parameters (“Power”, “Distance”, “Density” and “Highest”) are effective in classifying if the given sequence is a random sequence or not. But it should be pointed out here that the good results were obtained only for those sequences that have a high periodicity within the sequence or have a long cyclic segment. In addition, with the four parameters (“Power”, “Distance”, “Density” and “Highest”) we are not able to locate the cyclic section in the tested sequence. Thus, as will be explained in the next section we need the GGD [28] [29].
4. The Proposed Algorithm
The general structure of the algorithm can be seen in the block diagram presented in Figure 7. In general, the tested sequence is a vector containing only ones and minus ones in the digital time domain. By using the Wigner function [27] this vector is converted to a quadratic matrix which represents the sequence behavior in the time and frequency domains. This matrix allows us to decide whether the tested sequence has a high periodicity within the sequence and should be announced as a non-random sequence, or at this stage, no periodicity was detected within the sequence, thus we should move on to the next step of the algorithm. To detect changes over the time domain, the GGD [28] [29] is being used as will be explained later on in this section.
According to Figure 7,
is the sampled output from the RNG with sample rate of
, where
and T is the sampling period
. Next, this signal
is converted to an analytic signal with the help of the Hilbert transform [30]. Thus now, the analytical signal
is driven to the Wigner function [27]. Please note, the input to the Wigner function [27] must be an analytical signal to avoid the aliasing problem with the Wigner transform [27] for non-analytical signals. The output from the Wigner function [27] is
and is given by:
(1)
where
, f stands for the frequency,
is the Fast Fourier Transform on
and
is the Autocorrelation of
given by:
(2)
where “*” is the multiplication operation,
is the conjugate part of
and
stands for the expectation operation.
At this point, the decision part of Figure 7 which receives the output from the Wigner function [27] is used to decide whether the incoming sequence has a high periodicity within the sequence or whether the repetitive sequence has a high length. The purpose of this examination is to determine whether the sequence can be disqualified for being a random sequence or can be moved on to the next step of the algorithm. In other words, if the algorithm does not detect a high periodicity within the sequence or a high length of a repetitive sequence at this point, we continue to the next step in the algorithm which is the GGD function [28] [29]. This step is based on examining the changes in the shape parameter associated with the GGD function [28] [29].
Changes in the shape parameter of the GGD [28] [29] presentation, change the shape of the probability density function (pdf) of the given input sequence to the GDD [28] [29] which may have a Laplacian, or double exponential distribution, a Gaussian distribution or a uniform distribution for a shape parameter equal to one, two and infinity respectively. Obviously, changes in the pdf of the tested input sequence to the GDD [28] [29], indicates that the input sequence is not random. The shape parameter allows us to characterize the absolute values from the matrix obtained from the Wigner function [27], so that each row or column in the matrix can be expressed by a single value. Thus, a vector that characterizes the time or the frequency domain through the shape parameter can be obtained. In our algorithm, the time domain is in our interest, therefore we calculate the shape parameter on each column (representing the frequency domain) in the matrix. The obtained vector will be denoted as
(Figure 8).
According to the GGD function [28] [29], the shape parameter is given by:
(3)
where
According to [28] [29],
can be calculated by:
(4)
where
is the i-th column (representing the frequency domain) in the matrix
.
Based on Equation (3) and Equation (4) the vector
can be defined:
where N is the length of the tested input sequence.
Figure 8. Signal flow.
is the i-th column (representing the frequency domain) in the matrix.
The vector
holds all the changes in the time domain of the shape parameter. An example of the shape parameter vector
can be seen in Figure 9 for a random input sequence of length 1024.
To simplify the shape parameter signal classification, an average operation is carried out on
by averaging every eight following samples of
for an input sequence with length of 1024. In generally, the number of samples for the average operation is 8, 16 and 32 for an input sequence with length of 1024, 2048 and 4096 respectively. The averaged signal
from Figure 9 is presented in Figure 10. Please note, in order to maintain the same vector length, each of these eight samples (for an input sequence with length of 1024) received the averaged value obtained by the averaging operation.
Figure 9. The vector G[n]. n represents the sample number.
Figure 10. Average of G[n] (an average of 8 samples).
In the following we define:
(5)
(6)
According to Equation (4), Equation (5) and Equation (6) we can define:
(7)
(8)
Therefore, the vectors
and
can be defined:
(9)
(10)
After the average operation on
and
, we get the vectors
and
respectively. In the following we set q equal to 1, 2, 4 for an input sequence length of 1024, 2048 and 4096 respectively.
For simplicity we show in the following
and
for
where
. Please refer to the Appendix for the definition of
and
for q equal to two and four.
Detection Algorithm
The signal
is the output from the detection algorithm (Figure 7), that classifies whether the input sequence is a random sequence or not, based on passing or not passing seven test cases that will be presented in this section. In addition, this algorithm marks the places that are suspicious of having a repeated sequence within the tested sequence. Thus, there are two steps in the detection algorithm:
1) Determine whether the sequence is random or not random.
2) Finding the periodic part in the tested input sequence if it exists.
Let us start explaining the first step. Here we apply the histogram function (from Matlab version R2018a) on the vectors
and
(obtained from the tested input sequence) in order to have the distribution of the appearance of the values of the shape parameter. In addition, for comparison, we apply also the histogram function on the vectors
and
obtained from a random sequence. In order to generate this random sequence, the “randn” function from the Matlab version R2018a is applied which generates “random” numbers from a Gaussian distribution with variance of one and with a zero mean. Next, if the generated number is positive or equal to zero, then the used number is set to one, else it is set to zero. Please note that for some tests in the NIST test [18] a sequence of ones and zeros are required. Thus, our generated sequence is converted to ones and zeros following [18]. It should be pointed out that the various parameters for the following seven test cases were found by simulation experiments only. In the following, we use thirty bins for representing the histogram. But those bins that have a height less than 0.01 will be deleted and not considered in the following tests. In addition, after the deletion of those bins that have a height of less than 0.01, the bins that are not close to the center bins are also deleted. In order to declare a sequence to be a random sequence, the seven test cases have to be passed.
Please note that in Figures 11-17 the x-axis represents the various values of the shape parameter associated with the tested input sequence, while the y-axis represents the probability density of their appearance. As already was mentioned earlier in this paper, the shape parameter allows us to characterize the absolute values from the matrix obtained from the Wigner function [27], so that each row or column in the matrix can be expressed by a single value. Thus, a vector that characterizes the time or the frequency domain through the shape parameter can be obtained.
In the following, we mean by saying that the test is performed in the time domain that the shape parameter was calculated on each column (representing the frequency domain) in the matrix. In addition, if it is stated that the test is performed in the frequency domain, it means that the shape parameter was calculated on each row (representing the time domain) in the matrix.
The seven test cases are:
Test 1—Checks, whether the maximum height is at the center of the graph and if the width of the graph is smaller than 0.2. This test is performed in the time domain. If the conditions are not met—the signal will be marked as a non- random sequence. An example of this test can be seen in Figure 11, where at the left figure we have the result for a random sequence while at the right figure we have the result for a non-random input sequence. According to Figure 11, the width of the graph in the right figure is higher than 0.2 and the maximum height is not at the center of the graph. Thus, the tested input sequence is suspected to be a non-random sequence. Please note, that we have marked all the bins that were already deleted in the early stage of our test.
Test 2—Checks if there is more than one peak in the graph. It looks for the two highest bins in the graph and declares that the tested input sequence is suspected to be a non-random sequence if at least there is one bin between the two highest found bins, and the difference between the heights of those two highest bins is less than 15%. This test is performed in the time domain. An example for this test can be seen in Figure 12 where at the right figure we have the result for a non-random input sequence.
Figure 11. Example for test 1. The histogram in the left is for a random sequence and on the right is for a non-random sequence.
Figure 12. Example for test 2. The histogram in the left is for a random sequence and on the right is for a non-random sequence.
Figure 13. Example for test 3. The histogram in the left is for a random sequence and on the right is for a non-random sequence.
Figure 14. Example for test 4. The histogram in the left is for a random sequence and on the right is for a non-random sequence.
Figure 15. Example for test 5. The histogram in the left is for a random sequence and on the right is for a non-random sequence.
Figure 16. Example to test 6, the histogram in the left is random sequence and on the right is non-random sequence.
Figure 17. Example to test 7. The histogram in the left is for a random sequence and on the right is for a non-random sequence.
Test 3—Checks whether the maximum height is obtained in the x-axis between 1.29 and 1.4. If it is outside of this boundary, the sequence is marked as a non-random sequence. This test performed in the time domain. An example of this test can be seen in Figure 13 where on the right figure we have the result for a non-random sequence where the maximum height is not obtained in the x-axis between 1.29 and 1.4.
Test 4—Checks whether the maximum height is above 0.25. If it is, the tested input sequence is marked as a non-random sequence. This test is performed in the frequency domain. The main idea behind this test is that if there is a concentration around a certain frequency, it indicates that the tested sequence is suspected to have a periodicity within the sequence. An example of this test can be seen in Figure 14, where we have at the right figure the result for a non-random input sequence where the maximum bin is more than 0.25.
Test 5—Checks the position of the maximum height of the bin between the time domain and the frequency domain. If there is no match, this indicates that the tested input sequence is a non-random sequence. An example for this test can be seen in Figure 15, where we have at the right figures the result for a non-random input sequence where there is no match between the position of the maximum height of the bin between the time domain and the frequency domain.
Test 6—This test checks two conditions concerning the heights of the bins. First, it checks if a bin can be found with a height lower than 0.05. Next, it checks whether a bin can be found with a height above 0.2. If no bin is left with a height lower than 0.05 and a bin is found with a height above 0.2, the sequence is announced as a non-random sequence. This test is performed in the time domain. An example for this test can be seen in Figure 16, where we have at the right figure the result for a non-random input sequence where we have marked all the bins that were already deleted before entering into this test. Now, according to Figure 16, we have a bin with a height above 0.2 and there is no bin with a height lower than 0.05. Thus, the result presented in the right figure of Figure 16, corresponds to a non-random sequence. Please note, that we have marked in Figure 16 all the bins that were already deleted in the early stage of our test.
Test 7—The pdf should increase monotonically towards the center bin (or to the bin with the maximum height) of the pdf and decrease monotonically from the center bin (or from the bin with the maximum height) downwards. If there is a 40% decrease in the height of a bin with the bin before it, the tested sequence is announced as a non-random sequence. An example for this test can be seen in Figure 17, where we have at the right figure the result for a non-random input sequence where we have circled the places where the decrease in the height of 40% was found. Please note, that we have marked all the bins that were already deleted in the early stage of our test.
Next, we determine the mathematical description of the above seven tests.
First, we define:
where
and
holds the number of observations of
and
respectively that fall into each of the disjoint bins divided by the length of
and
respectively, and L is the total number of bins.
where
holds the value in the x-axis of the edge where the bin begins.
where
holds the value in the x-axis of the edge where the bin begins. It was already mentioned earlier that those bins that have a height less than 0.01 will be deleted and not considered in the seven tests. In addition, after the deletion of those bins that have a height of less than 0.01, the bins that are not close to the center bins are also deleted. Thus, from the mathematical point of view we may write:
Step 1: find the index or indexes where (
) is true and set those
indexes to
.
Step 2: set
; for
according to the founded
indexes from Step 1.
Step 3: find
and set the index i where the maximum
was achieved as
Step 4:
;
for
to (
) checking if there is a bin with
in order to
set “a” as the start of the histogram after this bin.
if
then
end
end
Step 5:
;
;
while (
) checking if there is a bin with
in order to
set “b” as the end of the histogram before this bin.
if (
) then
end
if (
) then
end
end
Test number 1:
Step 1: find
and set the index
where the
maximum was achieved as
.
Step 2: if (
&
) then
Random signal
else
Not Random signal
end
Test number 2:
Step 1: find
and set the index
where the
maximum was achieved as
.
Step 2: find
and set the index i
(
) where the maximum was achieved as
.
Step 3: if
& (
) then
Not Random signal
else
Random signal
end
Test number 3:
Step 1: find
and set the index i
(
) where the maximum was achieved as
.
Step 2: if
then
Random signal
else
Not Random signal
end
Test number 4:
Step 1: if
then
Not Random signal
else
Random signal
end
Test number 5:
Step 1: find
and set the index i
(
) where the maximum was achieved as
.
Step 2: find
and set the index i
(
) where the maximum was achieved as
.
Step 3: if
then
Random signal
else
Not Random signal
end
Test number 6:
Step 1: if (
) & (
)
then
Not Random signal
else
Random signal
end
Test number 7:
Step 1: find
and set the index i
(
) where the maximum was achieved as
.
Step 2:
Random signal
for
to
checking if there is an increase monotonically
towards the center bin
if (
) & (
) then
Not Random signal
end
end
for
to b checking if there is a decrease monotonically
from the center bin down
if (
) & (
) then
Not Random signal
end
end
Outcome from the seven tests:
if
then
the tested sequence is Random
else
the tested sequence is not Random
end
Please note that when
we continue to the next step in the algorithm of locating the periodic part.
The second step in the detection algorithm is to find the periodic part in the tested input sequence. For this purpose, we apply three different kind of averages on the tested averaged vector
(
). This part of the algorithm is performed only on
which represents the time domain, because in this stage we want to locate the periodic part. The three different kind of averages are as follows:
1) The average of the whole vector
(except for some samples at the beginning and at the end) is carried out here. Thus, one average value is obtained. In the following, we denote this kind of averaging as “Average number 1”.
2) The vector
(except for some samples at the beginning and at the end) is divided into 3 parts (40% - 20% - 40%) and then the average operation is carried out on each part. Thus, here we have three average values. In the following, we denote this kind of averaging as “Average number 2”.
3) The vector
(except for some samples at the beginning and at the end) is divided into 3 parts (25% - 50% - 25%) and then the average operation is carried out on each part. Thus, here we have three average values. In the following, we denote this kind of averaging as “Average number 3”.
Figure 18 presents the three different kind of averages carried out on the tested averaged vector
(
).
where by “Margin” we mean that at the beginning and at the end of the averaged tested
no averaging operation is carried out. For the case of a 1024, 2048 and 4096 length sequence, the margin on each side is 32 samples, 64 samples and 128 samples respectively.
Now, in order to declare an area to be suspicious of not being a random sequence, we compare the values of the averaged vector
to the values of Average number 1, Average number 2 and to Average number 3. In general, if the values of the averaged vector
are above or below the average number (Average number
, Average number
, Average number
where
,
and
are predefined values) for some tested values that come in sequence, a suspicious place is declared. For more details, please refer to Figures 19-21 which describe in more details the comparison operation between the values of the averaged vector
and the Average number 1 to Average number 3.
Figures 22-25 show the simulated results obtained for a non-random sequence with length of 1024 where the averaged vector
(
) was compared to the three different kind of averages (Average number 1, Average number 2, Average number 3) while Figure 26 holds the total result (collected from Figure 22, Figure 23, Figure 25) of the suspicious places where the tested sequence is suspicious not to be a random sequence.
In the following, we describe the flowchart for finding the suspicious places of the periodic parts in the tested input sequence by using the values for Average number 1, Average number 2 and Average number 3.
Figure 18. 3 kinds of averages are performed on the sequence.
Figure 19. Matlab code, associated with Average number 1.
Figure 20. Matlab code, associated with Average number 2.
Figure 21. Matlab code, associated with Average number 3.
Figure 24. Zoomed in version of Figure 23, please note that we have two different levels at
. This cannot be seen in Figure 23 because the two levels are very close.
Figure 26. The result after collecting the three images from Figure 22, Figure 23, Figure 25.
Comparison to Average number 1:
Step 1: set
.
“edge” is the length of the margin defined as 32, 64 and 128 for a 1024,
2048 and 4096 length of input sequence respectively.
The input sequence length is defined as N.
Step 2: set Average number 1 =
Step 3: set
where
is the rounding down operation on
.
Step 4:find the index or indexes where (
)
is true and set those indexes to
Step 5: set
;
for
according to the founded indexes of Step 4.
Step 6: find the number of groups in
of consecutive samples
equal to one and set this number to K1.
Step 7: find the length of those groups of consecutive samples equal to one
from Step 6 and hold them in the vector:
; for
Step 8: find the start and end position of the founded groups from
Step 6 and hold them in the matrix:
; for
;
Step 9: set
A1 = 20, 40 and 80 for 1024, 2048 and 4096 length of sequence respectively.
for
to K1
if
then
for
to
end
end
end
Step 10: set
Step 11: find the index or indexes where (
)
is true and set those indexes to
Step 12: set
;
for
according to the
founded indexes of Step 11.
Step 13: find the number of groups in
of consecutive samples
equal to one and set this number to K2.
Step 14: find the length of those groups of consecutive samples equal to one
from Step 13 and hold them in the vector:
; for
Step 15: find the start and end position of the founded groups from
Step 13 and hold them in the matrix:
; for
;
Step 16: set
A2 = 33, 56 and 95 for 1024, 2048 and 4096 length of sequence respectively.
for
to K2
if
then
for
to
end
end
end
Step 17: set
.
Step 18: find the index or indexes where (
)
is true and set those indexes to
Step 19: set
;
for
according to the founded indexes of Step 18.
Step 20: find the number of groups in
of consecutive samples
equal to one and set this number to K3.
Step 21: find the length of those groups of consecutive samples equal to one
from Step 20 and hold them in the vector:
; for
Step 22: find the start and end position of the founded groups from
Step 20 and hold them in the matrix:
; for
;
Step 23: set
for
to K3
if
then
for
to
end
end
end
Comparison to Average number 2:
Step 1: set
Step 2: set
Step 3: set
Step 4: set
Step 5: find the index or indexes where
(
) is true and set those
indexes to
Step 6: set
;
for
according to the founded indexes of Step 5.
Step 7: find the number of groups in
of consecutive samples
equal to one and set this number to K4.
Step 8: find the length of those groups of consecutive samples equal to one
from Step 7 and hold them in the vector:
; for
Step 9: find the start and end position of the founded groups from
Step 7 and hold them in the matrix:
; for
;
Step 10: set
A3 = 17, 48 and 129 for 1024, 2048 and 4096 length of sequence respectively.
for
to K4
if
then
for
to
end
end
end
Step 11: set
Step 12: find the index or indexes where
(
) is true and set those indexes to
Step 13: set
;
for
according to the founded indexes of Step 12.
Step 14: find the number of groups in
of consecutive samples
equal to one and set this number to K5.
Step 15: find the length of those groups of consecutive samples equal to one
from Step 14 and hold them in the vector:
; for
Step 16: find the start and end position of the founded groups from
Step 14 and hold them in the matrix:
; for
;
Step 17: set
for
to K5
if
then
for
to
end
end
end
Comparison to Average number 3:
Step 1: set
Step 2: set
Step 3: set
Step 4: set
Step 5: find the index or indexes where (
is true and set those indexes to
Step 6: set
;
for
according to the founded indexes of Step 5.
Step 7: find the number of groups in
of consecutive samples
equal to one and set this number to K6.
Step 8: find the length of those groups of consecutive samples equal to one
from Step 7 and hold them in the vector:
; for
Step 9: find the start and end position of the founded groups from
Step 7 and hold them in the matrix:
; for
;
Step 10: set
for
to K6
if
then
for
to
end
end
end
Step 11: set
Step 12: find the index or indexes where (
is true and set those indexes to
Step 13: set
;
for
according to the founded indexes of Step 12.
Step 14: find the number of groups in
of consecutive samples
equal to one and set this number to K7.
Step 15: find the length of those groups of consecutive samples equal to one
from Step 14 and hold them in the vector:
; for
Step 16: find the start and end position of the founded groups from
Step 14 and hold them in the matrix:
; for
;
Step 17: set
for
to K7
if
then
for
to
end
end
end
Step 18: set
Step 19: find the index or indexes where
(
is true and set those indexes to
Step 20: set
;
for
according to the founded indexes of Step 19.
Step 21: find the number of groups in
of consecutive samples
equal to one and set this number to K8.
Step 22: find the length of those groups of consecutive samples equal to one
from Step 21 and hold them in the vector:
; for
Step 23: find the start and end position of the founded groups from
Step 21 and hold them in the matrix:
; for
;
Step 24:
A4 = 9, 15 and 63 for 1024, 2048 and 4096 length of sequence respectively.
for
to K8
if
then
for
to
end
end
end
Fine tuning on the above obtained results:
The algorithm has also false alarms at the stage of marking the suspected parts as periodic. Therefore, there is a mechanism which deletes some of the false alarms which is described in the following.
Step 1: if (
) & (
) then
end
Step 2: if (
) & (
) then
end
Step 3: if (
) & (
) then
end
Step 4: if (
) & (
) then
end
Step 5: if (
) or (
) then
end
The last stage is to connect all the vectors with the suspicious locations as a periodic part to one outcome vector.
Step 6: Initialize the following vectors with zeros of length N:
Step 7: Arrange the values in the vectors
to their position in the original vector
by using the vectors from
Step 6.
;
for
;
for
;
for
;
for
;
for
;
for
;
for
;
for
Step 8: Obtain one output vector which holds the suspicious places in
.
5. Simulation
Wigner Test Vs DFT NIST Test
In this section we first show the simulation results (Table 2) using the Wigner test (the test with the four parameters “Power”, “Distance”, “Density” and “Highest”) for identifying if the tested input sequence is a random sequence or not, compared to those obtained with the DFT test from NIST [18]. For this purpose, eight sequences were applied with different number of cycles, with different cycle length and segment position of the cyclic part in the tested sequence. Each structure was tested with a sequence length of 1024, 2048 and 4096 (please refer to Section III where we have already presented the results for the Wigner test and where the tested sequences were defined). According to Table 2, the DFT test from NIST [18], does not detect in most cases that the non-stationary input sequence is not a random sequence, while the Wigner test (the test with the four parameters “Power”, “Distance”, “Density” and “Highest”) supplies in most cases the right answer. Please note that in Table 2 the red colored values are the indices that classify the sequence as non-random.
New Test Vs NIST Tests
It should be pointed out that the tested input sequences used for generating the results in Table 2 are sequences with a high periodicity within the sequence or have a high length of a repetitive sequence. For this case, the Wigner test (the test with the four parameters “Power”, “Distance”, “Density” and “Highest”) is enough for declaring if the tested input sequence is random or not random. But, for sequences with a low periodicity within the sequence the GGD function [28] [29] is needed. In the following, we denote as the “New test” our new proposed method as described in the previous section and described in Figure 7 containing the Wigner and the GDD functions [27] [28] [29].
Table 3 shows a summary of results obtained by the NIST tests number 1, 2, 3, 6, 7 and 8 [18] and by the “New test” algorithm for low-periodic input sequences (2 periods in a signal) and input sequences having a short periodic segment length (100 - 170 samples of a periodic segment length out of 1024 samples). It can be clearly seen from Table 3, that in most cases the “New test” classifies the incoming sequence as non-random as it should, while the opposite is true for the NIST tests number 1, 2, 3, 6, 7 and 8 [18].
Table 2. Comparison between the DFT test to Wigner.
Table 3. Comparison of NIST test results with the “New test”.
The sequences structure:
Structure number 1: 128, 320, 128, 320, 128.
Structure number 2: 150, 170, 384, 170, 150.
Structure number 3: 100, 824, 100.
Structure number 4: 50, 924, 50.
Structure number 5: 200, 100, 324, 100, 300.
Structure number 6: 312, 100, 200, 100, 312.
(The number marked in red marks the periodic part in the sequence.)
Simulation results—Random or Non-random:
Simulation results—Estimating the Random places:
Next, we compare the simulated results obtained from our new algorithm “New test” to those obtained by the 15 tests from NIST [18] (Figures 27-30).
Figure 27. Result of a sequence from the structure 1, on the left the results from the “New test”, on the right the results from the NIST tests.
Figure 28. Result of a sequence from the structure 2, on the left the results from the “New test”, on the right the results from the NIST tests.
Figure 29. Result of a sequence from the structure 3, on the left the results from the “New test”, on the right the results from the NIST tests.
Figure 30. Result of a sequence from the structure 4, on the left the results from the “New test”, on the right the results from the NIST tests.
According to Figures 27-30, our new proposed algorithm (“New test”) finds that all the input sequences are non-random as it should in contrary to the NIST tests [18]. In addition, our new proposed algorithm (“New test”) also indicates to those places where repeated samples are found that have already appeared in the tested sequence.
Although the presented examples (Figures 27-30) refer to a sequence length of 1024 samples only, the algorithm can also be applied successfully for longer sequences such as for 2048 or 4096 samples by changing only a few parameters within the algorithm. It should be pointed out that according to [18], the assumption has been made that the size of the sequence length, is large (of the order 103 to 107) for many of the tests in [18]. For such large sample sizes of the sequence length, asymptotic reference distributions have been derived and applied to carry out the tests [18]. Most of the tests are applicable for smaller values of the sequence length [18]. However, if used for smaller values of the sequence length, the asymptotic reference distributions would be inappropriate and would need to be replaced by exact distributions that would commonly be difficult to compute [18]. As already mentioned earlier in this work, the Wigner function [27] allows us to get the spectrum as a function of time which makes it applicable for the non-stationary input sequence case. The DFT function is not suitable for the non-stationary input sequence case. Different sizes of the input sequence length do not affect the fact that the Wigner function [27] is a proper choice for the non-stationary input sequence. Thus, we just looked for the minimum input sequence length allowed according to [18] in order to have a low computation time.
6. Conclusion
In this paper, we proposed a new approach for estimating the probability of signal randomness degree based on the Wigner and GGD functions which allowed us to classify the input sequence in the time and frequency domains at the same time. Our new proposed approach is suitable also for non-stationary input sequences where the NIST tests fail. In addition, this new proposed algorithm has the ability to indicate on suspicious places of cyclic sections in the tested sequence. Thus, the option to repair or to remove the suspicious places of cyclic sections in the tested input sequence is given with this new approach which was not available until now. Simulation results have confirmed the effectiveness of our new proposed method for estimating the probability of signal randomness degree for non-stationary input sequences. It should be pointed out here that the values for the four parameters (“Power”, “Distance”, “Density” and “Highest”) were not optimized. Thus, it is possible to get even better results in identifying if the tested sequence is random or not.
Funding Statement
The authors would like to thank the Israeli Innovation Authority and Defender Cyber Technologies LTD for their support for research under the Academic Knowledge Guidance Program (Nofar) File No. 63907.
Patent Statement
A Patent Application incorporating this paper has been filed.
Appendix
The mathematical description for
and
with an input sequence length of 2048 and 4096 respectively is:
and
where
.
For
:
For
: