A First Order Stationary Branching Negative Binomial Autoregressive Model with Application ()
1. Introduction
In the area of time series modelling, several applications in real-life involve analysis of count time series data, which motivated researchers to develop models that can handle count data in term of time series. Such applications include monthly polio counts (see e.g. [1] [2]), traffic accident counts (see e.g. [3] [4]), daily asthma presentation at a hospital (see e.g. [2]), crime counts analysis (see e.g. [5] [6]), stock market application (see e.g. [7] [8]), sandstorm counts (see e.g. [9]), among others in the fields like demography, economic, meteorology, sociology, epidemiology, finance, and education. Facing such kind of data, Poisson and Binomial marginal distributions are some of the most used to develop serial dependence models. An important number of counts time series data are characterized by overdispersion, under-dispersion, equi-dispersion and excess of zeros. Handling this issues [10] discussed modelling time series of counts with overdispersion and [11] modelled an INAR (1) process for count time series with equi-dispersion, under-dispersion and over-dispersion. Beyond that, several strategical approaches of serial dependence models have been proposed including autoregressive AR (1) (see e.g. in [12]), Integer-valued autoregressive (INAR (1)) (see e.g. in [13]), Hidden Markov Model (HMM) (see e.g. in [14] [15]), Generalized Linear Model (GLM) (see e.g. in [16]), thinning-based models (see e.g. in [17]) and others in order to contribute in this area. Although many autoregressive models has been proposed, for example [18] discussed the Autoregressive Conditional Poisson (ACP) models able to handle overdispersion but also the Integer Autoregressive modeling framework is the popular approach used to preserve the discrete nature of data, which has been exploited by various authors both in a univariate and multivariate setting. Among them we have Bayesian INAR models, New Geometric INAR models, Zero truncated Poisson INAR models, Zero-modified geometric INAR, etc. [19] introduced a process of new stationary first-order integer-valued autoregressive with geometric marginal distributions based on negative binomial thinning. Conditional least squares, Yule-Walker and maximum likelihood methods are used to estimate the parameters. [20] investigated negative binomial time series models based on the binomial thinning and two other expectation thinning operations, and showed how they differ in conditional variance or heteroscedasticity. [16] discussed the approach based on generalized linear models and the class of integer autoregressive processes, which provides a natural extension to the traditional ARMA methodology. [21] proposed a new autoregressive time series of counts model with Poisson-negative binomial (PNB) distribution, which is the convolution of Poisson and Negative binomial random variables. They also introduced the Geometric PNB and the Geometric semi PNB distributions. [11] presented with Bernoulli-geometric marginals based on a new type of generalized thinning operator, a first-order non-negative integer-valued autoregressive model for stationary count data process. The maximum likelihood method is used for estimating the model parameters. Few years ago, threshold negative binomial autoregressive model considered as an observation-driven model for time series of counts which allows for overdispersion and negative serial dependence in the observation, is studied by [22]. However, in this paper, we develop a new approach motivated by an offspring generating process which in turn leads to a flexible Negative Binomial modeling approach based on the branching process which is a recursive dependence mechanism instead of thinning process, to avoid scalar multiplication in the autoregressive model. The obtained model can be able to deal with overdispersed counts time series data and its serial dependence. So far, it is applied to a real-life dataset.
The rest of the paper is organized as follows; Section 2 outlines the approach adopted in developing the stationary first order branching negative binomial (bNB) autoregressive model as well the procedure for inference, Section 3 provides the results for the simulation of stationary bNB autoregressive model, Section 4 presents the results from the application of the model to real-life data. Finally, Section 5 concludes the work then followed by acknowledgement.
2. Methods and Materials
In this section, the adopted modelling approach for this study is discussed. Specific attention is given to the negative binomial distribution which is able to capture overdispersion, a common feature in count observations. Further, the specific autoregressive mechanism for the serial dependence is also demonstrated. In particular, we consider a branching mechanism, a concept derived from the field of demography and epidemiology whereby a finite number of elements is consider as the starting stage of the process. With the passage of time, each element can either vanish with probability
or produce k new elements with probability
, where
. Each of these k elements behaves in the same way as their parents do. Let
represent the population size following n such incidents. The process
is a branching process, which is a Markov chain. Many problems in science and engineering are modelled with branching processes, including population expansion, pandemic spread, and nuclear fission are all issues that must be addressed.
Overall, the idea is adopted to propose model for stationary time series of count data and consider real data examples. The software package R is used in this work for implementing simulation study and application.
2.1. Model Specification
The classical equation for an AR (1) process
is
(1)
where
is a sequence of independent and identically distributed random variables. Thus if
is a positive random variable then
. Since the output of the Equation (1) does not guaranty the issue of non-negative integer values. Therefore, to vanish this issue, some researchers mentioned in Section 1 have used the thinning process with Equation (1) to become
(2)
The operator
is defined as
where
are i.i.d random variable following a discrete law and for each fixed non-negative integer-value x of X,
is a binomial random variable with parameters x and
. The
are i.i.d negative binomial random variable, see [12].
The negative binomial distribution
is usually parametrized with pmf
(3)
For any
there is one process
with
marginal distributions that is stationary, Markov, time-reversible, has infinitely-divisible marginal distribution and with correlation
, (see theorem in [23]).
As an alternative to thinning process (Equation (2)), we define an observation driven model
with a Negative Binomial marginal distribution which is the recursive scheme from
to
using the branching mechanism, where the Binomial law is involved to obtain model defined as follows
(4)
where
We denote the obtained model as
.
In terms of the branching process
arises from
according to the binomial law (Bin) with probability
. The dependence parameter
determines how many off springs are generated from the parent
which in turn specifies the dependence in the autoregressive process
and
.
The error component
has a Negative Binomial distribution which ensures the marginal distribution of the arising process is Negative Binomial.
The distribution of the two components is given bellow;
(5)
(6)
2.2. Model Estimation
The process,
has both marginal and dependence parameters
and
. These parameters are estimated via Maximum Likelihood Estimation.
2.2.1. Likelihood Function
Let
be an increasing set of times and
an arbitrary set of non-negative integers. Then the joint pmf for
is the product of the marginal and the conditionals.
(7)
Where
,
, and
. Using marginal and conditionals probabilities, we can explicitly define Equation (7).
Conditional or transition probability mass function (pmf)
(8)
where
and
is Gauss’ hypergeometric function with parameters
and k.
Marginal distribution
The marginal distribution for
is:
(9)
Hence, by using conditional probabilities (Equation (8)) and marginal distribution (Equation (7)) in equation (Equation (7)), the likelihood is as follow
(10)
where
and
stand for the initial values of the branching process.
2.2.2. Log-Likelihood Function
We derived Equation (11) by taking the natural logarithm of Equation (10), and it follows;
(11)
3. Simulation Study
Monte Carlo experiments have been conducted several times to evaluate the performance of the Branching Negative Binomial estimators for the parameters. The experiment entails generating n time series of size N from the models and then estimating the parameter vector
. For each combination (θ, N), we compute the mean, bias, and the mean-squared error (MSE) given by,
where
is the estimated parameter vector values from the ith simulated series.
Using the software package R, we generate by Monte Carlo replicates n = 1000 random samples of length N = 250, 500 and 1000 from bNB, then calculate the mean, bias and mean-squared error of the estimator. Table 1 shows the simulation results for the model. From that, it’s clear that with the estimation of the parameter m for a sample size of N = 250 and the performance seems to improve with increasing sample size. It’s observed that the bias and MSE reduce as the sample size increases which confirms that the parameter vector
can be consistently estimated by likelihood function. Feature work.
3.1. Simulated Data
The following, Figures 1-4 display the time series plot and their corresponding autocorrelation functions of simulated data by varying each one of the parameters. When parameters m increase, the range of the entries (observations) increases (Figure 2) while it decreases when parameter
increases (Figure 3). When parameter
increases, the dependence of data becomes strong which is shown in Figure 4. For Figures 1-3 cases, the huge difference is only the time series plot while ACF and PACF are almost the same, which implies that the autoregressive model is order one.
3.2. Simulation Results
Figures 5-8 provide graphical inspection of the results from the simulation experiment. They depict the effect of different levels of temporal dependence and sample size on the performance of the developed model. The results are consistent with those reported in Table 1. Overall, the performance of the model is better when the dependence is increased and increasing the sample size improves the performance (median estimates are closer to the true values).
4. Application to Real-Life Data
In this section, the Branching Negative Binomial model is applied to New South Wales (NSW) dataset police reports provided by the NSW Bureau of Crime Statistics and Research, see http://www.bocsar.nsw.gov.au/. The entire dataset is organized by offence type, month and Local Government Area (LGA). In particular, we study the monthly number of Domestic violence related assault reported in the city of Kempsey, Australia, over the period January 1995 to Jun 2022. Table 2 shows the summary statistics of the dataset, the sample mean and variance are 17.78 and 39.93837 respectively, which indicates that we are facing to an overdispersed data. Figure 5 depicts the time series plot and the corresponding autocorrelation functions (ACF and PACF).
Figure 9 displays the visualization of real-data considered for the application of the branching Negative Binomial autoregressive model. The time series plot
Table 1. Summary statistics for the estimator for different parameter values
and different sample sizes N. These statistics are obtained from 1000 Monte Carlo replications of the developed model.
Figure 1. Time series and autocorrelation functions plots by varying sample sizes respectively to 250, 500 and 1000.
Figure 2. Time series and autocorrelation functions plots by varying parameter m respectively to 5, 15 and 25.
Figure 3. Time series and autocorrelation functions plots by varying probability respectively to 0.3, 0.5 and 0.9.
Figure 4. Time series and autocorrelation functions plots by varying dependence parameter
respectively to 0.3, 0.5 and 0.9.
Figure 5. Boxplots with results from the simulation experiment for T = 250. The red line corresponds to the true parameter values, while the black line is the median.
Figure 6. Boxplots with results from the simulation experiment for T = 1000.
of Crime count data showed that data is stationary in time range considered. The ACF showed that there are some serial dependences between consecutive observations, and through the Partial ACF, we can see that the order of the autoregressive model is one. Therefore, the selected data matches well to this first order branching Negative Binomial autoregressive model.
By fitting the developed model to the crime count data, the estimate parameters and their standard errors obtained are displayed in Table 3.
Figure 7. Boxplots with results from the simulation experiment for T = 250.
Figure 8. Boxplots with results from the simulation experiment for T = 1000.
Table 2. Summary statistics for the monthly number of domestic violence related assault reported in Kempsey from January 1995 to Jun 2022.
These obtained parameters have been used in the model by fitting the crime data to branching Negative Binomial model, see Figure 10 which implies that the model fits well these crime data.
The summary Table 4 displayed below is the data generated by the model. We
Table 3. Estimate parameters obtained by fitting crime data in the model.
Figure 9. Time series of monthly number of domestic violence related assault reported in Kempsey from January 1995 to Jun 2022 and the autocorrelation functions.
Figure 10. Time series plots of Crime data in blue and its Estimates values in red from the model.
Table 4. Summary statistics of the estimate crime data obtained from the branching Negative Binomial model.
see a good coherent with the summary of the crime data in Table 2.
5. Conclusion and Suggestions
In this paper we introduce an autoregressive branching Negative Binomial (bNB) as an alternative model for time series of count data. It allows for overdispersion and serial dependence in count observations. The bNB simulation study provides a performance evaluation with maximum likelihood as the estimation method of model parameters. Finally, crime count data from Kempsey, a city of Australia is used to display the usefulness of the developed model. Suggestions for future works include extending the model to account for higher order dependence, allowing for covariates and multivariate extensions.
Acknowledgements
First and foremost, I would like to express my sincere gratitude to my supervisors, Dr. BONFACE MIYA MALENJE and Dr. HERBERT IMBOGA, for their support, encouragement, and insightful guidance during my study at Pan African University, Institute of Basic Science, Technology and Innovation (PAUIBSTI). They guided me to sharpen my thinking and brought my academic work to a higher level. Without them, I could not reach this point. I want to thank African Union Commission for funding the masters studies. Particularly, I thank all the faculty members of the Mathematics Department of PAUSTI, for their help in many ways to improve my knowledge. Finally, I would like to extend my special and sincere thanks to Dr. IBRAHIMA BAKAYOKO from “the University of N’Zerekore (UZ), Guinea” (in Guinea), for not only the basic knowledge but also the wonderful advice and motivation that he provided to me during my undergraduate studies, and Dr. TAMBA NICOLAS MILLIMONO from “the Institute of Science and Technology of Mamou (ISTM), Guinea”, for the support and motivation that he is giving me to succeed in my scientific career. Last but not least, I want to thank God for sustaining me throughout the previous years and for placing wonderful people around me.