The Table Auto-Regressive Moving-Average Model for (Categorical) Stationary Series: Mathematical Perspectives (Invertibility; Maximum Likelihood Estimation)

Abstract

Once invertibility for a causal TARMA series is defined and accompanied by conditions on the probability parameters of the model, all focus concentrates on the maximum likelihood estimators. Under the coexistence of essential causality and invertibility, the estimators are shown to be convergent to the real values and asymptotically obedient to the Gaussian distribution: their variance matrix identifies with a classic result. Some real-like examples are simulated and simplification attempts include the derivation of the non-parametric chi-square test extension for stationary TAR series.

Keywords

Share and Cite:

Dimitriou-Fakalou, C. (2022) The Table Auto-Regressive Moving-Average Model for (Categorical) Stationary Series: Mathematical Perspectives (Invertibility; Maximum Likelihood Estimation). Open Journal of Statistics, 12, 385-407. doi: 10.4236/ojs.2022.123025.

1. Introduction

The scientific progress, from defining a valid time series model to making it useful in practice, depends on the consolidation of inference results. With regard to the non-linear stationary time series, the general TARMA model ( [1]) has recently pledged an unprecedented flexibility: given the mild requirement to categorize the variables’ values, its competence to express any serial order conditional or joint probability dependence is unquestionable. So it will be the aim of this paper to bond the theoretical with the most deserving TARMA sample properties. Before that, a scent of other results and popular models in the field is offered.

The DARMA model undoubtedly triggered a significant number of scientists’ interest for the discrete time series analysis: the reader may skim through [2] to get acquainted with the definition and some asymptotic properties. A recent fix to replication issues of the older model, can be discovered in [3]. Having to do with an additive model though, it should be crystal clear that its (and those pieces’ following it) simplicity and parsimony contribution to the discrete stationary series modelling, are due to its inarguable impairment to manage any better than the marginal and covariance dependence (which is not the case for the TARMA).

A valuable equivalence has been established in the past between the analysis of Bernoulli variables and the strictly stationary series obeying any law ( [4]): this saves the trouble of generalizing the Gaussian ARMA as in [5] for a family of infinitely divisible distributions. Nevertheless, most of the derivations concern the Gaussian stationarity, such as [6] who deals with the alternative (via 0 - 1) estimation of autoregressive parameterizations: those binary data have traditionally benefited from a rich bibliography anyway.

A promotion to the maximum likelihood estimation together with some solid statements can be found in [7], but the dense and short paper uses an approximation to the fixed Markovian or auto-regressive order; the inclusion of a moving-average part only presents itself with the Gaussian ARMA (1, 1) illustration, but it is exactly converting the AR to the ARMA that makes the problem challenging, ever more so when the distribution is not Gaussian. In the best estimators’ properties department, it is worth reading [8] to view the well-known ARMA efficiency result.

Notable attempts have taken place for count series as well; besides modelling as usual, [9] reproduced standard discrete-time renewal process inference results. In the other hand there are the INARMA models, for example, the INAR (1) (with random coefficient) asymptotic behaviour has been studied extensively by [10]. A recent review on the subject of count time series is [11]. One can always incorporate a count variable into a categorical one, encouraging the reader to elect the TARMA model as superior for the series stationarity. For bilinear models (and ‘any’-valued variables) there has been a plethora of inference results too.

Though a distinction wall is often raised between the time series and the Markov chain schemes, the causal TAR model embodies the homogeneous Markov dependence attached to a unique stationary distribution. Hence, [12] were the original contributors for the sake of the Markov chain inference. Later, [13] restricted the derivations to two states basic chain dependence managing explicit efficiency-related results, when (for two parameters only) the inverse of a square matrix can be computed with ease.

In this paper, the main objective is to perform the TARMA parameters inference, filling the gap of estimation for the infinite Markovian dependence: Section 2 summarizes the TARMA definition together with some raw statements concerning the use of the model parameters for a stationary series. Then Section 3 newly defines the invertible TARMA series and Section 3.1 invents a condition that safeguards invertibility, using a parallel with the existing definition and condition for causality. Section 3.2 continues with the invertibility topic, delivering news on the limit of probabilities based on a finite to the countably infinite past. Then all Section 4 is to establish the two grand properties of the maximum likelihood estimators: the weak convergence to the parameters is in Section 4.1, while Section 4.2 uses numerous lemmas and a theorem to conclude with the asymptotic normality, as this is explicitly stated in the beginning of Section 6; there, the normally distributed estimator vector transforms to a chi-square statistic to test a null hypothesis on the specified TARMA model fitting stationarity. The purely theoretical presentations are quickly reminded (Sections 2, 3), established (Sections 3, 4 and 6) or discussed (Section 6). In addition, the simulation Section 5 examines the performance of TARMA estimators (or how to compute the estimates) for small sample sizes, and returns favorable conclusions as well.

2. Reminders

Previously in [1], a strictly stationary time series $\left\{{X}_{t},\text{\hspace{0.17em}}t\in ℤ\right\}$ was considered: the variables can be of any (bounded or unbounded) range, which for the serial stationarity clothing must be assigned to $\left(k+1\right)$ categories (for fixed $k\in ℕ$ ) with conventional (or, natural, if that is the case) category codes, say, 0 and ${v}_{1},\cdots ,{v}_{k}\ne 0$. The X-variables are built on a multivariate sequence of independent in time and identically distributed ${\left(k+1\right)}^{p+q}×1$ random vectors: for each $i=\left({i}_{1},\cdots ,{i}_{p}\right)$, $j=\left({j}_{1},\cdots ,{j}_{q}\right)$, ${i}_{1},\cdots ,{i}_{p},{j}_{1},\cdots ,{j}_{q}=0,{v}_{1},\cdots ,{v}_{k}$, a univariable ${I}_{t}^{\left(i|j\right)}$ ( ${I}_{t}^{\left({0}_{p}|{0}_{q}\right)}\equiv {I}_{t}$ ) with same range as X is considered at time t, and marginally it has to hold $\left\{{I}_{t}^{\left(i|j\right)}\right\}~\text{IID}$ with probabilities

${\pi }_{x}^{\left(i|j\right)}=ℙ\left({I}_{t}^{\left(i|j\right)}=x\right)\in \left(0,1\right),\text{\hspace{0.17em}}x=0,{v}_{1},\cdots ,{v}_{k},\text{\hspace{0.17em}}\text{\hspace{0.17em}}\underset{x=0,{v}_{1},\cdots ,{v}_{k}}{\sum }\text{ }\text{ }{\pi }_{x}^{\left(i|j\right)}=1$

(and ${\pi }_{x}^{\left({0}_{p}|{0}_{q}\right)}\equiv {\pi }_{x}$ ).

It is reminded that a multivariate IID time series is such that the random vectors are independent in time only, or that any two univariables indexed at different times are independent: the variables’ dependence at the same time within the same vector will be referred to as interdependence with interindependence being a special case; hence the ${\pi }_{x}^{\left(i|j\right)}$ above are marginal probabilities that do not necessarily suffice to determine the joint dependence at the same timing.

In the general setting, $\left\{{I}_{t}^{\left(i|j\right)}=x\right\}$ will simplify to the event that the variable ${I}_{t}^{\left(i|j\right)}$ belongs into the category with code $x=0,{v}_{1},\cdots ,{v}_{k}$. The Markov chain terminology of a state space, say $\mathcal{S}:=\left\{0,{v}_{1},\cdots ,{v}_{k}\right\}$ and $\cdots \in \mathcal{S}$ will be avoided deliberately. Then the definition introduced in [1] is reminded here.

For fixed $p,q\in {ℕ}_{0}$ (such that $p+q\in ℕ$ ), $\left\{{X}_{t},\text{\hspace{0.17em}}t\in ℤ\right\}$ is a TableAuto-Regressive Moving-Average process of order $\left(k,p,q\right)$, and it is written $\left\{{X}_{t}\right\}~\text{TARMA}\left(k,p,q\right)$, if (it is causal and invertible and) it holds that

${X}_{t}:={I}_{t}^{\left(\left({X}_{t-1}\cdots {X}_{t-p}\right)|\left({I}_{t-1}\cdots {I}_{t-q}\right)\right)},\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{ }\text{for}\text{ }\text{\hspace{0.17em}}\text{ }\text{every}\text{ }\text{\hspace{0.17em}}t\in ℤ.$ (1)

In explanation, ${X}_{t}$ is set equal to a ${I}_{t}^{\left(...\right)}$ (at the original range); nevertheless, which I (out of the ${\left(k+1\right)}^{p+q}$ ) contributes its value, depends on the previous ${X}_{t-1},\cdots ,{X}_{t-p},{I}_{t-1},\cdots ,{I}_{t-q}$, i.e., which category each one of these $\left(p+q\right)$ variable realization falls into.

As an example of a TARMA (1, 1), the real-valued variables may be categorized into $k+1=3$ groups, say $\left(-\infty ,-3\right]$, $\left(-3,3\right)$ and $\left[3,\infty \right)$ with codes “-3”, “0” and “3”, respectively; then the (9 × 1) random vectors ${I}_{t}^{\left(i|j\right)}$, $i,j=-3,0,3$ are considered with fixed distribution for every $t\in ℤ$, and then over time with ${I}_{t}^{\left(i|j\right)}$ being independent of ${I}_{{t}^{*}}^{\left({i}^{*}|{j}^{*}\right)}$ for $t\ne {t}^{*}$. The ${X}_{t}$ is defined to be equal to ${I}_{t}^{\left(-3|-3\right)}$ if both ${X}_{t-1}$ and ${I}_{t-1}^{\left(0|0\right)}$ are less than or equal to −3, or equal to ${I}_{t}^{\left(0|-3\right)}$ if $-3<{X}_{t-1}<3$ and ${I}_{t-1}^{\left(0|0\right)}\le -3$, or...., or equal to ${I}_{t}^{\left(3|3\right)}$ if both ${X}_{t-1}$ and ${I}_{t-1}^{\left(0|0\right)}$ are greater than or equal to 3.

Hopefully, the example above clarifies that it is not necessarily a count series that is under study. This is an ‘occurrence-not occurrence’ analysis of categorical variables and, for the serial evolution, it matters not whether ${X}_{t}$ is set equal to the exact value or the relevant category code: it is a probability of occurrence analysis and it works subject to the categorization (together with k of course).

Then (1) can be re-arranged as

$\begin{array}{l}{f}_{x}\left({X}_{t}\right):=\underset{{i}_{1},\cdots ,{i}_{p},{j}_{1},\cdots ,{j}_{q}=0,{v}_{1},\cdots ,{v}_{k}}{\sum }\text{ }\text{ }{f}_{x}\left({I}_{t}^{\left(i|j\right)}\right)\underset{l=1}{\overset{p}{\prod }}\text{ }\text{ }{f}_{{i}_{l}}\left({X}_{t-l}\right)\underset{n=1}{\overset{q}{\prod }}\text{ }\text{ }{f}_{{j}_{n}}\left({I}_{t-n}\right),\\ \text{for}\text{ }\text{\hspace{0.17em}}x=0,{v}_{1},\cdots ,{v}_{k},\end{array}$ (2)

where for $x,y=0,{v}_{1},\cdots ,{v}_{k}$, the index functions

${f}_{x}\left(y\right)=\frac{{\prod }_{{x}^{*}=0,{v}_{1},\cdots ,{v}_{k},\text{\hspace{0.17em}}{x}^{*}\ne x}\left(y-{x}^{*}\right)}{{\prod }_{{x}^{*}=0,{v}_{1},\cdots ,{v}_{k},\text{\hspace{0.17em}}{x}^{*}\ne x}\left(x-{x}^{*}\right)}=\left\{\begin{array}{ll}1,\hfill & \text{ }\text{if}\text{ }\text{\hspace{0.17em}}y=x\hfill \\ 0,\hfill & \text{ }\text{if}\text{ }\text{\hspace{0.17em}}y=0,{v}_{1},\cdots ,{v}_{k},\text{\hspace{0.17em}}y\ne x\hfill \end{array}$

have been set.

For (2) to hold for every $t\in ℤ$, at the very least the assumption of uniqueness of a stationary distribution solution should be applicable; in fact, $\left\{{X}_{t},\text{\hspace{0.17em}}t\in ℤ\right\}$ should not be considered at all, unless it is a (table) causal process (based on $\left\{{I}_{t}\right\}$ ) according to the definition in [1]. The causality element will be revisited here together with the new element of invertibility.

Since the definition of this model has been the focus of past work, the reader is encouraged to look for examples and form a clearer picture from there ( [1]). The understanding and appreciation of the TARMA definition must have already taken place to proceed with this paper’s inference related goals. For the reader that wishes to derive the auto-covariance function of TARMA series, it is highlighted that the second moments stationarity is a special case of the all moments stationarity: the work of [1] has handed over a methodology to compute all joint probabilities, and hence the auto-covariance function is a special case, as it demands the joint probabilities of two variables. Not only are those derivations outside the scope of this paper, but also taming the all- rather than just second-order stationarity is the great TARMA contribution over other models.

From this point on, it will be taken for granted (and without a formal proof) that the TARMA construction may represent any strictly stationary time series, particularly under the convention- if there needs to be one- of tabulating the variables’ values. Together with a solution on identification (of k and p, q) issues, the TARMA theory is a contribution to the non-parametric version of statistical science. When $q=0$ the dependence is pth order Markovian, while for $q\ge 1$ a valid Markovian dependence of infinite order is achieved; when $p=0$, a q-dependent strictly stationary series is concerned, and the joint dependence limits to infinity for $p\ge 1$. Hence it will be considered ( [1]) that, subject to the categorization, the TARMA can be the all-moments stationary analogue to the ARMA model for the covariance stationary series.

Some Basic Requirements

The probabilities $\pi$ will be taking the role of parameters of the model, for which the estimation results will be established. The conditions in [1] are remembered:

(C1): $\left\{{I}_{t}^{\left(i|j\right)},\text{\hspace{0.17em}}t\in ℤ\right\}$ are jointly interindependent series, i.e., it holds that

$ℙ\left({I}_{t}^{\left({i}_{1}|{j}_{1}\right)}={x}_{1},\cdots ,{I}_{t}^{\left({i}_{n}|{j}_{n}\right)}={x}_{n}\right)=\underset{m=1}{\overset{n}{\prod }}\text{ }\text{ }ℙ\left({I}_{t}^{\left({i}_{m}|{j}_{m}\right)}={x}_{m}\right),\text{\hspace{0.17em}}t\in ℤ$

for any $n=2,\cdots ,{\left(k+1\right)}^{p+q}$, $\left({i}_{m},{j}_{m}\right)=\left({i}_{m,1},\cdots ,{i}_{m,p},{j}_{m,1},\cdots ,{j}_{m,q}\right)$, ${i}_{m,1},\cdots ,{i}_{m,p},{j}_{m,1},\cdots ,{j}_{m,q}=0,{v}_{1},\cdots ,{v}_{k}$, $m=1,\cdots ,n$, $\left({i}_{{m}_{1}},{j}_{{m}_{1}}\right)\ne \left({i}_{{m}_{2}},{j}_{{m}_{2}}\right)$, ${m}_{1},{m}_{2}=1,\cdots ,n$ ( ${m}_{1}\ne {m}_{2}$ ) and ${x}_{1},\cdots ,{x}_{n}=0,{v}_{1},\cdots ,{v}_{k}$.

Condition (C1) enforces the ${I}_{t}^{\left(...\right)}$ variables (at the same time t) to be independent: otherwise, the form of their interdependence should accompany (1), in order to properly define a TARMA process. Under an interdependence scenario, the computation of all joint TARMA probabilities heavily relies on

${\pi }_{x|y}^{*\left(i|j\right)}:=ℙ\left({I}_{t}^{\left(i|j\right)}=x\text{\hspace{0.17em}}|\text{\hspace{0.17em}}{I}_{t}=y\right),\text{\hspace{0.17em}}x,y=0,{v}_{1},\cdots ,{v}_{k},\text{\hspace{0.17em}}\left(i,j\right)\ne {0}_{p+q}$

(as well as $\pi$ ) according to [1]. Hence in this case, ${\pi }^{*}$ are also parameters for an apposite TARMA definition.

In the absence of (C1), an “h to h + 1” way to plex the interdependence may work by first considering ${\pi }_{x}$ and then gradually building for $n=1,\cdots ,p+q$ and every ${l}_{1},\cdots ,{l}_{n}=1,\cdots ,p+q$ ( ${l}_{1}<\cdots <{l}_{n}$ ), ${i}_{{l}_{1}},\cdots ,{i}_{{l}_{n}}={v}_{1},\cdots ,{v}_{k}$, the conditional probabilities $ℙ\left({I}_{t}^{\left({0}_{{l}_{1}-1}\text{\hspace{0.17em}}{i}_{{l}_{1}}\cdots {i}_{{l}_{n}}\text{\hspace{0.17em}}{0}_{p+q-{l}_{n}}\right)}={x}_{n}|{I}_{t}=x,\cdots \right)$ : “ $\cdots$ ” in the condition sets all ${I}_{t}^{\left({0}_{{l}_{1}^{*}-1}\text{\hspace{0.17em}}{i}_{{l}_{1}^{*}}\cdots {i}_{{l}_{{n}^{*}}^{*}}\text{\hspace{0.17em}}{0}_{p+q-{l}_{{n}^{*}}^{*}}\right)}$, ${n}^{*}=1,\cdots ,n-1$, ${l}_{1}^{*},\cdots ,{l}_{{n}^{*}}^{*}={l}_{1},\cdots ,{l}_{n}$ ( ${l}_{1}^{*}<\cdots <{l}_{{n}^{*}}^{*}$ ). That way one gets a feel of how (C1) can be replaced, especially when other attributes need to be accomplished.

Other conditions in [1] are listed below:

(C2): $\left\{{X}_{t},\text{\hspace{0.17em}}t\in ℤ\right\}$ is not a deterministic process, i.e.,

$ℙ\left({X}_{t}=x\text{\hspace{0.17em}}|\text{\hspace{0.17em}}{X}_{t-1}={x}_{1},{X}_{t-2}={x}_{2},\cdots \right)\in \left(0,1\right)$

for all $x,{x}_{n}=0,{v}_{1},\cdots ,{v}_{k}$, $n\in ℕ$.

(C3): $\left\{{X}_{t},\text{\hspace{0.17em}}t\in ℤ\right\}$ is not overparameterized, i.e., in the simplest of cases, it cannot be that ${\pi }_{x}^{\left(i|j\right)}={\pi }_{x}^{\left({i}^{\prime }|{j}^{\prime }\right)}$, $x=0,{v}_{1},\cdots ,{v}_{k}$ for $\left(i,j\right)\ne \left({i}^{\prime },{j}^{\prime }\right)$.

It would be misleading that the process $\left\{{X}_{t}\right\}$ is of order $\left(p,q\right)$ with $\left(k+1\right)$ categories, when in fact there are fewer parameters that are active (due to identical distributions), so this is preserved by (C3): the reader may reflect on the extreme example that all distributions are identical and $\left\{{X}_{t}\right\}$ is in fact an IID process! Nevertheless, the usual (opposite) notion that the different I-distributions should be as alike as possible, will be fundamental for the process validity according to other conditions (causality/invertibility) as in the next section.

Finally, it is stressed that the assumption of a convoluted multivariate IID time series, which is attached to the definition of the TARMA, is the necessary block for the inference that will be presented in this paper, the same way that the univariate error IID series is for the well-known ARMA.

3. Invertibility

In [1], the process $\left\{{X}_{t},\text{\hspace{0.17em}}t\in ℤ\right\}$ was called table causal (based on $\left\{{I}_{t},\text{\hspace{0.17em}}t\in ℤ\right\}$ ) also implying that it is strictly stationary. After setting

$\begin{array}{l}{d}^{h}{f}_{x}\left({I}_{t}^{\left({0}_{{l}_{1}-1}\text{\hspace{0.17em}}{i}_{{l}_{1}}\text{\hspace{0.17em}}{0}_{{l}_{2}-{l}_{1}-1}\text{\hspace{0.17em}}{i}_{{l}_{2}}\cdots {i}_{{l}_{h}}\text{\hspace{0.17em}}{0}_{p+q-{l}_{h}}\right)}\right)\\ :={f}_{x}\left({I}_{t}^{\left({0}_{{l}_{1}-1}\text{\hspace{0.17em}}{i}_{{l}_{1}}\text{\hspace{0.17em}}{0}_{{l}_{2}-{l}_{1}-1}\text{\hspace{0.17em}}{i}_{{l}_{2}}\cdots {i}_{{l}_{h}}\text{\hspace{0.17em}}{0}_{p+q-{l}_{h}}\right)}\right)-\underset{n=1}{\overset{h}{\sum }}\text{ }\text{ }{f}_{x}\left({I}_{t}^{\left({0}_{{l}_{1}-1},{i}_{{l}_{1}},\cdots ,{i}_{{l}_{n-1}},{0}_{{l}_{n+1}-{l}_{n-1}-1},{i}_{{l}_{n+1}},\cdots ,{i}_{{l}_{h}},{0}_{p+q-{l}_{h}}\right)}\right)\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}+\cdots +{\left(-1\right)}^{h-1}\underset{n=1}{\overset{h}{\sum }}\text{ }\text{ }{f}_{x}\left({I}_{t}^{\left({0}_{{l}_{n}-1},{i}_{{l}_{n}},{0}_{p+q-{l}_{n}}\right)}\right)+{\left(-1\right)}^{h}{f}_{x}\left( I t \right)\end{array}$

for $h=1,\cdots ,p+q$, ${l}_{1},\cdots ,{l}_{h}=1,\cdots ,p+q$ ( ${l}_{1}<\cdots <{l}_{h}$ ), ${i}_{{l}_{1}},\cdots ,{i}_{{l}_{h}}={v}_{1},\cdots ,{v}_{k}$, it can be derived from (2), that

$\begin{array}{l}{f}_{x}\left({I}_{t}\right)={f}_{x}\left({X}_{t}\right)-\underset{{h}_{p}=1}{\overset{p}{\sum }}\text{\hspace{0.17em}}\underset{\begin{array}{c}{l}_{1},\cdots ,{l}_{{h}_{p}}=1\\ {l}_{1}<\cdots <{l}_{{h}_{p}}\end{array}}{\overset{p}{\sum }}\text{\hspace{0.17em}}\underset{{i}_{{l}_{1}},\cdots ,{i}_{{l}_{{h}_{p}}}={v}_{1},\cdots ,{v}_{k}}{\sum }{d}^{{h}_{p}}{f}_{x}\left({I}_{t}^{\left(\left({0}_{{l}_{1}-1},{i}_{{l}_{1}},\cdots ,{i}_{{l}_{{h}_{p}}},{0}_{p-{l}_{{h}_{p}}}\right)|{0}_{q}\right)}\right)\\ \cdot \left(\underset{r=1}{\overset{{h}_{p}}{\prod }}\text{ }\text{ }{f}_{{i}_{{l}_{r}}}\left({X}_{t-{l}_{r}}\right)\right)-\underset{{h}_{q}=1}{\overset{q}{\sum }}\text{\hspace{0.17em}}\underset{{h}_{p}=0}{\overset{p}{\sum }}\text{\hspace{0.17em}}\underset{\begin{array}{c}{l}_{1},\cdots ,{l}_{{h}_{p}}=1\\ {l}_{1}<\cdots <{l}_{{h}_{p}}\end{array}}{\overset{p}{\sum }}\text{\hspace{0.17em}}\underset{\begin{array}{c}{n}_{1},\cdots ,{n}_{{h}_{q}}=1\\ {n}_{1}<\cdots <{n}_{{h}_{q}}\end{array}}{\overset{q}{\sum }}\text{\hspace{0.17em}}\underset{{i}_{{l}_{1}},\cdots ,{i}_{{l}_{{h}_{p}}},\text{\hspace{0.17em}}{j}_{{n}_{1}},\cdots ,{j}_{{n}_{{h}_{q}}}={v}_{1},\cdots ,{v}_{k}}{\sum }\text{\hspace{0.17em}}{d}^{{h}_{p}+{h}_{q}}\\ \cdot {f}_{x}\left({I}_{t}^{\left(\left({0}_{{l}_{1}-1},{i}_{{l}_{1}},\cdots ,{i}_{{l}_{{h}_{p}}},{0}_{p-{l}_{{h}_{p}}}\right)|\left({0}_{{n}_{1}-1},{j}_{{n}_{1}},\cdots ,{j}_{{n}_{{h}_{q}}},{0}_{q-{n}_{{h}_{q}}}\right)\right)}\right)\left(\underset{r=1}{\overset{{h}_{q}}{\prod }}\text{ }\text{ }{f}_{{j}_{{n}_{r}}}\left({I}_{t-{n}_{r}}\right)\right)\left(\underset{r=1}{\overset{{h}_{p}}{\prod }}\text{ }\text{ }{f}_{{i}_{{l}_{r}}}\left({X}_{t-{l}_{r}}\right)\right):\end{array}$ (3)

similarly was done for causality ( [1]) and the representation

$\begin{array}{c}{f}_{x}\left({X}_{t}\right)={f}_{x}\left({I}_{t}\right)+\underset{h\in ℕ}{\sum }\text{\hspace{0.17em}}\underset{\begin{array}{c}{n}_{1},\cdots ,{n}_{h}\in ℕ\\ {n}_{1}<\cdots <{n}_{h}\end{array}}{\sum }\text{\hspace{0.17em}}\underset{{j}_{{n}_{1}},\cdots ,{j}_{{n}_{h}}={v}_{1},\cdots ,{v}_{k}}{\sum }\text{ }\text{ }{f}_{x}{\Psi }_{t,\left({n}_{1},\cdots ,{n}_{h}\right)}\left({j}_{{n}_{1}},\cdots ,{j}_{{n}_{h}}\right)\\ \text{\hspace{0.17em}}\cdot \left(\underset{r=1}{\overset{h}{\prod }}\text{ }\text{ }{f}_{{j}_{{n}_{r}}}\left({I}_{t-{n}_{r}}\right)\right).\end{array}$ (4)

Here, a new substitution of $\left({\prod }_{r=1}^{{h}_{q}}\text{ }\text{ }{f}_{{j}_{{n}_{r}}}\left({I}_{t-{n}_{r}}\right)\right)$ and so on, will also lead to a countably infinite representation (of the upcoming form (5)). An adequate definition for invertibility follows.

Definition 3.1: The process $\left\{{X}_{t},\text{\hspace{0.17em}}t\in ℤ\right\}$ as defined in (1) will be called invertible, in the sense that it can be written

$\begin{array}{c}{f}_{x}\left({X}_{t}\right)={f}_{x}\left({I}_{t}\right)+\underset{h\in ℕ}{\sum }\text{\hspace{0.17em}}\underset{\begin{array}{c}{l}_{1},\cdots ,{l}_{h}\in ℕ\\ {l}_{1}<\cdots <{l}_{h}\end{array}}{\sum }\text{\hspace{0.17em}}\underset{{i}_{{l}_{1}},\cdots ,{i}_{{l}_{h}}={v}_{1},\cdots ,{v}_{k}}{\sum }\text{ }\text{ }{f}_{x}{\Phi }_{t,\left({l}_{1},\cdots ,{l}_{h}\right)}\left({i}_{{l}_{1}},\cdots ,{i}_{{l}_{h}}\right)\\ \text{\hspace{0.17em}}\cdot \left(\underset{r=1}{\overset{h}{\prod }}\text{ }\text{ }{f}_{{i}_{{i}_{r}}}\left({X}_{t-{l}_{r}}\right)\right)\end{array}$ (5)

for $x={v}_{1},\cdots ,{v}_{k}$ and $t\in ℤ$ ; the random variable ${f}_{x}{\Phi }_{t,\left({l}_{1},\cdots ,{l}_{h}\right)}\left({i}_{{l}_{1}},\cdots ,{i}_{{l}_{h}}\right)$, $\left(h\ge 1\right)$ is independent of ${I}_{t+n}^{\left(...\right)}$, $n\in ℕ$, ${I}_{t-l}^{\left(...\right)}$, $l\ge {l}_{h}$, $l\in ℕ$ (it is a function of ${I}_{t-l}^{\left(i|j\right)}$, $0\le l\le {l}_{h}-1$ ) and remains unchanged for $t\in ℤ$.

Additionally, it must hold that the probabilities

$ℙ\left({X}_{t}=x|{X}_{t-l}={i}_{l},\text{\hspace{0.17em}}l\in ℕ\right),\text{\hspace{0.17em}}\text{\hspace{0.17em}}{i}_{l}=0,{v}_{1},\cdots ,{v}_{k},\text{\hspace{0.17em}}l\in ℕ$ (6)

can be uniquely determined from (5).

Remark 1: For $h,{h}^{*},{h}_{L}\in ℕ$, ${h}_{L}\le {h}^{*}\le h$, ${l}_{1},\cdots ,{l}_{h}\in ℕ$, ( ${l}_{1}<\cdots <{l}_{h}$ ), consider $\left\{{l}_{1}^{*},\cdots ,{l}_{{h}_{L}}^{*}\equiv {l}_{{h}^{*}}\right\}\subseteq \left\{{l}_{1},\cdots ,{l}_{h}\right\}$, ${l}_{1}^{*}<\cdots <{l}_{{h}_{L}}^{*}$ and any ${i}_{{l}_{1}},\cdots ,{i}_{{l}_{h}}={v}_{1},\cdots ,{v}_{k}$, and write the events

${\mathcal{B}}_{\left(t,h\right),\left(\left({l}_{1},\cdots ,{l}_{h}\right),\left({i}_{{l}_{1}},\cdots ,{i}_{{l}_{h}}\right)\right)}:=\left\{{I}_{t-{l}_{1}}={i}_{{l}_{1}},\cdots ,{I}_{t-{l}_{h}}={i}_{{l}_{h}},\text{\hspace{0.17em}}{I}_{t-l}=0,\text{\hspace{0.17em}}l\in ℕ,\text{\hspace{0.17em}}l\ne {l}_{1},\cdots ,{l}_{h}\right\},$

${\mathcal{C}}_{\left(t,h\right),\left(\left({l}_{1},\cdots ,{l}_{h}\right),\left({i}_{{l}_{1}},\cdots ,{i}_{{l}_{h}}\right)\right)}:=\left\{{X}_{t-{l}_{1}}={i}_{{l}_{1}},\cdots ,{X}_{t-{l}_{h}}={i}_{{l}_{h}},\text{\hspace{0.17em}}{X}_{t-l}=0,\text{\hspace{0.17em}}l\in ℕ,\text{\hspace{0.17em}}l\ne {l}_{1},\cdots ,{l}_{h}\right\}$

and the probability of interest

$ℙ\left({f}_{x}{\Phi }_{t,\left({l}_{1}^{*},\cdots ,{l}_{{h}_{L}}^{*}\right)}\left({i}_{{l}_{1}^{*}},\cdots ,{i}_{{l}_{{h}_{L}}^{*}}\right)\le y|{\mathcal{C}}_{\left(t,h\right),\left(\left({l}_{1},\cdots ,{l}_{h}\right),\left({i}_{{l}_{1}},\cdots ,{i}_{{l}_{h}}\right)\right)}\right).$

As opposed to the probability

$\begin{array}{l}ℙ\left({f}_{x}{\Psi }_{t,\left({l}_{1}^{*},\cdots ,{l}_{{h}_{L}}^{*}\right)}\left({i}_{{l}_{1}^{*}},\cdots ,{i}_{{l}_{{h}_{L}}^{*}}\right)\le y|{\mathcal{B}}_{\left(t,h\right),\left(\left({l}_{1},\cdots ,{l}_{h}\right),\left({i}_{{l}_{1}},\cdots ,{i}_{{l}_{h}}\right)\right)}\right)\\ \equiv ℙ\left({f}_{x}{\Psi }_{t,\left({l}_{1}^{*},\cdots ,{l}_{{h}_{L}}^{*}\right)}\left({i}_{{l}_{1}^{*}},\cdots ,{i}_{{l}_{{h}_{L}}^{*}}\right)\le y|{I}_{t-{l}_{1}}={i}_{{l}_{1}},\cdots ,{I}_{t-{l}_{{h}^{*}}}={i}_{{l}_{{h}^{*}}},\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\stackrel{}{\text{ }}{I}_{t-l}=0,\text{\hspace{0.17em}}1\le l<{l}_{{h}^{*}},\text{\hspace{0.17em}}l\ne {l}_{1},\cdots ,{l}_{{h}^{*}}\right)\end{array}$

from the “causality” topic in [1] when it is used that $\left\{{I}_{t}^{\left(i|j\right)}\right\}~\text{IID}$ ( ${f}_{x}{\Psi }_{t,\left({l}_{1},\cdots ,{l}_{h}\right)}$ are functions of ${I}_{t-l}^{\left(i|j\right)}$, $0\le l<{l}_{h}$ ), in the case of “invertibility” $\left\{{X}_{t}\right\}$ is a serially dependent series. Nevertheless, a simplification is in order, i.e., it holds that

$\begin{array}{l}ℙ\left({f}_{x}{\Phi }_{t,\left({l}_{1}^{*},\cdots ,{l}_{{h}_{L}}^{*}\right)}\left({i}_{{l}_{1}^{*}},\cdots ,{i}_{{l}_{{h}_{L}}^{*}}\right)\le y|{\mathcal{C}}_{\left(t,h\right),\left(\left({l}_{1},\cdots ,{l}_{h}\right),\left({i}_{{l}_{1}},\cdots ,{i}_{{l}_{h}}\right)\right)}\right)\\ =ℙ\left({f}_{x}{\Phi }_{t,\left({l}_{1}^{*},\cdots ,{l}_{{h}_{L}}^{*}\right)}\left({i}_{{l}_{1}^{*}},\cdots ,{i}_{{l}_{{h}_{L}}^{*}}\right)\le y|{\Phi }_{\left(t-1,h\right),\left(\left({l}_{1}-1,\cdots ,{l}_{h}-1\right),\left({i}_{{l}_{1}},\cdots ,{i}_{{l}_{h}}\right)\right)}^{*}=0,\cdots \\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\Phi }_{\left(t-{l}_{1},h-1\right),\left(\left({l}_{2}-{l}_{1},\cdots ,{l}_{h}-{l}_{1}\right),\left({i}_{{l}_{2}},\cdots ,{i}_{{l}_{h}}\right)\right)}^{*}={i}_{{l}_{1}},\cdots ,{\Phi }_{\left(t-{l}_{h-1},1\right),\left({l}_{h}-{l}_{h-1},{i}_{{l}_{h}}\right)}^{*}={i}_{{l}_{h-1}},\cdots \\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\Phi }_{\left(t-{l}_{h}+1,1\right),\left(1,{i}_{{l}_{h}}\right)}^{*}=0,{I}_{t-{l}_{h}}={i}_{{l}_{h}}\right)\end{array}$ (7)

as it can be shown (together with the notation ${\Phi }^{*}$ ).

3.1. Conditions Relating to Causality and Invertibility

To perform the TARMA parameters inference, there will be the need to somehow “squeeze” the random coefficients ${f}_{x}{\Phi }_{t,\left({l}_{1},\cdots ,{l}_{h}\right)}$ to become smaller as ${l}_{h}\to \infty$.

To manage convergence results, first set the remainder

$\begin{array}{c}{f}_{x}{R}_{t,>l}:=\underset{r=l+1}{\overset{\infty }{\sum }}\text{ }\text{ }\underset{{i}_{r}={v}_{1},\cdots ,{v}_{k}}{\sum }\text{ }\text{ }{f}_{x}{\Phi }_{t,r}\left({i}_{r}\right)\cdot {f}_{{i}_{r}}\left({X}_{t-r}\right)\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{ }\text{ }+\underset{{r}_{1}=1}{\overset{l}{\sum }}\text{ }\text{ }\underset{{r}_{2}=l+1}{\overset{\infty }{\sum }}\text{ }\text{ }\underset{{i}_{{r}_{1}},{i}_{{r}_{2}}={v}_{1},\cdots ,{v}_{k}}{\sum }\text{ }\text{ }{f}_{x}{\Phi }_{t,\left({r}_{1},{r}_{2}\right)}\left({i}_{{r}_{1}},{i}_{{r}_{2}}\right)\cdot {f}_{{i}_{{r}_{1}}}\left({X}_{t-{r}_{1}}\right){f}_{{i}_{{r}_{2}}}\left({X}_{t-{r}_{2}}\right)\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{ }\text{ }+\underset{\begin{array}{c}{r}_{1},{r}_{2}=l+1\\ {r}_{1}<{r}_{2}\end{array}}{\overset{\infty }{\sum }}\text{ }\text{ }\underset{{i}_{{r}_{1}},{i}_{{r}_{2}}={v}_{1},\cdots ,{v}_{k}}{\sum }\text{ }\text{ }{f}_{x}{\Phi }_{t,\left({r}_{1},{r}_{2}\right)}\left({i}_{{r}_{1}},{i}_{{r}_{2}}\right)\cdot {f}_{{i}_{{r}_{1}}}\left({X}_{t-{r}_{1}}\right){f}_{{i}_{{r}_{2}}}\left({X}_{t-{r}_{2}}\right)+\cdots \end{array}$

from the “invertible” representation (5). Similarly, it has been set by [1] that

$\begin{array}{c}{f}_{x}{r}_{t,>l}:=\underset{r=l+1}{\overset{\infty }{\sum }}\text{ }\text{ }\underset{{j}_{r}={v}_{1},\cdots ,{v}_{k}}{\sum }\text{ }\text{ }{f}_{x}{\Psi }_{t,r}\left({j}_{r}\right)\cdot {f}_{{j}_{r}}\left({I}_{t-r}\right)\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{ }\text{ }+\underset{{r}_{1}=1}{\overset{l}{\sum }}\text{ }\text{ }\underset{{r}_{2}=l+1}{\overset{\infty }{\sum }}\text{ }\text{ }\underset{{j}_{{r}_{1}},{j}_{{r}_{2}}={v}_{1},\cdots ,{v}_{k}}{\sum }\text{ }\text{ }{f}_{x}{\Psi }_{t,\left({r}_{1},{r}_{2}\right)}\left({j}_{{r}_{1}},{j}_{{r}_{2}}\right)\cdot {f}_{{j}_{{r}_{1}}}\left({I}_{t-{r}_{1}}\right){f}_{{j}_{{r}_{2}}}\left({I}_{t-{r}_{2}}\right)\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{ }\text{ }+\underset{\begin{array}{c}{r}_{1},{r}_{2}=l+1\\ {r}_{1}<{r}_{2}\end{array}}{\overset{\infty }{\sum }}\text{ }\text{ }\underset{{j}_{{r}_{1}},{j}_{{r}_{2}}={v}_{1},\cdots ,{v}_{k}}{\sum }\text{ }\text{ }{f}_{x}{\Psi }_{t,\left({r}_{1},{r}_{2}\right)}\left({j}_{{r}_{1}},{j}_{{r}_{2}}\right)\cdot {f}_{{j}_{{r}_{1}}}\left({I}_{t-{r}_{1}}\right){f}_{{j}_{{r}_{2}}}\left({I}_{t-{r}_{2}}\right)+\cdots \end{array}$

from the “causal” representation (4).

Consider generic constants $C>0$ and $\alpha \in \left(0,1\right)$. The condition is presented below:

(C4): The parameters of the TARMA equation are such that it holds that

$\begin{array}{l}\mathbb{E}\left\{|{f}_{x}{R}_{t,>l}||{X}_{t-n}={x}_{n},\text{\hspace{0.17em}}n\in ℕ\right\}\le C\cdot {\alpha }^{l+1},\text{\hspace{0.17em}}\text{ }\text{for}\text{ }\text{\hspace{0.17em}}x={v}_{1},\cdots ,{v}_{k},\text{\hspace{0.17em}}\\ t\in ℤ,\text{\hspace{0.17em}}l\in {ℕ}_{0}\text{\hspace{0.17em}}\text{and}\text{ }\text{\hspace{0.17em}}\text{ }\text{any}\text{ }\text{\hspace{0.17em}}{x}_{n}=0,{v}_{1},\cdots ,{v}_{k},\text{\hspace{0.17em}}n\in ℕ.\end{array}$ (8)

(C5): The parameters of the TARMA equation are such that it holds that

$\mathbb{E}|{f}_{x}{r}_{t,>l}|\le C\cdot {\alpha }^{l+1},\text{\hspace{0.17em}}\text{ }\text{for}\text{ }\text{\hspace{0.17em}}x={v}_{1},\cdots ,{v}_{k},\text{\hspace{0.17em}}t\in ℤ,\text{\hspace{0.17em}}l\in {ℕ}_{0}.$ (9)

To justify (C4), see that it can be written that

${f}_{x}\left({X}_{t}\right)={f}_{x}\left({I}_{t}\right)+\underset{l\in {ℕ}_{0}}{\sum }\left({f}_{x}{R}_{t,>l}-{f}_{x}{R}_{t,>l+1}\right),$

so that the conditional probability of interest (6) is bounded by

$ℙ\left({I}_{t}=x|{X}_{t-n}={i}_{n},\text{\hspace{0.17em}}n\in ℕ\right)+\underset{l\in {ℕ}_{0}}{\sum }\text{ }\text{ }{C}^{*}\cdot {\alpha }^{l+1},$

where a converging geometric series is involved and, under causality, it is $ℙ\left({I}_{t}=x|{X}_{t-n}={i}_{n},\text{\hspace{0.17em}}n\in ℕ\right)\equiv {\pi }_{x}$ ; straight from (8), it holds that $\begin{array}{l}\mathbb{E}\left\{|{f}_{x}{R}_{t,>l}-{f}_{x}{R}_{t,>l+1}||{X}_{t-n}={i}_{n},\text{\hspace{0.17em}}n\in ℕ\right\}\\ \le \mathbb{E}\left\{|{f}_{x}{R}_{t,>l}||{X}_{t-n}={i}_{n}\right\}+\mathbb{E}\left\{|{f}_{x}{R}_{t,>l+1}||{X}_{t-n}={i}_{n}\right\}\end{array}$, so that it has been inserted ${C}^{*}:=C\cdot \left(1+\alpha \right)$.

Regarding (C5) and how it secures that all joint probabilities can be safely bounded, the answer is easy to show.

Since (C4) (as opposed to (C5)) involves conditional expectations, it might be wished for $|{f}_{x}{R}_{t,>l}|$ given “...” to simplify the condition “...” from $\left\{{X}_{t-n}={x}_{n},\text{\hspace{0.17em}}n\in ℕ\right\}$ to $\left\{{X}_{t-n}={x}_{n},\text{\hspace{0.17em}}n=1,\cdots ,l\right\}$. According to the argument laid out above, it is in fact the absolute value of

$\begin{array}{l}{f}_{x}{R}_{t,>l}-{f}_{x}{R}_{t,>l+1}\equiv \underset{{i}_{l+1}={v}_{1},\cdots ,{v}_{k}}{\sum }\text{ }\text{ }{f}_{x}{\Phi }_{t,l+1}\left({i}_{l+1}\right)\cdot {f}_{{i}_{l+1}}\left({X}_{t-\left(l+1\right)}\right)\\ +\underset{{r}_{1}=1}{\overset{l}{\sum }}\text{ }\text{ }\underset{{i}_{{r}_{1}},{i}_{l+1}={v}_{1},\cdots ,{v}_{k}}{\sum }\text{ }\text{ }{f}_{x}{\Phi }_{t,\left({r}_{1},l+1\right)}\left({i}_{{r}_{1}},{i}_{{r}_{2}}\right)\cdot {f}_{{i}_{{r}_{1}}}\left({X}_{t-{r}_{1}}\right){f}_{{i}_{l+1}}\left({X}_{t-\left(l+1\right)}\right)+\cdots \end{array}$

(instead of that of ${f}_{x}{R}_{t,>l}$ ) that is needed, which is “contained” within the random coefficients ${f}_{x}{\Phi }_{t,\left({l}_{1},\cdots ,{l}_{h}\right)}$, ${l}_{1}<\cdots <{l}_{h}\le l+1$ (that are functions of ${I}_{t-n}^{\left(...\right)}$, $n=0,1,\cdots ,l$ ). In the case that ${X}_{t-n}=0$, $n\ge l+1$, it is clear from Remark 1 that (under causality) a simplification is possible, though not concerning the variables X in the condition: otherwise, it’d better not be attempted. The interested reader may look at Section 3.2 to verify what happens in the general case given the requirement in (C4) as it is.

For causality, due to the difficulties in determining the random coefficients ${f}_{x}{\Psi }_{t}$ (as functions of ${I}_{t-l}^{\left(...\right)}$, $l\in {ℕ}_{0}$ ), [1] resorted to an alternative (to (4)) representation: then based on the new (rather than (4)) form, salvaged a condition relating to causality. The equivalent representation for invertibility may be demonstrated with a Proposition 1 and Propositions 2 and 3 may lead to an invertibility-related condition. Nevertheless, it might be stressed here that Proposition 2 (and, consequently, Proposition 3) is established using the prerequisite of causality; this is because an “invertible” representation relies on the index variables $f\left({X}_{t-l}\right)$ (not $f\left({I}_{t-l}\right)$ ) and, under causality, it can be certified that ${X}_{t-l}$, $l\in ℕ$ is independent of ${I}_{t}^{\left(...\right)}$.

Furthermore, remember that the causality consideration is attached to the definition of a TARMA process, as it guarantees probability stationarity, which is what this theory is about: so this is before the results for the inference are seeked. In the contrary, the contribution of invertibility will shine, when the weak convergence (consistency) of the maximum likelihood estimators for the relevant TARMA probabilities is established. Nevertheless, it is not wished to undermine the value of invertibility as compared to that of causality: after all, both type coefficients ${f}_{x}{\Psi }_{t}$ and ${f}_{x}{\Phi }_{t}$ are built as functions of the ${I}^{\left(...\right)}$ variables from present and past, using a similar mechanism. The conditions obtained seem to complete rather than contradict each other too, so it is clear that causality and invertibility need to work together for the TARMA body to stand straight.

The reader needs to acquaint themselves with the following notation: it is set for $x={v}_{1},\dots ,{v}_{k}$, that

${\gamma }_{x}^{\left(\nu \right)}:=\left\{\begin{array}{ll}\mathrm{max}\left\{\mathbb{E}|{d}^{\nu }{f}_{x}\left({I}_{t}^{\left({0}_{p}|\left({0}_{{n}_{1}-1}\text{\hspace{0.17em}}{j}_{{n}_{1}}\cdots {j}_{{n}_{\nu }}\text{\hspace{0.17em}}{0}_{q-{n}_{\nu }}\right)\right)}\right)|\right\},\hfill & \text{ }\text{if}\text{ }\text{\hspace{0.17em}}\nu =1,\cdots ,q\hfill \\ \underset{i,j}{\mathrm{max}}\left\{{\pi }_{x}^{\left(i|j\right)}\right\},\hfill & \text{ }\text{if}\text{ }\text{\hspace{0.17em}}\nu =0\hfill \end{array},$

as well as for each ${h}_{p}=1,\cdots ,p$, $\nu =0,1,\cdots ,q$, that

${\gamma }_{p,x}^{\left(\nu \right)}\left({h}_{p}\right):=\mathrm{max}\left\{\mathbb{E}|{d}^{{h}_{p}+\nu }{f}_{x}\left({I}_{t}^{\left(\left({0}_{{l}_{1}-1}\text{\hspace{0.17em}}{i}_{{l}_{1}}\cdots {i}_{{l}_{{h}_{p}}}\text{\hspace{0.17em}}{0}_{p-{l}_{{h}_{p}}}\right)|\left({0}_{{n}_{1}-1}\text{\hspace{0.17em}}{j}_{{n}_{1}}\cdots {j}_{{n}_{\nu }}\text{\hspace{0.17em}}{0}_{q-{n}_{\nu }}\right)\right)}\right)|\right\},$

where both maximums take place for any ${n}_{1},\cdots ,{n}_{\nu }=1,\cdots ,q$ ( ${n}_{1}<\cdots <{n}_{\nu }$ ), ${j}_{{n}_{1}},\cdots ,{j}_{{n}_{\nu }}={v}_{1},\cdots ,{v}_{k}$ ; the second also taking place for any ${l}_{1},\cdots ,{l}_{{h}_{p}}=1,\cdots ,p$ ( ${l}_{1}<\cdots <{l}_{{h}_{p}}$ ), ${i}_{{l}_{1}},\cdots ,{i}_{{l}_{{h}_{p}}}={v}_{1},\cdots ,{v}_{k}$. It is defined for $x={v}_{1},\cdots ,{v}_{k}$

${\gamma }_{x}:=\underset{\nu =1,\cdots ,q}{\mathrm{max}}\left\{{\gamma }_{x}^{\left(\nu \right)}+\underset{{h}_{p}=1}{\overset{p}{\sum }}\left(\begin{array}{c}p\\ {h}_{p}\end{array}\right){\gamma }_{p,x}^{\left(\nu \right)}\left({h}_{p}\right)\right\}$ and

${\gamma }^{*}:=\underset{\nu =0,\cdots ,q}{\mathrm{max}}\left\{\left(\underset{x={v}_{1},\cdots ,{v}_{k}}{\sum }\text{ }\text{ }{\gamma }_{x}^{\left(\nu \right)}\right)+\underset{{h}_{p}=1}{\overset{p}{\sum }}\left(\begin{array}{c}p\\ {h}_{p}\end{array}\right)\left(\underset{x={v}_{1},\cdots ,{v}_{k}}{\sum }\text{ }\text{ }{\gamma }_{p,x}^{\left(\nu \right)}\left({h}_{p}\right)\right)\right\}.$

By relaxing the sum of maximums into a maximum of sums, ${\sum }_{x={v}_{1},\cdots ,{v}_{k}}\text{ }\text{ }{\gamma }_{x}^{\left(0\right)}$ is replaced by ${\mathrm{max}}_{i,j}\left\{1-{\pi }_{0}^{\left(i|j\right)}\right\}$. More research is welcome on the subject of the equivalence with (C4), which was demonstrated to be sufficient for invertibility.

Remark 2: (i) Under (C4) and (C5), all joint probabilities and all probabilities $ℙ\left({X}_{t}=x\text{\hspace{0.17em}}|\text{\hspace{0.17em}}{X}_{t-n}={x}_{n},\text{\hspace{0.17em}}n\in ℕ\right)$, $x,{x}_{n}=0,{v}_{1},\cdots ,{v}_{k}$ are contained (away from infinity): it can be concluded that any joint probability is away from zero (hence away from one as well), because if any such probability, say $ℙ\left({\mathcal{X}}_{t}^{\left(a\right)}\right)$, was 0 it would have to be that any conditional probability as above- with ${\mathcal{X}}_{{t}^{*}}^{\left(a\right)}\subseteq \left\{{X}_{t-n}={x}_{n},\text{\hspace{0.17em}}n\in ℕ\right\}$ for some ${t}^{*}$ - would not be bounded and properly defined; then it can be concluded (by division of a joint non zero probability over a joint non infinity probability) that all conditional probabilities are strictly larger than zero (hence smaller than one as well). Consequently, it will be taken that (C4) and (C5) can suffice for (C2).

(ii) For the $\pi$ that yield a causal and invertible TARMA series $\left\{{X}_{t}\right\}$, it can be arranged under (8) in (C4) (or, (9) in (C5)) that

$|\frac{\partial }{\partial \pi }\mathbb{E}\left({f}_{x}{R}_{t,>l}|{X}_{t-n}={x}_{n},\text{\hspace{0.17em}}n\in ℕ\right)|\le C\cdot {\alpha }^{l+1},\text{\hspace{0.17em}}\text{\hspace{0.17em}}\left(\text{or}\text{ }\text{\hspace{0.17em}}|\frac{\partial }{\partial \pi }\mathbb{E}\left({f}_{x}{r}_{t,>l}\right)|\le C\cdot {\alpha }^{l+1}\right)$ (10)

for some $0 and $\alpha \in \left(0,1\right)$, and

$|\frac{{\partial }^{2}}{\partial \pi \text{ }\partial \pi }\mathbb{E}\left({f}_{x}{R}_{t,>l}|{X}_{t-n}={x}_{n},\text{\hspace{0.17em}}n\in ℕ\right)|\le C\cdot {\alpha }^{l+1}\text{ }\text{\hspace{0.17em}}\left(\text{or}\text{ }\text{\hspace{0.17em}}|\frac{{\partial }^{2}}{\partial \pi \text{ }\partial \pi }\mathbb{E}\left({f}_{x}{r}_{t,>l}\right)|\le C\cdot {\alpha }^{l+1}\right)$ (11)

under (8) (or (9)): this can be shown.

It is being taken for granted that the $\frac{\partial }{\partial \pi }ℙ\left({X}_{t}=x|{X}_{t-n}={x}_{n},\text{\hspace{0.17em}}n\in ℕ,\pi \right)$, $\frac{{\partial }^{2}}{\partial \pi \text{ }\partial \pi }ℙ\left({X}_{t}=x|{X}_{t-n}={x}_{n},\text{\hspace{0.17em}}n\in ℕ,\pi \right)$ derivatives exist.

3.2. Random Coefficient Modelling Based on the Past

For $l,n\in {ℕ}_{0}$, $n\le l$ and a set $\left\{{l}_{1}^{*},\cdots ,{l}_{n}^{*}\right\}\subseteq \left\{1,\cdots ,l\right\}$, it is written (for fixed ${i}_{{l}_{1}^{*}},\cdots ,{i}_{{l}_{n}^{*}}={v}_{1},\cdots ,{v}_{k}$ but this is omitted from the symbol $\mathcal{N}$ ), that

${\mathcal{N}}_{t,n\le l}:=\left\{{X}_{t-{l}_{1}^{*}}={i}_{{l}_{1}^{*}},\cdots ,{X}_{t-{l}_{n}^{*}}={i}_{{l}_{n}^{*}},\text{\hspace{0.17em}}{X}_{t-{l}^{*}}=0,\text{\hspace{0.17em}}{l}^{*}=1,\cdots ,l,\text{\hspace{0.17em}}{l}^{*}\ne {l}_{1}^{*},\cdots ,{l}_{n}^{*}\right\}.$

Eventually it can be shown that

$\underset{l\to \infty }{\mathrm{lim}}ℙ\left(|ℙ\left({X}_{t}=x|{\mathcal{N}}_{t,n\le l},{X}_{t-l-{l}^{*}},{l}^{*}\in ℕ\right)-ℙ\left({X}_{t}=x|{\mathcal{N}}_{t,n\le l}\right)|\le C\cdot {\alpha }^{l+1}\right)=1$ (12)

( $n,{l}_{1}^{*},\cdots ,{l}_{n}^{*},{i}_{{l}_{1}^{*}},\cdots ,{i}_{{l}_{n}^{*}}$ remain unchanged as $l\to \infty$ ).

Remark 3: From (10) and (11) and in the same manner, it will be considered occasionally that

$\underset{l\to \infty }{\mathrm{lim}}ℙ\left(|\frac{\partial }{\partial \pi }ℙ\left({X}_{t}=x|{\mathcal{N}}_{t,n\le l}\right)-\frac{\partial }{\partial \pi }ℙ\left({X}_{t}=x|{\mathcal{N}}_{t,n\le l},\text{\hspace{0.17em}}{X}_{t-{l}^{*}},\text{\hspace{0.17em}}{l}^{*}\ge l+1\right)|\le C\cdot {\alpha }^{l+1}\right)=1$ (13)

and

$\underset{l\to \infty }{\mathrm{lim}}ℙ\left(|\frac{{\partial }^{2}}{\partial \pi \text{ }\partial \pi }ℙ\left({X}_{t}=x|{\mathcal{N}}_{t,n\le l}\right)-\frac{{\partial }^{2}}{\partial \pi \text{ }\partial \pi }ℙ\left({X}_{t}=x|{\mathcal{N}}_{t,n\le l},\text{\hspace{0.17em}}{X}_{t-{l}^{*}},\text{\hspace{0.17em}}{l}^{*}\ge l+1\right)|\le C\cdot {\alpha }^{l+1}\right)=1,$ (14)

respectively. A justification may be offered.

4. Maximum Likelihood Estimation

First, the parameter space, say $\Theta$, must be considered; the fixed $k\in ℕ$, categories ${v}_{1},\cdots ,{v}_{k}$ and order $p,q\in {ℕ}_{0}$, $p+q\in ℕ$ are attached to it:

The parameter space $\Theta$ is a set that includes all candidate parameter vectors $\pi \in \Theta$ that model a process of interest $\left\{{X}_{t},\text{\hspace{0.17em}}t\in ℤ\right\}$ using a TARMA (k, p, q) equation with a predetermined form of interdependence, such that conditions (C3), (C4) and (C5) are satisfied (plus any extra requirements added in this section).

Now, suppose that $\left\{{X}_{1},\cdots ,{X}_{T}\right\}$ have been made available from (1), and the target is to estimate the true parameters of the model, say ${\pi }_{0}\in \Theta$, based on these observations. The likelihood expressed as a function of $\pi$, takes the form

$L\left(\pi \right)=\underset{t=1}{\overset{T}{\prod }}\left({\stackrel{˜}{p}}_{t,t-1}\left(\pi \right)\right),$

where

${\stackrel{˜}{p}}_{t,0}\left(\pi \right):=ℙ\left({X}_{t}=x;\pi \right),\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{ }\text{if}\text{ }\text{\hspace{0.17em}}{X}_{t}=x,\text{\hspace{0.17em}}\text{ }\text{where}\text{ }\text{\hspace{0.17em}}x=0,{v}_{1},\cdots ,{v}_{k},$

and for the general $i\in ℕ$, it is written

${\stackrel{˜}{p}}_{t,i}\left(\pi \right):=ℙ\left({X}_{t}=x|{X}_{t-1},\cdots ,{X}_{t-i};\text{\hspace{0.17em}}\pi \right),\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{ }\text{if}\text{ }\text{\hspace{0.17em}}{X}_{t}=x,\text{\hspace{0.17em}}\text{ }\text{where}\text{ }\text{\hspace{0.17em}}x=0,{v}_{1},\cdots ,{v}_{k};$

${\stackrel{˜}{p}}_{t,i}$ are functions of ${X}_{t-1},\cdots ,{X}_{t-i}$ as well as ${X}_{t}$ and it is considered that these (conditional) probabilities result from $\left\{{X}_{t},\text{\hspace{0.17em}}t\in ℤ\right\}$ as if those have been generated by $\pi \in \Theta$. Of course, the natural logarithm of the likelihood may be taken

$l\left(\pi \right)=\underset{t=1}{\overset{T}{\sum }}\mathrm{ln}\left({\stackrel{˜}{p}}_{t,t-1}\left( \pi \right)\right)$

as the maximum likelihood estimators $\stackrel{^}{\pi }$ then satisfy the equations

${\frac{\partial }{\partial \pi }l\left(\pi \right)|}_{\pi =\stackrel{^}{\pi }}=0\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{ }\text{or}\text{ }\text{\hspace{0.17em}}\text{\hspace{0.17em}}\underset{t=1}{\overset{T}{\sum }}\frac{1}{{\stackrel{˜}{p}}_{t,t-1}\left(\stackrel{^}{\pi }\right)}{\frac{\partial }{\partial \pi }{\stackrel{˜}{p}}_{t,t-1}\left(\pi \right)|}_{\pi =\stackrel{^}{\pi }}=0.$ (15)

4.1. Weak Convergence of Estimators

For $\pi \in \Theta$ and any $t\in ℤ$, it is written

${p}_{t}\left(\pi \right):=ℙ\left({X}_{t}=x\text{\hspace{0.17em}}|\text{\hspace{0.17em}}{X}_{t-i},\text{\hspace{0.17em}}i\in ℕ;\pi \right),\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{ }\text{if}\text{ }\text{\hspace{0.17em}}{X}_{t}=x,\text{\hspace{0.17em}}\text{ }\text{where}\text{ }\text{\hspace{0.17em}}x=0,{v}_{1},\cdots ,{v}_{k},$

which is a function of ${X}_{t-i}$, $i\in {ℕ}_{0}$ and the (conditional) distribution law is generated by $\pi \in \Theta$.

According to (12) (thanks to invertibility) and for any $\pi \in \Theta$ and ${x}_{1},\cdots ,{x}_{T}=0,{v}_{1},\cdots ,{v}_{k}$, it holds as $T\to \infty$ that

$\begin{array}{l}\left(\frac{ℙ\left({X}_{1}={x}_{1};\pi \right)}{ℙ\left({X}_{1}={x}_{1}|{X}_{1-n},\text{\hspace{0.17em}}n\in ℕ;\pi \right)}\frac{ℙ\left({X}_{2}={x}_{2}|{X}_{1}={x}_{1};\pi \right)}{ℙ\left({X}_{2}={x}_{2}|{X}_{1}={x}_{1},\text{\hspace{0.17em}}{X}_{1-n},\text{\hspace{0.17em}}n\in ℕ;\pi \right)}\cdots \\ {\frac{ℙ\left({X}_{T}={x}_{T}|{X}_{T-1}={x}_{T-1},\dots ,{X}_{1}={x}_{1};\pi \right)}{ℙ\left({X}_{T}={x}_{T}|{X}_{T-1}={x}_{T-1},\cdots ,{X}_{1}={x}_{1},{X}_{1-n},n\in ℕ;\pi \right)}\right)}^{1/T}\stackrel{P}{\to }1,\end{array}$ (16)

because the geometric mean weighs with (rather than just multiplies) the previous values: as the new values get closer to 1, so must the mean itself. The geometric mean from (T + 1) observations is a weighted product mean of the geometric mean of the T observations and the (T + 1)th observation (with a ${C}_{A}$ adjusted constant for the ratio of probabilities to one, by allowing the exponential rate for the geometric mean of ratios, this becomes apparent with the bound ${\left\{{C}_{A}^{T}\cdot {\alpha }^{\frac{\left(T+1\right)T}{2}}\right\}}^{1/T}\to 0$ as $T\to \infty$ ).

Once the fixated sample series ${x}_{1},\cdots ,{x}_{T}$ has been collected (this has been generated by ${\pi }_{0}$ ), remember that the maximum likelihood estimate $\stackrel{^}{\pi }$ is set that way, such that it is true for any $\pi \in \Theta$, that

$\begin{array}{l}\underset{t=1}{\overset{T}{\prod }}\text{ }\text{ }ℙ\left({X}_{t}={x}_{t}|{X}_{t-1}={x}_{t-1},\cdots ,{X}_{1}={x}_{1};\stackrel{^}{\pi }\right)\\ \ge \underset{t=1}{\overset{T}{\prod }}\text{ }\text{ }ℙ\left({X}_{t}={x}_{t}|{X}_{t-1}={x}_{t-1},\cdots ,{X}_{1}={x}_{1};\pi \right)\end{array}$

where the (conditional) probabilities (the distribution law of ${X}_{1},\cdots ,{X}_{T}$ ) are calculated as if $\pi$ are the real parameters that generated this realization. Thanks to (16) (and invertibility), this becomes

${\left(\underset{t=1}{\overset{T}{\prod }}\frac{ℙ\left({X}_{t}={x}_{t}|{X}_{t-1}={x}_{t-1},\cdots ,{X}_{1}={x}_{1},\text{\hspace{0.17em}}{X}_{1-n},\text{\hspace{0.17em}}n\in ℕ;\stackrel{^}{\pi }\right)}{ℙ\left({X}_{t}={x}_{t}|{X}_{t-1}={x}_{t-1},\cdots ,{X}_{1}={x}_{1},\text{\hspace{0.17em}}{X}_{1-n},\text{\hspace{0.17em}}n\in ℕ;\pi \right)}\right)}^{1/T}\ge 1$

with probability that tends to one as $T\to \infty$. Thinking about the random variables (rather than the realizations) now, it can be written

$\begin{array}{l}{\left(\underset{t=1}{\overset{T}{\prod }}\frac{{p}_{t}\left(\stackrel{^}{\pi }\right)}{{p}_{t}\left(\pi \right)}\right)}^{1/T}\\ =\underset{{x}_{1},\cdots ,{x}_{T}=0,{v}_{1},\cdots ,{v}_{k}}{\prod }{\left\{{\left(\underset{t=1}{\overset{T}{\prod }}\frac{ℙ\left({X}_{t}={x}_{t}|{X}_{t-1}={x}_{t-1},\cdots ,{X}_{1}={x}_{1},\text{\hspace{0.17em}}{X}_{1-n},\text{\hspace{0.17em}}n\in ℕ;\stackrel{^}{p}\right)}{ℙ\left({X}_{t}={x}_{t}|{X}_{t-1}={x}_{t-1},\cdots ,{X}_{1}={x}_{1},\text{\hspace{0.17em}}{X}_{1-n},\text{\hspace{0.17em}}n\in ℕ;p\right)}\right)}^{1/T}\right\}}^{{f}_{{x}_{1}}\left({X}_{1}\right)\cdots {f}_{{x}_{T}}\left({X}_{T}\right)},\end{array}$

where ${X}_{1},\cdots ,{X}_{T}$ are the same random variables for both numerator/denominator with realizations as in ${f}_{{x}_{1}}\left({X}_{1}\right)\cdots {f}_{{x}_{T}}\left({X}_{T}\right)$ (those have been generated by the real ${\pi }_{0}$ ), and it is verified that

$\underset{T\to \infty }{\mathrm{lim}}ℙ\left({\left(\underset{t=1}{\overset{T}{\prod }}\frac{{p}_{t}\left(\stackrel{^}{\pi }\right)}{{p}_{t}\left(\pi \right)}\right)}^{1/T}\ge 1\right)=1\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{ }\text{for}\text{ }\text{\hspace{0.17em}}\text{ }\text{any}\text{ }\text{\hspace{0.17em}}\pi \in \Theta .$ (17)

In the other hand, it is true for any $\pi \in \Theta$, that

$\mathbb{E}\left(\underset{t=1}{\overset{T}{\prod }}\frac{{p}_{t}\left(\pi \right)}{{p}_{t}\left({\pi }_{0}\right)}\right)=\mathbb{E}\left(\mathbb{E}\left(\underset{t=1}{\overset{T}{\prod }}\frac{{p}_{t}\left(\pi \right)}{{p}_{t}\left({\pi }_{0}\right)}|{X}_{1-n},\text{\hspace{0.17em}}n\in ℕ;{\pi }_{0}\right)\right)$

and it is essential that it is the “real” ${\pi }_{0}$ that generates the random variables and governs the (conditional) expectation, i.e.,

$\begin{array}{l}\mathbb{E}\left(\underset{t=1}{\overset{T}{\prod }}\frac{{p}_{t}\left(\pi \right)}{{p}_{t}\left({\pi }_{0}\right)}|{X}_{1-n},\text{\hspace{0.17em}}n\in ℕ;{\pi }_{0}\right)\\ =\underset{{x}_{1},\cdots ,{x}_{T}=0,{v}_{1},\cdots ,{v}_{k}}{\sum }\text{ }\text{ }ℙ\left({X}_{1}={x}_{1},\cdots ,{X}_{T}={x}_{T}|{X}_{1-n},\text{\hspace{0.17em}}n\in ℕ;{\pi }_{0}\right)\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\cdot \underset{t=1}{\overset{T}{\prod }}\frac{ℙ\left({X}_{t}={x}_{t}|{X}_{t-1}={x}_{t-1},\cdots ,{X}_{1}={x}_{1},\text{\hspace{0.17em}}{X}_{1-n},\text{\hspace{0.17em}}n\in ℕ;\pi \right)}{ℙ\left({X}_{t}={x}_{t}|{X}_{t-1}={x}_{t-1},\cdots ,{X}_{1}={x}_{1},\text{\hspace{0.17em}}{X}_{1-n},\text{\hspace{0.17em}}n\in ℕ;{\pi }_{0}\right)}\\ \equiv \underset{{x}_{1},\cdots ,{x}_{T}=0,{v}_{1},\cdots ,{v}_{k}}{\sum }\text{ }\text{ }ℙ\left({X}_{1}={x}_{1},\cdots ,{X}_{T}={x}_{T}|{X}_{1-n},\text{\hspace{0.17em}}n\in ℕ;\pi \right)=1:\end{array}$

the last statement is true since these are probabilities adding to one (regardless of the $\pi \in \Theta$ ). Then it holds that

$\mathbb{E}\left(\underset{t=1}{\overset{T}{\prod }}\frac{{p}_{t}\left(\pi \right)}{{p}_{t}\left({\pi }_{0}\right)}\right)=1\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{ }\text{for}\text{ }\text{\hspace{0.17em}}\text{ }\text{any}\text{ }\text{\hspace{0.17em}}\pi \in \Theta .$

As in Jensen’s inequality (use the function $g\left(y\right)={y}^{1/T}$, $y>0$ with ${g}^{″}\left(y\right)=\left(1/T\right)\left(\left(1-T\right)/T\right){y}^{\left(1/T\right)-2}<0$ for $T>1$ ), this last statement can be transformed to

$\mathbb{E}\left\{{\left(\underset{t=1}{\overset{T}{\prod }}\frac{{p}_{t}\left(\pi \right)}{{p}_{t}\left({\pi }_{0}\right)}\right)}^{1/T}\right\}\le 1\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{ }\text{for}\text{ }\text{\hspace{0.17em}}\text{ }\text{any}\text{ }\text{\hspace{0.17em}}\pi \in \Theta ,$ (18)

with the equality holding if and only if $\pi ={\pi }_{0}$ (it is considered here that ${p}_{t}\left({\pi }_{1}\right)={p}_{t}\left({\pi }_{2}\right)$, ${\pi }_{1},{\pi }_{2}\in \Theta$ can only result from ${\pi }_{1}={\pi }_{2}$ ).

In order to conclude, one can combine (17) and write

$\underset{T\to \infty }{\mathrm{lim}}ℙ\left({\left(\underset{t=1}{\overset{T}{\prod }}\frac{{p}_{t}\left(\stackrel{^}{\pi }\right)}{{p}_{t}\left({\pi }_{0}\right)}\right)}^{1/T}\ge 1\right)=1$

with (18) that writes

$\mathbb{E}\left\{{\left(\underset{t=1}{\overset{T}{\prod }}\frac{{p}_{t}\left(\stackrel{^}{\pi }\right)}{{p}_{t}\left({\pi }_{0}\right)}\right)}^{1/T}\right\}\le 1$

and they are both satisfied when

${\left(\underset{t=1}{\overset{T}{\prod }}\text{ }\text{ }{p}_{t}\left(\stackrel{^}{\pi }\right)\right)}^{1/T}-{\left(\underset{t=1}{\overset{T}{\prod }}\text{ }\text{ }{p}_{t}\left({\pi }_{0}\right)\right)}^{1/T}\stackrel{P}{\to }0\text{\hspace{0.17em}}\text{ }\text{ }\text{as}\text{ }\text{ }\text{\hspace{0.17em}}T\to \infty$

or, in other words, when

$\stackrel{^}{\pi }\stackrel{P}{\to }{\pi }_{0}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{ }\text{as}\text{ }\text{\hspace{0.17em}}T\to \infty .$ (19)

4.2. Asymptotic Distribution

Thanks to a Taylor’s expansion, (15) can be turned to

$\begin{array}{l}\underset{t=1}{\overset{T}{\sum }}\frac{1}{{\stackrel{˜}{p}}_{t,t-1}\left({\pi }_{0}\right)}{\frac{\partial }{\partial {\pi }_{i}}{\stackrel{˜}{p}}_{t,t-1}\left(\pi \right)|}_{\pi ={\pi }_{0}}+\underset{t=1}{\overset{T}{\sum }}\text{\hspace{0.17em}}\underset{j}{\sum }\left\{\frac{1}{{\stackrel{˜}{p}}_{t,t-1}\left({\pi }_{0}\right)}{\frac{{\partial }^{2}}{\partial {\pi }_{j}\partial {\pi }_{i}}{\stackrel{˜}{p}}_{t,t-1}\left(\pi \right)|}_{\pi ={\pi }_{0}}\\ -\frac{1}{{\stackrel{˜}{p}}_{t,t-1}^{2}\left({\pi }_{0}\right)}\left({\frac{\partial }{\partial {\pi }_{j}}{\stackrel{˜}{p}}_{t,t-1}\left(\pi \right)|}_{\pi ={\pi }_{0}}\right)\left({\frac{\partial }{\partial {\pi }_{i}}{\stackrel{˜}{p}}_{t,t-1}\left(\pi \right)|}_{\pi ={\pi }_{0}}\right)\right\}\left({\stackrel{^}{\pi }}_{j}-{\pi }_{j,0}\right)\\ +\underset{t=1}{\overset{T}{\sum }}\text{ }\text{ }{E}_{t,i}\left(\stackrel{^}{\pi }\right)=0,\end{array}$

where i (and j) are indexes that refer to all different scalar $\pi$ in $\pi \in \Theta$ and ${E}_{t,i}\left(\stackrel{^}{\pi }\right)$ are scalars; to understand their role better, consider the expansion for fixed t and define the function

$\begin{array}{c}{e}_{t,i}\left(\pi \right):=\frac{1}{{\stackrel{˜}{p}}_{t,t-1}\left(\pi \right)}\frac{\partial }{\partial {\pi }_{i}}{\stackrel{˜}{p}}_{t,t-1}\left(\pi \right)\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}-\underset{x=0,1}{\sum }\frac{{\partial }^{x}}{\partial {\pi }^{x}}{\left(\frac{1}{{\stackrel{˜}{p}}_{t,t-1}\left(\pi \right)}\frac{\partial }{\partial {\pi }_{i}}{\stackrel{˜}{p}}_{t,t-1}\left(\pi \right)\right)|}_{\pi ={\pi }_{0}}\frac{{\left(\pi -{\pi }_{0}\right)}^{x}}{x!}\end{array}$

with “ $x=0$ ” applying no derivative at all to the function ( ${\left(\pi -{\pi }_{0}\right)}^{0}\equiv 1$ ) and “ $x=1$ ” being the usual row vector of first derivatives (times a column vector): then it is clear that $e\left({\pi }_{0}\right)=0$. Due to (19) and the continuity of the function e at ${\pi }_{0}$, it can be concluded that $e\left(\stackrel{^}{\pi }\right)\stackrel{P}{\to }0$ as $T\to \infty$ (without worrying about the $\left(t,t-1\right)$ label, i.e., the convergence to zero takes place anyway for $T\to \infty$ ). It will be taken for granted then that

$\frac{{\sum }_{t=1}^{T}{E}_{t,i}\left(\stackrel{^}{\pi }\right)}{...}\stackrel{P}{\to }0\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{ }\text{as}\text{ }\text{\hspace{0.17em}}T\to \infty ,$ (20)

where “…” is the convenient divisor T or even T1/2. Note that (20) has been justified without the presumption of existence of a higher than second derivative of the conditional probabilities under study.

After omitting the extra terms, all the equations are stacked to come up with

$\underset{t=1}{\overset{T}{\sum }}\left(\frac{1}{{\stackrel{˜}{p}}_{t,t-1}^{2}\left({\pi }_{0}\right)}{\stackrel{˜}{\delta }}_{t}{\stackrel{˜}{\delta }}_{t}^{\tau }-\frac{1}{{\stackrel{˜}{p}}_{t,t-1}\left({\pi }_{0}\right)}{\stackrel{˜}{D}}_{t}\right)\left(\stackrel{^}{\pi }-{\pi }_{0}\right)=\underset{t=1}{\overset{T}{\sum }}\frac{1}{{\stackrel{˜}{p}}_{t,t-1}\left({\pi }_{0}\right)}{\stackrel{˜}{\delta }}_{t},$

where ${\stackrel{˜}{\delta }}_{t}$ is the column vector of ${\frac{\partial }{\partial {\pi }_{i}}{\stackrel{˜}{p}}_{t,t-1}\left(\pi \right)|}_{\pi ={\pi }_{0}}$, $i=1,2,\cdots$ and ${\stackrel{˜}{D}}_{t}$ is the matrix with elements ${\frac{{\partial }^{2}}{\partial {\pi }_{i}\partial {\pi }_{j}}{\stackrel{˜}{p}}_{t,t-1}\left(\pi \right)|}_{\pi ={\pi }_{0}}$, $i,j=1,2,\cdots$ and “ $\tau$ ” stands for the transpose operator. It is re-written that

$\begin{array}{l}{T}^{1/2}\left(\stackrel{^}{\pi }-{\pi }_{0}\right)\\ ={\left\{\frac{1}{T}\underset{t=1}{\overset{T}{\sum }}\left(\frac{1}{{\stackrel{˜}{p}}_{t,t-1}^{2}\left({\pi }_{0}\right)}{\stackrel{˜}{\delta }}_{t}{\stackrel{˜}{\delta }}_{t}^{\tau }-\frac{1}{{\stackrel{˜}{p}}_{t,t-1}\left({\pi }_{0}\right)}{\stackrel{˜}{D}}_{t}\right)\right\}}^{-1}{T}^{-1/2}\underset{t=1}{\overset{T}{\sum }}\frac{1}{{\stackrel{˜}{p}}_{t,t-1}\left({\pi }_{0}\right)}{\stackrel{˜}{\delta }}_{t}.\end{array}$ (21)

Next, consider ${\delta }_{t}$ to be the column vector of ${\frac{\partial }{\partial {\pi }_{i}}{p}_{t}\left(\pi \right)|}_{\pi ={\pi }_{0}}$, $i=1,2,\cdots$.

Lemma 4.1: It holds that

${T}^{-1/2}\underset{t=1}{\overset{T}{\sum }}\left(\frac{1}{{p}_{t}\left({\pi }_{0}\right)}{\delta }_{t}-\frac{1}{{\stackrel{˜}{p}}_{t,t-1}\left({\pi }_{0}\right)}{\stackrel{˜}{\delta }}_{t}\right)\stackrel{P}{\to }0\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{ }\text{as}\text{ }\text{\hspace{0.17em}}T\to \infty .$

Proof: This may be presented.

Once Lemma 4.1 has been established, Equation (22) should be replaced by

${T}^{1/2}\left(\stackrel{^}{\pi }-{\pi }_{0}\right)={\left\{\frac{1}{T}\underset{t=1}{\overset{T}{\sum }}\left(\frac{1}{{\stackrel{˜}{p}}_{t,t-1}^{2}\left({\pi }_{0}\right)}{\stackrel{˜}{\delta }}_{t}{\stackrel{˜}{\delta }}_{t}^{\tau }-\frac{1}{{\stackrel{˜}{p}}_{t,t-1}\left({\pi }_{0}\right)}{\stackrel{˜}{D}}_{t}\right)\right\}}^{-1}{T}^{-1/2}\underset{t=1}{\overset{T}{\sum }}\frac{1}{{p}_{t}\left({\pi }_{0}\right)}{\delta }_{t}.$ (22)

To proceed further, consider ${D}_{t}$ to be the matrix with elements ${\frac{{\partial }^{2}}{\partial {\pi }_{i}\partial {\pi }_{j}}{p}_{t}\left(\pi \right)|}_{\pi ={\pi }_{0}}$, $i,j=1,\cdots$.

Lemma 4.2: It holds that

$\frac{1}{T}\underset{t=1}{\overset{T}{\sum }}\left(\left(\frac{1}{{\stackrel{˜}{p}}_{t,t-1}\left({\pi }_{0}\right)}{\stackrel{˜}{D}}_{t}-\frac{1}{{\stackrel{˜}{p}}_{t,t-1}^{2}\left({\pi }_{0}\right)}{\stackrel{˜}{\delta }}_{t}{\stackrel{˜}{\delta }}_{t}^{\tau }\right)-\left(\frac{1}{{p}_{t}\left({\pi }_{0}\right)}{D}_{t}-\frac{1}{{p}_{t}^{2}\left({\pi }_{0}\right)}{\delta }_{t}{\delta }_{t}^{\tau }\right)\right)\stackrel{P}{\to }O$

as $T\to \infty$, where $O$ (use bold for emphasis) is the square matrix of zeros.

Proof: This may be presented.

Once Lemma 4.2 has been established, Equation (22) should be replaced by

${T}^{1/2}\left(\stackrel{^}{\pi }-{\pi }_{0}\right)={\left\{\frac{1}{T}\underset{t=1}{\overset{T}{\sum }}\left(\frac{1}{{p}_{t}^{2}\left({\pi }_{0}\right)}{\delta }_{t}{\delta }_{t}^{\tau }-\frac{1}{{p}_{t}\left({\pi }_{0}\right)}{D}_{t}\right)\right\}}^{-1}{T}^{-1/2}\underset{t=1}{\overset{T}{\sum }}\frac{1}{{p}_{t}\left({\pi }_{0}\right)}{\delta }_{t}.$ (23)

Theorem 4.3: It holds that

${T}^{-1/2}\underset{t=1}{\overset{T}{\sum }}\frac{1}{{p}_{t}\left({\pi }_{0}\right)}{\delta }_{t}\stackrel{D}{\to }N\left(0,\text{Var}\left(\frac{1}{{p}_{t}\left({\pi }_{0}\right)}{\delta }_{t}\right)\right)\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{ }\text{as}\text{ }\text{\hspace{0.17em}}T\to \infty .$

Proof: This may be presented.

Lemma 4.4: It holds that

$\frac{1}{T}\underset{t=1}{\overset{T}{\sum }}\frac{1}{{p}_{t}^{2}\left({\pi }_{0}\right)}{\delta }_{t}{\delta }_{t}^{\tau }\stackrel{P}{\to }\text{Var}\left(\frac{1}{{p}_{t}\left({\pi }_{0}\right)}{\delta }_{t}\right)\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{ }\text{as}\text{ }\text{\hspace{0.17em}}T\to \infty .$

Proof: This may be presented.

Lemma 4.5: It holds that

$\frac{1}{T}\underset{t=1}{\overset{T}{\sum }}\frac{1}{{p}_{t}\left({\pi }_{0}\right)}{D}_{t}\stackrel{P}{\to }O\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{ }\text{as}\text{ }\text{\hspace{0.17em}}T\to \infty .$

Proof: This may be presented.

5. Empirical Illustrations

This section serves in practice, when for special cases of TARMA models the maximum likelihood estimation takes place, so that (i) it is refreshed how to compute joint probabilities to insert in the likelihood (and estimate) and, more importantly, (ii) it is examined how well the estimators perform for moderate sample sizes: for both (i), (ii) particular interest lies in $q\ge 1$, when “naïve” moments estimates cannot be computed directly from the data, as opposed to the more traditional TAR cases. At this point, it is reminded that the TAR is a special case of the TARMA model with AR parameters only and explicit results that have been obtained for the Markov chains. The TARMA with the inclusion of MA parts is a parsimonious way to succeed the infinite order of a TAR model and where the paper interest lies.

Hence the groundwork equation

$\begin{array}{c}{X}_{t}={I}_{t}\left(1-{X}_{t-1}\right)\left(1-{I}_{t-1}\right)+{I}_{t}^{\left(1|0\right)}{X}_{t-1}\left(1-{I}_{t-1}\right)\\ \text{\hspace{0.17em}}+{I}_{t}^{\left(0|1\right)}\left(1-{X}_{t-1}\right){I}_{t-1}+{I}_{t}^{\left(1|1\right)}{X}_{t-1}{I}_{t-1}\end{array}$ (24)

will, in general, govern a series $\left\{{X}_{t}\right\}~\text{TARMA}\left(1,1,1\right)$ (with realizations in $\left\{0,1\right\}$ ). By writing ${\psi }_{\left(i|j\right)}:=ℙ\left({X}_{t}=i,{I}_{t}=j\right)$, $i,j=0,1$, then from the parameters $\pi =ℙ\left({I}_{t}=1\right)$ and ${\pi }_{|v}^{\ast \left(i|j\right)}=ℙ\left({I}_{t}^{\left(i|j\right)}=1|{I}_{t}=v\right)$, $\left(i,j\right)\ne \left(0,0\right)$, $v=0,1$ ( ${\pi }^{\left(i|j\right)}=ℙ\left({I}_{t}^{\left(i|j\right)}=1\right)\equiv \pi {\pi }_{|1}^{\ast \left(i|j\right)}+\left(1-\pi \right){\pi }_{|0}^{\ast \left(i|j\right)}$ ), [1] has contributed a methodology that eventually (under causality) computes the $\psi$ as in

$\begin{array}{l}{\psi }_{\left(1|1\right)}=\left\{\pi \left[\left(1-\left(1-\pi \right){\pi }_{|0}^{\ast \left(1|0\right)}\right)\left(1-\pi \left(1-{\pi }_{|1}^{\ast \left(0|1\right)}\right)\right)-\left(1-\pi \right){\pi }_{|0}^{\ast \left(0|1\right)}\pi \left(1-{\pi }_{|1}^{\ast \left(1|0\right)}\right)\right]\right\}\\ \text{ }/\left\{\left(1+\pi -\pi {\pi }_{|1}^{\ast \left(1|1\right)}\right)\left[\left(1-\left(1-\pi \right){\pi }_{|0}^{\ast \left(1|0\right)}\right)\left(1-\pi \left(1-{\pi }_{|1}^{\ast \left(0|1\right)}\right)\right)-\left(1-\pi \right){\pi }_{|0}^{\ast \left(0|1\right)}\pi \left(1-{\pi }_{|1}^{\ast \left(1|0\right)}\right)\right]\\ +\left(\pi -\pi {\pi }_{|1}^{\ast \left(1|0\right)}\right)\left[\left(1-\pi \left(1-{\pi }_{|1}^{\ast \left(0|1\right)}\right)\right)\left(1-\pi \right){\pi }_{|0}^{\ast \left(1|1\right)}+\left(1-\pi \right){\pi }_{|0}^{\ast \left(0|1\right)}\pi \left(1-{\pi }_{|1}^{\ast \left(1|1\right)}\right)\right]\\ +\left(\pi -\pi {\pi }_{|1}^{\ast \left(0|1\right)}\right)\left[\pi \left(1-{\pi }_{|1}^{\ast \left(1|0\right)}\right)\left(1-\pi \right){\pi }_{|0}^{\ast \left(1|1\right)}+\left(1-\left(1-\pi \right){\pi }_{|0}^{\ast \left(1|0\right)}\right)\pi \left(1-{\pi }_{|1}^{\ast \left(1|1\right)}\right)\right]\right\},\end{array}$ (25)

followed by

${\psi }_{\left(1|0\right)}=\frac{\left(1-\pi \left(1-{\pi }_{|1}^{\ast \left(0|1\right)}\right)\right)\left(1-\pi \right){\pi }_{|0}^{\ast \left(1|1\right)}+\left(1-\pi \right){\pi }_{|0}^{\ast \left(0|1\right)}\pi \left(1-{\pi }_{|1}^{\ast \left(1|1\right)}\right)}{\left(1-\left(1-\pi \right){\pi }_{|0}^{\ast \left(1|0\right)}\right)\left(1-\pi \left(1-{\pi }_{|1}^{\ast \left(0|1\right)}\right)\right)-\left(1-\pi \right){\pi }_{|0}^{\ast \left(0|1\right)}\pi \left(1-{\pi }_{|1}^{\ast \left(1|0\right)}\right)}{\psi }_{\left(1|1\right)}$ (26)

and

${\psi }_{\left(0|1\right)}=\frac{\pi \left(1-{\pi }_{|1}^{\ast \left(1|0\right)}\right)\left(1-\pi \right){\pi }_{|0}^{\ast \left(1|1\right)}+\left(1-\left(1-\pi \right){\pi }_{|0}^{\ast \left(1|0\right)}\right)\pi \left(1-{\pi }_{|1}^{\ast \left(1|1\right)}\right)}{\left(1-\left(1-\pi \right){\pi }_{|0}^{\ast \left(1|0\right)}\right)\left(1-\pi \left(1-{\pi }_{|1}^{\ast \left(0|1\right)}\right)\right)-\left(1-\pi \right){\pi }_{|0}^{\ast \left(0|1\right)}\pi \left(1-{\pi }_{|1}^{\ast \left(1|0\right)}\right)}{\psi }_{\left(1|1\right)}$

(and ${\psi }_{\left(0|0\right)}=1-{\psi }_{\left(1|0\right)}-{\psi }_{\left(0|1\right)}-{\psi }_{\left(1|1\right)}$ ). (27)

Additionally, after writing for $h\in ℕ$ that ${\psi }_{\left(\left({i}_{1},\cdots ,{i}_{h+1}\right)|j\right)}\left(h\right):=ℙ\left({X}_{t}={i}_{1},\cdots ,{X}_{t-h}={i}_{h+1},{I}_{t}=j\right)$ (and ${\psi }_{\left(i|j\right)}\left(0\right)\equiv {\psi }_{\left(i|j\right)}$ ), the methodology also contributes the recursive formulae

${\psi }_{\left(\left(1,0,{i}_{2},\cdots ,{i}_{h+1}\right)|1\right)}\left(h+1\right)=\pi {\psi }_{\left(\left(0,{i}_{2},\cdots ,{i}_{h+1}\right)|0\right)}\left(h\right)+\pi {\pi }_{|1}^{\ast \left(0|1\right)}{\psi }_{\left(\left(0,{i}_{2},\cdots ,{i}_{h+1}\right)|1\right)}\left(h\right),$ (28)

${\psi }_{\left(\left(1,1,{i}_{2},\cdots ,{i}_{h+1}\right)|1\right)}\left(h+1\right)=\underset{j=0,1}{\sum }\text{ }\text{ }\pi {\pi }_{|1}^{\ast \left(1|j\right)}{\psi }_{\left(\left(1,{i}_{2},\cdots ,{i}_{h+1}\right)|j\right)}\left(h\right),$

${\psi }_{\left(\left(1,0,{i}_{2},\cdots ,{i}_{h+1}\right)|0\right)}\left(h+1\right)=\left(1-\pi \right){\pi }_{|0}^{\ast \left(0|1\right)}{\psi }_{\left(\left(0,{i}_{2},\cdots ,{i}_{h+1}\right)|1\right)}\left(h\right),$

${\psi }_{\left(\left(1,1,{i}_{2},\cdots ,{i}_{h+1}\right)|0\right)}\left(h+1\right)=\underset{j=0,1}{\sum }\left(1-\pi \right){\pi }_{|0}^{\ast \left(1|j\right)}{\psi }_{\left(\left(1,{i}_{2},\cdots ,{i}_{h+1}\right)|j\right)}\left(h\right),$

and the recursive formulae

${\psi }_{\left(\left(0,0,{i}_{2},\cdots ,{i}_{h+1}\right)|1\right)}\left(h+1\right)=\pi \left(1-{\pi }_{|1}^{\ast \left(0|1\right)}\right){\psi }_{\left(\left(0,{i}_{2},\cdots ,{i}_{h+1}\right)|1\right)}\left(h\right),$

${\psi }_{\left(\left(0,1,{i}_{2},\cdots ,{i}_{h+1}\right)|1\right)}\left(h+1\right)=\underset{j=0,1}{\sum }\text{ }\text{ }\pi \left(1-{\pi }_{|1}^{\ast \left(1|j\right)}\right){\psi }_{\left(\left(1,{i}_{2},\cdots ,{i}_{h+1}\right)|j\right)}\left(h\right),$

$\begin{array}{l}{\psi }_{\left(\left(0,0,{i}_{2},\cdots ,{i}_{h+1}\right)|0\right)}\left(h+1\right)\\ =\left(1-\pi \right){\psi }_{\left(\left(0,{i}_{2},\dots ,{i}_{h+1}\right)|0\right)}\left(h\right)+\left(1-\pi \right)\left(1-{\pi }_{|0}^{\ast \left(0|1\right)}\right){\psi }_{\left(\left(0,{i}_{2},\cdots ,{i}_{h+1}\right)|1\right)}\left(h\right),\end{array}$

${\psi }_{\left(\left(0,1,{i}_{2},\cdots ,{i}_{h+1}\right)|0\right)}\left(h+1\right)=\underset{j=0,1}{\sum }\left(1-\pi \right)\left(1-{\pi }_{|0}^{\ast \left(1|j\right)}\right){\psi }_{\left(\left(1,{i}_{2},\cdots ,{i}_{h+1}\right)|j\right)}\left(h\right),$ (29)

from which the probabilities

$ℙ\left({X}_{t}={i}_{0},{X}_{t-1}={i}_{1},\cdots ,{X}_{t-\left(h+1\right)}={i}_{h+1}\right)=\underset{j=0,1}{\sum }\text{ }\text{ }{\psi }_{\left(\left({i}_{0},{i}_{1},\dots ,{i}_{h+1}\right)|j\right)}\left(h+1\right)$

can be computed for ${i}_{0},{i}_{1},\cdots ,{i}_{h+1}=0,1$.

To generate ${X}_{t}$, $t=1,\cdots ,T$, it has been used that ${X}_{t}={Y}_{t}+{\sum }_{i\in ℕ}\left({\prod }_{j=0}^{i-1}{\left(DY\right)}_{t-j}\right){Y}_{t-i}$, ${Y}_{t}:={I}_{t}^{\left(0|1\right)}{I}_{t-1}+{I}_{t}\left(1-{I}_{t-1}\right)$, ${\left(DY\right)}_{t}:=\left({I}_{t}^{\left(1|1\right)}-{I}_{t}^{\left(0|1\right)}\right){I}_{t-1}+\left({I}_{t}^{\left(1|0\right)}-{I}_{t}\right)\left(1-{I}_{t-1}\right)$, with the sum being held up to a finite constant $c=20$ for $t=1$, and so on. The two presented conditions for causality and invertibility write together that

$\pi +\mathbb{E}|d{f}_{1}\left({I}_{t}^{\left(0|1\right)}\right)|,\mathbb{E}|d{f}_{1}\left({I}_{t}^{\left(1|0\right)}\right)|+\mathbb{E}|{d}^{2}{f}_{1}\left({I}_{t}^{\left(1|1\right)}\right)|,$

$\underset{i,j=0,1}{\mathrm{max}}\left\{{\pi }^{\left(i|j\right)}\right\}+\mathbb{E}|d{f}_{1}\left({I}_{t}^{\left(1|0\right)}\right)|,\mathbb{E}|d{f}_{1}\left({I}_{t}^{\left(0|1\right)}\right)|+\mathbb{E}|{d}^{2}{f}_{1}\left({I}_{t}^{\left(1|1\right)}\right)|<1$

(the ${d}^{h}{f}_{1}\left(..\right)$, $h=1,2$ notation was re-introduced in the very beginning of Section 3). Hence by imposing the strong requirement ${\mathrm{max}}_{i,j=0,1}\left\{{\pi }^{\left(i|j\right)}\right\}<1/6$, the series is secured to be where it should be (without worrying about the interdependence scenario).

Firstly the case $\pi =0.165$, ${\pi }^{\left(1|0\right)}=0.152$, ${\pi }^{\left(0|1\right)}=0.125$ and ${\pi }^{\left(1|1\right)}=0.108$ (and under interindependence) has been studied, for a number of observations $T=6,10,14$ (the maximum sample size 14 has been picked due to a purely technical reason of restoring in one vector ${2}^{T+1}$ probabilities for the last recursion, or otherwise a modification must take place in the code). A search of the four parameters $0<{\pi }^{\left(...\right)}<0.1667$, ${\pi }^{\left(...\right)}={\pi }^{\left(...\right)}+0.01$ has materialized and for each attempted value the probabilities (25), (26), (27) and recursions (28)...(29) must be computed: the likelihood is none other than one of these final probabilities indicated by the position of zero-one(s) in the sample series (this is the same position regardless of the parameter values under search). Then the attempted parameter values that offered the biggest of those probabilities are taken to be the maximum likelihood estimates.

Meanwhile, an opportunity of a comparison with the TAR model (of order $p=2$ ) with same parameters ( $\pi =0.165$, ${\pi }^{\left(10\right)}=0.152$, ${\pi }^{\left(01\right)}=0.125$ and ${\pi }^{\left(11\right)}=0.108$ ) has arisen, i.e., $\left\{{X}_{1},\cdots ,{X}_{T}\right\}$ have now been born from

$\begin{array}{c}{X}_{t}={I}_{t}\left(1-{X}_{t-1}\right)\left(1-{X}_{t-2}\right)+{I}_{t}^{\left(10\right)}{X}_{t-1}\left(1-{X}_{t-2}\right)\\ \text{\hspace{0.17em}}+{I}_{t}^{\left(01\right)}\left(1-{X}_{t-1}\right){X}_{t-2}+{I}_{t}^{\left(11\right)}{X}_{t-1}{X}_{t-2}.\end{array}$

Note that it can be derived ${X}_{t}={I}_{t}+{\sum }_{n\in ℕ}\text{ }\text{ }{C}_{t,n}^{\left(n-1\right)}{I}_{t-n}$ with ${C}_{t,1}^{\left(0\right)}:=D{I}_{t}^{\left(10\right)}\equiv {I}_{t}^{\left(10\right)}-{I}_{t}$, ${C}_{t,2}^{\left(0\right)}:=D{I}_{t}^{\left(01\right)}\equiv {I}_{t}^{\left(01\right)}-{I}_{t}$, ${C}_{t,\left(1,2\right)}^{\left(0\right)}:={D}^{2}{I}_{t}^{\left(11\right)}\equiv {I}_{t}^{\left(11\right)}-{I}_{t}^{\left(10\right)}-{I}_{t}^{\left(01\right)}+{I}_{t}$ and for $n\in ℕ$

$\begin{array}{l}{C}_{t,n+1}^{\left(n\right)}:={C}_{t,n+1}^{\left(n-1\right)}+{C}_{t,n}^{\left(n-1\right)}D{I}_{t-n}^{\left(10\right)}+{C}_{t,\left(n,n+1\right)}^{\left(n-1\right)}\text{\hspace{0.17em}}{I}_{t-n}^{\left(10\right)},\text{\hspace{0.17em}}{C}_{t,n+2}^{\left(n\right)}:={C}_{t,n}^{\left(n-1\right)}\text{\hspace{0.17em}}D{I}_{t-n}^{\left(01\right)},\text{\hspace{0.17em}}\text{ }\text{and}\text{ }\\ {C}_{t,\left(n+1,n+2\right)}^{\left(n\right)}:={C}_{t,n}^{\left(n-1\right)}\text{\hspace{0.17em}}{D}^{2}{I}_{t-n}^{\left(11\right)}+{C}_{t,\left(n,n+1\right)}^{\left(n-1\right)}\text{\hspace{0.17em}}{D}^{*}{I}_{t-n}^{\left(11\right)},\text{\hspace{0.17em}}{D}^{*}{I}_{t}^{\left(11\right)}:={I}_{t}^{\left(11\right)}-{I}_{t}^{\left(10\right)}.\end{array}$

Computing the probabilities $ℙ\left({X}_{t}=i,{X}_{t-1}=j\right)$, $i,j=0,1$ (from the same methodology and under causality) is straightforward, while the condition for causality now requires $\left[{\left(1+{\beta }^{*}{\sum }_{m=0}^{1}\text{ }\text{ }{\sum }_{{m}_{1}=0}^{m}\text{ }\text{ }1\right)}^{2}-1\right]<1$ or ${\beta }^{*}<\frac{\sqrt{2}-1}{3}\approx 0.138$, where

${\beta }^{*}:=\mathrm{max}\left\{\pi ,\mathrm{max}\left\{\mathbb{E}|D{I}_{t}^{\left(10\right)}|,\mathbb{E}|D{I}_{t}^{\left(01\right)}|\right\},\mathbb{E}|{D}^{2}{I}_{t}^{\left(11\right)}|\right\}:$

so it might be that it is violated (the search is no different than for the TARMA above).

According to Table 1, it is indeed remarkable how the performances for the two models almost coincide. That is such an encouraging sign for the TARMA, as it tempts the researcher to dare include moving-average parts and benefit from the rare flexibility, if they are willing to take the extra computational burden of the recursions: exact likelihoods can be computed as functions of the parameters and, even for small sample sizes, the estimators seem to perform quite satisfactorily. It is highlighted that the precision here is as returned by the code with no formal justification attached to it (for a smaller $R=100$ repetitions, the differences to the table were still minimal). From Table 1 it looks like there is a better bias (and MSE) performance of the $\pi$ estimator over the other three ones, so this is investigated further.

Table 2 picture still favors the $\pi$ estimator with a low absolute bias: the two ${\pi }^{\left(0|1\right)}$, ${\pi }^{\left(1|1\right)}$ estimators score the highest bias, naturally resulting from the real values being quite high. Similarly, for both numerical cases the ${\pi }^{\left(1|0\right)}$ estimator, with the lowest real value, escapes the high bias values without outperforming the $\pi$ in the same department though. The conclusions for the variance can be reversed. At this stage, it is highlighted that the easily obtainable (conditional

Table 1. Approximate (from $R=1000$ replications) bias, variance and Mean Squared Error of the Maximum Likelihood Estimators based on T consecutive observations from the TARMA (1, 1) and TAR (2) models ( $k=1$, Bernoulli variables) with real values $\pi =0.165$, ${\pi }^{\left(1|0\right)}=0.152$, ${\pi }^{\left(0|1\right)}=0.125$ and ${\pi }^{\left(1|1\right)}=0.108$ (under interindependence).

Table 2. Approximate (from $R=10000$ replications) bias, variance and Mean Squared Error of the Maximum Likelihood Estimators based on $T=6$ consecutive observations from the TARMA (1, 1) model ( $k=1$, Bernoulli variables) with real values, for the “Case 1.1”, $\pi =0.105$, ${\pi }^{\left(1|0\right)}=0.081$, ${\pi }^{\left(0|1\right)}=0.148$, ${\pi }^{\left(1|1\right)}=0.152$, or for the “Case 1.2”, $\pi =0.165$, ${\pi }^{\left(1|0\right)}=0.081$, ${\pi }^{\left(0|1\right)}=0.152$ and ${\pi }^{\left(1|1\right)}=0.148$ (both under interindependence).

likelihood) asymptotic result (for the estimators ${\stackrel{^}{\pi }}^{\left(1\right)}$, $\stackrel{^}{\pi }$ of ${\pi }^{\left(1\right)}:=ℙ\left({X}_{t}=1|{X}_{t-1}=1\right)$, $\pi :=ℙ\left({X}_{t}=1|{X}_{t-1}=0\right)$, respectively) regarding the $\left\{{X}_{t}\right\}~\text{TAR}\left(1\right)$ model, i.e.,

${T}^{1/2}\left(\begin{array}{c}{\stackrel{^}{\pi }}^{\left(1\right)}-{\pi }^{\left(1\right)}\\ \stackrel{^}{\pi }-\pi \end{array}\right)\stackrel{D}{\to }N\left(\left(\begin{array}{c}0\\ 0\end{array}\right),\left(\begin{array}{cc}{\pi }^{\left(1\right)}\left(1-{\pi }^{\left(1\right)}\right)/\mathbb{E}\left\{{X}_{t}\right\}& 0\\ 0& \pi \left(1-\pi \right)/\mathbb{E}\left\{1-{X}_{t}\right\}\end{array}\right)\right)$

has also been approved by simulation indications (for sample sizes $T=10,20,30$ and $R=100$ ), but those will not be presented here.

The investigation is not over yet, as it is desirable to smooth out the differences in the bias of estimators for the TARMA (1, 1) model, and the solution to that could be an “agreement” between the different ${I}^{\left(...\right)}$ variables (at the same time t) via the presence of interdependence. Indeed according to Table 3 and for the only small sample size ( $T=6$ ), “Case 2.1” exhibits low biases in all but the -0.11726 approximation from the 1000 estimates of ${\pi }_{|1}^{\ast \left(0|1\right)}$ ; similarly, for the “Case 2.2” and excluding the highest (absolute) bias for $\pi$ (its true value has been set high), almost all other estimates are approximated to be near their real value. The variance results for both cases are very impressive too. Note that the conditions for causality-invertibility might not have been followed faithfully in Table 3 (hence the real value $\pi =0.25$ in “Case 2.2”) and the “agreement” search has been set to ≥0.85 for ${\pi }_{|1}^{\ast \left(...\right)}$, or ≤0.15 for ${\pi }_{|0}^{\ast \left(...\right)}$ (together with a search $\pi \le 0.1667$ that justifies the -0.11655 bias in “Case 2.2”).

Again it is reminded that the results obtained from fewer simulations (Table 2: $R=100,1000$, Table 3: $R=100$ ) are hardly any different. The simulation indications of this section favor the addition of moving-average parts for the modelling of strict stationarity, if one is willing to slightly inconvenience themselves with a more sophisticated code for the likelihood computation. There

Table 3. Approximate (from $R=1000$ replications) bias, variance and Mean Squared Error of the Maximum Likelihood Estimators based on $T=6$ consecutive observations from the TARMA (1, 1) model ( $k=1$, Bernoulli variables) with real values, for the “Case 2.1”, $\pi =0.165$, ${\pi }_{|1}^{\ast \left(1|0\right)}=0.95$, ${\pi }_{|0}^{\ast \left(1|0\right)}=0.03$, ${\pi }_{|1}^{\ast \left(0|1\right)}=0.98$, ${\pi }_{|0}^{\ast \left(0|1\right)}=0.05$, ${\pi }_{|1}^{\ast \left(1|1\right)}=0.966$, ${\pi }_{|0}^{\ast \left(1|1\right)}=0.044$, or for the “Case 2.2”, $\pi =0.25$, ${\pi }_{|1}^{\ast \left(1|0\right)}=0.95$, ${\pi }_{|0}^{\ast \left(1|0\right)}=0.03$, ${\pi }_{|1}^{\ast \left(0|1\right)}=0.98$, ${\pi }_{|0}^{\ast \left(0|1\right)}=0.05$ and ${\pi }_{|1}^{\ast \left(1|1\right)}=0.966$, ${\pi }_{|0}^{\ast \left(1|1\right)}=0.044$ (the interdependence is implied for both).

have been no signs of distinction between the auto-regressive and moving-average estimators’ merits: not only does this strengthen the views of Sections 4.1 and 4.2, but also it brings to mind the classic Gaussian ARMA gem of [8], that the moving-average estimation transforms in theory to an auto-regression like situation.

6. Conclusions and Extending to the χ2 Test for Stationarity

Straight from (23), Theorem 4.3 and Lemmas 4.4, 4.5, there is the important derivation

${T}^{1/2}\left(\stackrel{^}{\pi }-{\pi }_{0}\right)\stackrel{D}{\to }N\left(0,\text{Var}{\left(\frac{1}{{p}_{t}\left({\pi }_{0}\right)}{\delta }_{t}\right)}^{-1}\right)\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{ }\text{as}\text{ }\text{\hspace{0.17em}}T\to \infty$ (30)

for the Maximum Likelihood Estimators of the parameters of a process that assigns its appearance to a causal and invertible TARMA model. The normal distribution convergence (30) implies the chi-square distribution convergence

$T{\left(\stackrel{^}{\pi }-{\pi }_{0}\right)}^{\tau }\text{Var}\left(\frac{1}{{p}_{t}\left({\pi }_{0}\right)}{\delta }_{t}\right)\left(\stackrel{^}{\pi }-{\pi }_{0}\right)\stackrel{D}{\to }{\chi }_{df}^{2},$ (31)

where $df$ equals the size of the parameter vector.

The asymptotic result (31) can be a real asset when testing whether a sample series has been generated by a stationary ( $ℤ$ -indexed) process: the null hypothesis will be of the form

H0: {Xt} is stationary with “distribution”...

where “distribution” is being used in the wide sense of a hyper-model that might specify as much information is allowed for the researcher (for example, marginal distribution, pairwise distribution,... or marginal and conditional distributions,... with or without knowledge of parameters etc). Under H0, a proper identification of k (this might not be necessary if the variables are discrete with “Bernoulli” being the best of cases), as well as p and q (some faint suggestions for setting $p=0$ or $q=0$ might be found in [1]) must take place to proceed. Identification issues must still be resolved in the case that the researcher is not testing an assumption, but opts for the alternative task of point and interval estimation. For either inference route taken, further research is mandatory, in order to approximate the variance matrix $\text{Var}\left({p}_{t}^{-1}{\delta }_{t}\right)$. The reader might wish to look for the special case of a causal TAR model, which due to its simplicity, is believed here that it should be supported as the most appropriate test for the non-parametric time series stationarity.

In a more structured context, an equivalent to the [14] ARMA methodology of identification, estimation, validation and forecasting can now become reality for the strictly stationary processes: any distribution (heavy-tailed, with extreme values, asymmetric or with levels of skewness) can be tamed by a proper categorization and then p and q need to be selected as well. This paper has contributed the complete guide for the second step with the best of all methods of estimation, by putting on the table all necessary properties of the TARMA maximum likelihood estimators.

In order to use the TARMA equation, the sample series needs to be produced by a stationary one. Any sign of persistent tendency, such as trends, seasonality or cycles forbid the use of TARMA: joint distributions (of equal “windows”) must look alike in time. For the second-order stationarity [15] give answers on how the series can be fixed to ARMA; or, there is a whole philosophy of modelling those tendencies directly. For the strict stationarity, one can still resort to (31), testing and computing whether the sample series is close enough to a hypothesis of hyper- dependence: a large value of the statistic, as compared to a chi-square value, will suggest that the hypothesis collapses. Otherwise, there are parametric/theoretical approaches that work under specific distributional assumptions.

To wrap it up, it is strongly believed that a precious link has been solidified between the “multiplicative linear” (as called in [1])—strictly stationary- times series and the non-parametric inferential statistics. The well-known ${\chi }^{2}$ test for the distribution of a variable that has generated a random sample, can be extended now to the ${\chi }^{2}$ test for the stationary-principled hyper-model, dominating a $ℤ$ -indexed process that has generated a sample series: as in the former case, when extra (than the prespecified) parameters need to be estimated, it might be worth investigating the reduction of degrees of freedom in the statistic ${\chi }^{2}$ distribution. Regardless of various such minor issues that might be studied further, the main achievement of this paper is that it has dealt quite satisfactorily with a complex problem: the inference for any stationary time series that can be clothed by a causal and invertible TARMA equation.

Acknowledgments

This is for the author’s beloved father (deceased).

Conflicts of Interest

The authors declare no conflicts of interest.

 [1] Dimitriou-Fakalou, C. (2019) The Table Auto-Regressive Moving-Average Model for (Categorical) Stationary Series: Statistical Properties (Causality; from the All Random to the Conditional Random). Journal of Nonparametric Statistics, 31, 31-63. https://doi.org/10.1080/10485252.2018.1527912 [2] Jacobs, P.A. and Lewis, A.W. (1978) Discrete Time Series Generated by Mixtures II: Asymptotic Properties. Journal of the Royal Statistical Society: Series B, 40, 222-228. https://doi.org/10.1111/j.2517-6161.1978.tb01667.x [3] Möller, T.A. and Wei, C.H. (2020) Generalized Discrete Autoregressive Moving-Average Models. Applied Stochastic Models in Business and Industry, 36, 641-659. https://doi.org/10.1002/asmb.2520 [4] Lomnicki, Z.A. and Zaremba, S.K. (1955) Some Applications of Zero-One Processes. Journal of the Royal Statistical Society: Series B, 17, 243-255. https://doi.org/10.1111/j.2517-6161.1955.tb00198.x [5] Joe, H. (1996) Time Series Models with Univariate Margins in the Convolution-Closed Infinitely Divisible Class. Journal of Applied Probability, 33, 664-677. https://doi.org/10.1017/S0021900200100105 [6] Kedem, B. (1980) Estimation of the Parameters in Stationary Autoregressive Processes after Hard Limiting. Journal of the American Statistical Association, 75, 146-153. https://doi.org/10.1080/01621459.1980.10477445 [7] Azzalini, A. (1983) Maximum Likelihood Estimation of Order m for Stationary Stochastic Processes. Biometrika, 70, 381-387. https://doi.org/10.1093/biomet/70.2.381 [8] Hannan, E.J. (1973) The Asymptotic Theory of Linear Time Series Models. Journal of Applied Probability, 10, 130-145. https://doi.org/10.1017/S0021900200042145 [9] Cui, Y. and Lund, R.B. (2009) A New Look at Time Series of Counts. Biometrika, 96, 781-792. https://doi.org/10.1093/biomet/asp057 [10] Roitershtein, A. and Zhong, Z. (2013) On Random Coefficient INAR(1) Processes. Science China Mathematics, 56, 177-200. https://doi.org/10.1007/s11425-012-4547-z [11] Davis, R.A., Fokianos, K., Holan, S.H., Joe, H., Livsey, J., Lund, R., Pipiras, V. and Ravishanker, N. (2021) Count Time Series: A Methodological Review. Journal of the American Statistical Association, 116, 1533-1547. https://doi.org/10.1080/01621459.2021.1904957 [12] Anderson, T.W. and Goodman, L.A. (1957) Statistical Inference about Markov Chains. The Annals of Mathematical Statistics, 28, 89-110. https://doi.org/10.1214/aoms/1177707039 [13] Klotz, J. (1973) Statistical Inference in Bernoulli Trials with Dependence. The Annals of Statistics, 1, 373-379. https://doi.org/10.1214/aos/1176342377 [14] Box, G.E.P. and Jenkins, G.M. (1970) Time Series Analysis: Forecasting and Control. Holden-Day, San Francisco. [15] Brockwell, P.J. and Davis, R.A. (1991) Time Series: Theory and Methods. 2nd Edition, Springer-Verlag, New York. https://doi.org/10.1007/978-1-4419-0320-4