Discrete-Time Nonlinear Stochastic Optimal Control Problem Based on Stochastic Approximation Approach

In this paper, a computational approach is proposed for solving the discrete-time nonlinear optimal control problem, which is disturbed by a sequence of random noises. Because of the exact solution of such optimal control problem is impossible to be obtained, estimating the state dynamics is currently required. Here, it is assumed that the output can be measured from the real plant process. In our approach, the state mean propagation is applied in order to construct a linear model-based optimal control problem, where the model output is measureable. On this basis, an output error, which takes into account the differences between the real output and the model output, is defined. Then, this output error is minimized by applying the stochastic approximation approach. During the computation procedure, the stochastic gradient is established, so as the optimal solution of the model used can be updated iteratively. Once the convergence is achieved, the iterative solution approximates to the true optimal solution of the original optimal control problem, in spite of model-reality differences. For illustration, an example on a continuous stirred-tank reactor problem is studied, and the result obtained shows the applicability of the approach proposed. Hence, the efficiency of the approach proposed is highly recommended.

Share and Cite:

Kek, S. , Sim, S. , Leong, W. and Teo, K. (2018) Discrete-Time Nonlinear Stochastic Optimal Control Problem Based on Stochastic Approximation Approach. Advances in Pure Mathematics, 8, 232-244. doi: 10.4236/apm.2018.83012.

1. Introduction

Nonlinear optimal control problem, which is disturbed by random noises, is an interesting research topic. In presence of the random noises, the entire state trajectory could not be measured exactly. Due to the nonlinear structure and the fluctuation behavior of the dynamical system, an efficient computational approach is, therefore, necessarily required to estimate the state dynamics. Further from this, the state estimate shall be used to optimize and control the dynamical system, where the optimal control policy is drawn apparently      . From literatures, the applications of the nonlinear stochastic optimal control are widely studied, see for examples, vehicle trajectory planning  , portfolio selection problem  , building structural system  , investment in insurance  , switching system  , machine maintenance problem  , nonlinear differential game problem  , and viscoelastic systems  .

In recent years, using the linear optimal control model with model-reality differences in solving the nonlinear optimal control problem, especially for discrete-time nonlinear stochastic optimal control problem, is proposed     . Such method is known as the integrated optimal control and parameter estimation (IOCPE) algorithm. In this approach, the adjusted parameters are introduced into the model used, so as the differences between the real plant and the model used can be calculated repeatedly. This algorithm is an iterative procedure, where system optimization and parameter estimation are integrated interactively. During the computation procedure, the optimal solution of the model used is updated iteratively. Once the convergence is achieved, the iterative solution of the model used approximates to the true optimal solution of the original optimal control problem, in spite of model-reality differences.

Besides, the applications of the IOCPE algorithm in providing the expectation solution as well as the filtering solution of the discrete-time nonlinear stochastic optimal control problem have been well-demonstrated   . In addition, the optimal output solution obtained from the IOCPE algorithm has been improved by using the weighted output residual  , which is introduced into the model cost function, and the output matching scheme  , where the adjusted parameter is introduced into the model output. Moreover, the application of the approaches on the least-square and the Gauss-Newton with the principle of model-reality differences, which omits from using the adjusted parameters, enhance the practical usage of the IOCPE algorithm for delivering the optimal solution of the original optimal control problem   .

By virtue of the improvement done, it is simply seen that the efficiency of the IOCPE algorithm for solving the discrete-time nonlinear stochastic optimal control problem is shown. However, we find that the output residual from the Kalman filtering theory could be further reduced, in turn, having an efficient output solution for representing the original output. Hence, in this paper, we aim to improve the accuracy of the output solution of the model used. In our approach, the stochastic approximation approach, which is an iterative stochastic optimization algorithm     , is applied. The advantage of the stochastic approximation algorithm is to find the optimum of a function, which cannot be computed directly, but only be estimated from noisy observations     , and its applications to control systems have been well-defined      . This advantage motivates us on applying the stochastic approximation algorithm into the IOCPE algorithm can significantly reduce the output residual compared to those output residual from the Kalman filtering theory. Here, the optimal control law, which is based on the state mean propagation, is constructed. At the end of iteration, the trajectories of state and control, which are in expectation manner, are obtained, while the output trajectory could track the real output closely. Hence, the efficiency of the approach proposed is highly recommended.

The rest of the paper is organized as follows. In Section 2, a general discrete-time nonlinear stochastic optimal control problem is described. In Section 3, the stochastic approximation scheme, which is combined with the principle of model-reality differences, is discussed. The calculation procedure is then formulated as an iterative algorithm. In Section 4, an illustrative example on a continuous stirred-tank reactor problem is studied and the applicability of the approach proposed is presented. Finally, some concluding remarks are made.

2. Problem Statement

Consider a general discrete-time nonlinear stochastic optimal control problem given by

$\begin{array}{l}\underset{u\left(k\right)}{\mathrm{min}}{J}_{0}\left(u\right)=E\left[\phi \left(x\left(N\right),N\right)+\underset{k=0}{\overset{k-1}{\sum }}L\left(x\left(k\right),u\left(k\right),k\right)\right]\\ \text{subjectto}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{ }\text{ }x\left(k+1\right)=f\left(x\left(k\right),u\left(k\right),k\right)+G\omega \left(k\right)\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}y\left(k\right)=h\left(x\left(k\right),k\right)+\eta \left(k\right)\end{array}$ (1)

where $u\left(k\right)\in {\Re }^{m},k=0,1,\cdots ,N-1$ , $x\left(k\right)\in {\Re }^{n},k=0,1,\cdots ,N$ and $y\left(k\right)\in {\Re }^{p},$ $k=0,1,\cdots ,N$ are, respectively, control sequence, state sequence and output sequence. The process noise sequence $\omega \left(k\right)\in {\Re }^{q},k=0,1,\cdots ,N-1$ and the measurement noise sequence $\eta \left(k\right)\in {\Re }^{p},k=0,1,\cdots ,N$ are the stationary Gaussian white noise sequences with zero mean and their covariance matrices are, respectively, given by ${Q}_{\omega }\in {\Re }^{q×q}$ and ${R}_{\eta }\in {\Re }^{p×p}$ , which both are positive definite matrices. While, $G\in {\Re }^{n×q}$ is a process noise coefficient matrix, $f:{\Re }^{n}×{\Re }^{m}×\Re \to {\Re }^{n}$ represents the real plant, and $h:{\Re }^{n}×\Re \to {\Re }^{p}$ is the output measurement, whereas $\phi :{\Re }^{n}×\Re \to \Re$ is the terminal cost and $L:{\Re }^{n}×{\Re }^{m}×\Re \to \Re$ is the cost under summation. Here, ${J}_{0}$ is the scalar cost function and $E\left[\cdot \right]$ is the expectation operator. It is assumed that all functions in (1) are continuously differentiable with respect to their respective arguments.

The initial state

$x\left(0\right)={x}_{0}$ ,

where ${x}_{0}\in {\Re }^{n}$ is a random vector with mean and covariance are, respectively, given by

$E\left[{x}_{0}\right]={\stackrel{¯}{x}}_{0}\text{and}E\left[\left({x}_{0}-{\stackrel{¯}{x}}_{0}\right){\left({x}_{0}-{\stackrel{¯}{x}}_{0}\right)}^{\text{T}}\right]={M}_{0}.$

Here, ${M}_{0}\in {\Re }^{n×n}$ is a positive definite matrix. It is assumed that initial state, process noise and measurement noise are statistically independent.

This problem, which is regarded as the discrete-time stochastic optimal control problem, is referred to as Problem (P). Notice that the exact solution of Problem (P) is, in general, unable to be obtained. Moreover, applying the nonlinear filtering theory to estimate the state of the real plant is computationally demanding. Nevertheless, the output can be measured from the real plant process.

In view of these weaknesses, a linear model-based optimal control problem, which is referred to as Problem (M), is constructed, given by

$\begin{array}{l}\underset{u\left(k\right)}{\mathrm{min}}{J}_{1}\left(u\right)=\frac{1}{2}\stackrel{¯}{x}{\left(N\right)}^{\text{T}}S\left(N\right)\stackrel{¯}{x}\left(N\right)+\underset{k=0}{\overset{N-1}{\sum }}\frac{1}{2}\left(\stackrel{¯}{x}{\left(k\right)}^{\text{T}}Q\stackrel{¯}{x}\left(k\right)+u{\left(k\right)}^{\text{T}}Ru\left(k\right)\right)\\ \text{subjectto}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\stackrel{¯}{x}\left(k+1\right)=A\stackrel{¯}{x}\left(k\right)+Bu\left(k\right)\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{ }\stackrel{¯}{y}\left(k\right)=C\stackrel{¯}{x}\left(k\right)\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{ }\text{\hspace{0.17em}}\stackrel{¯}{x}\left(0\right)={\stackrel{¯}{x}}_{0}\end{array}$ (2)

where $\stackrel{¯}{x}\left(k\right)\in {\Re }^{n},k=0,1,\cdots ,N$ and $\stackrel{¯}{y}\left(k\right)\in {\Re }^{p},k=0,1,\cdots ,N$ are, respectively, the expected state sequence and the expected output sequence; $A\in {\Re }^{n×n}$ is a state transition matrix, $B\in {\Re }^{n×m}$ is a control coefficient matrix, and $C\in {\Re }^{p×n}$ is an output coefficient matrix, while $S\left(N\right)\in {\Re }^{n×n}$ and $Q\in {\Re }^{n×n}$ are positive semi-definite matrices and $R\in {\Re }^{m×m}$ is a positive definite matrix. Here, ${J}_{1}$ is the scalar cost function.

It is emphasized that only solving Problem (M) would not give the optimal solution of Problem (P). However, by establishing an efficient matching scheme based on the output error, which is the differences between the real output and the model output, to Problem (M), it is possible to obtain the optimal solution of Problem (P) as solving Problem (M) iteratively. In this point of view, we are motivated to look into the possibility of constructing an expanded optimal control model with the output error. This model formulation is for obtaining the true optimal solution of Problem (P) despite model-reality differences.

3. Optimal Control with Stochastic Approximation

Now, let us define the expanded optimal control problem, which is referred to as Problem (E), is formulated by

$\underset{\alpha \left(k\right)}{\mathrm{min}}{J}_{2}\left(\alpha \right)=\underset{u\left(k\right)}{\mathrm{min}}{J}_{1}\left(u\right)+\frac{1}{2}\underset{k=0}{\overset{N}{\sum }}\alpha {\left(k\right)}^{\text{T}}\alpha \left(k\right)$

$\begin{array}{l}\text{subjectto}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\stackrel{¯}{x}\left(k+1\right)=A\stackrel{¯}{x}\left(k\right)+Bu\left(k\right)\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\stackrel{¯}{y}\left(k\right)=C\stackrel{¯}{x}\left(k\right)+\alpha \left(k\right)\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{ }\text{\hspace{0.17em}}\text{\hspace{0.17em}}\stackrel{¯}{x}\left(0\right)={\stackrel{¯}{x}}_{0}\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\stackrel{^}{y}\left(k\right)+\alpha \left(k\right)=y\left(k\right)\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\stackrel{^}{y}\left(k\right)=\stackrel{¯}{y}\left(k\right)\end{array}$ (3)

where $\stackrel{^}{y}\left(k\right)\in {\Re }^{p},k=0,1,\cdots ,N$ is introduced to separate the output sequence from the respective signals in the output error problem. It is important to note that the algorithm is to be designed such that the constraint $\stackrel{^}{y}\left(k\right)=\stackrel{¯}{y}\left(k\right)$ will be satisfied at the end of the iterations. In this situation, the output $\stackrel{^}{y}\left(k\right)$ will be used for the output error problem and the establishment of the matching scheme, whereas the corresponding output $\stackrel{¯}{y}\left(k\right)$ will be reserved for the model output after optimizing the model-based optimal control problem. Here, the output error is defined as

$\alpha \left(k\right)=y\left(k\right)-\stackrel{^}{y}\left(k\right),\text{\hspace{0.17em}}k=0,1,\cdots ,N.$ (4)

3.1. Necessary Optimality Conditions

Define the Hamiltonian function as follows

$\begin{array}{c}{H}_{e}\left(k\right)=\frac{1}{2}\left(\stackrel{¯}{x}{\left(k\right)}^{\text{T}}Q\stackrel{¯}{x}\left(k\right)+u{\left(k\right)}^{\text{T}}Ru\left(k\right)\right)+\frac{1}{2}\alpha {\left(k\right)}^{\text{T}}\alpha \left(k\right)\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}+p{\left(k+1\right)}^{\text{T}}\left(A\stackrel{¯}{x}\left(k\right)+Bu\left(k\right)\right)-r{\left(k\right)}^{\text{T}}\stackrel{¯}{y}\left(k\right)\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}+q{\left(k\right)}^{\text{T}}\left(C\stackrel{¯}{x}\left(k\right)+\alpha \left(k\right)-\stackrel{¯}{y}\left(k\right)\right)\end{array}$ (5)

then, the augmented cost function becomes

$\begin{array}{c}{{J}^{\prime }}_{2}\left(\alpha \right)=\frac{1}{2}\stackrel{¯}{x}{\left(N\right)}^{\text{T}}S\left(N\right)\stackrel{¯}{x}\left(N\right)+\frac{1}{2}\alpha {\left(N\right)}^{\text{T}}\alpha \left(N\right)+p{\left(0\right)}^{\text{T}}\stackrel{¯}{x}\left(0\right)\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}-p{\left(N\right)}^{\text{T}}\stackrel{¯}{x}\left(N\right)+\underset{k=0}{\overset{N-1}{\sum }}{H}_{e}\left(k\right)-p{\left(k\right)}^{\text{T}}\stackrel{¯}{x}\left(k\right)+r{\left(k\right)}^{\text{T}}\stackrel{^}{y}\left(k\right)\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}+s{\left(k\right)}^{\text{T}}\left(y\left(k\right)-\stackrel{^}{y}\left(k\right)-\alpha \left(k\right)\right)\end{array}$ (6)

where $p\left(k\right),q\left(k\right),r\left(k\right)$ and $s\left(k\right)$ are the appropriate multipliers to be determined later.

Applying the calculus of variation    to the augmented cost function (6), the following necessary optimality conditions are obtained:

1) Stationary condition:

$Ru\left(k\right)+{B}^{\text{T}}p\left(k+1\right)=0$ (7a)

2) Co-state equation:

$p\left(k\right)=Q\stackrel{¯}{x}\left(k\right)+{A}^{\text{T}}p\left(k+1\right)$ (7b)

3) State equation:

$\stackrel{¯}{x}\left(k+1\right)=A\stackrel{¯}{x}\left(k\right)+Bu\left(k\right)$ (7c)

with the boundary conditions $\stackrel{¯}{x}\left(0\right)={\stackrel{¯}{x}}_{0}$ and $p\left(N\right)=0$ .

4) Output equation:

$\stackrel{¯}{y}\left(k\right)=C\stackrel{¯}{x}\left(k\right)+\alpha \left(k\right)$ (7d)

5) Separable variables:

$\stackrel{^}{y}\left(k\right)=\stackrel{¯}{y}\left(k\right),\text{\hspace{0.17em}}\text{\hspace{0.17em}}\stackrel{^}{p}\left(k\right)=p\left(k\right)$ (7e)

with the multipliers $r\left(k\right)=-q\left(k\right)$ , $s\left(k\right)=q\left(k\right)$ and $q\left(k\right)=0$ .

In view of these necessary optimality conditions, the conditions (7a), (7b) and (7c) are the necessary conditions for Problem (M), while the necessary condition (7d) is an adjustable output measurement. Notice that with this adjustable output, the real output could be tracked by the model output as closely as possible once the output residual is significantly minimized.

3.2. Feedback Optimal Control Law

From (7a), the feedback optimal control law can be calculated from

$u\left(k\right)=-K\left(k\right)\stackrel{¯}{x}\left(k\right),\text{\hspace{0.17em}}k=0,1,\cdots ,N-1$ (8)

where

$K\left(k\right)={\left(R+{B}^{\text{T}}S\left(k+1\right)B\right)}^{-1}{B}^{\text{T}}S\left(k+1\right)A$ , (9a)

$S\left(k\right)=Q+{A}^{\text{T}}S\left(k+1\right)\left(A-BK\left(k\right)\right)$ . (9b)

For more detail, see     for the proof of the derivation on this feedback optimal control law.

Applying (8), the state equation is written as

$\stackrel{¯}{x}\left(k+1\right)=\left(A-BK\left(k\right)\right)\stackrel{¯}{x}\left(k\right),\text{\hspace{0.17em}}k=0,1,\cdots ,N-1$ (10)

and the co-state equation is given by

$p\left(k\right)=S\left(k\right)\stackrel{¯}{x}\left(k\right),\text{\hspace{0.17em}}k=0,1,\cdots ,N.$ (11)

3.3. Stochastic Approximation Scheme

In general, the recursive equation for the stochastic approximation (SA) algorithm     is defined by

$\theta \left(k+1\right)=\theta \left(k\right)-a\left(k\right)g\left(\theta \left(k\right),k\right)$ (12)

where $\theta \left(k\right)$ is the set of the parameters to be estimated, $g\left(\theta \left(k\right),k\right)$ is the stochastic gradient, and $a\left(k\right)$ is the gain sequence. On this basis, refer to Problem (E), let us define $\theta \left(k\right)={\left(u\left(k\right),\stackrel{¯}{x}\left(k\right),\stackrel{^}{y}\left(k\right)\right)}^{\text{T}}$ and the stochastic gradient, which is assumed to be measurable for the objective function given in (3), is introduced as

$g\left(\theta \left(k\right),k\right)={\left(\frac{\partial {J}_{2}\left(k\right)}{\partial u\left(k\right)},\frac{\partial {J}_{2}\left(k\right)}{\partial \stackrel{¯}{x}\left(k\right)},\frac{\partial {J}_{2}\left(k\right)}{\partial \stackrel{^}{y}\left(k\right)}\right)}^{\text{T}}$ .

Refer to the SA algorithm (12), it leads to the following iterative equations:

$u{\left(k\right)}^{i+1}=u{\left(k\right)}^{i}-a\left(k\right)\frac{\partial {J}_{2}\left(\alpha \right)}{\partial u{\left(k\right)}^{i}}$ (13a)

$\stackrel{¯}{x}{\left(k\right)}^{i+1}=\stackrel{¯}{x}{\left(k\right)}^{i}-a\left(k\right)\frac{\partial {J}_{2}\left(\alpha \right)}{\partial \stackrel{¯}{x}{\left(k\right)}^{i}}$ (13b)

$\stackrel{^}{y}{\left(k\right)}^{i+1}=\stackrel{^}{y}{\left(k\right)}^{i}-a\left(k\right)\frac{\partial {J}_{2}\left(\alpha \right)}{\partial \stackrel{^}{y}{\left(k\right)}^{i}}$ . (13c)

These equations would be used to update the optimal solution of Problem (E), in turn, to approximate the optimal solution of Problem (P), in spite of model-reality differences.

Consequently, to evaluate the stochastic gradient, rewrite the output error defined in (4), for k = k +1, as

$\alpha \left(k+1\right)=y\left(k+1\right)-\stackrel{^}{y}\left(k+1\right)=y\left(k+1\right)-\stackrel{¯}{y}\left(k+1\right)$ (14)

where the separable variable in (7e) is satisfied. After that, taking the expected output measured (7d) for k = k +1, and substituting $\stackrel{¯}{x}\left(k+1\right)$ by the state equation (10), we have

$\alpha \left(k+1\right)=\frac{1}{2}\left(y\left(k+1\right)-C\left(A\stackrel{¯}{x}\left(k\right)+Bu\left(k\right)\right)\right)$ . (15)

Hence, from the objective function (3) in Problem (E), the stochastic gradient, which the chain rule differentiation is applied, is calculated from

$\frac{\partial {J}_{2}\left(\alpha \right)}{\partial u\left(k\right)}={\left[\frac{\partial \alpha \left(k\right)}{\partial u\left(k\right)}\right]}^{\text{T}}\left[\frac{\text{d}{J}_{2}\left(k\right)}{\text{d}\alpha \left(k\right)}\right]=-{\left(CB\right)}^{\text{T}}\alpha \left(k\right)$ (16a)

$\frac{\partial {J}_{2}\left(\alpha \right)}{\partial \stackrel{¯}{x}\left(k\right)}={\left[\frac{\partial \alpha \left(k\right)}{\partial \stackrel{¯}{x}\left(k\right)}\right]}^{\text{T}}\left[\frac{\text{d}{J}_{2}\left(k\right)}{\text{d}\alpha \left(k\right)}\right]=-{\left(CA\right)}^{\text{T}}\alpha \left(k\right)$ (16b)

$\frac{\partial {J}_{2}\left(\alpha \right)}{\partial \stackrel{^}{y}\left(k\right)}={\left[\frac{\partial \alpha \left(k\right)}{\partial \stackrel{^}{y}\left(k\right)}\right]}^{\text{T}}\left[\frac{\text{d}{J}_{2}\left(k\right)}{\text{d}\alpha \left(k\right)}\right]=-\alpha \left(k\right)$ (16c)

On the other hand, the gain sequence $a\left(k\right)$ , which is given in (12), has the asymptotic normality and its convergence property has been well-defined      . In particular, the formulation form of the gain sequence $a\left(k\right)$ is given from

$a\left(k\right)=\frac{a}{{\left(k+1+A\right)}^{b}}$ (17)

where a and b are strictly positive and the stability constant A ≥ 0. The practical value of b is 0.602, which provides the generally more desirable slowly decaying gain (17).

3.4. Computational Algorithm

From the discussion above, the resulting algorithm provides the optimal solution of the linear model-based optimal control problem. This optimal solution is then updated based on the stochastic approximation algorithm to approximate the true optimal solution of the original optimal control problem. As a result, the computation procedure of the iterative algorithm is summarized as follows.

Iterative algorithm with SA scheme

Data: Given $A,B,C,G,Q,R,{Q}_{\omega },{R}_{\eta },S\left(N\right),{M}_{0},{\stackrel{¯}{x}}_{0},N,f,L,h,\phi$ .

Step 0: Compute a nominal solution. Calculate $K\left(k\right)$ and $S\left(k\right)$ from (9a) and (9b), respectively, Then, solve Problem (M) defined by (2) to obtain $u\left(k\right),k=0,1,\cdots ,N-1$ , $\stackrel{¯}{x}\left(k\right),k=0,1,\cdots ,N$ and $\stackrel{¯}{y}\left(k\right),k=0,1,\cdots ,N$ . Set $i=0$ , $u{\left(k\right)}^{0}=u\left(k\right)$ , $\stackrel{¯}{x}{\left(k\right)}^{0}=\stackrel{¯}{x}\left(k\right)$ and $\stackrel{¯}{y}{\left(k\right)}^{0}=\stackrel{¯}{y}\left(k\right)$ .

Step 1: Compute the output error $\alpha {\left(k\right)}^{i},k=0,1,\cdots ,N$ from (4).

Step 2: With the determined $\alpha {\left(k\right)}^{i}$ , solve Problem (E) defined by (3) to obtain the new $u{\left(k\right)}^{i},k=0,1,\cdots ,N-1$ , the new $\stackrel{¯}{x}{\left(k\right)}^{i},k=0,1,\cdots ,N$ , and the new $\stackrel{¯}{y}{\left(k\right)}^{i},k=0,1,\cdots ,N$ , respectively, from (8), (10) and (7d).

Step 3: Update the optimal solution given, respectively, by (13a), (13b) and (13c). If $u{\left(k\right)}^{i+1}=u{\left(k\right)}^{i},k=0,1,\cdots ,N-1$ , $\stackrel{¯}{x}{\left(k\right)}^{i+1}=\stackrel{¯}{x}{\left(k\right)}^{i},k=0,1,\cdots ,N$ and $\stackrel{¯}{y}{\left(k\right)}^{i+1}=\stackrel{¯}{y}{\left(k\right)}^{i},k=0,1,\cdots ,N$ , within a given tolerance, stop; else set $i=i+1$ and repeat from Step 1.

Remarks

1) The off-line computation is done, as stated in Step 0, to calculate $K\left(k\right),k=0,1,\cdots ,N-1$ and $S\left(k\right),k=0,1,\cdots ,N$ , for the control law design. Then, these parameters are used for solving Problem (M) in Step 0 and for solving Problem (E) in Step 2, respectively.

2) The variable $\alpha {\left(k\right)}^{i}$ is zero in Step 0 and the calculated value of $\alpha {\left(k\right)}^{i}$ changes from iteration to iteration.

3) Problem (P) is not necessary to be linear or to have a quadratic cost function.

4) The conditions $u{\left(k\right)}^{i+1}=u{\left(k\right)}^{i}$ and $\stackrel{¯}{x}{\left(k\right)}^{i+1}=\stackrel{¯}{x}{\left(k\right)}^{i}$ are required to be satisfied for the converged optimal control sequence and the converged state estimate sequence. The following averaged 2-norms are computed and then they are compared with a given tolerance to verify the convergence of $u\left(k\right)$ and $\stackrel{¯}{x}\left(k\right)$ :

${‖{u}^{i+1}-{u}^{i}‖}_{2}={\left(\frac{1}{N-1}\underset{k=0}{\overset{N-1}{\sum }}‖u{\left(k\right)}^{i+1}-u{\left(k\right)}^{i}‖\right)}^{1/2}$ (18a)

${‖{\stackrel{¯}{x}}^{i+1}-{\stackrel{¯}{x}}^{i}‖}_{2}={\left(\frac{1}{N}\underset{k=0}{\overset{N}{\sum }}‖\stackrel{¯}{x}{\left(k\right)}^{i+1}-\stackrel{¯}{x}{\left(k\right)}^{i}‖\right)}^{1/2}$ (18b)

5) The gain sequence $a\left(k\right)$ , which is considered in the algorithm proposed, is

$a\left(k\right)=\frac{2}{{\left(1+k\right)}^{0.602}}$ (19)

where A = 0 from (17).

4. Illustrative Example

Consider the optimal control of a continuous stirred-tank reactor problem  :

$\underset{u\left(k\right)}{\mathrm{min}}{J}_{0}\left(u\right)=0.01\underset{k=0}{\overset{76}{\sum }}E\left[{\left({x}_{1}\left(k\right)\right)}^{2}+{\left({x}_{2}\left(k\right)\right)}^{2}+0.1{\left(u\left(k\right)\right)}^{2}\right]$

subject to

$\begin{array}{c}{x}_{1}\left(k+1\right)={x}_{1}\left(k\right)-0.02\left({x}_{1}\left(k\right)+0.25\right)+0.01\left({x}_{2}\left(k\right)+0.5\right)\mathrm{exp}\left[\frac{25{x}_{1}\left(k\right)}{{x}_{1}\left(k\right)+2}\right]\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}-0.01\left({x}_{1}\left(k\right)+0.25\right)u\left(k\right)+{\omega }_{1}\left(k\right)\end{array}$

${x}_{2}\left(k+1\right)=0.99{x}_{2}\left(k\right)-0.005-0.01\left({x}_{2}\left(k\right)+0.5\right)\mathrm{exp}\left[\frac{25{x}_{1}\left(k\right)}{{x}_{1}\left(k\right)+2}\right]+{\omega }_{2}\left(k\right)$

$y\left(k\right)={x}_{1}\left(k\right)+\eta \left(k\right)$

with the initial condition

${x}_{1}\left(0\right)=0.05,\text{\hspace{0.17em}}\text{\hspace{0.17em}}{x}_{2}\left(0\right)=0.$

Here, $\omega \left(k\right)={\left[\begin{array}{cc}{\omega }_{1}\left(k\right)& {\omega }_{2}\left(k\right)\end{array}\right]}^{\text{T}}$ and $\eta \left(k\right)$ are Gaussian white noise sequences with their respective covariance given by ${Q}_{\omega }={10}^{-3}{I}_{2}$ and ${R}_{\eta }={10}^{-3}$ .

This problem is referred to as Problem (P).

The linear model-based optimal control problem, which is simplified from Problem (P) and is referred to as Problem (M), is defined by

$\underset{u\left(k\right)}{\mathrm{min}}{J}_{1}\left(u\right)=\frac{1}{2}\underset{k=0}{\overset{76}{\sum }}\left[{\left({\stackrel{¯}{x}}_{1}\left(k\right)\right)}^{2}+{\left({\stackrel{¯}{x}}_{2}\left(k\right)\right)}^{2}+0.1{\left(u\left(k\right)\right)}^{2}\right]$

subject to

$\left[\begin{array}{c}{\stackrel{¯}{x}}_{1}\left(k+1\right)\\ {\stackrel{¯}{x}}_{2}\left(k+1\right)\end{array}\right]=\left[\begin{array}{cc}1.0895& 0.0184\\ -0.1095& 0.9716\end{array}\right]\left[\begin{array}{c}{\stackrel{¯}{x}}_{1}\left(k\right)\\ {\stackrel{¯}{x}}_{2}\left(k\right)\end{array}\right]+\left[\begin{array}{c}-0.003\\ 0.000\end{array}\right]u\left(k\right)$

$\stackrel{¯}{y}\left(k\right)={\stackrel{¯}{x}}_{1}\left(k\right)+\alpha \left(k\right)$

with the initial condition

${\stackrel{¯}{x}}_{1}\left(0\right)=0.05,\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\stackrel{¯}{x}}_{2}\left(0\right)=0$

and the adjusted parameter $\alpha \left(k\right)$ is added into the output measurement channel.

By running the approach proposed, the simulation result is shown in Table 1, where it is compared to the result of the filtering solution  . It can be seen that the iteration number of the approach proposed is more than the iteration number of filtering model, and the final cost of the approach proposed is greater than the final cost of filtering model. But, it is found that the output residual of the approach proposed is dramatically reduced to 0.000216 unit, which is a 99 percent reduction. This percentage shows that the model output solution obtained by the approach proposed is significantly closely to the real output trajectory. Hence, this indicates that the approach proposed is practically useful in obtaining the real output solution.

The trajectories of control, state, and output are, respectively, shown in Figures 1-3. It is noticed that the trajectories of control and state are smoothly freely from the disturbance of random noise sequences. This is because of they are an ideal deterministic optimal solution to the nonlinear model-based optimal control problem. However, the real output that is disturbed by the random noise sequences is really fluctuated. By applying the approach proposed, the model

Table 1. Simulation result.

Figure 1. Control trajectory.

Figure 2. State trajectories.

Figure 3. Output trajectories.

Figure 4. Output error.

output trajectory could follow the real output trajectory as closely as possible. Additionally, the output error, which is presenting the differences between the real output and the model output, is shown in Figure 4. As a result of this, it is concluded that the approach proposed is efficient and its applicability is demonstrated.

5. Concluding Remarks

Applying the stochastic approximation scheme into the IOCPE algorithm was discussed in this paper. The aim is to improve the output solution of the model used. From previous studies, the IOCPE algorithm is for solving the discrete-time nonlinear stochastic optimal control problem, while the stochastic approximation is for the stochastic optimization. In combining these two approaches, the state mean propagation is constructed, where the adjusted parameter is added into the model output used. During the calculation procedure, the differences between the real plant and the model used are taken into account for updating the iterative solution repeatedly. On the other hand, the least square output error is established such that the stochastic gradient is derived. Consequently, the iterative solution approximates to the optimal solution of the original optimal control problem, in spite of model-reality differences. For illustration, an example on a continuous stirred-tank reactor problem was studied to show the applicability of the approach proposed. In conclusion, the efficiency of the approach proposed is highly recommended.

For the future research direction, it is suggested to apply the SA algorithm to solve the linear model-based optimal control problem, without calculating the adjusted parameter, in order to obtain the true optimal solution of the nonlinear optimal control problem. The result would be compared to the result which is obtained by using the Gauss-Newton method   . Hence, the calculation procedure in the IOCPE could be simplified.

Conflicts of Interest

The authors declare no conflicts of interest.

  Kalman, R.E. (1960) Contributions to the Theory of Optimal Control. Boletín de la Sociedad Matemática Mexicana, 5, 102-119.  Bryson, A.E. and Ho, Y.C. (1975) Applied Optimal Control. Hemisphere, Washington, DC.  Bagchi, A. (1993) Optimal Control of Stochastic Systems. Prentice-Hall, New York.  Ahmed, N.U. (1999) Linear and Nonlinear Filtering for Scientists and Engineers. World Scientific Publishers, Singapore. https://doi.org/10.1142/3911  Simon, D. (2006) Optimal State Estimation: Kalman, H-Infinity and Nonlinear Approaches. John Wiley & Sons, Hoboken, NJ. https://doi.org/10.1002/0470045345  Liu, H.F., Zhang, Y., Chen, S.F. and Chen, J. (2012) Autonomous Vehicle Trajectory Planning under Uncertainty Using Stochastic Collocation. Advanced Materials Research, 580, 175-179. https://doi.org/10.4028/www.scientific.net/AMR.580.175  Zhou, Y. and Wu, Z. (2013) Mean-Variance Portfolio Selection with Margin Requirements. Journal of Mathematics, 2013, Article ID 726297.  Li, X.P., Yu, C., Zhang, J.Y., Zhou, J.J. and Zhang, L.M. (2013) Instantaneous Stochastic Optimal Control of Seismically Excited Structures Based on Time Domain Explicit Method. Advanced Materials Research, 790, 215-218. https://doi.org/10.4028/www.scientific.net/AMR.790.215  Liu, J., Yiu, K.F.C., Loxton, R. and Teo, K.L. (2013) Optimal Investment and Proportional Reinsurance with Risk Constraint. Journal of Mathematical Finance, 3, 437-447. https://doi.org/10.4236/jmf.2013.34046  Abushov, Q. and Aghayeva, C. (2014) Stochastic Maximum Principle for Nonlinear Optimal Control Problem of Switching Systems. Journal of Computational and Applied Mathematics Part B, 259, 371-376. https://doi.org/10.1016/j.cam.2013.06.010  Sun, Y., Aw, G., Loxton, R. and Teo, K.L. (2014) An Optimal Machine Maintenance Problem with Probabilistic State Constraints. Information Sciences, 281, 386-398. https://doi.org/10.1016/j.ins.2014.05.051  Basimanebotlhe, O. and Xue, X. (2014) Stochastic Optimal Control to a Nonlinear Differential Game. Advances in Difference Equations, 2014, 1-14. https://doi.org/10.1186/1687-1847-2014-266  Xiong, H. and Zhu, W. (2015) Nonlinear Stochastic Optimal Control of Viscoelastic Systems. Journal of Vibration and Control, 21, 1029-1040. https://doi.org/10.1177/1077546313489589  Kek, S.L., Teo, K.L. and Ismail, A.A.M. (2010) An Integrated Optimal Control Algorithm for Discrete-Time Nonlinear Stochastic System. International Journal of Control, 83, 2536-2545. https://doi.org/10.1080/00207179.2010.531766  Kek, S.L., Teo, K.L. and Ismail, A.A.M. (2012) Filtering Solution of Nonlinear Stochastic Optimal Control Problem in Discrete-Time with Model-Reality Differences. Numerical Algebra, Control and Optimization, 2, 207-222. https://doi.org/10.3934/naco.2012.2.207  Kek, S.L., Ismail, A.A.M., Teo, K.L. and Rohanin, A. (2013) An Iterative Algorithm Based on Model-Reality Differences for Discrete-Time Nonlinear Stochastic Optimal Control Problems. Numerical Algebra, Control and Optimization, 3, 109-125. https://doi.org/10.3934/naco.2013.3.109  Kek, S.L., Teo, K.L. and Ismail, A.A.M. (2014) Efficient Output Solution for Nonlinear Stochastic Optimal Control Problem with Model-Reality Differences. Mathematical Problems in Engineering, 2014, Article ID 659506.  Kek, S.L., Li, J. and Teo, K.L. (2017) Least Squares Solution for Discrete Time Nonlinear Stochastic Optimal Control Problem with Model-Reality Differences. Applied Mathematics, 8, 1-14. https://doi.org/10.4236/am.2017.81001  Kek, S.L., Li, J., Leong, W.J. and Ismail, A.A.M. (2017), A Gauss-Newton Approach for Nonlinear Optimal Control Problem with Model-Reality Differences. Open Journal of Optimization (OJOp), 6, 85-100. https://doi.org/10.4236/ojop.2017.63007  Robbins, H. and Monro, S. (1951) A Stochastic Approximation Method. The Annals of Mathematical Statistics, 22, 400-407. https://doi.org/10.1214/aoms/1177729586  Kiefer, J. and Wolfowitz, J. (1952) Stochastic Estimation of the Maximum of a Regression Function. The Annals of Mathematical Statistics, 23, 462-466. https://doi.org/10.1214/aoms/1177729392  Sacks, J. (1958) Asymptotic Distribution of Stochastic Approximation Procedures. The Annals of Mathematical Statistics, 29, 373-405. https://doi.org/10.1214/aoms/1177706619  Martin, R. and Masreliez, C. (1975) Robust Estimation via Stochastic Approximation. IEEE Transactions on Information Theory, 21, 263-271. https://doi.org/10.1109/TIT.1975.1055386  Nemirovski, A. and Yudin, D. (1983) Problem Complexity and Method Efficiency in Optimization. John Wiley, New York.  Polyak, B.T. and Juditsky, A.B. (1992) Acceleration of Stochastic Approximation by Averaging. SIAM Journal on Control and Optimization, 30, 838-855. https://doi.org/10.1137/0330046  Kushner, H.J. and Yin, G.G. (1997) Stochastic Approximation Algorithms and Applications. Springer, New York. https://doi.org/10.1007/978-1-4899-2696-8  Nemirovski, A., Juditsky, A., Lan, G. and Shapiro, A. (2009) Robust Stochastic Approximation Approach to Stochastic Programming. SIAM Journal on Optimization, 19, 1574-1609. https://doi.org/10.1137/070704277  Sin, K.S. and Goodwin, G.C. (1982) Stochastic Adaptive Control Using a Modified Least Squares Algorithm. Automatica, 18, 315-321. https://doi.org/10.1016/0005-1098(82)90091-7  Spall, J.C. and Cristion, J.A. (1998) Model-Free Control of Nonlinear Stochastic Systems with Discrete-Time Measurements. IEEE Transactions on Automatic Control, 43, 1198-1210. https://doi.org/10.1109/9.718605  Spall, J.C. (2000) Adaptive Stochastic Approximation by the Simultaneous Perturbation Method. IEEE Transactions on Automatic Control, 45, 1839-1853. https://doi.org/10.1109/TAC.2000.880982  Spall, J.C. (2003) Introduction to Stochastic Search and Optimization: Estimation, Simulation and Control. John Wiley & Sons, Inc, New York. https://doi.org/10.1002/0471722138  Aksakalli, V. and Ursu, D. (2006) Control of Nonlinear Stochastic Systems: Model-Free Controllers versus Linear Quadratic Regulators. Proceedings of the 45th IEEE Conference on Decision and Control (CDC ’06), San Diego, CA, December 2006, 4145-4150. https://doi.org/10.1109/CDC.2006.377721  Lewis, F.L., Vrabie, V. and Symos, V.L. (20012) Optimal Control. 3rd Edition, John Wiley & Sons, Inc, New York.  Kirk, D.E. (2004) Optimal Control Theory: An Introduction. Dover Publications, Mineola, NY. 