Solving Riccati-Type Nonlinear Differential Equations with Novel Artificial Neural Networks ()
1. Introduction
We present a new perspective for obtaining solutions of initial value problems of Ricatti-type [1], using Artificial Neural Networks (ANN). This is an extension of the procedure developed by Okereke [2]. We discover that neural network based model for the solution of ordinary differential equations (ODE) provides a number of advantages over standard numerical methods. Firstly, the neural network based solution is differentiable and is in closed analytic form. On the other hand most other techniques offer a discretized solution or a solution with limited differentiability. Secondly, the neural network based method for solving differential equations provides a solution with very good generalization properties. The major advantage here is that our method reduces considerably the computational complexity involved in weight updating, while maintaining satisfactory accuracy.
1.1. Neural Network Structure
A neural network is an inter-connection of processing elements, units or nodes, whose functionality resemble that of the human neurons. The processing ability of the network is stored in the connection strengths, simply called weights, which can be obtained by a process of adaptation to, a set of training patterns. Neural network methods can solve both ordinary and partial differential equations. Furthermore, it relies on the function approximation property of feed forward neural networks which results in a solution written in a closed analytic form. This form employs a feed forward neural network as a basic approximation element. Training of the neural network can be done either by any optimization technique which in turn requires the computation of the gradient of the error with respect to the network parameters, by regression based model or by basis function approximation.
1.2. Neural Networks are Universal Approximators
Artificial neural network can make a nonlinear mapping from the inputs to the outputs of the corresponding system of neurons which is suitable for analyzing the problem defined by initial/boundary value problems that have no analytical solutions or which cannot be easily computed. One of the applications of the multilayer feed forward neural network is the global approximation of real valued multivariable function in a closed analytic form. Namely such neural networks are universal approximators. It has been find out in the literature that multilayer feed forward neural networks with one hidden layer using arbitrary squashing functions are capable of approximating any Borel measurable function from one finite dimensional space to another with any desired degree of accuracy. This is made clear in the following theorem.
1.3. Universal Approximation Theorem
The universal approximation theorem for MLP was proved by Cybenko [3] and Hornik et al. [4] in 1989. Let
represent an n-dimensional unit cube containing all possible input samples
with
,
. Let
be the space of continuous functions on
, given a continuous sigmoid function
, then the universal approximation theorem states that the finite sums of the form
(1)
are dense in
. This simply means that given any function
and
, there is a sum
of the above form that satisfies
. (2)
1.4. Learning in Neural Networks
A neural network has to be configured such that the application of a set of inputs produces the desired set of outputs. Various methods to set the strengths of the connection exist. One way is to set the weights explicitly, using priory knowledge. Another way is to train the neural network by feeding it, teaching patterns and letting it change its weights according to some learning rule. The term learning is widely used in the neural network field to describe this process; it might be formally described as: determining an optimized set of weights based on the statistics of the examples. The learning classification situations in neural networks may be classified into distinct sorts of learning: supervised learning, unsupervised learning, reinforcement learning and competitive learning [5].
1.5. Gradient Computation with Respect to Network Inputs
Next step is to compute the gradient with respect to input vectors, for this purpose let us consider a multilayer perceptron (MLP) neural network [6] with n input units, a hidden layer with m sigmoid units and a linear output unit. For a given input vector
the output of the network is written:
,
. (3)
denotes the weight from input unit 𝑖 to the hidden unit 𝑗,
denotes weight from the hidden unit 𝑗 to the output unit,
denotes the biases, and
is the sigmoid activation function.
Now the derivative of networks output N with respect to input vector
is:
(4)
where
. Similarly, the kth derivative of N is computed as;
Where
and
denotes the kth order derivative of the sigmoid activation function.
2. General Formulation for Differential Equations
Let us consider the following general differential equations which represent both ordinary and partial differential equations Majidzadeh [7]:
(5)
subject to some initial or boundary conditions, where
,
denotes the domain, and
is the unknown scalar-valued solution to be computed. Here, G is the function which defines the structure of the differential equation and
is a differential operator. Let
denote the trail solution with parameters (weights, biases) p. Legaris et al. [8] gave the following as the general formulation for the solution of differential Equations (4) using ANN. Now,
may be written as the sum of two terms
(6)
where
satisfies initial or boundary condition and contains no adjustable parameters, whereas
is the output of feed forward neural network with the parameters p and input data x. The function
is actually the operational model of the neural network. Feed forward neural network (FFNN) converts differential equation problem to function approximation problem. The neural network
is given by
,
. (7)
denotes the weight from input unit 𝑖 to the hidden unit j,
denotes weight from the hidden unit j to the output unit,
denotes the biases, and
is the sigmoid activation function.
2.1. Neural Network Training
The neural network weights determine the closeness of predicted outcome to the desired outcome. If the neural network weights are not able to make the correct prediction, then only the biases need to be adjusted. The basis function we shall apply in this work in training the neural network is the sigmoid activation function given by
. (8)
2.2. Neural Network Model for Solving First Order Nonlinear ODE
Let us consider the first order ordinary differential equation below
(9)
with initial condition
. In this case we assume the function f is nonlinear in its argument. The ANN trial solution may be written as
(10)
where
is the neural output of the feed forward network with one input data x with parameters p. The trial solution
satisfies the initial condition. To solve this problem using neural network (NN), we shall employ a NN architecture with three layers. One input layer with one neuron; one hidden layer with n neurons and one output layer with one output unit, as depicted in Figure 1 below.
Each neuron is connected to other neurons of the previous layer through adaptable synaptic weights
and biases
. Now,
with
. (11)
It is possible to have Multi-layered perceptrons with more than three layers, in which case we have more hidden layers [9] [10]. The most important application of multilayered perceptrons is their ability in function approximation. The Kolmogorov existence theorem guarantees that a three-layered perceptron with
nodes can compute any continuous function of n variables [11] [12]. The accuracy of the approximation depends only on the number of neurons in the hidden layer and not on the number of the hidden layers [13]. For the purpose of numerical computation, as mentioned previously, our sigmoidal activation function
for the hidden units of our neural network is taken to be;
(12)
with the property that;
. (13)
The trial solution
satisfies the initial condition. We differentiate the trial solution
to get
(14)
We observe that;
For evaluating the derivative term in the right hand side of (32), we use equations (7) and (26)-(31).
The error function for this case is formulated as;
. (15)
Minimization of the above error function is considered as a procedure for training the neural network, where the error corresponding to each input vector
is the value
which has to become zero. In computing this error value, we require the network output as well as the derivatives of the output with respect to the input vectors. Therefore, while computing error with respect to the network parameters, we need to compute not only the gradient of the network but also the gradient of the network derivatives with respect to its inputs [14]. This process can be quite tedious computationally, and in this work we avoid this cumbersome process by introducing the novel procedure outlined in this paper.
3. Numerical Example
The Riccati equation is a nonlinear ordinary differential equation of first order of the form:
(16)
where
are continuous functions of x. Neural network method can also solve this type of ODE. We show how our new approach can solve this type of ODE by redefining the neural network with respect to the form the ODE takes. Specifically, we consider the initial value problem:
, (17)
which was solved by Otadi and Mosleh (2011) [15]. The exact solution is
.
The trial solution is given by
. Applying the initial conditions gives
. Therefore
. This solution obviously satisfies the given initial condition. We observe that in Equation (17), the term
is what makes the ODE nonlinear. Also this term cannot be separated from
. Therefore, we incorporate
into the neural network to take care of the nonlinearity seen in the given differential equation. Thus, the new neural network becomes,
(18)
The error to be minimized is
(19)
where the set
are the discrete points in the interval
. We proceed as follows.
To compute the weights
from the input layer to the hidden layer (Figure 1), we construct a function
such that
, f and
. In particular, for
,
. Here N = 3 and the solution
is given by;
(20)
Here;
(21)
The above is the so-called Gaussian Radial Basis function (GRBF) approximation model. To obtain the weights
from hidden layer to the output layer, we construct another function
such that
, where,
,
and
is given in Equation (20). We only need to replace the
’s by the
’s,
.
The exact form of
depends on the nature of a given differential equation. This will be made clear below. The nonlinear differential Equation (17) is rewritten as;
.
We now form a linear function based on the default sign of the differential equation, i.e.
, where a is the coefficient of the derivative of y and b is the coefficient of y (i.e.
). Thus;
, for
.
This we apply to get the weights from input layer to the hidden layer. Thus
and
(22)
Hence, the weights from the input layer to the hidden layer are
(23)
The weights from input layer to the hidden layer are:
.
In order to get the weights from the hidden layer to the output layer, we now apply the forcing function which in this case is a constant function. That is,
, which is a constant function.
(24)
being the nonhomogeneous term. With
the weights from the hidden layer to the output layer are given by
(25)
Thus the weights from the hidden layer to the output layer are:
.
The biases are fixed between −20 and 20. We now train the network with the available parameters using our MathCAD 14 [16] algorithm (computer output) as follows:
The plots of the exact and predicted values in Table 1 are depicted in Figure 2 below.
Example
We consider the initial value problem:
(26)
The exact solution is easily computed as:
.
Our trial solution for the given problem is
. Applying the initial conditions gives
. Therefore,
(27)
In Equation (26), the nonlinear term
is alone in the ode (i.e. dividing
![]()
Table 1. Comparison of the results.
![]()
Figure 2. Plot of Y Exact and Y Predicted.
out rightly by
). Therefore, our neural network for this problem takes the form:
(28)
We form algebraic equation of degree one with the default sign of the ode. Thus
, (
). Hence
, for
This we apply to get the weights from input layer to the hidden layer. We employ the GRBF here for the weights
. Hence;
(29)
The weights from input layer to the hidden layer are:
.
We now use the forcing function, a constant function in this case, to get the weights from the hidden layer to the output layer. That is,
for
. Hence, the weights
from the hidden layer to the output layer are;
(30)
The weights from the hidden layer to the output layer are:
.
The biases are fixed between −10 and 10. We now train the network with the available parameters using our MathCAD 14 algorithm as follows:
![]()
Table 2. Comparison of the results.
The plots of the exact and predicted values in Table 2 are depicted in Figure 3.
4. Conclusion
A novel Neural Network approach was developed recently by Okereke, for solving first and second order linear ordinary differential equations. In this article, the procedure is now extended in this article to investigate neural network solutions to nonlinear differential equations of Ricatti-type. Specifically, we employ a feed-forward Multilayer Perceptron Neural Network (MLPNN), but avoid the standard back-propagation algorithm for updating the intrinsic weights. This greatly reduces the computational complexity of the given problem. For desired accuracy our objective is to minimize an error, which is a function of the network parameters i.e., the weights and biases. Once the weights of the neural network are obtained by our systematic procedure, we need not adjust all the parameters in the network, as postulated by many researchers before us, in order to achieve convergence. We only need to fine-tune our biases which are fixed to lie in a certain given interval, and convergence to a solution with an acceptable minimum error is achieved. The first example ODE of Ricatti type to which the procedure is applied gave us perfect agreement with the exact solution. The second example however provided us with only an acceptable approximation to the exact solution. This has demonstrated quite clearly the function approximation capabilities of ANN in the solution of nonlinear differential equations of Ricatti type. The above method still requires some refinement so that it can be generalized to solve any type of nonlinear differential equation including partial differential equations.