Solving Riccati-Type Nonlinear Differential Equations with Novel Artificial Neural Networks ()

Roseline N. Okereke^{}, Olaniyi S. Maliki^{}

Department of Mathematics, Michael Okpara University of Agriculture, Umudike, Nigeria.

**DOI: **10.4236/am.2021.1210060
PDF HTML XML
160
Downloads
845
Views
Citations

Department of Mathematics, Michael Okpara University of Agriculture, Umudike, Nigeria.

In this study we investigate neural network solutions to nonlinear differential equations of Ricatti-type. We employ a feed-forward Multilayer Perceptron Neural Network (MLPNN), but avoid the standard back-propagation algorithm for updating the intrinsic weights. Our objective is to minimize an error, which is a function of the network parameters i.e., the weights and biases. Once the weights of the neural network are obtained by our systematic procedure, we need not adjust all the parameters in the network, as postulated by many researchers before us, in order to achieve convergence. We only need to fine-tune our biases which are fixed to lie in a certain given range, and convergence to a solution with an acceptable minimum error is achieved. This greatly reduces the computational complexity of the given problem. We provide two important ODE examples, the first is a Ricatti type differential equation to which the procedure is applied, and this gave us perfect agreement with the exact solution. The second example however provided us with only an acceptable approximation to the exact solution. Our novel artificial neural networks procedure has demonstrated quite clearly the function approximation capabilities of ANN in the solution of nonlinear differential equations of Ricatti type.

Keywords

Share and Cite:

Okereke, R. and Maliki, O. (2021) Solving Riccati-Type Nonlinear Differential Equations with Novel Artificial Neural Networks. *Applied Mathematics*, **12**, 919-930. doi: 10.4236/am.2021.1210060.

1. Introduction

We present a new perspective for obtaining solutions of initial value problems of Ricatti-type [1], using Artificial Neural Networks (ANN). This is an extension of the procedure developed by Okereke [2]. We discover that neural network based model for the solution of ordinary differential equations (ODE) provides a number of advantages over standard numerical methods. Firstly, the neural network based solution is differentiable and is in closed analytic form. On the other hand most other techniques offer a discretized solution or a solution with limited differentiability. Secondly, the neural network based method for solving differential equations provides a solution with very good generalization properties. The major advantage here is that our method reduces considerably the computational complexity involved in weight updating, while maintaining satisfactory accuracy.

1.1. Neural Network Structure

A neural network is an inter-connection of processing elements, units or nodes, whose functionality resemble that of the human neurons. The processing ability of the network is stored in the connection strengths, simply called weights, which can be obtained by a process of adaptation to, a set of training patterns. Neural network methods can solve both ordinary and partial differential equations. Furthermore, it relies on the function approximation property of feed forward neural networks which results in a solution written in a closed analytic form. This form employs a feed forward neural network as a basic approximation element. Training of the neural network can be done either by any optimization technique which in turn requires the computation of the gradient of the error with respect to the network parameters, by regression based model or by basis function approximation.

1.2. Neural Networks are Universal Approximators

Artificial neural network can make a nonlinear mapping from the inputs to the outputs of the corresponding system of neurons which is suitable for analyzing the problem defined by initial/boundary value problems that have no analytical solutions or which cannot be easily computed. One of the applications of the multilayer feed forward neural network is the global approximation of real valued multivariable function in a closed analytic form. Namely such neural networks are universal approximators. It has been find out in the literature that multilayer feed forward neural networks with one hidden layer using arbitrary squashing functions are capable of approximating any Borel measurable function from one finite dimensional space to another with any desired degree of accuracy. This is made clear in the following theorem.

1.3. Universal Approximation Theorem

The universal approximation theorem for MLP was proved by Cybenko [3] and Hornik *et al.* [4] in 1989. Let
${I}_{n}$ represent an *n*-dimensional unit cube containing all possible input samples
$x=\left({x}_{1},{x}_{2},\cdots ,{x}_{n}\right)$ with
${x}_{i}\in \left[0,1\right]$,
$i=1,2,\cdots ,n$. Let
$C\left({I}_{n}\right)$ be the space of continuous functions on
${I}_{n}$, given a continuous sigmoid function
$\phi (\cdot )$, then the universal approximation theorem states that the finite sums of the form

${y}_{k}={y}_{k}\left(x,w\right)={\displaystyle \underset{i=1}{\overset{{N}_{2}}{\sum}}{w}_{ki}^{3}}\phi \left({\displaystyle \underset{j=0}{\overset{n}{\sum}}{w}_{ki}^{2}{x}_{j}}\right)\text{,}k=1,2,\cdots ,m$ (1)

are dense in $C\left({I}_{n}\right)$. This simply means that given any function $f\in C\left({I}_{n}\right)$ and $\epsilon >0$, there is a sum $y\left(x,w\right)$ of the above form that satisfies

$\left|y\left(x,w\right)-f\left(x\right)\right|<\epsilon ,\text{}\forall x\in {I}_{n}$. (2)

1.4. Learning in Neural Networks

A neural network has to be configured such that the application of a set of inputs produces the desired set of outputs. Various methods to set the strengths of the connection exist. One way is to set the weights explicitly, using priory knowledge. Another way is to train the neural network by feeding it, teaching patterns and letting it change its weights according to some learning rule. The term learning is widely used in the neural network field to describe this process; it might be formally described as: determining an optimized set of weights based on the statistics of the examples. The learning classification situations in neural networks may be classified into distinct sorts of learning: supervised learning, unsupervised learning, reinforcement learning and competitive learning [5].

1.5. Gradient Computation with Respect to Network Inputs

Next step is to compute the gradient with respect to input vectors, for this purpose let us consider a multilayer perceptron (MLP) neural network [6] with *n* input units, a hidden layer with *m* sigmoid units and a linear output unit. For a given input vector
$x=\left({x}_{1},{x}_{2},\cdots ,{x}_{n}\right)$ the output of the network is written:

$N\left(x,p\right)={\displaystyle \underset{i=1}{\overset{m}{\sum}}{v}_{j}\phi \left({z}_{j}\right)}$, ${z}_{j}={\displaystyle \underset{i=1}{\overset{n}{\sum}}{w}_{ji}{x}_{i}+{u}_{j}}$. (3)

${w}_{ji}$ denotes the weight from input unit 𝑖 to the hidden unit 𝑗, ${v}_{j}$ denotes weight from the hidden unit 𝑗 to the output unit, ${u}_{j}$ denotes the biases, and $\phi \left({z}_{j}\right)$ is the sigmoid activation function.

Now the derivative of networks output *N* with respect to input vector
${x}_{i}$ is:

$\frac{\partial}{\partial {x}_{i}}N\left(x,p\right)=\frac{\partial}{\partial {x}_{i}}\left({\displaystyle \underset{j=1}{\overset{m}{\sum}}{v}_{j}\phi \left({z}_{j}\right)}\right)={\displaystyle \underset{j=1}{\overset{m}{\sum}}{v}_{j}{w}_{ji}}{\phi}^{\left(1\right)}$ (4)

where
${\phi}^{\left(1\right)}\equiv \partial \phi \left(x\right)/\partial x$. Similarly, the *k*^{th} derivative of *N* is computed as;
${\partial}^{k}N/\partial {x}_{i}^{k}={\displaystyle \underset{j=1}{\overset{m}{\sum}}{v}_{j}{w}_{ji}^{k}}{\phi}_{j}^{(\; k\; )}$

Where
${\phi}_{j}\equiv \phi \left({z}_{j}\right)$ and
${\phi}^{\left(k\right)}$ denotes the *k*^{th} order derivative of the sigmoid activation function.

2. General Formulation for Differential Equations

Let us consider the following general differential equations which represent both ordinary and partial differential equations Majidzadeh [7]:

$G\left(x,\psi \left(x\right),\nabla \psi \left(x\right),{\nabla}^{2}\psi \left(x\right),\cdots \right)=0,\text{\hspace{0.17em}}\text{\hspace{0.17em}}\forall \text{\hspace{0.17em}}x\in D,$ (5)

subject to some initial or boundary conditions, where
$x=\left({x}_{1},{x}_{2},\cdots ,{x}_{n}\right)\in {\mathbb{R}}^{n}$,
$D\subset {\mathbb{R}}^{n}$ denotes the domain, and
$\psi \left(x\right)$ is the unknown scalar-valued solution to be computed. Here, *G* is the function which defines the structure of the differential equation and
$\nabla $ is a differential operator. Let
${\psi}_{t}\left(x,p\right)$ denote the trail solution with parameters (weights, biases) *p*. Legaris *et al*. [8] gave the following as the general formulation for the solution of differential Equations (4) using ANN. Now,
${\psi}_{t}\left(x,p\right)$ may be written as the sum of two terms

${\psi}_{t}\left(x,p\right)=A\left(x\right)+F\left(x,N\left(x,p\right)\right)$ (6)

where
$A\left(x\right)$ satisfies initial or boundary condition and contains no adjustable parameters, whereas
$N\left(x,p\right)$ is the output of feed forward neural network with the parameters *p* and input data *x*. The function
$F\left(x,N\left(x,p\right)\right)$ is actually the operational model of the neural network. Feed forward neural network (FFNN) converts differential equation problem to function approximation problem. The neural network
$N\left(x,p\right)$ is given by

$N\left(x,p\right)={\displaystyle \underset{j=1}{\overset{m}{\sum}}{v}_{j}\sigma \left({z}_{j}\right)}$, ${z}_{j}={\displaystyle {\sum}_{i=1}^{n}{w}_{ji}{x}_{i}+{u}_{j}}$. (7)

${w}_{ji}$ denotes the weight from input unit 𝑖 to the hidden unit *j*,
${v}_{j}$ denotes weight from the hidden unit *j* to the output unit,
${u}_{j}$ denotes the biases, and
$\sigma \left({z}_{j}\right)$ is the sigmoid activation function.

2.1. Neural Network Training

The neural network weights determine the closeness of predicted outcome to the desired outcome. If the neural network weights are not able to make the correct prediction, then only the biases need to be adjusted. The basis function we shall apply in this work in training the neural network is the sigmoid activation function given by

$\sigma \left({z}_{j}\right)={\left(1+{\text{e}}^{-{z}_{j}}\right)}^{-1}$. (8)

2.2. Neural Network Model for Solving First Order Nonlinear ODE

Let us consider the first order ordinary differential equation below

${\psi}^{\prime}\left(x\right)=f\left(x,\psi \right),\text{}x\in \left[a,b\right]$ (9)

with initial condition
$\psi \left(a\right)=A$. In this case we assume the function *f* is nonlinear in its argument. The ANN trial solution may be written as

${\psi}_{t}\left(x,p\right)=A+xN\left(x,p\right),$ (10)

where
$N\left(x,p\right)$ is the neural output of the feed forward network with one input data *x* with parameters *p*. The trial solution
${\psi}_{t}\left(x,p\right)$ satisfies the initial condition. To solve this problem using neural network (NN), we shall employ a NN architecture with three layers. One input layer with one neuron; one hidden layer with *n* neurons and one output layer with one output unit, as depicted in Figure 1 below.

Each neuron is connected to other neurons of the previous layer through adaptable synaptic weights ${w}_{1j}$ and biases ${u}_{j}$. Now, ${\psi}_{t}\left({x}_{i},p\right)=A+{x}_{i}N\left({x}_{i},p\right)$ with

$N\left(x,p\right)={\displaystyle \underset{j=1}{\overset{n}{\sum}}{v}_{j}\sigma \left(x{w}_{j}+{u}_{j}\right),}\text{\hspace{0.17em}}\text{\hspace{0.17em}}{z}_{j}=x{w}_{j}+{u}_{j}$. (11)

It is possible to have Multi-layered perceptrons with more than three layers, in which case we have more hidden layers [9] [10]. The most important application of multilayered perceptrons is their ability in function approximation. The Kolmogorov existence theorem guarantees that a three-layered perceptron with
$n\left(2n+1\right)$ nodes can compute any continuous function of *n *variables [11] [12]. The accuracy of the approximation depends only on the number of neurons in the hidden layer and not on the number of the hidden layers [13]. For the purpose of numerical computation, as mentioned previously, our sigmoidal activation function
$\sigma (\cdot )$ for the hidden units of our neural network is taken to be;

$\sigma \left(z\right)={\left(1+{\text{e}}^{-z}\right)}^{-1}$ (12)

with the property that;

${\sigma}^{\prime}\left(z\right)=\sigma \left(z\right)\left(1-\sigma \left(z\right)\right)$. (13)

The trial solution ${\psi}_{t}\left(x,p\right)$ satisfies the initial condition. We differentiate the trial solution ${\psi}_{t}\left(x,p\right)$ to get

$\frac{\text{d}{\psi}_{t}\left(x,p\right)}{\text{d}x}=N\left(x,p\right)+x\frac{\text{d}N\left(x,p\right)}{\text{d}x},$ (14)

We observe that;

$\frac{\text{d}N\left(x,p\right)}{\text{d}x}={\displaystyle \underset{j=1}{\overset{n}{\sum}}{v}_{j}\frac{\text{d}}{\text{d}x}\sigma \left(x{w}_{j}+{u}_{j}\right)}={\displaystyle \underset{j=1}{\overset{n}{\sum}}{v}_{j}{w}_{j}{\sigma}^{\prime}(\; z\; j\; )}$

$\Rightarrow \text{}\frac{\text{d}N\left(x,p\right)}{\text{d}x}={\displaystyle \underset{j=1}{\overset{n}{\sum}}{v}_{j}{w}_{j}\sigma \left({z}_{j}\right)}\left(1-\sigma \left({z}_{j}\right)\right)$

Figure 1. Schematic for $N\left(x,p\right)$.

For evaluating the derivative term in the right hand side of (32), we use equations (7) and (26)-(31).

The error function for this case is formulated as;

$E\left(p\right)={\displaystyle \underset{i=1}{\overset{n}{\sum}}{\left(\frac{\text{d}{\psi}_{t}\left({x}_{i},p\right)}{\text{d}{x}_{i}}-f\left({x}_{i},{\psi}_{t}\left({x}_{i},p\right)\right)\right)}^{2}}$. (15)

Minimization of the above error function is considered as a procedure for training the neural network, where the error corresponding to each input vector $x$ is the value $f\left(x\right)$ which has to become zero. In computing this error value, we require the network output as well as the derivatives of the output with respect to the input vectors. Therefore, while computing error with respect to the network parameters, we need to compute not only the gradient of the network but also the gradient of the network derivatives with respect to its inputs [14]. This process can be quite tedious computationally, and in this work we avoid this cumbersome process by introducing the novel procedure outlined in this paper.

3. Numerical Example

The Riccati equation is a nonlinear ordinary differential equation of first order of the form:

${y}^{\prime}\left(x\right)=p\left(x\right)y+q\left(x\right){y}^{2}+r\left(x\right)$ (16)

where
$p\left(x\right),q\left(x\right),r\left(x\right)$ are continuous functions of *x*. Neural network method can also solve this type of ODE. We show how our new approach can solve this type of ODE by redefining the neural network with respect to the form the ODE takes. Specifically, we consider the initial value problem:

${y}^{\prime}\left(x\right)=2y\left(x\right)-{y}^{2}\left(x\right)+1,\text{}y\left(0\right)=0,\text{}x\in \left[0,1\right]$, (17)

which was solved by Otadi and Mosleh (2011) [15]. The exact solution is $y\left(x\right)=\sqrt{2}tanh\left(x\sqrt{2}\right)$.

The trial solution is given by ${y}_{t}\left(x\right)=A+x\aleph \left(x,p\right)$. Applying the initial conditions gives $A=0$. Therefore ${y}_{t}\left(x\right)=x\aleph \left(x,p\right)$. This solution obviously satisfies the given initial condition. We observe that in Equation (17), the term ${y}^{2}\left(x\right)$ is what makes the ODE nonlinear. Also this term cannot be separated from $2y\left(x\right)$. Therefore, we incorporate $2y\left(x\right)-{y}^{2}\left(x\right)$ into the neural network to take care of the nonlinearity seen in the given differential equation. Thus, the new neural network becomes,

$\aleph \left(x,p\right)={\displaystyle \underset{j}{\overset{m}{\sum}}{v}_{j}\left[2\sigma \left({z}_{j}\right)-{\sigma}^{2}\left({z}_{j}\right)\right]}={\displaystyle \underset{j}{\overset{m}{\sum}}{v}_{j}\sigma \left({z}_{j}\right)\left[2-\sigma \left({z}_{j}\right)\right]}$ (18)

The error to be minimized is

$E=\frac{1}{2}{\displaystyle \underset{i=1}{\overset{n}{\sum}}{\left\{\frac{\text{d}}{\text{d}t}{y}_{t}\left({x}_{i},p\right)-\left[2{y}_{t}\left({x}_{i},p\right)-{y}_{t}^{2}\left({x}_{i},p\right)+1\right]\right\}}^{2}}$ (19)

where the set $\left\{{x}_{i},i=1,\cdots ,n\right\}$ are the discrete points in the interval $\left[0,1\right]$. We proceed as follows.

To compute the weights
${w}_{j},j=1,2,3$ from the input layer to the hidden layer (Figure 1), we construct a function
$\vartheta \left(x\right)$ such that
$w={\varphi}^{-1}f$, *f* and
$\varphi $. In particular, for
$x=\left({x}_{1},{x}_{2},{x}_{3}\right)$,
$f\left(x\right)={\left(\vartheta \left({x}_{1}\right)\text{,}\vartheta \left({x}_{2}\right)\text{,}\vartheta \left({x}_{3}\right)\right)}^{\text{T}}$. Here *N* = 3 and the solution
$w={\varphi}^{-1}f$ is given by;

$\left[\begin{array}{c}{w}_{1}\\ {w}_{2}\\ {w}_{3}\end{array}\right]={\left[\begin{array}{ccc}{\phi}_{1}\left({x}_{1}\right)& {\phi}_{2}\left({x}_{1}\right)& {\phi}_{3}\left({x}_{1}\right)\\ {\phi}_{1}\left({x}_{2}\right)& {\phi}_{2}\left({x}_{2}\right)& {\phi}_{3}\left({x}_{2}\right)\\ {\phi}_{1}\left({x}_{3}\right)& {\phi}_{2}\left({x}_{3}\right)& {\phi}_{3}\left({x}_{3}\right)\end{array}\right]}^{-1}\left[\begin{array}{c}{f}_{1}\\ {f}_{2}\\ {f}_{3}\end{array}\right]$ (20)

Here;

${\phi}_{i}\left(x\right)=\mathrm{exp}\left(-\frac{{\left|x-{x}_{i}\right|}^{2}}{2{\sigma}^{2}}\right),\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\sigma}^{2}=\frac{1}{N}{\displaystyle \underset{i=1}{\overset{N}{\sum}}{\left({x}_{i}-\stackrel{\xaf}{x}\right)}^{2}},\text{\hspace{0.17em}}\text{\hspace{0.17em}}\stackrel{\xaf}{x}=\frac{1}{N}{\displaystyle \underset{i=1}{\overset{N}{\sum}}{x}_{i}}$ (21)

The above is the so-called Gaussian Radial Basis function (GRBF) approximation model. To obtain the weights ${\nu}_{j},j=1,2,3$ from hidden layer to the output layer, we construct another function $\theta \left(x\right)$ such that $\nu ={\varphi}^{-1}f$, where, $f\left(x\right)={\left(\theta \left({x}_{1}\right)\text{,}\theta \left({x}_{2}\right)\text{,}\theta \left({x}_{3}\right)\right)}^{\text{T}}$, $x=\left({x}_{1},{x}_{2},{x}_{3}\right)$ and $\varphi $ is given in Equation (20). We only need to replace the ${w}_{j}$ ’s by the ${\nu}_{j}$ ’s, $j=1,2,3$.

The exact form of $f\left(x\right)$ depends on the nature of a given differential equation. This will be made clear below. The nonlinear differential Equation (17) is rewritten as; ${y}^{\prime}\left(x\right)-2y\left(x\right)+{y}^{2}\left(x\right)=1$.

We now form a linear function based on the default sign of the differential equation, *i.e. *
$\vartheta \left(x\right)=ax-b$, where *a* is the coefficient of the derivative of *y *and *b* is the coefficient of *y* (*i.e.*
$a=1,b=-2$ ). Thus;

$\vartheta \left(x\right)=x+2,\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}f\left(x\right)={\left(\vartheta \left({x}_{1}\right)\text{,}\vartheta \left({x}_{2}\right)\text{,}\vartheta \left({x}_{3}\right)\right)}^{\text{T}}={\left(2.1,2.2,2.3\right)}^{\text{T}}$, for $x={\left(0.1,0.2,0.3\right)}^{\text{T}}$.

This we apply to get the weights from input layer to the hidden layer. Thus $\text{}f={\left(2.1,2.2,2.3\right)}^{\text{T}}$ and $w={\varphi}^{-1}f$

$\Rightarrow \text{}\left[\begin{array}{c}{w}_{1}\\ {w}_{2}\\ {w}_{3}\end{array}\right]={\left[\begin{array}{ccc}\text{1}& \text{0}\text{.94}& \text{0}\text{.78}\\ \text{0}\text{.94}& \text{1}& \text{0}\text{.94}\\ \text{0}\text{.78}& \text{0}\text{.94}& \text{1}\end{array}\right]}^{-1}\left[\begin{array}{c}2.1\\ 2.2\\ 2.3\end{array}\right]$ (22)

Hence, the weights from the input layer to the hidden layer are

$\left[\begin{array}{c}{w}_{1}\\ {w}_{2}\\ {w}_{3}\end{array}\right]=\left[\begin{array}{ccc}\text{41}\text{.335}& -\text{73}\text{.437}& \text{36}\text{.79}\\ -\text{73}\text{.437}& \text{139}\text{.062}& -\text{73}\text{.437}\\ \text{36}\text{.79}& -\text{73}\text{.437}& \text{41}\text{.335}\end{array}\right]\left[\begin{array}{c}2.1\\ 2.2\\ 2.3\end{array}\right],\text{\hspace{0.17em}}\text{\hspace{0.17em}}\left[\begin{array}{c}{w}_{1}\\ {w}_{2}\\ {w}_{3}\end{array}\right]=\left[\begin{array}{c}9.858\\ -17.187\\ 10.767\end{array}\right]$ (23)

The weights from input layer to the hidden layer are: ${w}_{1}=9.858,{w}_{2}=-17.187,{w}_{3}=10.767$.

In order to get the weights from the hidden layer to the output layer, we now apply the forcing function which in this case is a constant function. That is, $\theta \left(x\right)=1$, which is a constant function.

$\Rightarrow \text{}\stackrel{^}{f}={\left(\theta \left({x}_{1}\right)\text{,}\theta \left({x}_{2}\right)\text{,}\theta \left({x}_{3}\right)\right)}^{\text{T}}={\left(1,\text{1,1}\right)}^{\text{T}}$ (24)

$\theta \left(x\right)$ being the nonhomogeneous term. With $v={\varphi}^{-1}\stackrel{^}{f}$ the weights from the hidden layer to the output layer are given by

$\left[\begin{array}{c}{v}_{1}\\ {v}_{2}\\ {v}_{3}\end{array}\right]={\left[\begin{array}{ccc}\text{1}& \text{0}\text{.94}& \text{0}\text{.78}\\ \text{0}\text{.94}& \text{1}& \text{0}\text{.94}\\ \text{0}\text{.78}& \text{0}\text{.94}& \text{1}\end{array}\right]}^{-1}\left[\begin{array}{c}1\\ 1\\ 1\end{array}\right]=\left[\begin{array}{ccc}41.335& -73.437& 36.79\\ -73.437& \text{139}\text{.067}& -73.437\\ 36.79& -73.437& \text{41}\text{.335}\end{array}\right]\left[\begin{array}{c}1\\ 1\\ 1\end{array}\right]$

$\Rightarrow \text{}\left[\begin{array}{c}{v}_{1}\\ {v}_{2}\\ {v}_{3}\end{array}\right]=\left[\begin{array}{c}4.687\\ -7.812\\ 4.687\end{array}\right]$ (25)

Thus the weights from the hidden layer to the output layer are: ${v}_{1}=4.687,{v}_{2}=-7.812,{v}_{3}=4.687$.

The biases are fixed between −20 and 20. We now train the network with the available parameters using our MathCAD 14 [16] algorithm (computer output) as follows:

$\begin{array}{l}{\text{w}}_{\text{1}}:=9.858{\text{w}}_{\text{2}}:=-17.187{\text{w}}_{3}:=10.767\text{x}:=1\text{}\\ {\text{v}}_{1}:=4.687{\text{v}}_{\text{2}}:=-7.812{\text{v}}_{3}:=4.687{\text{u}}_{1}:=-20{\text{u}}_{2}:=10{\text{u}}_{3}:=-12.534\\ {\text{z}}_{1}:={\text{w}}_{\text{1}}\cdot \text{x}+{\text{u}}_{1}=-10.142{\text{z}}_{2}:={\text{w}}_{2}\cdot \text{x}+{\text{u}}_{2}=-7.187{\text{z}}_{3}:={\text{w}}_{3}\cdot \text{x}+{\text{u}}_{3}=-1.767\\ \text{\sigma}\left({\text{z}}_{1}\right):={\left[1+\mathrm{exp}\left({\text{z}}_{1}\right)\right]}^{-1}=3.9388\times {10}^{-5}\text{,\sigma}\left({\text{z}}_{2}\right):={\left[1+\mathrm{exp}\left({\text{z}}_{2}\right)\right]}^{-1}=7.5578\times {10}^{-4}\text{,}\\ \text{\sigma}\left({\text{z}}_{3}\right):={\left[1+\mathrm{exp}\left({\text{z}}_{3}\right)\right]}^{-1}=0.1459\\ \aleph :={\text{v}}_{1}\cdot \text{\sigma}\left({\text{z}}_{1}\right)\cdot \left(2-\text{\sigma}\left({\text{z}}_{1}\right)\right)+{\text{v}}_{2}\cdot \text{\sigma}\left({\text{z}}_{2}\right)\cdot \left(2-\text{\sigma}\left({\text{z}}_{2}\right)\right)+{\text{v}}_{3}\cdot \text{\sigma}\left({\text{z}}_{3}\right)\cdot \left(2-\text{\sigma}\left({\text{z}}_{3}\right)\right)=1.256457\\ {\text{y}}_{\text{p}}\left(\text{x}\right):=\text{x}\cdot \aleph =1.256457,{\text{y}}_{\text{d}}\left(\text{x}\right):=\sqrt{2}\cdot \mathrm{tanh}\left(x\cdot \sqrt{2}\right)=\text{1}\text{.256367}\\ \text{E}:=0.5\cdot {\left({\text{y}}_{\text{d}}\left(\text{x}\right)-{\text{y}}_{\text{p}}\left(\text{x}\right)\right)}^{2}=4.05\times {10}^{-4}\end{array}$

The plots of the exact and predicted values in Table 1 are depicted in Figure 2 below.

Example

We consider the initial value problem:

${x}^{2}{y}^{\prime}+{x}^{2}{y}^{2}=2,\text{}y\left(\frac{1}{2}\right)=0,\text{}x\in \left(0,1\right]$ (26)

The exact solution is easily computed as: $y\left(x\right)=\left(8{x}^{3}-1\right){\left(x+4{x}^{4}\right)}^{-1}$.

Our trial solution for the given problem is ${y}_{t}\left(x\right)=A+x\aleph \left(x,p\right)$. Applying the initial conditions gives

$A=-\frac{1}{2}\aleph \left(\frac{1}{2},p\right)$. Therefore, ${y}_{t}\left(x\right)=-\frac{1}{2}\aleph \left({\scriptscriptstyle \frac{1}{2}},p\right)+x\aleph \left(x,p\right)$ (27)

In Equation (26), the nonlinear term
${y}^{2}\left(x\right)$ is alone in the ode (*i.e.* dividing

Table 1. Comparison of the results.

Figure 2. Plot of Y Exact and Y Predicted.

out rightly by ${x}^{2}$ ). Therefore, our neural network for this problem takes the form:

$\aleph \left(x,p\right)={\displaystyle \underset{j}{\overset{3}{\sum}}{v}_{j}{\sigma}^{2}\left({z}_{j}\right)=}{\displaystyle \underset{j}{\overset{3}{\sum}}{v}_{j}\sigma \left({z}_{j}\right)\left[\sigma \left({z}_{j}\right)\right]}$ (28)

We form algebraic equation of degree one with the default sign of the ode. Thus $\vartheta \left(x\right)=ax+b$, ( $a={x}^{2},\text{\hspace{0.17em}}b=0$ ). Hence $\vartheta \left(x\right)={x}^{3}\Rightarrow f\left(x\right)={\left(0.001,0.008,0.027\right)}^{\text{T}}$, for $x={\left(0.1,0.2,0.3\right)}^{\text{T}}$

This we apply to get the weights from input layer to the hidden layer. We employ the GRBF here for the weights $w={\varphi}^{-1}f$. Hence;

$\left[\begin{array}{c}{w}_{1}\\ {w}_{2}\\ {w}_{3}\end{array}\right]={\left[\begin{array}{ccc}\text{1}& \text{0}\text{.94}& \text{0}\text{.78}\\ \text{0}\text{.94}& \text{1}& \text{0}\text{.94}\\ \text{0}\text{.78}& \text{0}\text{.94}& \text{1}\end{array}\right]}^{-1}\left[\begin{array}{c}0.001\\ 0.008\\ 0.027\end{array}\right]\text{}\Rightarrow \text{}\left[\begin{array}{c}{w}_{1}\\ {w}_{2}\\ {w}_{3}\end{array}\right]=\left[\begin{array}{c}0.447\\ -0.944\\ 0.565\end{array}\right]$ (29)

The weights from input layer to the hidden layer are: ${w}_{1}=0.447,\text{\hspace{0.17em}}\text{\hspace{0.17em}}{w}_{2}=-0.944,\text{\hspace{0.17em}}\text{\hspace{0.17em}}{w}_{3}=0.565$.

We now use the forcing function, a constant function in this case, to get the weights from the hidden layer to the output layer. That is, $\theta \left(x\right)=2\Rightarrow \stackrel{^}{f}\left(x\right)={\left(2,2,2\right)}^{\text{T}}$ for $x={\left(0.1,0.2,0.3\right)}^{\text{T}}$. Hence, the weights $v={\varphi}^{-1}\stackrel{^}{f}$ from the hidden layer to the output layer are;

$\left[\begin{array}{c}{v}_{1}\\ {v}_{2}\\ {v}_{3}\end{array}\right]={\left[\begin{array}{ccc}\text{1}& \text{0}\text{.94}& \text{0}\text{.78}\\ \text{0}\text{.94}& \text{1}& \text{0}\text{.94}\\ \text{0}\text{.78}& \text{0}\text{.94}& \text{1}\end{array}\right]}^{-1}\left[\begin{array}{c}2\\ 2\\ 2\end{array}\right]\text{}\Rightarrow \text{}\left[\begin{array}{c}{v}_{1}\\ {v}_{2}\\ {v}_{3}\end{array}\right]=\left[\begin{array}{c}9.375\\ -15.625\\ 9.375\end{array}\right]$ (30)

The weights from the hidden layer to the output layer are: ${v}_{1}=9.375,\text{\hspace{0.17em}}\text{\hspace{0.17em}}{v}_{2}=-15.625,\text{\hspace{0.17em}}\text{\hspace{0.17em}}{v}_{3}=9.375$.

The biases are fixed between −10 and 10. We now train the network with the available parameters using our MathCAD 14 algorithm as follows:

$\begin{array}{l}{\text{w}}_{\text{1}}:=1.234{\text{w}}_{\text{2}}:=-2.725{\text{w}}_{3}:=1.716\text{x}:=1\text{}\\ {\text{v}}_{1}:=9.375{\text{v}}_{\text{2}}:=-15.625{\text{v}}_{3}:=9.375{\text{u}}_{1}:=-7{\text{u}}_{2}:=-4{\text{u}}_{3}:=-7\\ {\text{z}}_{1}:={\text{w}}_{\text{1}}\cdot \text{x}+{\text{u}}_{1}=-5.766{\text{z}}_{2}:={\text{w}}_{2}\cdot \text{x}+{\text{u}}_{2}=-6.725{\text{z}}_{3}:={\text{w}}_{3}\cdot \text{x}+{\text{u}}_{3}=-5.284\end{array}$

$\begin{array}{l}\text{\sigma}\left({\text{z}}_{1}\right):={\left[1+\mathrm{exp}\left({\text{z}}_{1}\right)\right]}^{-1}=0.998,\text{\sigma}\left({\text{z}}_{2}\right):={\left[1+\mathrm{exp}\left({\text{z}}_{2}\right)\right]}^{-1}=0.995,\\ \text{\sigma}\left({\text{z}}_{3}\right):={\left[1+\mathrm{exp}\left({\text{z}}_{3}\right)\right]}^{-1}=0.998\\ \aleph \left(0.5\right):={\text{v}}_{1}\cdot \text{\sigma}{\left(\text{0}\text{.5}\cdot {\text{w}}_{\text{1}}+{\text{u}}_{1}\right)}^{2}+{\text{v}}_{2}\cdot \text{\sigma}{\left(\text{0}\text{.5}\cdot {\text{w}}_{2}+{\text{u}}_{2}\right)}^{2}+{\text{v}}_{3}\cdot \text{\sigma}{\left(\text{0}\text{.5}\cdot {\text{w}}_{3}+{\text{u}}_{3}\right)}^{2}=3.199\\ \aleph :={\text{v}}_{1}\cdot \text{\sigma}{\left({\text{z}}_{1}\right)}^{2}+{\text{v}}_{2}\cdot \text{\sigma}{\left({\text{z}}_{2}\right)}^{2}+{\text{v}}_{3}\cdot \text{\sigma}{\left({\text{z}}_{3}\right)}^{2}=3.01\\ {\text{y}}_{\text{p}}\left(\text{x}\right):=-0.5\cdot \aleph \left(0.5\right)+\text{x}\cdot \aleph =1.41,{\text{y}}_{\text{d}}\left(\text{x}\right):=\left(8\cdot {x}^{3}-1\right){\left(x+4\cdot {x}^{4}\right)}^{-1}=1.4,\\ \text{E}:=0.5\cdot {\left({\text{y}}_{\text{d}}\left(\text{x}\right)-{\text{y}}_{\text{p}}\left(\text{x}\right)\right)}^{2}=5\times {10}^{-5}\end{array}$

Table 2. Comparison of the results.

Figure 3. Plot of Y Exact and Y Pred.

The plots of the exact and predicted values in Table 2 are depicted in Figure 3.

4. Conclusion

A novel Neural Network approach was developed recently by Okereke, for solving first and second order linear ordinary differential equations. In this article, the procedure is now extended in this article to investigate neural network solutions to nonlinear differential equations of Ricatti-type. Specifically, we employ a feed-forward Multilayer Perceptron Neural Network (MLPNN), but avoid the standard back-propagation algorithm for updating the intrinsic weights. This greatly reduces the computational complexity of the given problem. For desired accuracy our objective is to minimize an error, which is a function of the network parameters *i.e.*, the weights and biases. Once the weights of the neural network are obtained by our systematic procedure, we need not adjust all the parameters in the network, as postulated by many researchers before us, in order to achieve convergence. We only need to fine-tune our biases which are fixed to lie in a certain given interval, and convergence to a solution with an acceptable minimum error is achieved. The first example ODE of Ricatti type to which the procedure is applied gave us perfect agreement with the exact solution. The second example however provided us with only an acceptable approximation to the exact solution. This has demonstrated quite clearly the function approximation capabilities of ANN in the solution of nonlinear differential equations of Ricatti type. The above method still requires some refinement so that it can be generalized to solve any type of nonlinear differential equation including partial differential equations.

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

[1] | Polyanin, A.D. and Zaitsev, V.F. (2003) Handbook of Exact Solutions for Ordinary Differential Equations. 2nd Edition, Chapman & Hall/CRC, Boca Raton. |

[2] | Okereke, R.N. (2019) A New Perspective to the Solution of Ordinary Differential Equations Using Artificial Neural Networks. Ph.D Dissertation, Mathematics Department, Michael Okpara University of Agriculture, Umudike. |

[3] |
Cybenco, G. (1989) Approximation by Superposition of a Sigmoidal Function. Mathematics of Control, Signals and Systems, 2, 303-314. https://doi.org/10.1007/BF02551274 |

[4] |
Hornic, K., Stinchcombe, M. and White, H. (1989) Multilayer Feed forward Networks Are Universal Approximators. Neural Networks, 2, 359-366. https://doi.org/10.1016/0893-6080(89)90020-8 |

[5] | Graupe, D. (2007) Principles of Artificial Neural Networks. Vol. 6, 2nd Edition, World Scientific Publishing Co. Pte. Ltd., Singapore. |

[6] |
Rumelhart, D.E. and McClelland, J.L. (1986) Parallel Distributed Processing, Explorations in the Microstructure of Cognition I and II. MIT Press, Cambridge. https://doi.org/10.7551/mitpress/5236.001.0001 |

[7] |
Majidzadeh, K. (2011) Inverse Problem with Respect to Domain and Artificial Neural Network Algorithm for the Solution. Mathematical Problems in Engineering, 2011, Article ID: 145608, 16 p. https://doi.org/10.1155/2011/145608 |

[8] | Lagaris, I.E., Likas, A.C. and Fotiadis D.I. (1997) Artificial Neural Network for Solving Ordinary and Partial Differential Equations. arXiv: physics/9705023v1. |

[9] | Chen, R.T.Q., Rubanova, Y., Bettencourt, J. and Duvenaud, D. (2018) Neural Ordinary Differential Equations. arXiv: 1806.07366v1. |

[10] |
Mall, S. and Chakraverty, S. (2013) Comparison of Artificial Neural Network Architecture in Solving Ordinary Differential Equations. Advances in Artificial Neural Systems, 2013, Article ID: 181895. https://doi.org/10.1155/2013/181895 |

[11] | Gurney, K. (1997) An Intorduction to Neural Networks. UCL Press, London. |

[12] |
Samath, J.A., Kumar, P.S. and Begum, A. (2010) Solution of Linear Electrical Circuit Problem Using Neural Networks. International Journal of Computer Applications, 2, 6-13. https://doi.org/10.5120/618-869 |

[13] | Werbos, P.J. (1974) Beyond Recognition, New Tools for Prediction and Analysis in the Behavioural Sciences. Ph.D. Thesis, Harvard University, Cambridge. |

[14] |
Manoj, K. and Yadav, N. (2011) Multilayer Perceptrons and Radial Basis Function Neural Network Methods for the Solution of Differential Equations, A Survey. Computers and Mathematics with Applications, 62, 3796-3811. https://doi.org/10.1016/j.camwa.2011.09.028 |

[15] | Otadi, M. and Mosleh, M. (2011) Numerical Solution of Quadratic Riccati Differential Equations by Neural Network. Mathematical Sciences, 5, 249-257. |

[16] |
PTC (Parametric Technology Corporation) (2007) Mathcad Version 14. http://communications@ptc.com |

Journals Menu

Contact us

customer@scirp.org | |

+86 18163351462(WhatsApp) | |

1655362766 | |

Paper Publishing WeChat |

Copyright © 2023 by authors and Scientific Research Publishing Inc.

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.