A Novel Method for Solving Ordinary Differential Equations with Artificial Neural Networks

Roseline N. Okereke; Olaniyi S. Maliki; Ben I. Oruh

doi:10.4236/am.2021.1210059

Applied Mathematics > Vol.12 No.10, October 2021

A Novel Method for Solving Ordinary Differential Equations with Artificial Neural Networks

Roseline N. Okereke, Olaniyi S. Maliki, Ben I. Oruh
Department of Mathematics, Michael Okpara University of Agriculture, Umudike, Nigeria.
DOI: 10.4236/am.2021.1210059 PDF HTML XML 601 Downloads 3,488 Views Citations

Abstract

This research work investigates the use of Artificial Neural Network (ANN) based on models for solving first and second order linear constant coefficient ordinary differential equations with initial conditions. In particular, we employ a feed-forward Multilayer Perceptron Neural Network (MLPNN), but bypass the standard back-propagation algorithm for updating the intrinsic weights. A trial solution of the differential equation is written as a sum of two parts. The first part satisfies the initial or boundary conditions and contains no adjustable parameters. The second part involves a feed-forward neural network to be trained to satisfy the differential equation. Numerous works have appeared in recent times regarding the solution of differential equations using ANN, however majority of these employed a single hidden layer perceptron model, incorporating a back-propagation algorithm for weight updation. For the homogeneous case, we assume a solution in exponential form and compute a polynomial approximation using statistical regression. From here we pick the unknown coefficients as the weights from input layer to hidden layer of the associated neural network trial solution. To get the weights from hidden layer to the output layer, we form algebraic equations incorporating the default sign of the differential equations. We then apply the Gaussian Radial Basis function (GRBF) approximation model to achieve our objective. The weights obtained in this manner need not be adjusted. We proceed to develop a Neural Network algorithm using MathCAD software, which enables us to slightly adjust the intrinsic biases. We compare the convergence and the accuracy of our results with analytic solutions, as well as well-known numerical methods and obtain satisfactory results for our example ODE problems.

Keywords

Ordinary Differential Equations, Multilayer Perceptron Neural Networks, Gaussian Radial Basis Function, Network Training, MathCAD (Computer Aided Design) 14, IBM-SPSS (Statistical Package for Social Science) 23

Share and Cite:

Okereke, R. , Maliki, O. and Oruh, B. (2021) A Novel Method for Solving Ordinary Differential Equations with Artificial Neural Networks. Applied Mathematics, 12, 900-918. doi: 10.4236/am.2021.1210059.

1. Introduction

The beginning of Neuro-computing is often taken to be the research article of McCulloch and Pitts [1] published in 1943, which showed that even simple types of neural networks could, in principle, compute any arithmetic or logical function, was widely read and had great influence. Other researchers, principally von Neumann, wrote a book [2] in which the suggestion was made that the research into the design of brain-like or brain-inspired computers might be of great interest and benefit to scientific and technological knowledge.

We present a new perspective for obtaining solutions of initial value problems using Artificial Neural Networks (ANN). We discover that neural network based model for the solution of ordinary differential equations (ODE) provides a number of advantages over standard numerical methods. Firstly, the neural network based solution is differentiable and is in closed analytic form. On the other hand most other techniques offer a discretized solution or a solution with limited differentiability. Secondly, the neural network based method for solving a differential equation provides a solution with very good generalization properties. The major advantage here is that our method reduces considerably the computational complexity involved in weight updating, while maintaining satisfactory accuracy.

Neural Network Structure

A neural network is an inter-connection of processing elements, units or nodes, whose functionality resemble that of the human neurons [3]. The processing ability of the network is stored in the connection strengths, simply called weights, which can be obtained by a process of adaptation to, a set of training patterns. Neural network methods can solve both ordinary and partial differential equations. Furthermore, it relies on the function approximation property of feed forward neural networks which results in a solution written in a closed analytic form. This form employs a feed forward neural network as a basic approximation element [4] [5]. Training of the neural network can be done either by any optimization technique which in turn requires the computation of the gradient the error with respect to the network parameters, by regression based model or by basis function approximation. In any of these methods, a trial solution of the differential equation is written as a sum of two parts, proposed by Lagaris [6]. The first part satisfies the initial or boundary conditions and contains no adjustable parameters. The second part contains some adjustable parameters that involves feed forward neural network and is constructed in a way that does not affect the initial or boundary conditions. Through the construction, the trial solution, initial or boundary conditions are satisfied and the network is trained to satisfy the differential equation. The general flowchart for neural network training (or learning) is given below in Figure 1.

2. Neural Networks as Universal Approximators

Artificial neural networks can make a nonlinear mapping from the inputs to the outputs of the corresponding system of neurons, which is suitable for analyzing the problem defined by initial/boundary value problems that have no analytical solutions or which cannot be easily computed. One of the applications of the multilayer feed forward neural network is the global approximation of real valued multivariable function in a closed analytic form. Namely such neural networks are universal approximators. It is discovered in the literature that multilayer feed forward neural networks with one hidden layer using arbitrary squashing functions are capable of approximating any Borel measurable function from one finite dimensional space to another with any desired degree of accuracy. This is made clear in the following theorem.

Universal Approximation Theorem

The universal approximation theorem for MLP was proved by Cybenko [7] and Hornik et al. [8] in 1989. Let $I_{n}$ represent an n-dimensional unit cube containing all possible input samples $x = (x_{1}, x_{2}, \dots, x_{n})$ with $x_{i} \in [0, 1]$ , $i = 1, 2, \dots, n$ . Let $C (I_{n})$ be the space of continuous functions on $I_{n}$ , given a continuous sigmoid function $φ (\cdot)$ , then the universal approximation theorem states that the finite sums of the form

$y_{k} = y_{k} (x, w) = \sum_{i = 1}^{N_{2}} w_{k i}^{3} φ (\sum_{j = 0}^{n} w_{k i}^{2} x_{j}), k = 1, 2, \dots, m$ (1)

are dense in $C (I_{n})$ . This simply means that given any function $f \in C (I_{n})$ and $ε > 0$ , there is a sum $y (x, w)$ of the above form that satisfies

$| y (x, w) - f (x) | < ε, \forall x \in I_{n}$ . (2)

Figure 1. Network training flowchart.

3. Gradient Computation

Minimization of error function can also be considered as a procedure for training the neural network [9], where the error corresponding to each input vector $x$ is the value $f (x)$ which has to become zero. In computing this error value, we require the network output as well as the derivatives of the output with respect to the input vectors. Therefore, while computing error with respect to the network parameters, we need to compute not only the gradient of the network but also the gradient of the network derivatives with respect to its inputs. This process can be quite tedious computationally, and we briefly outline this in what follows.

3.1. Gradient Computation with Respect to Network Inputs

Next step is to compute the gradient with respect to input vectors, for this purpose let us consider a multilayer perceptron (MLP) neural network with n input units, a hidden layer with m sigmoid units and a linear output unit. For a given input vector $x = (x_{1}, x_{2}, \dots, x_{n})$ the network output is written:

$N (x, p) = \sum_{i = 1}^{m} v_{j} φ (z_{j})$ , $z_{j} = \sum_{i = 1}^{n} w_{j i} x_{i} + u_{j}$ (3)

$w_{j i}$ denotes the weight from input unit 𝑖 to the hidden unit 𝑗, $v_{j}$ denotes weight from the hidden unit 𝑗 to the output unit, $u_{j}$ denotes the biases, and $φ (z_{j})$ is the sigmoid activation function.

Now the derivative of networks output N with respect to input vector $x_{i}$ is:

$\frac{\partial}{\partial x_{i}} N (x, p) = \frac{\partial}{\partial x_{i}} (\sum_{j = 1}^{m} v_{j} φ (z_{j})) = \sum_{j = 1}^{m} v_{j} w_{j i} φ^{(1)}$ (4)

where $φ^{(1)} \equiv \partial φ (x) / \partial x$ . Similarly, the k^th derivative of N is computed as; $\partial^{k} N / \partial x_{i}^{k} = \sum_{j = 1}^{m} v_{j} w_{j i}^{k} φ_{j}^{( k )}$

Where $φ_{j} \equiv φ (z_{j})$ and $φ^{(k)}$ denotes the k^th order derivative of the sigmoid activation function.

3.2. Gradient Computation with Respect to Network Parameters

Network’s derivative with respect to any of its inputs is equivalent to a feed-forward neural network $N_{k} (x)$ with one hidden layer, having the same values for the weights $w_{j i}$ and thresholds $u_{j}$ and with each weight $v_{j}$ being replaced with $v_{j} p_{j}$ . Moreover, the transfer function of each hidden unit is replaced with the k^th order derivative of the sigmoid function. Therefore, the gradient of $N_{k}$ with respect to the parameters of the original network can easily obtained as:

3.3. Network Parameter Updation

After computation of derivative of the error with respect to the network parameter has been defined then the network parameters $v_{j}$ , $u_{j}$ and $w_{j i}$ updation rule is given as,

$ν_{j} (t + 1) = ν_{j} (t) + μ \frac{\partial N_{k}}{\partial ν_{j}}$ , $u_{j} (t + 1) = u_{j} (t) + η \frac{\partial N_{k}}{\partial u_{j}}$ , $w_{j i} (t + 1) = w_{j i} (t) + γ \frac{\partial N_{k}}{\partial w_{j i}}$ (5)

where $μ$ , $η$ and $γ$ are the learning rates, $i = 1, 2, \dots, n$ and $j = 1, 2, \dots, m$ .

Once a derivative of the error with respect to the network parameters has been defined it is then straightforward to employ any optimization technique to minimize error function.

4. General Formulation for Differential Equations

Let us consider the following general differential equations which represent both ordinary and partial differential equations Majidzadeh [10]:

$G (x, ψ (x), \nabla ψ (x), \nabla^{2} ψ (x), \dots) = 0, \forall x \in D,$ (6)

subject to some initial or boundary conditions, where $x = (x_{1}, x_{2}, \dots, x_{n}) \in ℝ^{n}$ , $D \subset ℝ^{n}$ denotes the domain, and $ψ (x)$ is the unknown scalar-valued solution to be computed. Here, G is the function which defines the structure of the differential equation and $\nabla$ is a differential operator. Let $ψ_{t} (x, p)$ denote the trail solution with parameters (weights, biases) p. Tian Qiet al. [11], gave the following as the general formulation for the solution of differential equations (5) using ANN. Now, $ψ_{t} (x, p)$ may be written as the sum of two terms

$ψ_{t} (x, p) = A (x) + F (x, N (x, p)),$ (7)

where $A (x)$ satisfies initial or boundary condition and contains no adjustable parameters, whereas $N (x, p)$ is the output of feed forward neural network with the parameters p and input data x. The function $F (x, N (x, p))$ is actually the operational model of the neural network. Feed forward neural network (FFNN) converts differential equation problem to function approximation problem. The neural network $N (x, p)$ is given by;

$N (x, p) = \sum_{j = 1}^{m} v_{j} σ (z_{j})$ , $z_{j} = \sum_{i = 1}^{n} w_{j i} x_{i} + u_{j}$ (8)

$w_{j i}$ denotes the weight from input unit i to the hidden unit j, $v_{j}$ denotes weight from the hidden unit j to the output unit, $u_{j}$ denotes the biases, and $σ (z_{j})$ is the sigmoid activation function.

Neural Network Training

The neural network weights determine the closeness of predicted outcome to the desired outcome. If the neural network weights are not able to make the correct prediction, then only the biases need to be adjusted. The basis function we shall apply in this work in training the neural network is the sigmoid activation function given by

$σ (z_{j}) = {(1 + e^{- z_{j}})}^{- 1}$ . (9)

5. Method for Solving First Order Ordinary Differential Equation

Let us consider the first order ordinary differential equation below

$ψ^{'} (x) = f (x, ψ), x \in [a, b]$ (10)

with initial condition $ψ (a) = A$ . In this case, the ANN trial solution may be written as

$ψ_{t} (x, p) = A + x N (x, p),$ (11)

where $N (x, p)$ is the neural output of the feed forward network with one input data x with parameters p. The trial solution $ψ_{t} (x, p)$ satisfies the initial condition. Now let us consider a first order differential equation:

$ψ^{'} (x) - ψ = 0, x \in [0, 1], ψ (0) = 1$ (12)

with trial solution:

$ψ_{t} (x, p) = A + x N (x, p),$ (13)

where x is the input to neural network model and p represents the parameters—weights and biases.

$w_{t} (0, p) = A + (0) N (0, p) = A = 1$ $\Rightarrow ψ_{t} (x, p) = 1 + x N (x, p),$ (14)

To solve this problem using neural network, we shall employ a neural network architecture with three layers. One input layer with one neuron; one hidden layer with three neurons and one output layer with one output unit, as depicted below in Figure 2.

Each neuron is connected to other neurons of the previous layer through adaptable synaptic weights $w_{j}$ and biases $u_{j}$ . Now, $ψ_{t} (x_{i}, p) = 1 + x_{i} N (x_{i}, p)$ , with $N (x_{i}, p) = \sum_{j = 1}^{3} v_{j} σ (z_{j})$ , and

$\sum_{j = i}^{3} v_{j} σ (z_{j}) = v_{1} σ (z_{1}) + v_{2} σ (z_{2}) + v_{3} σ (z_{3})$ , (15)

where $z_{1} = x_{1} w_{11} + u_{1}, z_{2} = x_{1} w_{12} + u_{2}, z_{3} = x_{1} w_{13} + u_{3}$ .

Figure 2. Schematic for $N (x, p)$ .

Now, in solving ordinary differential equations, we assume a solution to the homogeneous part and approximate the function using SPSS model, which estimates the regression coefficients in the multiple regression mode. These coefficients are what we use as the weights from the input layer to the hidden layer. The condition placed on $y_{a} (x) = f (x)$ , say, where $y_{a} (x)$ is the assumed solution is that $f (x) \neq 0$ .

Any exponential function, $y (x) = e^{α x}$ , where $α \in ℝ$ , is a part of solution to any first order ordinary differential equation. We regress that function using excel spreadsheet and SPSS model as follows:

Assuming we let $y_{a} (x) = f (x) = e^{2 x}, x \in [0, 1]$ , be a solution to a given first order ordinary differential equation, $y^{'} - 2 y = f (x)$ , defined on the given interval. Then dividing the interval into 10 equidistant points, we use excel spreadsheet to find values of $y_{a} (x)$ at all the points shown in Table 1, then use SPSS to regress and get the weights which we designate weights from input layer to hidden layer.

The above is followed by the display of the SPSS 20 output of the data in Table 2, from which we pick the weights.

Looking at Table 2, we see that the cubic curve fits perfectly the assumed solution. Therefore, we pick the coefficients: 2.413, 0.115 and 3.860 as the weights from input layer to the hidden layer.

The next task is to obtain the weights from hidden layer to the output layer. We shall find $f (x)$ , a real function of a real valued vector $x = {(x_{1}, x_{2}, \dots, x_{d})}^{T}$ and a set of functions, ${φ_{i} (x)}$ called the elementary functions such that

$\hat{f} (x, v) = \sum_{i = 1}^{N} v_{i} φ_{i} (x)$ (16)

is satisfied, where $v_{i}$ are real valued constants such that $| f (x) - \hat{f} (x, v) | < ε$ . When one can find coefficients $v_{i}$ that make $ε$ arbitrarily small for any function $f (.)$ over the domain of interest, we say that the elementary function set ${φ_{i} (.)}$ has the property of universal approximation over the class of functions $f (.)$ . There are many possible elementary functions we can choose from which we will show later. Now if the number of input vectors $x_{i}$ is made equal to the

Table 1. Table of values for the relationship between y_a andx.

Table 2. Model summary and parameter estimates.

number of elementary functions $φ_{i} (.)$ , then the normal equations can be given as;

$[\begin{matrix} φ_{1} (x_{1}) & \dots & φ_{N} (x_{1}) \\ ⋮ & ⋱ & ⋮ \\ φ_{1} (x_{N}) & \dots & φ_{N} (x_{N}) \end{matrix}] [\begin{matrix} v_{1} \\ ⋮ \\ v_{N} \end{matrix}] = [\begin{matrix} f (x_{1}) \\ ⋮ \\ f (x_{N}) \end{matrix}]$ (17)

and the solution becomes $v = ϕ^{- 1} f$ , where v becomes a vector with the coefficients, f is a vector composed of the values of the function at the N points, and $ϕ$ the matrix with entries given by values of the elementary functions at each of the N points in the domain. An important condition that must be placed in the elementary functions is that the inverse of $ϕ$ must exist. In general, there are many sets ${φ_{i} (.)}$ with the property of universal approximation for a class of functions. We would prefer a set ${φ_{i} (.)}$ over another ${γ_{i} (.)}$ if ${φ_{i} (.)}$ provides a smaller error ε for a pre-set value of N. This means that the speed of convergence of the approximation is also an important factor in the selection of the basis. As we mentioned earlier, there are many possible elementary basis functions we can choose from. A typical example of basis function is the Gaussian basis specified by:

$G (z) = G (‖ x - c_{i} ‖) = \exp (- z^{2}); i = 1, 2, \dots, N$

However in neuro-computing, the most popular choice for elementary functions is the radial basis function (RBFs) where $φ_{i} (x)$ is given;

$φ_{i} (x) = \exp (- \frac{{| x - x_{i} |}^{2}}{2 σ^{2}})$ (18)

where $σ^{2}$ is the variance of input vectors x. It is this last basis function that we shall adopt in generating our weights from hidden layer to output layer. At this point we shall divide a given interval into a certain number of points equidistant from each other and choose input vectors $x_{i}$ to conform with the number of elementary functions $φ_{i} (.)$ to make the basis function implementable. Here N = 3 and the solution $v = ϕ^{- 1} f$ is given by;

$[\begin{matrix} v_{1} \\ v_{2} \\ v_{3} \end{matrix}] = {[\begin{matrix} φ_{1} (x_{1}) & φ_{2} (x_{1}) & φ_{3} (x_{1}) \\ φ_{1} (x_{2}) & φ_{2} (x_{2}) & φ_{3} (x_{2}) \\ φ_{1} (x_{3}) & φ_{2} (x_{3}) & φ_{3} (x_{3}) \end{matrix}]}^{- 1} [\begin{matrix} f_{1} \\ f_{2} \\ f_{3} \end{matrix}]$ (19)

where $v, ϕ, f$ are as defined before, and

$φ_{i} (x) = \exp (- \frac{{| x - x_{i} |}^{2}}{2 σ^{2}}), σ^{2} = \frac{1}{N} \sum_{i = 1}^{N} {(x_{i} - \bar{x})}^{2}, \bar{x} = \frac{1}{N} \sum_{i = 1}^{N} x_{i}$ (20)

Now to compute $f (x)$ depends on the nature of a given differential equation. For first, second or higher order homogeneous ordinary differential equations, we form linear, quadratic or higher order polynomial equations incorporating the default signs of the terms in the differential equations. For non-homogeneous ordinary differential equations, we use the forcing functions. When the weights of the neural network are obtained by these systematic ways, there is no need to adjust all the parameters in the network, as postulated by previous researchers, in order to achieve convergence. All that is required is a little adjustment of the biases, and these are fixed to lie in a given interval and convergence to a solution with an acceptable minimum error is achieved. When the problem of obtaining the parameters is settled, it becomes easy to solve any first, second or higher order ordinary differential equation using the neural network model appropriate to it. We shall restrict this study to first and second order linear ordinary homogeneous differential equations. To compute the prediction error we use the squared error function defined as follows:

$E = 0.5 {(ψ_{d} - ψ_{p})}^{2}$ (21)

where $ψ_{d}$ represents the desired output and $ψ_{p}$ the predicted output. The same procedure can be applied to second and third order ODE. For the second order initial value problem (IVP):

$ψ^{″} (x) = f (x, ψ, ψ^{'} (x)), ψ (0) = A, ψ^{'} (0) = B$ (22)

The trial solution is written

$ψ_{t} (x) = A + B x + x^{2} N (x, p)$ (23)

where $A, B \in ℝ$ and $N (x_{i}, p) = \sum_{j = 1}^{3} v_{j} σ (z_{j})$ , and for two point BC: $ψ (0) = A, ψ (1) = B$ the trial solution in this case is written: $ψ_{t} (x) = A (1 - x) + B x + x (1 - x) N (x, p)$ .

Now, we first demonstrate the computational complexity involved in adjusting all parameters in order to update the weights and getting a close approximation to the desired result. Subsequently, we proceed to our main results and analysis that displays the ease of computation achieved by our novel method of adjusting only the biases. The former adjustment or weight updation is done using the backpropagation algorithm. Therefore, we need to train the network so we can apply the backpropagation algorithm. The basis function we shall apply in this work in training the neural network is the sigmoid activation function given by Equation (9). In a simple neural model as depicted in Figure 3, where there is no hidden layer and $x = (x_{1}, x_{2}), w = (w_{1}, w_{2})$ , we shall assume some figures for illustration. Let the training data be given as $x = (x_{1}, x_{2}) = (0.1, 0.2)$ ; desired output $y_{d} = 0.02$ ; initial weights $w = (w_{1}, w_{2}) = (0.4, 0.1)$ ; the bias $b = 1.78$ and predicted output $y_{p}$ . The diagram below shows the neural network training model for the sample data we are considering. Now we proceed to train the network to get the predicted output.

5.1. Network Training

First we compute $z = x_{1} \cdot w_{1} + x_{2} \cdot w_{2} + b$ , which is sum of products and bias, i.e.;

$z = x_{1} (w_{1}) + x_{2} (w_{2}) + b = 0.1 (0.4) + 0.2 (0.1) + 1.78 = 1.84$ (24)

Figure 3. Schematic for $N (x, p)$ .

Next we apply z, as input to the activation function, which in this case is the sigmoid activation function:

$σ (z) = {(1 + e^{- z})}^{- 1} = {(1 + e^{- 1.84})}^{- 1} = 0.863$ . Hence the predicted output $y_{p} = 0.863$ .

We have seen that the predicted output does not correspond to the desired output, therefore we have to train the network to reduce the prediction error. To compute the prediction error, we use the squared error (21) function defined as follows. Considering the predicted output we calculated above, the prediction error is:

$E = 0.5 {(0.02 - 0.863)}^{2} = 0.355$ . (25)

We observe that the prediction error is huge, so we must attempt to minimize it. We noted previously that the weights determine how close a predicted output is to the desired output. Therefore to minimize the error we have to adjust the weights. This can be achieved using the formula;

$w_{n} = w_{o} + η (y_{d} - y_{p}) x$ (26)

where $w_{n}$ and $w_{o}$ represent new and old weights respectively. We update the weights using the following:

$w_{o}$ : current weight (1.78, 0.4, 0.1);

$η$ : network learning rate = 0.01;

$y_{d}$ : desired output = 0.02;

x: current input vector = (+1, 0.1, 0.2).

$∴ w_{n} = [1.78, 0.4, 0.1] + 0.01 [0.02 - 0.863] [+ 1, 0.1, 0.2] = [1.772, 0.399, 0.098]$ .

With this information we adjust the model and retrain the neural network to get;

$z = 0.1 (0.399) + 0.2 (0.098) + 1.772 = 1.79$ $\Rightarrow σ (z) = {(1 + e^{- 1.79})}^{- 1} = 0.857$

$∴ E = 0.5 {(0.02 - 0.857)}^{2} = 0.350$ .

5.2. Computation of the Gradient

The error computation not only involves the outputs but also the derivatives of the network output with respect to its inputs. So, it requires computing the gradient of the network derivatives with respect to its inputs. Let us now consider a multilayered perceptron with one input node, a hidden layer with m nodes, and one output unit. For the given inputs $x = (x_{1}, x_{2}, \dots, x_{n})$ , the output is given by

$N (x, p) = \sum_{j = 1}^{m} v_{j} σ (z_{j})$ (27)

where $z_{j} = \sum_{i = 1}^{n} w_{j i} x_{i} + u_{j}$ , $w_{j i}$ denotes the weight from input unit i to the hidden unit j, $v_{j}$ denotes weight from the hidden unit j to the output unit, $u_{j}$ denotes the biases, and $σ (z_{j})$ is the sigmoid activation function. The derivatives of $N (x, p)$ with respect to input $x_{i}$ is

$\frac{\partial^{k} N}{\partial x_{i}^{k}} = \sum_{j = 1}^{m} v_{j} w_{j i}^{k} σ_{j}^{(k)},$ (28)

where $σ = σ (z_{j})$ and $σ^{(k)}$ denotes the k^th order derivative of sigmoid function.

Let $N_{θ}$ denote the derivative of the network with respect to its inputs and then we have the following relation

$N_{θ} = D^{n} N = \sum v_{i} p_{i} σ_{i}^{(n)}$ ; $p_{j} = \prod_{k = 1}^{n} w_{j k}^{λ^{k}}$ , $k = \sum_{i = 1}^{n} λ_{i}$ (29)

The derivative of $N_{θ}$ with respect to other parameters may be obtained as

$\frac{\partial N_{θ}}{\partial v_{j}} = p_{j} σ_{j}^{(k)}$ , $\frac{\partial N_{θ}}{\partial v_{j}} = v_{j} p_{j} σ_{j}^{(k + 1)}$ ,

$\frac{\partial N_{θ}}{\partial w_{j i}} = x_{j} v_{j} p_{j} σ_{j}^{(k + 1)} + v_{j} λ_{i} w_{j i}^{λ_{i} - 1} (\prod_{k = 1, k \neq 1} w_{j i}^{λ_{k}}) σ_{j}^{(k)}$ (30)

Now after getting all the derivatives we can find out the gradient of error. Using general learning method for supervised training we can minimize the error to the desired accuracy. We illustrate the above using the first order ordinary differential equation below

$ψ^{'} (x) = f (x, ψ), x \in [a, b]$ (31)

with initial condition $ψ (a) = A$ . In this case, the ANN trial solution may be written as

$ψ_{t} (x, p) = A + x N (x, p),$ (32)

where $N (x, p)$ is the neural output of the feed forward network with one input data x with parameters p. The trial solution $ψ_{t} (x, p)$ satisfies the initial condition. We differentiate the trial solution $ψ_{t} (x, p)$ to get

$\frac{d ψ_{t} (x, p)}{d x} = N (x, p) + x \frac{d N (x, p)}{d x},$ (33)

For evaluating the derivative term in the right hand side of (32), we use equations (6) and (26)–(31). The error function for this case may be formulated as

$E (p) = \sum_{i = 1}^{n} {(\frac{d w_{t} (x_{i}, p)}{d x_{i}} - f (x_{i}, w_{t} (x_{i}, p)))}^{2}$ (34)

The weights from input to hidden are modified according to the following rule

$w_{j i}^{r + 1} = w_{j i}^{r} - η (\frac{\partial E}{\partial w_{j i}^{r}})$ (35)

where

$\frac{\partial E}{\partial w_{j i}^{r}} = \frac{\partial}{\partial w_{j i}^{r}} (\sum_{i = 1}^{n} {(\frac{d w_{t} (x_{i}, p)}{d x_{i}} - f (x_{i}, w_{t} (x_{i}, p)))}^{2})$ (36)

Here, 𝜂 is the learning rate and r is the iteration step. The weights from hidden to output layer may be updated in a similar formulation as done for input to hidden. Now going back to Equation (27), we recall that $z_{j} = \sum_{i = 1}^{n} w_{j i} x_{i} + u_{j}$

and $N (x, p) = \sum_{j = 1}^{m} v_{j} σ (z_{j})$ .This implies that;

$\begin{matrix} \frac{d N (x, p)}{d x} = \frac{d}{d x} \sum_{j}^{m} v_{j} σ (w_{j i} x_{i} + u_{j}) = \sum_{j}^{m} v_{j} \frac{d}{d x} σ (w_{j i} x_{i} + u_{j}) \\ = \sum_{j}^{m} v_{j} \frac{d}{d x} σ (z_{j}) = \sum_{j}^{m} v_{j} \frac{d}{d z_{j}} σ (z_{j}) \cdot \frac{d z_{j}}{d x} \end{matrix}$ (37)

If the neural network model is a simple one as we saw in Section 3.3, then,

$N (x, p) = σ (z) = σ (x_{1} w_{1} + x_{2} w_{2} + u)$ $\Rightarrow \frac{d N}{d x_{1}} = \frac{\partial σ}{\partial z} \cdot \frac{d z}{d x_{1}}; \frac{d N}{d x_{2}} = \frac{\partial σ}{\partial z} \cdot \frac{d z}{d x_{2}}$

Now let us consider a first order differential equation:

$ψ^{'} (x) - ψ = 0, x \in [0, 1], ψ (0) = 1$ (38)

with trial solution:

$ψ_{t} (x, p) = A + x N (x, p)$ (39)

where x is the input to neural network model and p represents the parameters—weights and biases.

$w_{t} (0, p) = A + (0) N (0, p) = A = 1$ $\Rightarrow ψ_{t} (x, p) = 1 + x N (x, p)$ (40)

To solve this problem using neural network (NN), we shall employ a NN architecture given in Figure 3. Now, $ψ_{t} (x_{i}, p) = 1 + x_{i} N (x_{i}, p)$ , with $N (x_{i}, p) = \sum_{j = 1}^{3} v_{j} σ (z_{j})$ , and

$\sum_{j = 1}^{3} v_{j} σ (z_{j}) = v_{1} σ (z_{1}) + v_{2} σ (z_{2}) + v_{3} σ ( z 3 )$

where $z_{1} = x_{1} w_{11} + u_{1}$ , $z_{2} = x_{1} w_{12} + u_{2}$ and $z_{3} = x_{1} w_{13} + u_{3}$ .

If the neural network model is not able to predict correctly the solution of the differential equation with the given initial parameters—weights and biases, we need to find the prediction error given by

$E (p) = \sum_{i = 1}^{n} {(\frac{d w_{t} (x_{i}, p)}{d x_{i}} - f (x_{i}, w_{t} (x_{i}, p)))}^{2}$ . (41)

If the prediction error does not satisfy an acceptable threshold, then the parameters need to be adjusted using the equation,

$w_{j i}^{r + 1} = w_{j i}^{r} - η (\frac{\partial E}{\partial w_{j i}^{r}})$ , where $\frac{\partial E}{\partial w_{j i}^{r}} = \frac{\partial}{\partial w_{j i}^{r}} \sum_{i = 1}^{n} {(\frac{d w_{t} (x_{i}, p)}{d x_{i}} - f (x_{i}, w_{t} (x_{i}, p)))}^{2}$ (42)

Recall that: $ψ_{t} (x_{i}, p) = 1 + x_{i} N (x_{i}, p)$ , and

$\frac{d w_{t}}{d x_{i}} (x_{i}, p) = \frac{d}{d x_{i}} (1 + x_{i} N (x_{i}, p)) = N (x_{i}, p) + x_{i} \frac{d}{d x_{i}} N (x_{i}, p)$ . (43)

$\begin{array}{l} ∴ \frac{d}{d x_{1}} N (x_{1}, p) = \frac{d}{d x_{1}} \sum_{j = 1}^{3} v_{j} σ (z_{j}) = v_{j} \frac{d}{d x_{1}} \sum_{j = 1}^{3} σ (z_{j}) \\ = v_{1} \frac{d σ}{d z_{1}} \frac{d z_{1}}{d x_{1}} + v_{2} \frac{d σ}{d z_{2}} \frac{d z_{2}}{d x_{1}} + v_{3} \frac{d σ}{d z_{3}} \frac{d z_{3}}{d x_{1}} \end{array}$

$\frac{d}{d x_{1}} N (x_{1}, p) = v_{1} σ^{'} (z_{1}) w_{11} + v_{2} σ^{'} (z_{2}) w_{12} + v_{3} σ^{'} (z_{3}) w_{13}$ (44)

Putting Equations (42) and (43) into Equation (40) gives

$E (p) = \sum_{i = 1}^{n} {((N (x_{i}, p) + x_{i} \frac{d}{d x_{i}} N (x_{i}, p)) - (1 + x N (x, p)))}^{2}$

$∴ E (p) = {(\sum_{j = 1}^{3} v_{j} σ (z_{j}) + x_{1} (\frac{d}{d x_{1}} N (x_{i}, p)) - 1 - x_{1} (\sum_{j = 1}^{3} v_{j} σ (z_{j})))}^{2}$ (45)

$\begin{array}{l} \Rightarrow E (p) = (v_{1} σ (z_{1}) + v_{2} σ (z_{2}) + v_{3} σ (z_{3}) \\ + x_{1} (v_{1} σ^{'} (z_{1}) w_{11} + v + v_{2} σ^{'} (z_{2}) w_{12} + v_{3} σ^{'} (z_{3}) w_{13}) \\ - 1 - {x_{1} (v_{1} σ (z_{1}) + v_{2} σ (z_{2}) + v_{3} σ (z_{3})))}^{2} \end{array}$ (46)

We noted earlier that when the neural network is not able to predict acceptable solution, that is, solution with minimum error, the weights and biases need to be adjusted. This involves complex derivatives with multivariable chain rule [11] [12]. From the ongoing, we need to compute the derivative of Equation (46) with respect to the weights and biases. That is:

$\begin{matrix} \frac{\partial E}{\partial w_{j i}^{r}} = \frac{\partial}{\partial w_{j i}^{r}} (v_{1} σ (z_{1}) + v_{2} σ (z_{2}) + v_{3} σ (z_{3}) \\ + x_{1} (v_{1} σ^{'} (z_{1}) w_{11} + v + v_{2} σ^{'} (z_{2}) w_{12} + v_{3} σ^{'} (z_{3}) w_{13}) \\ - 1 - {x_{1} (v_{1} σ (z_{1}) + v_{2} σ (z_{2}) + v_{3} σ (z_{3})))}^{2} \end{matrix}$ (47)

Similarly, to update the weights from the hidden layer to output layer, we compute $\partial E / \partial v_{j}, j = 1, 2, 3$ . Finally, we update the biases by computing $\partial E / \partial u_{j}, j = 1, 2, 3$ . Equation (47) together with $\partial E / \partial v_{j}$ and $\partial E / \partial u_{j}$ are used to update the weights and biases. The superscript (r) denotes the r^th iteration.

It is important to note that the foregoing is necessary in order to achieve the number of iterations required. Sometimes, it may be necessary to do up to 30 or more iterations for the solution to converge within an acceptable threshold. It is on the basis of this complex derivatives involved in solving ODE with neural network, especially of higher order, and the many iterations required for convergence that motivated our search for a more efficient and accurate way of solving the given problem, but avoiding the inherent computational complexity.

5.3. Results

We begin with a couple of examples on first and second order differential equations.

5.4. Example

Consider the initial value problem;

$y^{'} - α y = 0; y (0) = 1, x \in [0, 1], α \equiv 1.$ (48)

This equation has been solved by Mall and Chakraverty [13]. As discussed in the previous section, the trial solution is given by;

$y_{t} (x) = A + x N (x, p)$

Applying the initial condition gives $A = 1$ , therefore $y_{t} (x) = 1 + x N (x, p)$ .

To obtain the weights from input to hidden layer, it is natural to assume $y_{a} (x) = e^{x}$ . We approximate this function using regression in SPSS. We shall train the network for 10 equidistant points in [0, 1], and then employ excel spreadsheet to find values of $y_{a} (x) = e^{x}$ at all the points. This leads us to the following data in Table 3.

We then perform a curve-fit polynomial regression built into IBM SPSS 23 [14], for the function $y_{a} (x) = e^{x}$ . The output is displayed below in Table 4.

Table 4 displays the linear, quadratic and cubic regression of $y_{a} (x) = e^{x}$ . The quadratic and cubic curves show perfect goodness of fit, $R^{2} = 1$ . Using the cubic curve, we pick our weights from input layer to hidden layer as:

Table 3. Values of $(x, y_{a} (x) = e^{x})$ for problem (47).

Table 4. Model summary and parameter estimates.

$w_{1}_{1} = 1.016, w_{12} = 0.423, w_{13} = 0.279$ . Now to compute the weights from hidden layer to the output layer, we find a function $ϑ (x)$ such that $v = ϕ^{- 1} f$ ; $v, f$ and $ϕ$ are as defined in Section 3. In particular, $f (x) = {(ϑ (x_{1}), ϑ (x_{2}), ϑ (x_{3}))}^{T}$ . We now form a linear function based on the default sign of the differential equation, i.e. $ϑ (x) = a x - b$ , where a is the coefficient of the derivative of y and b is the coefficient of y Thus;

$ϑ (x) = x + 1, f (x) = {(ϑ (x_{1}), ϑ (x_{2}), ϑ (x_{3}))}^{T} = {(1.1, 1 .2,1 .3)}^{T}$

The neural architecture for the neural network is shown in Figure 2, so we let $N = 3$ . We take $x = {(0.1, 0.2, 0.3)}^{T}$ and $f (x) = {(1.1, 1.2, 1.3)}^{T}$ . It then follows that

$v = ϕ^{- 1} f$ , $\Rightarrow [\begin{matrix} v_{1} \\ v_{2} \\ v_{3} \end{matrix}] = {[\begin{matrix} φ_{1} (x_{1}) & φ_{2} (x_{1}) & φ_{3} (x_{1}) \\ φ_{1} (x_{2}) & φ_{2} (x_{2}) & φ_{3} (x_{2}) \\ φ_{1} (x_{3}) & φ_{2} (x_{3}) & φ_{3} (x_{3}) \end{matrix}]}^{- 1} [\begin{matrix} ϑ_{1} \\ ϑ_{2} \\ ϑ_{3} \end{matrix}]$ (49)

where

$φ_{i} (x_{j}) = \exp (- \frac{{(| x_{i} - x_{j} |)}^{2}}{2 σ^{2}}), i = 1, 2, 3; j = 1, 2, 3.$ (50)

Substituting the given values of the vectors x and f, we obtain the weights from the hidden layer to the output layer,

$[\begin{matrix} v_{1} \\ v_{2} \\ v_{3} \end{matrix}] = {[\begin{matrix} 1 & 0 .94 & 0 .78 \\ 0 .94 & 1 & 0 .94 \\ 0 .78 & 0 .94 & 1 \end{matrix}]}^{- 1} [\begin{matrix} 1 .1 \\ 1 .2 \\ 1 .3 \end{matrix}] = [\begin{matrix} 5 .17 \\ - 9 .375 \\ 6 .08 \end{matrix}]$ , (51)

Therefore the weights from the hidden layer to the output layer are; $v_{1} = 5.17, v_{2} = - 9.375, v_{3} = 6.08$ .

The biases are fixed between –1 and 1. We now train the network with the available parameters using MathCAD 14 software [15] as follows:

$\begin{array}{l} w_{1} : = 1.016 w_{2} : = 0.423 w_{3} : = 0.279 x : = 1 \\ v_{1} : = 5.17 v_{2} : = 6.08 u_{1} : = 1 u_{2} : = 0.2251 u_{3} : = - 0.1 \\ z_{1} : = w_{1} \cdot x + u_{1} = 2.016 z_{2} : = w_{2} \cdot x + u_{2} = 0.6481 z_{3} : = w_{3} \cdot x + u_{3} = 0.179 \\ σ (z_{1}) : = {[1 + \exp (z_{1})]}^{- 1} = 0.882467, σ (z_{2}) : = {[1 + \exp (z_{2})]}^{- 1} = 0.656582, \\ σ (z_{3}) : = {[1 + \exp (z_{3})]}^{- 1} = 0.544631 \\ N : = v_{1} \cdot σ (z_{1}) + v_{2} \cdot σ (z_{2}) + v_{3} \cdot σ (z_{3}) = 1.718251 \\ y_{p} (x) : = 1 + x \cdot N = 2.718251, y_{d} (x) : = e^{x} = 2.718282 \\ E : = 0.5 \cdot {(y_{d} (x) - y_{p} (x))}^{2} = 4.707964 \times 10^{- 10} \end{array}$ .

Here y_d and y_p are respectively the desired output (exact solution) and the predicted output (trial solution). From the indicated error value, this is an acceptable accuracy. We compare our results with the neural network results obtained by Otadi and Mosleh [16] and find them to be in reasonable agreement. This is depicted in Table 5, as well as the graphical profile in Figure 4 below.

The perfect accuracy is evident in the graphical profile depicted in Figure 4.

5.5. Remark

In what follows, we consider a non-homogeneous second order linear differential equation. It is important to recall that for any second order non-homogeneous differential equation of the form $y^{″} (x) + a (x) y^{'} + b (x) y = f (x)$ , the non-homogeneous term $f (x)$ is termed the forcing function. In this section, we shall employ the forcing function to compute the weights from hidden layer to the output layer. This is made clear in the following example.

5.6. Example

Consider the initial value problem;

$y^{″} - 4 y = 24 \cos (2 x); y (0) = 3, y^{'} (0) = 4, x \in [0, 1]$ .

The trial solution is $y_{t} (x) = A + B x + x^{2} N (x, p)$ . Applying the initial condition gives $A = 3, B = 4$ .

Therefore, $y_{t} (x) = 3 + 4 x + x^{2} N (x, p)$ , $y_{a} (x) = e^{- 2 x} + e^{- 3 x}$ .

We use excel spreadsheet to find values of $y_{a} (x)$ at all the x points, as displayed in Table 6.

Using regression and SPSS model we find weights from input layer to hidden layer. From Table 7 and using the cubic curve fit with R² = 1, we pick our weights from input layer to hidden layer as:

$w_{1}_{1} = 0.488, w_{12} = 1.697, w_{13} = 3.338$ .

Now to compute the weights from hidden layer to the output layer, we use the function:

Table 5. Comparison of the results.

Figure 4. Plot of Y exact and Y predicted for Example 1.

$\begin{array}{l} ϑ (x) = 24 \cos (2 x), \\ f (x) = {(ϑ (x_{1}), ϑ (x_{2}), ϑ (x_{3}))}^{T} = {(23.999854, 23.999415,23 .998684)}^{T} \end{array}$

With $x = {(0.1, 0.2, 0.3)}^{T}$ . Hence, the weights from the hidden layer to the output layer given by $v = ϕ^{- 1} f$ are;

$[\begin{matrix} v_{1} \\ v_{2} \\ v_{3} \end{matrix}] = {[\begin{matrix} 1 & 0 .94 & 0 .78 \\ 0 .94 & 1 & 0 .94 \\ 0 .78 & 0 .94 & 1 \end{matrix}]}^{- 1} [\begin{matrix} 23.999854 \\ 23.999415 \\ 23.998684 \end{matrix}]$ $\Rightarrow [\begin{matrix} v_{1} \\ v_{2} \\ v_{3} \end{matrix}] = [\begin{matrix} 112.489 \\ - 187.474 \\ 112.483 \end{matrix}]$

The weights from the hidden layer to the output layer are; $v_{1} = 112.489, v_{2} = - 187.474, v_{3} = 112.483$ .

The biases are fixed between −1 and 1. We now train the network with the available parameters using our MathCAD 14 algorithm as follows:

$\begin{array}{l} w_{1} : = 0.488 w_{2} : = 1.697 w_{3} : = 3.338 x : = 1 \\ v_{1} : = 112.489 v_{2} : = - 187.474 v_{3} : = 112.483 u_{1} : = 1 u_{2} : = 1 u_{3} : = - 0.1691 \\ z_{1} : = w_{1} \cdot x + u_{1} = 1.488 z_{2} : = w_{2} \cdot x + u_{2} = 2.697 z_{3} : = w_{3} \cdot x + u_{3} = 3.1689 \\ σ (z_{1}) : = {[1 + \exp (- z_{1})]}^{- 1} = 0.815778, σ (z_{2}) : = {[1 + \exp (- z_{2})]}^{- 1} = 0.936849, \\ σ (z_{3}) : = {[1 + \exp (- z_{3})]}^{- 1} = 0.959647 \\ N : = v_{1} \cdot σ (z_{1}) + v_{2} \cdot σ (z_{2}) + v_{3} \cdot σ (z_{3}) = 24.075112 \\ y_{p} (x) : = 3 + 4 \cdot x + x^{2} \cdot N = 31.075112, \\ y_{d} (x) : = 4 \cdot e^{2 \cdot x} + 2 \cdot e^{- 2 \cdot x} - 3 \cdot \cos (2 \cdot x) = 31.075335 \\ E : = 0.5 \cdot {(y_{d} (x) - y_{p} (x))}^{2} = 2.502862 \times 10^{- 8} \end{array}$ .

We compare the exact and approximate solution in Table 8. The accuracy is clearly depicted graphically in Figure 5.

Table 6. Values of $(x, y_{a} (x) = e^{2 x} + e^{- 2 x})$ .

Table 7. Model summary and parameter estimates.

Table 8. Comparison of the results.

Figure 5. Plot of Y exact and Y predicted for Example 2.

6. Conclusion

In this paper, we have presented a novel approach for solving first and second order linear ordinary differential equations with constant coefficients. Specifically, we employ a feed-forward Multilayer Perceptron Neural Network (MLPNN), but avoid the standard back-propagation algorithm for updating the intrinsic weights. This greatly reduces the computational complexity of the given problem. Our results are validated by the near perfect approximations achieved in comparison with the exact solutions, as well as demonstrating the function approximation capabilities of ANN. This then proves the efficiency of our neural network procedure. We employed Excel spreadsheet, IBM SPSS 23, and MathCAD 14 algorithm to achieve this task.

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

References

[1]	McCulloch, W.S. and Pitts W. (1943) A Logical Calculus of the Ideas Immanent in Nervous Activity. The Bulletin of Mathematical Biophysics, 5, 115-133. https://doi.org/10.1007/BF02478259
[2]	Neumann, J.V. (1951) The General and Logical Theory of Automata. Wiley, New York.
[3]	Graupe, D. (2007) Principles of Artificial Neural Networks. Vol. 6, 2nd Edition, World Scientific Publishing Co. Pte. Ltd., Singapore.
[4]	Rumelhart, D.E. and McClelland, J.L. (1986) Parallel Distributed Processing, Explorations in the Microstructure of Cognition I and II. MIT Press, Cambridge. https://doi.org/10.7551/mitpress/5236.001.0001
[5]	Werbos, P.J. (1974) Beyond Recognition, New Tools for Prediction and Analysis in the Behavioural Sciences. Ph.D. Thesis, Harvard University, Cambridge.
[6]	Lagaris, I.E., Likas, A.C. and Fotiadis, D.I. (1997) Artificial Neural Network for Solving Ordinary and Partial Differential Equations. arXiv: physics/9705023v1.
[7]	Cybenco, G. (1989) Approximation by Superposition of a Sigmoidal Function. Mathematics of Control, Signals and Systems, 2, 303-314. https://doi.org/10.1007/BF02551274
[8]	Hornic, K., Stinchcombe, M. and White, H. (1989) Multilayer Feedforward Networks Are Universal Approximators. Neural Networks, 2, 359-366. https://doi.org/10.1016/0893-6080(89)90020-8
[9]	Lee, H. and Kang, I.S. (1990) Neural Algorithms for Solving Differential Equations. Journal of Computational Physics, 91, 110-131. https://doi.org/10.1016/0021-9991(90)90007-N
[10]	Majidzadeh, K. (2011) Inverse Problem with Respect to Domain and Artificial Neural Network Algorithm for the Solution. Mathematical Problems in Engineering, 2011, Article ID: 145608, 16 p. https://doi.org/10.1155/2011/145608
[11]	Chen, R.T.Q., Rubanova, Y., Bettencourt, J. and Duvenaud, D. (2018) Neural Ordinary Differential Equations. arXiv: 1806.07366v1.
[12]	Okereke, R.N. (2019) A New Perspective to the Solution of Ordinary Differential Equations using Artificial Neural Networks. Ph.D Dissertation, Mathematics Department, Michael Okpara University of Agriculture, Umudike.
[13]	Mall, S. and Chakraverty, S. (2013) Comparison of Artificial Neural Network Architecture in Solving Ordinary Differential Equations. Advances in Artificial Neural Systems, 2013, Article ID: 181895. https://doi.org/10.1155/2013/181895
[14]	IBM (2015) IBM SPSS Statistics 23 http://www.ibm.com
[15]	PTC (Parametric Technology Corporation) (2007) Mathcad Version 14. http://communications@ptc.com
[16]	Otadi, M. and Mosleh, M. (2011) Numerical Solution of Quadratic Riccati Differential Equations by Neural Network. Mathematical Sciences, 5, 249-257.

Journals Menu

Follow SCIRP

	+1 323-425-8868
	customer@scirp.org
	+86 18163351462(WhatsApp)
	1655362766

	Paper Publishing WeChat

Journals Menu

Home

About SCIRP

Service

Policies