The Implementation of the Deep Neural Network in Predicting the Coronavirus 2019 (COVID-19) Based on Laboratory Findings in Children


The novel coronavirus (COVID-19) has considerably spread over the world. Whereas children infected with coronavirus (COVID-19) are less expected to develop serious infection compared with adults, children are even at the risk of increasing serious illness and problems from COVID-19. The risk factor of COVID-19 laboratory findings plays a major role in clinical symptoms, diagnosis, and medication. Because the number of COVID-19 cases increased, it takes extra time to explain the lab results and provide an accurate diagnosis. Laboratory findings in children have been only moderately described in some experimental studies. This study aimed to exploit a deep learning approach for detecting COVID-19 in children based on Laboratory findings. The dataset used in this research had 5664 patient samples (4927 negatives and 737 positives for COVID-19). The ANN model allowed the classification of negative and positive samples after the implementation of SMOTE to manage the severe data imbalance. To evaluate the predictive performance of our model, precision, F1-score, recall, AUC, and accuracy scores were calculated. The results of the study illustrate that our predictive model identifies patients that have COVID-19 disease at an accuracy of 93%, and recall and precision values were 76.47% respectively. Our analysis shows that the model could assist in the diagnosis and prediction of COVID-19 severity.

Share and Cite:

Enughwure, A.A. and Al Mamlook, R. (2021) The Implementation of the Deep Neural Network in Predicting the Coronavirus 2019 (COVID-19) Based on Laboratory Findings in Children. Open Access Library Journal, 8, 1-16. doi: 10.4236/oalib.1107607.

1. Introduction

At the end of 2019, a novel coronavirus 2019 (COVID-19) called SARS-CoV-2 developed in the city of Wuhan, China [1] [2] and has quickly spread all around the world [3] [4] [5]. Being highly infectious, this novel coronavirus virus, also well-known as coronavirus disease 2019 (COVID-19), has been confirmed a pandemic by the World Health Organization (WHO) [6] [7]. COVID-19 is increasingly spreading around the world globally, causing many deaths and global economic crises [8]. At the time of conducting this study, more than 78,701,670 patients and 1,730,463 deaths from the COVID-19 virus were recorded as of December 23, 2020, with these figures rising daily across the globe by world population. Although a small number of children have been ill and tired with COVID-19 compared to adults, children can be affected by COVID-19. Most children with COVID-19 have slight symptoms, but some children can get severely ill from COVID-19. They might require intensive care or a ventilator to assist them to breathe.

The rate of new cases of COVID-19 among children continues to rise, but it has been fewer deaths than those in adults. COVID-19 in children is not known due to the lack of widespread testing and the ordering of testing for children. According to the study by the American Academy of Pediatrics and the Children’s Hospital Association, the number of children infected with COVID-19 was significantly increased from 2.2% to 10% between April and September 2020 [9]. As shown in Figure 1, over 178,823 new child COVID-19 cases have been recorded at the beginning of December 2020, the highest weekly increase since the pandemic began in the US [10].

Since the number of COVID-19 in children has increased each day, it requires a period to explain the laboratory findings, as a result, the constraints in conditions

Figure 1. Number of child COVID-19 cases added in USA.

of both medication and results are developed. Due to that, there is a need for a predictive model that may help in detecting COVID-19 from laboratory findings. The predictive model could possibly reduce the pressure on healthcare schemes by identifying if the patient is infected. An accurate model to predict COVID-19 coronavirus is necessary for preventing and controlling this pandemic by medical treatment in absence of any vaccine. The daily increase in cases of COVID-19 patients worldwide and a limited number of available detection kits pose difficulty in identifying the presence of disease.

After the outbreak of COVID-19, many studies based on clinical features have focused on some studies about symptoms, and laboratory results. Most of these studies focused on the symptoms, a chest X-ray image analysis and classification for COVID-19 pneumonia detection using Deep CNN [11] with only limited attention paid to laboratory findings, which could be helpful in the diagnosis and in determining the severity of COVID-19. Laboratory features have also been associated with worse outcomes. The most common laboratory in patients hospitalized with pneumonia includes leukopenia, lymphopenia, leukocytosis, elevated liver transaminases, elevated lactate dehydrogenase, and elevated C-reactive protein. Currently, it appears that severe illnesses due to COVID-19 are unique among children. Nevertheless, it is a necessity to collect more data on extended-period effects of the pandemic on children; including techniques the virus may possibly harm the long-term physical health as well as its emotional and mental health effects. There is a need to find a model that may help to identify if children have COVID-19 from laboratory findings. Now the need arises to look for other alternatives. Among already existing, widely available, and low-cost resources, recently, artificial intelligence (AI) has been widely used for the increase in the speed of biomedical research, and advanced deep learning techniques are proposed in this work. Using deep learning approaches, AI has been used in many applications such as image detection, data classification, image segmentation [12]. Learning techniques have achieved state-of-the-art performances in computer-aided medical analysis. To the best of our knowledge, there is no study to use deep learning models to predict COVID-19 infection with laboratory findings in children. The main purpose of this research is to exploit DNN approaches for detection of the coronavirus 2019 (COVID-19) in children based on their laboratory findings accurately.

With quickly rising datasets, it is very important to predict and validate Machine learning (ML) models for healthcare systems [13]. Machine Learning (ML) approaches have been realized to produce the highest accuracy prediction in the various domains including the medical field in areas of cancer diagnosis [14], predicting pneumonia mortality, and mortality risk adjustment in critical care [15]. These studies contribute to the medical experts to assess clinical findings better. ML and artificial intelligence (AI) approaches have been implemented to better understand the pattern of viral spread, more recover diagnostic, produce the highest accuracy prediction in the biomedical studies including classification of COVID-19, survival prediction of severe COVID-19 [16], predicting pneumonia mortality [17], Laboratory findings and X-ray Image. There is sufficient related work in this field of research directly related to this study. In this work, classifying various ML methods for the prediction of COVID-19 in children was achieved. Similar studies regarding the medical prediction for COVID-19 are limited in the literature. Authors in [18] develop ML models to assess the performance of these models at multiple hospitals and time points. In this study, XGBoost was applied for critical incident prediction, with an AUC-ROC of 0.80. These models identified at-risk patients and uncovered underlying relationships that predicted outcomes. Researchers in [19] have been used ML models to predict the number of impending patients influenced by COVID-19. This study used four types of ML models shrinkage and selection operator (LASSO), exponential smoothing (ES), linear regression (LR), and support vector machine (SVM). The results demonstrated that the ES yielded the best performance compare with others models [20]. Therefore, studied by applied various ML applications including RF, NN (neural networks), LR, SVM, XGB and determined the performance of classifiers. The best performing was obtained with gradient boosting trees (XGB) with 66% AUC score.

Artificial Intelligence (AI) approaches have been employed across research fields. AI is among the areas of science that have discovered great applications in solving the problem: identify COVID-19 accuracy on a timely fashion. Deep neural networks (DNNs) have shown significant improvements in many AI applications and have been demonstrated to be an important tool in applying COVID-19. Studied by [21] perform an overview on the applications of AI in a variety in Combating COVID-19: including diagnosis of the COVID-19 by dissimilar types of tests and symptoms, identifying severity of a patient, detecting COVID-19 related X-ray image, and epidemiology. Several studies have been made on possible treatment, cure, and prevention of the spread of COVID-19. Most studies have focused on applying DNN in X-ray or CT images. X-Ray and Chest Radiology imaging also discovers value in the early diagnosis and treatment of coronavirus [22]. This study will offer a prediction study for COVID-19 disease with DNN application models with laboratory findings to be more precise than X-ray or CT images. In this study, deep learning and laboratory data was applied to which children are possible to receive a COVID-19 disease. Moreover, deep learning application models with laboratory findings may be the best technique to provide a prediction study for COVID-19 rather than X-ray or CT images. The early prediction of a COVID-19 will benefit management and healthcare departments to support a timely reply to outbreaks. It will minimize the impact and ensure the use of resources in a planned manner. The main contribution of this research is to find a better model that will assist doctors and other healthcare laborers to identify patients who are infected by COVID-19 based on laboratory findings.

This study may help the investigations to validate the model by utilizing various laboratory data. The remainder of this study is structured as follows: In Section 2, laboratory findings of the data set and deep learning models were described and developed deep learning application models. Section 3 provides the results of deep learning classifiers and the evaluation model. Finally, Section 4 presents the conclusion and provides potential future research.

2. Methodology

In this study, a deep neural network (DNN) method is applied and developed for the prediction of the COVID-19 in the dataset described in Table 1. When developing the DNN model, the layers and their respective parameters were considered. In the first stage, the COVID-19 dataset and patient cohort were identified. The parameters used to develop the DNN were considered. Also, the evaluation and validation of our model was carried out. In the final stage, the DNN model was tested using various performance metrics to ensure it achieve the result optimally (Figure 2).

2.1. Data Description and Data Correlation

The dataset used in this work was obtained from Kaggle, one of the world’s largest data science communities that connect data scientists with robust resources

Table 1. The parameters contained in the dataset.

Figure 2. The methodology of predictive modelling for binary classification.

and tools to aid them achieve their goals within the data science sphere. The dataset comprises 5664 children whose ages within 0 to 19 years old. These children were admitted obtaining treatment in various medical centers. These children were tested for COVID-19 and their blood was used as the test specimen. With the target variable as SARS-COV-2 exam result, there were 29 features within the dataset. Only 18 were used in the work as shown in Table 1.

2.2. Data Preprocessing

Data preprocessing is a critical stage before developing a deep learning model. The scikit-learn package is applied to preprocess the data. This open-source package comprises simple and efficient tools used for predictive analytics. The inclusion of null value(s) in a dataset usually causes the model to perform poorly hence there was a need to detect and manage the null values within the data. There were lots of null values in the dataset. Normalization is important to ensure that all the feature values are on the same scale and treated with equal weight. A Min-Max normalization was performed to transform each numerical attribute into the range [0, 1]. With the use of the scikit-learn package, the null values were detected and there were two options considered when handling the null value(s); either to drop them or impute certain values into the dataset. After accessing the performance of the model, the null values were dropped. There was a need to address the dataset class imbalance as shown in Figure 3. Majority

Figure 3. The distribution of the target classes in the dataset.

of the target class were negative hence there is a higher chance of picking a negative class than positive class. We applied the SMOTE Tomek technique to manage the imbalance. This involves oversampling and under sampling the dataset. SMOTE handles the oversampling while Tomek links cleaned the data. Given the target variable data type is an object; there was a need to transform it to a numerical data type since the model worked better with numbers than text. This was achieved using the label encoder in the scikit-learn package.

2.2.1. Feed-Forward Neural Network

In this study, a neural network using deep learning was adopted in the classification task. Feedforward networks are sometimes called multi-layer perceptron’s (or MLPs. modern networks are made up of units with non-linearities like sigmoid, but at some point, the name stuck. A feedforward network is a multilayer feedforward network in which the units relate to no cycles; the outputs from units in each layer are passed to units in the next higher layer, and no outputs are passed back to lower layers. A deep neural network (DNN) is an artificial neural network (ANN) with multiple layers between the input and output layers. These neural networks consist of the same modules: neurons, weights, biases, and functions. The algorithm of neural networks is like the operations of the human brain. Every neural network must have two layers: input and output layers. Input layer is where the input variables are fed into and the output layer is where the target variable lies. The variables on the input layer are passed through certain weight classes, functions, and activation function in a bid to predict the target value on the output layer are shown in Figure 4.

An artificial neuron receives information (signal) from other neurons, processes it, and then relays the filtered signal to other neurons. The receiving end of the neuron has incoming signals (x1, x2, x3 … xn). Each of them is assigned a weight (wji) that is based on experience and likely to change during the training process. The summation of all the weighted signal amounts yields the combined input quantity (Ij) which is sent to a preselected transfer function (f), sometimes called an activation function. A filtered output (yj) is generated in the outgoing end of the artificial neuron (j) through the mapping of the transfer function. The parameters can be expressed in the form of following equations (Figure 5).

Figure 4. An overview architecture of deep learning network on the COVID-19 dataset.

Figure 5. A combination of several inputs passed through a function to obtain a value of Y.

y = f ( W i X + b ) (3.1)

where wi denote the network weights, b denotes a bias term, and denotes a specified activation function. A natural extension of this simple model is attained by combining multiple neurons to form a so-called hidden layer.

2.2.2. Sigmoid or Logistic Activation Function

The Sigmoid function is logistic function in some literature (Turian, et al., 2009). The Sigmoid Function curve looks like “S” shaped curve. The Sigmoid function takes a value as input and outputs another value between 0 and 1. Such functions, as the sigmoid is often nonlinearity since linear was not described. The good part about this function is that continuously differentiable over different values of z and has a fixed output range. On observation, we see that the value of f(z) increases but at a very slow rate. The sigmoid function is used to represent a probability distribution over a binary variable. It is defined as:

Sigmoid ( x ) = σ = 1 1 + exp ( z ) (3.2)

Figure 6 and Figure 7 below shows the sigmoid function and its derivative graphically. This activation function saturates to 0 when z becomes very negative

Figure 6. Sigmoid function.

Figure 7. Derivative graphically.

and saturates to 1 when z becomes very positive. For large absolute values of z, the gradient can become too small to be useful for learning even if the training data abundantly populate these regions.

2.2.3. Function Training Neural Nets

A feedforward neural net is a sample of supervised machine learning in which we know the correct output y for each observation x. The goal of the training procedure is to learn parameters W[i] and b[i] for each layer i that make y ^ for each training observation as close as possible to the true y. A loss function used for logistic regression, the cross-entropy loss. Second, to find the parameters that minimize this loss function, we will use the gradient descent optimization algorithm. Third, gradient descent requires knowing the gradient of the loss function, the vector that contains the partial derivative of the loss function with respect to each of the parameters.

2.2.4. Model Development and Validation

The neural network used in this study has a single layer between the input and output layers. It is called a hidden layer. The DNN model is described below:

1) The model is based on Keras (a library that offers highly robust and abstract building blocks to construct deep learning networks) within the Kaggle IDE using Python.

2) A model is defined using the sequential construct.

3) On the input layer, we used Dense and specified the input dimensionality which is equal to 18 (the number of features in the X_train set) and produced an output layer of 32. We added an activation function called rectified linear unit (RELU) whose transformation is given as

F ( x ) = max ( 0 , ω x + b ) (3.3)

The activation function was selected based on domain knowledge.

1) On the hidden layer, we used dense and specified the output dimensionality which is 4. This value was gotten after a number of iterations based on the model accuracy. The output on the layer was activated using RELU.

2) On the output layer, with the use of Dense, we defined the output dimensionality to be 1 since we are working on a binary classification and the activation function used in the layer is Sigmoid whose function is given as

F ( x ) = 1 ( 1 + e ( ω x + b ) ) (3.4)

2.2.5. Performance Evaluation for the Classification Method

The classification performance of the model was evaluated based on the values of the following parameters: Accuracy, Precision, Sensitivity, Specificity, False Positive Rate, Area under the Curve (AUC) and Receiver Operating Characteristic (ROC). These parameters are used to access a binary classifier. They can be obtained from the confusion matrix. The confusion matrix can be generated with the use of the sklearn library. It can help to evaluate any classification model very effectively. As shown in Figure 8, type-I error happens after null hypothesis is rejected which should not be in actual. And type-II error occurs when even though alternate hypothesis is true, null hypothesis has failed to reject.

ROC is the ratio of True Positive Rate (TPR) and False Positive Rate (FPR) as shown in Figure 9. TPR is the measure of the ratio between the number of accurate predictions of children having COVID-19 and the total number of children having COVID-19 in actual. FPR is the ratio between the number of children who are predicted as not to have COVID-19 correctly and the total number of children who are not having the COVID-19 in actual.

3. Result and Discussion

The DNN of 100 epochs using descent that was optimized by Adam was trained. The error function of categorical cross-entropy for binary classification was applied as a function of loss. The dataset used for this classifier was split using the train-test split method in sklearn in a proportion of 80:20 where 80% of the data point belong in the training set while the remaining 20% in the testing set. In the phase, in a bid to validate the model, the training data was separated into two groups in a 90:10 ratio where the 20% was used to validate the model and 80% to train the model. The purpose of validating the model was to avoid overfitting by monitoring the log loss. After fitting the model and validating the model, the following results were obtained (Table 2 & Figure 10).

The train and validation log-losses as well as their accuracies are shown below (Figure 11 & Figure 12).

To evaluate the performance of this model with the parameters mentioned in the paper, we construct the confusion matrix using the metrics package in the sklearn library as shown in Table 3.

Table 2. The parameters contained in the dataset.

Table 3. The confusion matrix.

Figure 8. Statistical hypothesis testing for type I and type II errors.

Figure 9. AUC-ROC curve.

Figure 10. A bar chart of the train, validation and test log loss and accuracy respectively.

Figure 11. The plot of the train loss and validation loss across the epochs.

By inspection of Table 3, True Positive, TP = 13, False Positive, FP = 4, False Negative, FN = 4, True Negative, TN = 99. We can deduce the model’s sensitivity, precision, specificity, and false positive rate by inputting the values of TP, FP, FN, and TN in the equation below (Table 4 & Figure 13).

The Receiving Operating Characteristic (ROC) curve is a function of the true positive rate (sensitivity) and false positive rate (1-specificity) is shown in Figure 14 whose Area under the curve (AUC) score is 0.967.

Figure 12. A plot of the train accuracy vs validation accuracy across the epochs.

Figure 13. Model evaluation parameters.

Figure 14. Receiver operating characteristics curve for the model.

Table 4. The model evaluation parameters and their results.

4. Conclusion

Deep Neural Network technique was applied to tell children who have been infected with COVID-19 virus via their blood specimen in testing centers. When compared to the previous study where machine learning was implemented in the same dataset, this work can be considered as an upgrade especially in the area of accuracy and AUC score. With an accuracy of 93% and its ability to pick a positive case correctly at 78%, it shows the model is quite promising and can be helped in the medical centers. Further investigations can be carried out in a bid to see if the precision score and recall score can be improved.


We thank our family for their unfailing encouragement while we were working at odd hours of the day.

Ebitimi, I thank you for your support. Love you.

The authors received no financial support for the research and/or publication of this article.

Conflicts of Interest

The authors declare no conflicts of interest.


[1] Zhu, H., Wei, L. and Niu, P. (2020) The Novel Coronavirus Outbreak in Wuhan, China. Global Health Research and Policy, 5, Article No. 6.
[2] Del Rio, C. and Malani, P.N. (2020) 2019 Novel Coronavirus—Important Information for Clinicians. JAMA, 323, 1039-1040.
[3] Boulos, M.N.K. and Geraghty, E.M. (2020) Geographical Tracking and Mapping of Coronavirus Disease COVID-19/Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) Epidemic and Associated Events around the World: How 21st Century GIS Technologies Are Supporting the Global Fight against Outbreaks and Epidemics. International Journal of Health Geographics, 19, Article No. 8.
[4] Lauer, S.A., Grantz, K.H., Bi, Q., Jones, F.K., Zheng, Q., Meredith, H.R., et al. (2020) The Incubation Period of Coronavirus Disease 2019 (COVID-19) from Publicly Reported Confirmed Cases: Estimation and Application. Annals of Internal Medicine, 172, 577-582.
[5] Cheng, V.C., Wong, S.C., Chen, J.H., Yip, C.C., Chuang, V.W., Tsang, O.T., et al. (2020) Escalating Infection Control Response to the Rapidly Evolving Epidemiology of the Coronavirus Disease 2019 (COVID-19) Due to SARS-CoV-2 in Hong Kong. Infection Control & Hospital Epidemiology, 41, 493-498.
[6] Rabi, F.A., Al Zoubi, M.S., Kasasbeh, G.A., Salameh, D.M. and Al-Nasser, A.D. (2020) SARS-CoV-2 and Coronavirus Disease 2019: What We Know So Far. Pathogens, 9, Article No. 231.
[7] Singhal, T. (2020) A Review of Coronavirus Disease-2019 (COVID-19). The Indian Journal of Pediatrics, 87, 281-286.
[8] Chakraborty, I. and Maity, P. (2020) COVID-19 Outbreak: Migration, Effects on Society, Global Environment and Prevention. Science of the Total Environment, 728, Article ID: 138882.
[9] American Academy of Pediatrics (2020) American Academy of Pediatrics and Children’s Health Association Find Rapid Rise of Pediatric COVID-19 Cases over 5-Month Period: Study.
[10] American Academy of Pediatrics (n.d.) Children and COVID-19: State-Level Data Report.
[11] Gao, T. (2020) Chest X-Ray Image Analysis and Classification for COVID-19 Pneumonia Detection Using Deep CNN. Research Square. (Preprint)
[12] Enughwure, A.A. and Oluwafemi J.J. (2019) Predicting Student Performance in Engineering Drawing Using Supervised Learning Methods. International Journal of Maritime and Interdisciplinary Research (IJMIR), 1, No. 1.
[13] Alakus, T.B. and Turkoglu, I. (2020) Comparison of Deep Learning Approaches to Predict COVID-19 Infection. Chaos, Solitons & Fractals, 140, Article ID: 110120.
[14] Al Mamlook, R.E., Bzizi, H.F. and Chen, S. (2020) Evaluate Performance Risk Score in Patients Suffering from Lung Cancer Using Survival Analysis of Statistics. 2020 IEEE International Conference on Electro Information Technology (EIT), Chicago, 31 July-1 August 2020, 145-150.
[15] Delahanty, R.J., Kaufman, D. and Jones, S.S. (2018) Development and Evaluation of an Automated Machine Learning Algorithm for In-Hospital Mortality Risk Adjustment among Critical Care Patients. Critical care medicine, 46, e481-e488.
[16] Kuo, K., Talley, P., Huang, C. and Cheng, L. (2019) Predicting Hospital-Acquired Pneumonia among Schizophrenic Patients: A Machine Learning Approach. BMC Medical Informatics and Decision Making, 19, Article No. 42.
[17] Cooper, G.F., Aliferis, C.F., Ambrosino, R., Aronis, J., Buchanan, B.G., Caruana, R., Fine, M.J., Glymour, C., Gordon, G., Hanusa, B.H., Janosky, J.E., Meek, C., Mitchell, T., Richardson, T. and Spirtes, P. (1997) An Evaluation of Machine-Learning Methods for Predicting Pneumonia Mortality. Artificial Intelligence in Medicine, 9, 107-138.
[18] Vaid, A., Somani, S., Russak, A.J., De Freitas, J.K., Chaudhry, F.F., Paranjpe, I, Zhao, S., et al. (2020) Machine Learning to Predict Mortality and Critical Events in a Cohort of Patients with COVID-19 in New York City: Model Development and Validation. Journal of Medical Internet Research, 22, e24018.
[19] Rustam, F., Ahmad Reshi, A., Mehmood, A., Ullah, S., On, B.-W., Aslam, W., et al. (2020) COVID-19 Future Forecasting Using Supervised Machine Learning Models. IEEE Access, 8, 101489-101499.
[20] Jiang, X., Coffee, M., Bari, A., Wang, J., Jiang, X., et al. (2020) Towards an Artificial Intelligence Framework for Data-Driven Prediction of Coronavirus Clinical Severity. Computers, Materials & Continua, 63, 537-551.
[21] Enughwure, A.A. and Febaide, I.C. (2020) Applications of Artificial Intelligence in Combating COVID-19: A Systematic Review. Open Access Library Journal, 7, 1-12.
[22] Al Mamlook, R.E., Chen, S. and Bzizi, H.F. (2020) Investigation of the Performance of Machine Learning Classifiers for Pneumonia Detection in Chest X-Ray Images. 2020 IEEE International Conference on Electro Information Technology (EIT), Chicago, 31 July - 1 August 2020, 98-104.

Copyright © 2021 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.