Test of Ordered Multivariate Discrete Selection Model for Average Life Expectancy

Abstract

At present, there are significant regional differences in average life expectancy among countries in the world. Not only is there a great disparity in average life expectancy, but also the gender difference is positive and negative, and is distributed in a bipolar distribution of “long life in rich countries and short life in poor countries”. This paper analyzes the factors affecting the life grade by using the ordered multivariate discrete selection model and combined with the average life expectancy data of countries all over the world in 2017. The test results show that: 1) The growth of per capita GDP, elderly dependency ratio and the proportion of people using at least basic drinking water services can effectively improve the level of life expectancy; 2) The birth rate has an inhibitory effect on the average life expectancy; 3) Through model comparison, probit model is more suitable for the analysis of this kind of problems than logit model, and the properties of the obtained model are better.

Share and Cite:

Liu, J. (2022) Test of Ordered Multivariate Discrete Selection Model for Average Life Expectancy. Journal of Applied Mathematics and Physics, 10, 261-269. doi: 10.4236/jamp.2022.102020.

1. Introduction

Average life expectancy is a comprehensive index to measure the social and economic development level and medical and health service level of a country or region. It refers to the average number of years that can survive after the exact age of X under a certain age-specific mortality level. It is an index to measure the health level of residents of a country, nationality and region. It can reflect the quality of life of a society. Social and economic conditions and the level of health care limit people’s life expectancy. Therefore, the length of human life span varies greatly in different societies and different periods; At the same time, due to personal differences such as physique, genetic factors and living conditions, everyone’s life span varies greatly. It is an important part of the United Nations Millennium Development Goals and the basis for calculating international comparative indicators such as population quality index (PQLI) and human development index (HDI) [1] [2] [3]. This indicator is closely related to gender, age and race, so it often needs to be calculated separately [4]. Average life expectancy is the most commonly used indicator of life expectancy. It shows the average number of years that the newborn population is expected to survive. It is an important indicator to measure the health status of the population. The length of service life is restricted by two aspects [5]. On the one hand, socio-economic conditions and health care level limit people’s life span, so the length of life span varies greatly in different societies and different periods. On the other hand, due to personal differences such as physique, genetic factors and living conditions, everyone’s life span is very different. Therefore, although it is difficult to predict the life expectancy of a specific person, it can be calculated and informed by scientific methods that under a certain death level, the average number of years that each person is expected to survive at birth. This is the average life expectancy of the population.

The current conventional method does not consider economic and social factors in the selection of indicators. Therefore, this paper selects indicators: total, GDP per capita, Current Health ExPenditure per capita, Age Dependency Ratio, Birth Rate, crud, Hospital Beds, Population Density, and People Using at least basic drinking water services as independent variables to improve the prediction accuracy. At the same time, AA is used as the prediction model in order to obtain better analysis results.

2. Introduction of Ordered Multivariate Discrete Selection Model

The econometric model used in the empirical test of this paper is the ordered multivariate discrete choice model, which is an econometric model suitable for microanalysis [6] [7] [8]. There are two differences between the ordered multivariate discrete selection model and the binary discrete selection model: one is that there are three or more discrete choices; Second, discrete selection has a certain order. The following is a brief introduction to the ordered multivariate discrete selection model (hereinafter referred to as the ordered model). In the ordered model, the observed value y as the explained variable represents the ordered result or classification result, and its value is an integer. Explanatory variable Xi is a variety of factors that may affect the order of explained variables. Xi can be a set of multiple explanatory variables, that is, vectors [9].

The general form of ordered model is:

y i * = x i β + ε (1)

where y i * is the hidden variable, x i is the set of explanatory variables, β is the parameter to be estimated, and ε is the random disturbance. The implicit variable has no observed value, which can be regarded as utility according to economic theory, and its size can be measured by numerical value; Secondly, the explained variable (implicit variable) under this method can be explained linearly with explanatory variables, so it has reasonable practical meaning. After sorting y i , according to its different thresholds, the probability of each y i observation value is determined by the following formula:

y i = { 0 if y i * γ 1 1 if γ 1 y i * γ 2 n if γ n y i * P ( y i = 0 | x i , β , γ ) = F ( γ 1 x i β ) P ( y i = 1 | x i , β , γ ) = F ( γ 2 x i β ) F ( γ 1 x i β ) P ( y i = 2 | x i , β , γ ) = F ( γ 3 x i β ) F ( γ 2 x i β ) P ( y i = n | x i , β , γ ) = 1 F ( γ n x i β ) (2)

where F is the cumulative distribution function of random disturbances in Equation (1). If probit model is selected, F is the standard normal distribution function [10]. If you choose the logit model, F is the logical distribution function. From this, we can see that what the ordered model estimates is actually the probability that each observation y falls into different intervals. Secondly, γ is the limit point estimated together with the regression coefficient. By maximizing the following log likelihood function, γ and the regression coefficient results can be obtained.

L ( β , γ ) = y i = 0 log ( P ( y i = 0 | x i , β , γ ) ) + y i = 1 log ( P ( y i = 1 | x i , β , γ ) ) + + y i = n log ( P ( y i = n | x i , β , γ ) ) (3)

3. Analysis of Regional Differences in Average Life Expectancy among Countries in the World

The data used in this paper are from World Bank WDI (World Development Indicator) Database (https://datatopics.worldbank.org/world-development-indicators/#archives). The data of 119 countries (regions) in 2017 were collected, including the following 8 variables. The variables are shown in Table 1.

Where, defined life level (LD):

Level 5 (high life)—over 80 years old, 80 < LIFE ;

Level 4 (medium and high life)—76 - 80 years old, 75 < LIFE 80 ;

Level 3 (medium life)—71 - 75 years old, 70 < LIFE 75 ;

Level 2 (middle and low life)—66 - 70 years old, 65 < LIFE 70 ;

Level 1 (low life)—under 65 years old. LIFE 65 .

Table 1. Variables of average life expectancy of countries in the world in 2017.

Note: Since the latest data of some indicators and some regions are up to 2017, the data of 2017 is selected in this paper.

3.1. Basic Statistics of Data

From the WDI Database, in 2017, the world average life expectancy was 72 years. Japan’s highest age is 84, Nigeria’s lowest age is 54, and China’s age is 76, ordered 52nd among 119 countries and regions. The world average life expectancy grades are artificially divided, so it is difficult to intuitively judge the significance of indicators and the rationality of scores, and the ordered model can be easily solved. According to the quantitative calculation results, we can not only get the significance level of each index, but also get the influence degree of each index on the rating, which is conducive to the objective evaluation of the credit rating system and point out the direction for the index adjustment and score adjustment. Considering the factors affecting life expectancy, the Spearman correlation coefficient test was conducted for the LD variable of life grade with per capita GDP, per capita health expenditure, elderly dependency ratio, birth rate, number of hospital beds, population density and people using at least basic drinking water services (accounting for % of the population). The results are shown in Table 2.

According to the results in Table 2, the variable LD has strong correlation with GDPP, HEP, ADR, BR, HB, PD and PUW, and passes the sig significance level. Therefore, LD can be selected as the explained variable, and GDPP, HEP, ADR, BR, HB, PD and PUW can be selected as the explained variable.

Perform collinearity test on the explained variables and check VIF. The results are shown in Table 3.

The data results in Table 3 show that there is no obvious collinearity among explanatory variables, which is the premise to obtain better regression results.

3.2. Analysis of Influencing Factors of Average Life Expectancy based on Ordered Model

In order to study the influencing factors of regional differences in average life expectancy around the world, LD is used as the explained variable, GDPP, HEP,

Table 2. Spearman correlation coefficient test.

Table 3. Collinearity test of explanatory variables.

ADR, BR, HB, PD and PUW are selected as the explained variables, and a multivariate ordered discrete model (order) is established. Assuming that the error term is normal distribution (probit) and logical distribution (logit), regression is carried out respectively. Using the econometric software Eviews, the ordered multivariate discrete selection model can be easily estimated, and the results are shown in Table 4 and Table 5.

The model regression results under two different distribution assumptions show that the explanatory variables of HEP, HB and PD are not significant. That is, the coefficient value of these three variables is very likely to be zero. After removing these three explanatory variables, the new estimation results are shown in Table 6 and Table 7.

Comparing the fitting results before and after, after removing the three variables with insignificant interpretation effect, the coefficient estimates of the other variables change slightly, which shows that HEP, HB and PD have little impact on the ordered model and can be eliminated. Because the Hannan Quinn criterion, SC, AIC and log likehood of probit model are smaller, and the P value corresponding to LR statistic and AVG. log likehood are larger, the probit model established in this paper is better [11].

The fitting diagram and residual diagram of the estimation results of the new ordered model are shown in Figure 1.

The standard equation form of the ordered model is as follows:

I_LD = 0.00006 G D P P + 0.04662 A D R 0.06636 B R + 0.05254 P U W (4)

where I_LD in Equation (1) is the implicit variable of the ordered model, that is, x i β in theoretical models (1) and (2). Linear interpretation is made by the explanatory variables on the right of the equal sign of the equation. In the next 10 equations, cnorm represents the normal distribution function. The parameters

Table 4. Multivariate ordered discrete model under different distributions.

Table 5. Model performance test.

Table 6. Multivariate ordered discrete model under different distributions.

Table 7. Model performance test.

Figure 1. Residual diagram after HEP, HB and PD are removed.

Table 8. Effects of variables on life expectancy.

in the equation give the boundary points that divide the normal distribution into five intervals, which are determined by LD_1 to LD_10 these 10 equations give probability distributions with ratings ranging from 1 to 5:

L D _ 1 = @ C N O R M ( 2.10 I _ L D ) L D _ 2 = @ C N O R M ( 3.28 I _ L D ) @ C N O R M ( 2.10 I _ L D ) L D _ 3 = @ C N O R M ( 5.37 I _ L D ) @ C N O R M ( 3.28 I _ L D ) L D _ 4 = @ C N O R M ( 7.35 I _ L D ) @ C N O R M ( 5.37 I _ L D ) L D _ 5 = 1 @ C N O R M ( 7.35 I _ L D ) (5)

From the probit model, the regression results of each explanatory variable show that the GDPP, ADR and PUW coefficients are positive, that is, they can improve the average life expectancy. The BR coefficient is negative, which means that the average life expectancy can be reduced. The results are sorted from high to bottom (and from promotion to inhibition) as shown in Table 8.

The greater the PUW, ADR and GDPP, the greater the improvement of average life expectancy, and the improvement effect of PUW is the most obvious (its coefficient is 0.05254). The absolute value of each variable coefficient reflects the impact of each index in the credit rating system. BR is negative, but it has the greatest impact on the average life expectancy.

4. Conclusions

Based on the average life expectancy data of countries around the world in 2017, the factors affecting the life grade are analyzed. This paper draws the following conclusions: probit model has better model superiority than logit model in this field. Per capita health expenditure, the number of hospital beds and population density have no significant impact on average life expectancy; The per capita GDP, the elderly dependency ratio, the population ratio using at least basic drinking water services and the birth rate have passed the significance test, which has a significant effect on the average life expectancy. Among them, the per capita GDP, the elderly dependency ratio and the population ratio using at least basic drinking water services have a positive effect on improving the life expectancy of the population. However, the increase in the birth rate will lead to the decline of the average life expectancy.

Combined with the above conclusions, China pays more attention to the per capita problem while increasing the overall economic growth. Any figure other than 1.3 billion will become very small. There is a long way to go to improve the per capita GDP. We need to vigorously develop the real economy, strengthen and optimize the real economy, and build a “double cycle” new development pattern. At the same time, on the issue of aging, improve the composition structure of the working population and the elderly care service system, give further play to the institutional advantages of the whole country in capital construction and other links, and expand and strengthen the infrastructure represented by new infrastructure; In terms of population policy, we should change the direction of population policy in different development periods of the demographic dividend period, in order to achieve a stable population structure and reduce the social burden.

Conflicts of Interest

The author declares no conflicts of interest regarding the publication of this paper.

References

[1] Gou, X.T. (2013) A Quantitative Analysis of International Differences in Life Expectancy and Their Influencing Factors. Journal of Nanjing College for Populaion Programme Management, 29, 31-36.
[2] Xu, J. (2010) Analysis of Chinese Population Policies’ Effect. Ph.D. Thesis, Jilin University, Changchun.
[3] Wei, J., Ni, X.M. and He, A.C. (2018) Study on the Relationship between Population Policy and Economic Growth in the Context of Aging. Study on the Relationship between Population Policy and Economic Growth in the Context of Aging, 38, 337-350.
[4] Wang, X.L. (2012) Economic Influencing Factors of World Population Life Expectancy and Its Enlightenment to China. The 9th Postgraduate Academic Forum, Beijing, 21 June 2016, 191-197.
[5] Yang, J.J. and Zhang, A.X. (2013) The Effects of Age Structure of Population and the Old-Age Insurance System Transition on Residents’ Savings Rates. Social Sciences in China, 8, 49-68.
[6] Shen, L.F. and Zhang, W.W. (2021) Research on Factors Affecting the Low-Carbon Behaviors of Urban Residents Based on Discrete Choice Model. Journal of Engineering Management, 1, 95-100.
[7] Zhang, W.W. (2021) Research on Low-Carbon Behaviors of Urban Residents Based on Discrete Choice Model, Ph.D. Thesis, Central South University of Forestry and Technology, Changsha.
[8] Chen, S.Q. (2021) Comprehensive Evaluation Analysis of Accounting Firms Based on Ranking Selection Model. Economic Research Guide, 32, 136-138.
[9] Wang, H. and Sheng, L.S. (2006) Testing the Client Credit Grading System Using Econometric Model. The Journal of Quantitative & Technical Economics, 23, 138-147.
[10] Smith, T.E. and Lesage, J.P. (2004) A Bayesian Probit Model with Spatial Dependencies. Advances in Econometrics, 18, 127-160.
https://doi.org/10.1016/S0731-9053(04)18004-3
[11] Akaike, H. (1974) A New Look at the Statistical Model Identification: IEEE Transactions on Automatic Control, 19, 716-723.
https://doi.org/10.1109/TAC.1974.1100705

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.