Survival Modelling for Credit Risk Assessment in Microfinance Lending: Estimating Time-to-Default and Probability of Default ()
1. Introduction
Credit risk continues to be one of the major challenges for financial institutions, especially in microfinance lending, where borrower default can affect institutional profitability. Conventional credit scoring approaches, like logistic regression, mainly estimate the probability of default at a single point in time and fail to account for the timing or evolving evolution of borrower repayment behaviour.
Survival analysis offers a more adaptable framework for credit risk modelling by accounting also for the time dimension of default and borrower and loan characteristics. Various methods include the Accelerated Failure Time (AFT) model and the Cox Proportional Hazards model, which make it possible to analyze how borrower characteristics affect both the timing and intensity of default risk. The Mixture Cure Model (MCM) is also included which is able to separate borrowers who are likely to default and those who never default offering a more realistic representation of borrower heterogeneity within credit portfolios, while estimating probability of default.
Building on these models, this study develops a unified risk assessment framework that combines probability of default, time-to-default, and hazard intensity within a rule-based risk segmentation approach. This enables borrowers to be grouped into defined risk categories in a way that is easy to interpret and statistically justified manner.
The main objective of this study is to evaluate credit risk in microfinance lending using survival models and to construct a multidimensional borrower risk classification that enhances informed credit decision-making.
2. Literature Review
Credit risk modelling has traditionally relied on methods such as logistic regression to estimate a borrower’s probability of default. Although widely used, these approaches are limited in that they do not account for the timing of default or the dynamic nature of borrower behaviour over time [1].
To address this limitation, survival analysis methods have been introduced into credit risk modelling. The Cox Proportional Hazards model and the Accelerated Failure Time (AFT) model are among the most commonly applied approaches, as they enable the incorporation of time-to-event information. Empirical evidence indicates that these models provide superior predictive performance relative to static classification methods, particularly in capturing time-evolving effects of borrower characteristics on default risk dynamics [2]-[4].
However, a key limitation of both Cox and AFT models is the assumption that all borrowers will eventually default, which may not be realistic in heterogeneous credit settings such as microfinance lending. To overcome this limitation, Mixture Cure Models (MCM) have been introduced to explicitly account for a segment of borrowers who are not susceptible to default, leading to a more realistic representation of long-term lending relationships [5]-[7].
Despite these advancements, existing studies remain constrained in two main ways. First, many empirical studies use relatively small datasets and tend to emphasize predictive and statistical performance more than economic relevance and real-world application. Second, there has been limited application of integrated survival-based frameworks in microfinance institutions, especially in developing economies where borrower heterogeneity tends to be more pronounced [8] [9].
This study contributes to the literature by applying Cox, AFT, and Mixture Cure models to a large microfinance loan dataset and by developing a unified, rule-based risk segmentation framework that integrates probability of default, time-to-default, and hazard intensity for credit risk classification [10] [11].
3. Methodology
This study applies survival analysis to model credit risk in microfinance lending by estimating the probability of default (PD) and time-to-default (TTD) at the loan level. Unlike conventional binary classification approaches, survival models account for the timing of default events and the presence of right-censored observations arising from loans that are fully repaid or remain active at the end of the observation period. Borrower and loan characteristics are incorporated to explain heterogeneity in default behaviour. To capture different dimensions of credit risk, the analysis employs three complementary approaches: the Accelerated Failure Time (AFT) model, the Cox Proportional Hazards (Cox PH) model and the Mixture Cure Model (MCM).
3.1. Data Description
The dataset used in this study consists of anonymized loan-level records obtained from a Kenyan microfinance institution. It includes borrower demographic characteristics (e.g., age, gender, and education level), financial attributes (e.g., income and loan amount), and loan-specific features (e.g., loan term, product type, and interest rate).
A key constructed variable, Days in Arrears (DIA), is used as a proxy for repayment delinquency. It is defined as a weighted indicator of repayment delays based on missed installments and late payments, where a missed installment is approximated as 30 days and each late payment as 7 days. To address skewness in its distribution, the variable is transformed using
prior to model estimation.
The dataset also includes time-to-event information, capturing time to default, full repayment, or censoring, which survival-based analysis of credit risk.
Standard data preprocessing steps were carried out, including handling missing values, encoding categorical variables, and applying log-transforming to skewed financial variables. All personally identifiable information was removed to ensure confidentiality, and the analysis was conducted on preanonymised data in compliance with ethical research standards.
3.2. Survival Framework
Let
denote a non-negative random variable representing the time to default for a given loan. The survival function gives the probability that a loan survives beyond time
and is defined as:
(1)
The survival function fully characterises the probability distribution of loan longevity prior to default.
The hazard function, describes the instantaneous risk of default at time
, given that the loan has survived up to that time, is defined as:
(2)
where
is the probability density function of
. The hazard function therefore represents the conditional intensity of default at any given time point.
Censoring is incorporated to account for incomplete loan outcome information. Specifically, a censoring indicator
is defined as:
This formulation ensures that both observed and censored loan durations are properly incorporated into the likelihood function, thereby preserving the validity and integrity of the survival analysis framework.
3.3. Accelerated Failure Time (AFT) Model
The Accelerated Failure Time (AFT) model assumes a log-linear relationship between survival time and covariates, directly capturing the effect of explanatory variables on the duration until default.
(3)
where
denotes the survival time for loan
,
is a vector of covariates,
is the corresponding parameter vector,
is a scale parameter, and
is a stochastic error term assumed to follow a specified distribution.
The model is estimated under different parametric assumptions, including the Weibull, Exponential, and Log-logistic distributions, in order to capture alternative shapes of the baseline survival process. Model selection is performed using the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC), where lower values indicate a better model fit.
3.4. Cox Proportional Hazards Model
The Cox Proportional Hazards (Cox PH) model is a semi-parametric approach that links covariates to the hazard function without requiring specification of the baseline hazard distribution. The hazard function conditional on covariates is expressed as:
(4)
where
denotes the unspecified baseline hazard function,
is a vector of covariates, and
represents the corresponding regression coefficients.
The model assumes that covariates act multiplicatively on the hazard, implying that hazard ratios remain constant over time. The hazard ratio for a one-unit increase in covariate
is given by:
(5)
Model parameters are estimated using the partial likelihood function, which avoids direct specification of the baseline hazard. The partial likelihood is defined as:
(6)
where
denotes the risk set of individuals still at risk just prior to time
, and
is the event indicator.
3.5. Mixture Cure Model
To account for unobserved heterogeneity in borrower behaviour, a Mixture Cure Model (MCM) is employed. The model assumes that the population consists of two latent groups: susceptible borrowers who may default and non-susceptible (cured) borrowers who will not default within the observation period.
The overall survival function is specified as:
(7)
where
denotes the probability of being non-susceptible (cured), and
represents the survival function of susceptible borrowers.
The probability of default is therefore defined as the complement of the cure probability:
(8)
The incidence (cure) component is modelled using a logistic specification:
(9)
For borrowers classified as susceptible, the timing of default is modelled using a proportional hazards structure:
(10)
where
is the baseline hazard function and
measures the effect of covariates on the hazard of default among susceptible borrowers.
3.6. Risk Segmentation Framework
To transform model outputs into actionable credit decisions, a rule-based risk segmentation framework is developed using predictions from the Mixture Cure Model (probability of default), the Accelerated Failure Time model (time-to-default), and the Cox Proportional Hazards model (hazard intensity).
Each borrower is characterised along three complementary risk dimensions: probability of default (
), time-to-default (
), and hazard intensity (
). Borrowers are then grouped based empirical quantiles of each distribution, where the 33rd and 66th percentiles define the thresholds for low, medium, and high-risk categories. The use of empirical quantiles in the risk segmentation is grounded in non-parametric statistical classification methods and is widely applied in credit risk modelling practice, especially in portfolio grading and scorecard development. Quantile-based thresholds are distribution-free and robust to skewed financial variables, making them well suitable for heterogeneous microfinance datasets. The 33rd and 66th percentiles correspond to tercile-based partitioning of the empirical distribution, reflects standard approaches in statistical risk stratification and actuarial classification frameworks [12] [13].
Borrowers are classified as high risk when they exhibit high probability of default, short time-to-default, and high hazard intensity. In contrast, low-risk borrowers are characterized by a low probability of default, long time-to-default, and low hazard intensity. All remaining borrowers are assigned to medium risk category.
This multidimensional classification ensures that risk is assessed both on the likelihood and timing of default, and relative risk intensity, are jointly considered without relying on arbitrary weights or scale transformations. The framework provides an interpretable and economically meaningful basis for credit risk segmentation in microfinance lending.
3.7. Economic Evaluation
The economic evaluation framework translates model-based credit risk outputs into expected financial outcomes at the borrower and portfolio level. The primary objective is to assess how heterogeneity in credit risk, as captured by the proposed segmentation framework, affects lending profitability.
Let borrowers be classified into risk groups
corresponding to low, medium, and high risk segments obtained from the joint use of the Mixture Cure Model, Accelerated Failure Time model, and Cox Proportional Hazards model.
For each risk group, the average probability of default is defined as:
(11)
where
is the predicted probability of default from the Mixture Cure Model and
is the number of borrowers in group
.
The expected economic value of lending to each risk group is then given by:
(12)
where
represents the expected revenue from performing loans in group
, and
denotes the expected loss given default, approximated by the outstanding exposure at default.
This formulation reflects the financial trade-off between expected income from performing borrowers and expected credit losses from default events, enabling comparison of profitability across different risk segments.
While the probability of default explains the economic valuation, the Accelerated Failure Time and Cox Proportional Hazards models provide complementary insights on the timing and intensity of default risk. These outputs are not directly incorporated into the valuation equation but are instead used to guide the risk segmentation process and improve interpretation of borrower risk dynamics.
3.8. Model Evaluation
The performance of the proposed survival models is evaluated using both statistical goodness of fit measures and predictive accuracy metrics, with each model assessed according to its primary estimation objective.
Predictive accuracy is assessed differently across models based on their structure. To ensure model validity, additional diagnostic tests are conducted. Together, these evaluation measures ensure robust inference and reliable predictive performance across all three modelling frameworks.
4. Results
4.1. Descriptive Statistics
Table 1 lays out the key variables used in the survival models. The results reveal heterogeneity among borrowers such as age spreads out moderately but is balanced overall. Income and loan amount, shows a mild positive skewness towards higher values, thus, the transformation help reduce the impact of outliers. Loan cycle exhibits significant variability and right skewness, reflecting differences in borrower’s repayment experience. The constructed log-transformed Days in Arrears variable shows pronounced variation, capturing disparities in repayment behaviour and serving as a strong indicator of delinquency risk. The time-to-event variable covers a wide range with near symmetry, indicating meaningful variation in loan duration and supporting the use of survival analysis to jointly model both the likelihood and timing of default.
Table 1. Descriptive statistics of numerical variables.
Variable |
n |
Mean |
SD |
Median |
Min |
Max |
Skew |
Kurtosis |
Age |
3000 |
37.646 |
10.205 |
37.160 |
19.650 |
64.420 |
0.430 |
−0.409 |
LogIncome |
3000 |
10.096 |
0.530 |
10.061 |
7.314 |
13.710 |
0.682 |
4.539 |
LogLoanAmount |
3000 |
10.392 |
0.546 |
10.309 |
9.210 |
12.206 |
0.206 |
−0.761 |
LoanCycle |
3000 |
4.928 |
4.278 |
4.000 |
1.000 |
23.000 |
1.488 |
1.772 |
LogDaysInArrears |
3000 |
4.254 |
1.635 |
4.700 |
0.000 |
6.608 |
−1.420 |
1.420 |
TimeToEvent |
3000 |
447.890 |
165.983 |
450.000 |
180.000 |
720.000 |
0.033 |
−1.221 |
Most borrowers have only a few days in arrears, but there’s a handful who rack up really high delinquency numbers, so the data is pretty lopsided. Extreme values were detected in the monthly income and principal amount variables, suggesting a strongly skewed distribution typical of credit portfolios. To reduce the influence of these outliers on parameter estimation, log transformations were applied to both variables prior to model fitting helping rescue their influence, improving the stability and reliability of the resulting model estimates.
4.2. Correlation Analysis
Correlation analysis in Figure 1 shows weak to moderate pairwise correlations, so the predictors aren’t tangled up in multicollinearity. Age barely connects with any other variable, which means it adds its own piece to the puzzle without overlap. You’ll notice Log Income and Log Loan Amount share a moderate positive link; basically, people who earn more tend to borrow bigger loans. Loan Cycle also ties moderately to both Log Income and Log Loan Amount, hinting that repeat borrowers usually earn more and take larger loans. In contrast, Log Days In Arrears hardly budges in correlation with the rest, so delinquency isn’t really showing up in a linear relationship with these covariates.
Figure 1. Correlation matrix of key variables.
These findings justify bringing every variable into the multivariate analysis, while highlighting the potential need for more flexible modeling approaches to capture the determinants of arrears.
4.3. Survival Probabilities
The Kaplan-Meier estimates for survival, as shown in Table 2 and Figure 2, indicate a monotonic decline in the survival function over time. At 180 days, the survival probability decreases from 0.981 to 0.847 at 360 days. By 540 days, it’s down to 0.670, and at 720 days, it hits just 0.275. So as time passes, the chances of default get higher and higher, hitting about 72.5.
Table 2. Kaplan-Meier survival and default probabilities at key time points.
Time (days) |
Survival Probability
|
Default Probability
|
180 |
0.981 |
0.019 |
360 |
0.847 |
0.153 |
540 |
0.670 |
0.330 |
720 |
0.275 |
0.725 |
Figure 2. Kaplan-Meier survival curve.
4.4. Cox Proportional Hazards Model
The Cox Proportional Hazards model was used to check the effect of borrower characteristics on default risk and the results are shown in Table 3.
Table 3. Cox proportional hazards model results.
Variable |
Hazard Ratio |
95% CI |
p-value |
Age |
1.001 |
0.995 - 1.007 |
0.729 |
Gender (Male) |
1.046 |
0.927 - 1.180 |
0.465 |
Education (Primary) |
1.172 |
0.929 - 1.479 |
0.180 |
Education (Secondary) |
1.234 |
0.974 - 1.563 |
0.082 |
Education (Tertiary) |
1.062 |
0.834 - 1.353 |
0.624 |
Log Income |
1.137 |
0.998 - 1.296 |
0.054 |
Log Loan Amount |
1.001 |
0.857 - 1.168 |
0.995 |
Loan Cycle |
0.991 |
0.973 - 1.008 |
0.294 |
Log Days in Arrears |
5.013 |
4.526 - 5.553 |
<0.001 |
The data indicates that Log Days in Arrears has the biggest impact on whether someone defaults, its hazard ratio is 5.013, and that’s highly significant. So, the longer borrowers go without paying, the more likely they are to default. This points to just how important repayment behavior is when it comes to judging credit risk.
Log Income and having a secondary education both came close to being significant (HRs of 1.137 and 1.234), but not quite, so there’s only a mild connection to default risk there. The other factors such as age, gender, loan size, loan cycle, and different education levels didn’t really matter for predicting when defaults would happen.
All in all, the model does a strong job of predicting default, with a concordance statistic of 0.828. It really shows that how borrowers handle their payments, especially when arrears pile up, is what drives the risk of default over time, which fits perfectly with what this study was aiming to figure out.
Proportional Hazards Assumption
To check if the Cox model’s proportional hazards assumption held up, the sturdy used Schoenfeld residuals. That basically tells you if each variable’s impact on the risk stays steady over time.
As shown in Table 4, all individual covariates, including age, gender, education level, log income, log loan amount, loan cycle, and log days in arrears, have p-values greater than 0.05. This indicates that there is no statistically significant evidence of time-varying effects for any of the predictors.
Table 4. Schoenfeld residuals test for cox PH model.
Variable |
Chi-square |
df |
p-value |
Age |
0.329 |
1 |
0.566 |
Gender |
1.693 |
1 |
0.193 |
Education Level |
4.926 |
3 |
0.177 |
Log Income |
1.373 |
1 |
0.241 |
Log Loan Amount |
0.965 |
1 |
0.326 |
Loan Cycle |
3.718 |
1 |
0.454 |
Log Days in Arrears |
2.722 |
1 |
0.632 |
GLOBAL |
13.058 |
9 |
0.714 |
The global test is also not significant (p = 0.714), implying that the proportional hazards assumption is not violated at the overall model level. Hence, the Cox model is appropriate for the data, and the estimated hazard ratios can be interpreted as constant over time.
Figure 3 provides a graphical diagnostic of the PH assumption. The plots of the scaled Schoenfeld residuals show no systematic trends over time, further supporting the validity of the proportional hazards assumption.
Figure 3. Proportional hazards assumption diagnostics for the Cox model.
Overall, both the formal statistical test and the graphical diagnostics consistently confirm that the PH assumption holds for the Cox model. This justifies its use in modelling time-to-default in this study.
4.5. Accelerated Failure Time Models
Among the fitted AFT models (Weibull, Exponential, and Log-logistic), the Weibull AFT model provides the best fit based on AIC and BIC criteria as Table 5 shows.
Table 5. Comparison of parametric survival models using AIC and BIC.
Model |
AIC |
BIC |
Weibull AFT |
14632.76 |
14698.83 |
Exponential AFT |
14652.85 |
14718.92 |
Log-logistic AFT |
14686.82 |
14752.89 |
The Weibull AFT results as Table 6 provide insights into the determinants of time-to-default among borrowers by modelling covariate effects on survival duration. The overall model is statistically significant (p < 0.001), indicating strong explanatory power in capturing variations in loan survival time. Most demographic variables, including age, gender, and education level, are not statistically significant at conventional levels, suggesting limited direct influence on the timing of default once financial behaviour is accounted for. However, income exhibits a statistically significant negative effect on survival time, implying that higher-income borrowers experience slightly shorter times to default, possibly reflecting higher leverage or faster credit cycle turnover. In contrast, the most influential variable is LogDaysInArrears, which has a large and highly significant negative effect on survival time, indicating that increased delinquency substantially accelerates default. The corresponding hazard interpretation confirms that borrowers with higher arrears experience a significantly shorter expected time-to-default. Loan amount and loan cycle are not statistically significant, suggesting limited direct influence on survival timing in the presence of behavioural repayment indicators. Overall, the AFT results demonstrate that repayment behaviour, particularly arrears accumulation, is the dominant driver of time-to-default, aligning with the study objective of modelling survival duration in microfinance lending.
Table 6. Weibull AFT model coefficients and time ratios.
Variable |
Estimate |
Std. Error |
z-value |
p-value |
Time Ratio |
Age |
−0.000617 |
0.001008 |
−0.61 |
0.540 |
0.999 |
Gender (Male) |
−0.008887 |
0.020343 |
−0.44 |
0.662 |
0.991 |
Education (Primary) |
−0.053330 |
0.039194 |
−1.36 |
0.174 |
0.948 |
Education (Secondary) |
−0.075735 |
0.039937 |
−1.90 |
0.058 |
0.927 |
Education (Tertiary) |
−0.032679 |
0.040801 |
−0.80 |
0.423 |
0.968 |
Log(Income) |
−0.043925 |
0.022328 |
−1.97 |
0.049 |
0.957 |
Log(Loan Amount) |
0.001659 |
0.026246 |
0.06 |
0.950 |
1.002 |
Loan Cycle |
0.003618 |
0.002955 |
1.22 |
0.221 |
1.004 |
Log(Days in Arrears) |
−0.503761 |
0.021232 |
−23.73 |
<0.001 |
0.604 |
To test the assumptions of the model, we examined the deviance residual plot shown in Figure 4. The deviance residuals for the Weibull AFT model are randomly scattered around zero with no clear systematic pattern, suggesting that the model provides a reasonably good fit to the data. Additionally, the spread of the residuals appears fairly constant across the observation index, indicating no strong evidence of heteroscedasticity or model misspecification.
The results indicated, as shown in Figure 5, that borrowers of lower risk levels had higher survival probabilities over time compared to borrowers of moderate and high risk levels. Thus, lower-risk borrowers experienced longer times-to-default compared to those in moderate and high risk categories. These results demonstrate that the model is able to effectively discriminate between borrowers of different risk levels.
Figure 4. Deviance residuals for Weibull AFT model.
Figure 5. Predicted survival curves from the Weibull AFT model for representative borrower profiles.
The results of the model diagnostics as shown in Figure 6 further confirm the adequacy of the specified form of the Weibull AFT model. The plot residuals indicate that the model provides a good fit to the observed data. These results confirm that the Weibull AFT model is an appropriate model to use for the current data set.
Figure 6. Residual plot for AFT model to validate assumptions.
4.6. Mixture Cure Model
Table 7 shows that the Mixture Cure Model successfully distinguishes between borrowers who are likely to default and those who are effectively non-default. variables such as days in arrears, income, and the loan amount are important in both components of the model. They influence when someone defaults and whether they default at all. Education Level exhibits a small and statistically insignificant effect on survival time (HR = 1.067), but shows a positive and statistically significant influence in the cure component (OR = 1.507), suggesting that higher education is associated with an increased likelihood of belonging to the non-default (cured) group. So, the Mixture Cure Model isn’t just good for estimating the probability of default, it also captures the fact that borrowers act pretty differently from one another. That makes it a solid choice for figuring out credit risk in microfinance.
Table 7. Mixture cure model coefficients.
Covariate |
Estimate
(Survival) |
95% CI |
HR |
Estimate (Cure) |
95% CI |
OR |
Age |
−0.012 |
(−0.028, 0.004) |
0.988 |
0.021 |
(0.005, 0.037) |
1.021 |
LogIncome |
−0.415 |
(−0.510, −0.320) |
0.660 |
0.365 |
(0.250, 0.480) |
1.440 |
LogLoanAmount |
0.532 |
(0.410, 0.654) |
1.702 |
−0.095 |
(−0.210, 0.020) |
0.909 |
LoanCycle |
0.218 |
(0.120, 0.316) |
1.244 |
−0.278 |
(−0.390, −0.166) |
0.757 |
LogDaysInArrears |
0.845 |
(0.710, 0.980) |
2.328 |
−0.520 |
(−0.640, −0.400) |
0.594 |
EducationLevel |
0.065 |
(−0.120, 0.250) |
1.067 |
0.410 |
(0.280, 0.540) |
1.507 |
Gender |
0.044 |
(−0.075, 0.163) |
1.045 |
0.085 |
(−0.020, 0.190) |
1.089 |
The estimated probabilities of default were calculated using the mixture cure model by combining the predicted cure probabilities and the survival function. The results show clear risk differentiation across borrower profiles. Low-risk borrowers exhibit high cure probabilities and high survival probabilities, resulting in very low probabilities of default. This demonstrates the ability of the mixture cure model to capture both long-term borrower resilience and time-dependent default behavior, as shown in Table 8.
Table 8. Predicted probability of default using MCM formula.
Borrower Profile |
Cure Probability |
Survival Probability |
PD |
Low-risk |
0.82 |
0.80 |
0.036 |
Medium-risk |
0.74 |
0.65 |
0.091 |
High-risk |
0.62 |
0.45 |
0.209 |
Figure 7 shows that the Mixture Cure Model produces a survival curve that better represents the behavior of borrowers over time than does the Kaplan-Meier curve due to its consideration of the fraction of borrowers that do not default on their loans.
Figure 7. Comparison of Kaplan-Meier and mixture cure model survival curves.
4.7. Model Performance Evaluation
Table 9 presents Model fit evaluation using the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC), with lower values indicating a better balance between model complexity and goodness of fit. The mixture cure model achieved the lowest AIC and BIC values, indicating superior model fit compared to both the Cox proportional hazards and Weibull AFT models.
Table 9. Model evaluation using information criteria.
Model |
AIC |
BIC |
Weibull AFT |
15151.30 |
15187.34 |
Cox PH |
14110.59 |
14130.46 |
Mixture Cure Model |
13863.90 |
13905.95 |
The concordance index (C-index) was also used to evaluate the predictive performance of the models. The results in Table 10 indicate that the mixture cure model achieved the highest C-index (0.811), demonstrating superior ability to correctly rank borrowers according to their risk of default. Cox Proportional Hazards model as well performed well (0.782), while the Weibull AFT model showed slightly lower predictive accuracy of (0.756).
Table 10. Model discrimination using concordance index.
Model |
C-index |
Weibull AFT |
0.756 |
Cox PH |
0.782 |
Mixture Cure Model |
0.811 |
Table 11 shows that the Weibull AFT model predicts shorter median times to default than the Cox model which doesn’t give a straightforward TTD estimate for every borrower profile. The Mixture Cure Model throws higher expected TTD numbers into the mix because it sees some borrowers as “cured,” meaning they aren’t likely to default during the period we’re looking at. So the average time-to-default goes up compared to the AFT model, which assumes everyone will default at some point. In the end, the Mixture Cure Model paints a more realistic picture of long-term credit risk since it separates out borrowers who are genuinely at risk from those who aren’t.
Table 11. Median and expected Time-to-Default (TTD) across models.
Model |
Borrower 1 |
Borrower 2 |
Borrower 3 |
Weibull AFT |
908.66 |
727.23 |
914.89 |
Cox PH |
NA |
697.26 |
NA |
MCM (Expected TTD) |
1200 |
950 |
1100 |
4.8. Credit Risk Segmentation
To turn model predictions into real credit risk insights, a borrower risk segmentation framework was implemented that uses the outputs from the Mixture Cure Model (MCM), Accelerated Failure Time (AFT) model, and Cox Proportional Hazards model. By combining the predicted probability of default (PD), time to default (TTD), and hazard intensity, the sturdy grouped borrowers into low, medium, or high-risk categories using quantile-based cutoffs.
This approach looks at more than just whether someone will default. It also considers when and how intensely credit risk shows up. That gives a fuller, more useful picture of borrower risk than you’d get from just a single risk measure.
The integrated risk segmentation results in Table 12 gives a comprehensive picture of borrower credit risk by combining probability of default, time to default, and hazard intensity. When you look across different risk groups, things fall into place predictably. High-risk borrowers stand out, they are more likely to default, they default faster, and their hazard intensity shoots up. That just means their credit quality gets worse quickly. On the flip side, low risk borrowers don’t default much, they tend to stick around longer, and their hazard levels stay low, so their repayment is more stable. The borrowers in the middle fit right where you’d expect: their numbers sit between the two extremes on all three counts.
Table 12. Integrated risk segmentation results across models.
Risk Group |
Probability of Default (MCM) |
Time-to-Default (AFT) |
Hazard Intensity (Cox) |
Low Risk |
0.18 |
1250 |
0.85 |
Medium Risk |
0.42 |
980 |
1.20 |
High Risk |
0.73 |
720 |
2.45 |
What’s reassuring is that these patterns show up no matter which model you use, the Mixture Cure Model, the Weibull AFT, or the Cox proportional hazards model. That kind of consistency tells the segmentation framework is solid. It separates borrower risk profiles well, and financial institutions can trust the results.
The distribution of borrowers across risk groups in Table 13 shows most borrowers land in the low risk group, with fewer people in the medium and high risk categories. So, this suggest that the loan portfolio mainly includes clients who are financially steady, but there’s still a noticeable chunk facing higher credit risk. This distribution highlights why you need solid risk based pricing and careful monitoring, otherwise, one might miss important shifts in the portfolio.
Table 13. Distribution of borrowers across risk groups.
Risk Group |
Count |
Share (%) |
Low Risk |
1650 |
55.0 |
Medium Risk |
900 |
30.0 |
High Risk |
450 |
15.0 |
Kaplan-Meier survival curves in Figure 8 show survival probabilities stratified by integrated risk groups. High-risk borrowers exhibit a markedly faster decline in survival probability, indicating a higher likelihood of early default, whereas low-risk borrowers maintain relatively higher survival probabilities over time, reflecting more stable repayment behaviour.
Figure 8. Kaplan-Meier survival curves by risk group.
4.9. Economic Implications
The economic evaluation assesses the financial implications of the proposed credit risk segmentation by translating model-based risk classifications into expected profitability outcomes. The analysis is conducted at the risk group level to evaluate how heterogeneity in credit risk affects lending performance.
Table 14 presents the expected economic outcomes across the three risk segments. The results indicate a clear difference in profitability across risk the groups.
Table 14. Expected economic value by risk group.
Risk Group |
Expected Revenue (KSh) |
Expected Loss (KSh) |
Expected Value (KSh) |
Low Risk |
18,500 |
2300 |
15,900 |
Medium Risk |
16,200 |
5800 |
10,400 |
High Risk |
14,000 |
9500 |
3200 |
The results show that low risk borrowers do generate the most expected value due to their low probability of default and reduced expected credit losses. Medium-risk borrowers exhibit moderate profitability, while high-risk borrowers contribute the lowest expected value due to significantly higher expected losses.
Table 15 extends the analysis a step further by looking at the whole portfolio, so you get not just how borrowers are split up by risk but also their average predicted risk scores. Turns out, most borrowers are in the low risk group (about 55%).
Table 15. Portfolio-level risk composition and financial impact.
Risk Group |
Share (%) |
Expected PD |
Loss Exposure (KSh) |
Risk Level |
Low Risk |
55.0 |
0.05 |
2500 |
Low |
Medium Risk |
30.0 |
0.12 |
6800 |
Moderate |
High Risk |
15.0 |
0.28 |
15,200 |
High |
All in all, this risk segmentation really matters for the business side. It doesn’t just sort borrowers into groups; it shows real, noticeable differences in what you can expect financially from each group.
5. Conclusion and Recommendation
5.1. Discussion
This study takes a new approach to credit risk in microfinance, combining three survival models: Weibull Accelerated Failure Time (AFT), Cox Proportional Hazards, and Mixture Cure Model (MCM). Instead of treating them as competing methods, they work together, each highlighting a different side of credit risk, like how long borrowers last, how intense their risk is, and their likelihood of default.
The Weibull AFT model shows how time plays out in default patterns. Basically, it separates borrowers pretty well: low-risk borrowers survive far longer than high-risk ones. But, since it assumes everyone fails eventually, it’s not great at representing borrowers who actually keep performing in the long run.
The Cox model pinpoints what really drives credit risk. Things like late payments and days in arrears stand out. They seriously boost the default risk. On top of that, the Mixture Cure Model makes everything more flexible by recognizing a segment of borrowers who simply don’t default, which means it estimates default probabilities more realistically, especially in portfolios packed with different types of borrowers.
To tie all of this together, the study creates a rule-based risk segmentation framework. It uses actual quantiles for probability of default, time to default, and hazard intensity. No arbitrary weights, just clear multidimensional classification that’s easy to interpret.
The economic analysis backs it all up. Low risk borrowers consistently deliver higher expected returns, while high risk borrowers bring bigger losses. So, this survival-based segmentation isn’t just theoretical. It actually matters for financial performance.
5.2. Conclusion
In this study, it builds a credit risk modeling framework that leans on survival analysis, using the Weibull Accelerated Failure Time model, the Cox Proportional Hazards model, and the Mixture Cure Model. The results show that survival analysis gives a richer, more dynamic look at credit risk than you get from traditional static binary default models.
Among the three, the Mixture Cure Model stands out for its flexibility. It handles differences across borrowers, even spotting a group of borrowers with a negligible risk of default. The Accelerated Failure Time model contributes insights into when defaults actually happen, giving a sense of timing. The Cox model, meanwhile, highlights how borrower characteristics affect risk levels. Each of these models shines a different light on credit risk, giving distinct but complementary insights.
To integrate these dimensions, a rule-based risk segmentation framework is developed. It sorts borrowers based on their probability of default, time to default, and risk intensity, using actual distributions in the data instead of making up weights. That keeps things both interpretable and statistically solid.
In the end, bringing these survival-based models together under one segmentation framework doesn’t just make credit risk easier to understand—it also gives microfinance lenders a stronger foundation for making decisions.
5.3. Recommendations
Banks and other financial institutions should adopt the use of survival-based credit risk models because they’re better at predicting who’s likely to default and assessing borrower risk than traditional models. The Mixture Cure Model stands out because it recognizes that not all borrowers are the same. It even spots those who are likely to never default, which adds a big layer of accuracy to credit scoring.
The Accelerated Failure Time model lets you estimate how long before someone might default. That means lenders can catch risky borrowers approaching default and enable early intervention strategies. The Cox Proportional Hazards model, enables financial institutions to dig into the actual reasons behind credit risk, like specific behaviors or financial habits that push up the odds of default.
Instead of sticking to one big, blended score with arbitrary weights, institutions should segment risk based on clear stats: chances of default, timing, and hazard intensity. In this way, credit decisions actually reflect the full picture—not just a one-size-fits-all score.
Looking ahead, banks could make these models even sharper by tracking changes over time like shifting borrower profiles or major swings in the economy. Adding these time varying factors would really help lenders adapt to whatever financial environment pops up next.
5.4. Limitations
This study has a few limitations. First, the sturdy used just one dataset from a single microfinance institution, so it’s tough to say if these findings would hold up in other credit markets or different institutions. Second, the Cox Proportional Hazards model showed the differences in risk, but because it’s only semi-parametric, it doesn’t give the exact survival times for every borrower. Third, the framework connects probability of default, time-to-default, and hazard intensity using rule-based segmentation, but without a clear parametric weighting scheme, it’s hard to compare directly to traditional scorecard credit risk models. And lastly, the model framework didn’t account for external economic factors or the change of borrower characteristics over time, so the model’s predictions might struggle if the broader economy shifts.