Banks are scrambling to meet with IFRS 9 guidelines and are setting down on the path to implement various ECL estimation methodologies and models. But a topic that hasn't been given enough attention is the need for governance of these models and the attendant model risk management framework that needs to be set up to lend credibility to the model estimates. IFRS 9 is the new accounting standard for recognition and measurement of financial instruments that will replace IAS 39. Several banks are planning to perform parallel run by Q1 2017; however, in a lot of cases, model governance finds only a cursory mention in the roadmap adopted by the banks. This blog touches upon the need for validation of models and how model risk governance has become paramount in view of the new guidelines.
The need for a robust Model Risk Management Framework?
Our earlier blogs touched upon how Basel models can be leveraged to some extent in a Bank's IFRS 9 efforts, albeit with significant add-ons and enhancements. In contrast with Basel-II rules, which call for the use of through-the-cyclen (TTC) probabilities of default (PDs) and downturn (DT) loss-given-default rates (LGDs) and exposures at default (EADs), the new IFRS 9 requires entities to use point in-time (PIT) projections to calculate the lifetime expected credit loss (ECL). By accounting for the current state of the credit cycle, PIT measures track closely the variations in default and loss rates over time. Entities are required to recognize an allowance for either 12-month or lifetime ECLs, depending on whether there has been a significant increase in credit risk since initial recognition (Stage 2 and Stage 3 require lifetime ECL computation).In past publications, Aptivaa has explained the concepts of lifetime expected loss and its components ( Demystifying PD Terminologies, Impairment Modelling).
The requirements of lifetime expected loss calculations under IFRS 9 will require a new suite of IFRS 9 models as separate from Basel IRB models. Such a suite of models would require validation under a robust Model Risk Management and Governance Framework. The associated processes need to be in place around the application of expert judgment. BCBS in its consultative document Guidance on accounting for expected credit loss highlights the importance of an independent validation with clear roles and responsibility to effectively validate the model inputs, design and output.
Banks should establish an overarching governance framework over the model validation process, including the appropriate organizational structures and control mechanisms, to ensure that the models are able to continue to generate accurate, consistent and predictive estimates - Guidance on accounting for expected credit loss, BCBS
Under the current Basel framework, there is a lack of a formal model governance structure under a robust risk management framework that looks into basic practical challenges in model risk management. There is a need for sound practices for model governance, including policies and controls, model development, implementation and uses, and also for model validation. A typical model risk management framework covers the following components.
In 2011, the US Federal Reserve led the way by issuing ?SR 11-7 Guidance (Supervisory Guidance on Model Risk Management' - April 4, 2011), with several of its principles readily adaptable for IFRS 9 model risk governance. While the full scope of the model risk management is well beyond the scope of this blog, this blog aims to provide some practical aspects to Model Risk management.
Principle 5 of Basel's 'Guidance on accounting for expected credit loss' seems very much aligned to SR 11-7 text, and one can clearly see the similarities with respect to the scope of the validation exercise.
Basel Guidance on accounting for expected credit loss on scope of validation | SR11-7 Guidance on Scope of validation |
However, SR11-07 goes a little beyond and provides some more practical guidance on the range of validation activities that needs to be covered under a validation framework.
Objective | SR 11-7 Guidance |
Establish model inventory | ?Banks should maintain a comprehensive set of information for models implemented for use, under development for implementation, or recently retired. While each line of business may maintain its own inventory, a specific party should also be charged with maintaining a firm-wide inventory of all models, which should assist a bank in evaluating its model risk in the aggregate.? |
Establish model materiality and requisite validation requirements | ?The rigor and sophistication of validation should be commensurate with the bank's overall use of models, the complexity and materiality of its models, and the size and complexity of the bank's operations.? |
Establish governance process for all IFRS 9 models | ?Developing and maintaining strong governance, policies, and controls over the model risk management framework is fundamentally important to its effectiveness.? |
Identify statistical tests and key areas of emphasis for each model | ?An effective validation framework comprises of the following core elements? Evaluation of conceptual soundness, including developmental evidence?? |
The following sections elaborate on how these guidance/principles can be interpreted.
Establish a model inventory
SR11-7 states the term model refers to a quantitative method, system, or approach that applies statistical, economic, financial, or mathematical theories, techniques, and assumptions to process input data into quantitative estimates. The definition of model also covers quantitative approaches whose inputs are partially or wholly qualitative or based on expert judgment, provided that the output is quantitative in nature.
This is a useful definition to use with respect to identifying IFRS 9 models as well. The complexity of expected lifetime expected loss calculations is much more than in the extant Basel environment - there would be outputs from nonparametric models that will used for the computation of ECL. Also, a substantial increase in the number of models can be envisaged in the IFRS 9 world. It wouldn't be surprising to find many models being spreadsheet based and being used by just a handful of users. But the important aspect is to firstly identify the models, then apply validation routines proportionate to the scope and materiality of the specific model. Depending on the models' lifecycle (e.g., post development, implementation), and model's materiality, the depth of the model validation and review can vary. Banks and financial institutions should adopt a framework which should be fully transparent, with full audit-ability of model definitions and model inventory, to monitor model risk and maintain transparency.
Establish model materiality and requisite validation requirement
The nature and scope of the validation should depend on the complexity and the materiality of the portfolios and the models being used. There should be distinct and transparent model materiality classification framework for models. This could be simple asset size based thresholds, RWA based guidelines or based on model purpose. The nature and scope of validation will then be dependent on the materiality of the models in question.
SR 11-7 says where models and model output have a material impact on business decisions, including decisions related to risk management and capital and liquidity planning, .. a bank's model risk management framework should be more extensive and rigorous?.
An illustrative possible model materiality and the corresponding Validation requirements are provided below.
Materiality | Nature and Scope of Validation |
High | In-depth validation review covering all components and tests to assess the quality and usability of the model. This includes performance and calibration tests, stability tests, development of benchmark models. This also requires an independent validation (possibly third party) and comprehensive documentation including a qualitative validation. |
Medium | Validation of selective components. Full model development exercise may not be replicated. Validation based on a few performance statistics, and/or review of model development evidence. |
Low | Validation based on check-list for only a few components of the overall model built can be performed by the model owner to check some measures related to performance and usage. Sometimes, only model documentation review is done by third party. |
Establish governance process for all IFRS 9 models
Shown below is a typical validation process for effective model risk management. The process and its rigor would change based on model materiality. However, note that while an annual independent validation is required, there should be a validation and review before the model goes into implementation. To meet tight deadlines, it is important that the model development teams get sign offs from validation teams at intermediate stages and a strong process is in place for such interactions between the development and the validation teams. For instance, right at the model development initiation, the terms of reference could state at a high level the purpose of the model and the methodology options that would be explored in the exercise. For example, the terms of reference can state that for PD term structure computation, it would explore the binomial method and the reasons for the same. Similarly, at the stage of data preparation, another sign off could be obtained from the validation team, so that validation team can flag off issues at an early stage, be it with data quality or with completeness of data. Shown here is a typical validation cycle that can be adopted by financial institutions.
Identify statistical tests and key areas of emphasis for each model
Based on the impact of IFRS 9 on the components of expected credit loss estimation, the below table summarizes the typical methods for validation, and their relevance across critical review parameters:
IFRS 9 Component | Model Type | Validation Methods | Relevance of Review component | ||||
Data Validation Review | Methodology Soundness Study Review | Quantitative Validation | Qualitative Validation | Documentation Review | |||
PD | Pure Expert Opinion | Validation of subjective factors through detailed qualitative review based on technical insights and expertise in the relevant portfolio under validation. | Low | Low | Low | High | High |
Statistical Models | Measures of discriminatory power (KS, AUROC, Accuracy Ratio), measures for predicting accuracy of PDs (Chisquare and binomial tests), Stability Test (Population stability index) | High | High | High | Medium | High | |
Off the Shelf Vended Models | Emphasis should be on reviewing model documentation and model updating based on changing market conditions. Other tests like sensitivity tests can be done. A combination of qualitative review and quantitative validation techniques to validate both statistical and subjective factors. | High | High | Medium | Medium | High | |
Hybrid (Statistical and Expert Judgment) | Medium | Medium | Medium | High | High | ||
LGD | Workout LGD (statistical) | Some common measures for validating the performance of LGD models are: 1) Scatter Plots 2) Confusion Matrix 3) Loss Capture Ratio 4) Expected Loss Shortfall 5) Mean Absolute Deviation 6) Correlation Analysis | High | High | High | High | High |
Workout LGD (Bayesian Expert Judgment Based) | Medium | High | High | High | High | ||
Regulatory LGD | Low | Low | Low | Low | Low | ||
EAD | Pure Classical Models | Some commonly used measures of testing goodness-of-fit: 1) Back-testing 2) Confusion matrices 3) Regulatory Benchmarking | High | Medium | Medium | Low | High |
Augmented Classical Models | High | High | High | High | High | ||
Macroeconomic Forecasting | Econometric Models | Macroeconomic forecasts are validated by using measures of forecasting errors. Common measures are Mean absolute deviation (MAD) and Mean absolute percentage error (MAPE). Validation of parameters used for deriving the PiT estimates of PD from the TTC estimates using adjusting Credit Cycle Adjustment Method. Qualitative review of the subjective overlays used in macroeconomic forecasting (eg, subjective decisions regarding fiscal and monetary policies in an economy) | High | High | High | High | High |
Z Score | High | High | High | High | High | ||
Multinomial regression to predict migrations | High | High | High | High | High | ||
PIT PD Calibration | Time series modeling for the PIT central tendency estimation and then running optimization for the calibration | Validation of the central tendency (CT) based on a shift in the current and future macro-economic environment and re-adjustment of score-buckets based on updated CT. Review of calibration tests and performance tests post re-calibration to ensure model robustness. The frequency of calibration review will increase due to increase in volatility inherent in a PiT approach compared to a TTC approach under Basel. | High | High | High | High | High |
Lifetime PD | Binomial | All the methodologies givePiT PD term structure as output. Validation should focus on assessing the accuracy of the term structure through regular backtesting activities and re-adjusting the term structure by minimizing the predictive error based on historical data and on forecasts of economic data. This can be done through measures of forecasting error and measures for assessing the discriminatory and predictive power of PDs. | Medium | Medium | Medium | Medium | High |
Markov Chain | High | High | High | Medium | High | ||
Mapping to External Ratings | Medium | Medium | Medium | High | High | ||
Loss Rate Approach | Collective Loan Loss Allowance Method | Macroeconomic adjustments are an important aspect of the loss rate approach. A scalar factor can be computed and the losses can be adjusted based on the factor. Regular monitoring and frequent validation is the key to ensuring such adjustments are accurate. Measures like 30DPD, collateral value, changes in expected performance and behavior of borrower, management judgment etc. should be reviewed periodically. Expert judgment validation is of paramount importance. | High | Medium | Medium | Medium | High |
Roll Rate Method | High | Medium | Medium | Medium | High | ||
Vintage Loss Method | High | Medium | Medium | Medium | High | ||
Cash Flow Approach | Discounted Cash flow Assessment | Mostly used for Stage 3 customers and validation at an individual level/granular. Back-testing of actual versus estimated cash flows should be the basis of validation under this approach. | Medium | Medium | High | High | High |
Identify the validation and performance statistics for various models
Shown below are some industry practices around the various performance statistics used to gauge model performance.
Performance measures for Probability of Default models
Under a PIT PD approach, PDs are estimated taking all available cyclical and non-cyclical, systematic and obligor specific information into account. Industry-specific factors and macroeconomic indicators need to be utilized to increase the forward-looking predictive power of the PDs in order to be more PIT. In such a scenario, it would require frequent re-rating of obligors to capture the changes in their PDs due to all relevant factors including cyclical ones. Validation under such a scenario could be based on an early warning trigger framework which is also forward-looking in nature. Over time, the bank needs to monitor if the obligors risk rating is being upgraded or downgraded effectively enough to capture their PIT PDs. Such a monitoring can be done using the risk-rating migration rates and there are various migration/mobility measures to quantify the degree of such migration. The more PIT the PDs, a higher migration of ratings would be observed due to movement between business cycles. A pure PIT approach, however, would be an ambitious effort and a hybrid-approach between TTC and PIT PDs would be more practical to implement.
Measures of Discriminatory Power | |
Gini Coefficient or Accuracy Ratio(AR) | AR is the summary index of Cumulative Accuracy Profile (CAP) and is also known as Gini coefficient. It shows the performance of the model that is being evaluated by depicting the percentage of defaulted accounts that are captured by the model across different scores. e.g. 60% Accuracy Ratio means that out of 10 defaulted accounts, model captured 6 defaulted accounts across different scores. |
Kolmogorov-Smirnov (KS) Statistic | KS is the maximum distance between two population distributions. This statistic helps discriminate default accounts from non-default accounts. It is also used to determine the best cutoff in application scoring. The best cutoff maximizes KS, which becomes the best differentiator between the two populations. The KS value can range between 0 and 1, where 1 implies that the model is perfectly accurate in predicting default accounts or separating the two populations. A higher KS denotes a better model. |
Receiver Operating Characteristic Curve/ Area Under Receiver Operating Characteristic (ROC) Curve | Area under the curve interprets the ability of the rating model to accurately classify non-defaulted and defaulted accounts. When AUC is 0.5 (50%), it means that nondefaulted and defaulted accounts are randomly classified and when AUC is 1 (100%), it means that the scoring model accurately classifies non-defaulted and defaulted accounts. Thus, the AUC ranges between 50% and 100%. |
Pietra Index | The Pietra Index is a summary index of Receiver Operating Characteristic (ROC) statistics because the Pietra Index is defined as the maximum area of a triangle that can be inscribed between the ROC curve and the diagonal of the unit square. The Pietra Index can take values between 0 and 0.353. As a rating model's performance improves, the value is closer to 0.353. This expression is interpreted as the maximum difference between the cumulative frequency distributions of default accounts and non-default accounts. |
Calibration Power test (PD accuracy test) | |
Binomial Test | The Binomial Test is a natural possibility for the validation of PD estimates banks have to provide for each rating category of their internal rating systems. Its construction relies on an assumption of the default events in the rating category under consideration being independent. |
Chi Square test | Similar to the Binomial Test, this test validates the accuracy of the model calibration with the following null hypothesis: ?PDs are estimated correctly? |
Hosmer-Lemeshow Test (pvalue) | The Hosmer-Lemeshow test is a statistical test for goodness-of-fit for classification models. The test assesses whether the observed event rates match the expected event rates in pools. Models for which expected and observed event rates in pools are similar are well calibrated. The p-value of this test is a measure of the accuracy of the estimated default probabilities. The closer the p-value is to zero, the poorer the calibration of the model. |
Brier Skill Score (BSS) | BSS measures the accuracy of probability assessments at the account level. It measures the average squared deviation between predicted probabilities for a set of events and their outcomes. Therefore, a lower score represents a higher accuracy. |
Traffic Lights Test | The Traffic Lights Test evaluates whether the PD of a pool is underestimated, but unlike the binomial test, it does not assume that cross-pool performance is statistically independent. If the number of default accounts per pool exceeds either the low limit (Traffic Lights Test at 0.95 confidence) or high limit (Traffic Lights Test at 0.99 confidence), the test suggests the model is poorly calibrated. |
Normal Test | The Normal Test compares the normalized difference of predicted and actual default rates per pool with two limits estimated over multiple observation periods. This test measures the pool stability over time. If a majority of the pools lie in the rejection region, to the right of the limits, then the pooling strategy should be revisited. |
Performance measures for Loss Given Default models
Under the current Basel framework, banks are required to calculate downturn estimates of Loss give default. Such downturn estimates help stabilize RWA by making it less susceptible to changes in the underlying credit cycle. Under the IFRS 9 framework, banks are required to calculate best estimate measures based on current risk, which, in other words implies calculating point in time estimates. Such PIT LGD estimate accounts for all relevant information including the current state of the credit cycle as well as specified macroeconomic or credit-factor scenarios in the future. Also, under IFRS 9, discounting of historical recovery cash flows is done based on the effective interest rate compared to the current practice of contractual. Under such a scenario, LGD estimates can be validated to check the following:
Amongst all the methods for computation of LGD estimates, Workout LGD is the most widely used method to build LGD models as pointed out in an earlier publication by Aptivaa, (?Cash Shortfall & LGD ? Two Sides of the Same Coin). Before a detailed validation strategy can be framed, it is important to be consistent in the definition of loss and default (depending on the portfolio and product type). Below are some methodologies based on which validation review can be performed.
1. Scatter Plots: Scatter plots can be useful to examine the relationship between the expected and observed losses. Such plots can reveal anomalies such as extreme values (indicating validation base clean-up issues) and also how the estimated and observed values move together. Greater concentration along the diagonal shows accuracy and deviations observed along the axis can be a cause for concern requiring review of LGD model parameters. A scatter plot is an example of a summary plot which is used as a ?pulse check? to recognize any inherent problem at a glance.
2. Confusion Matrix: Confusion matrices are designed to look at all the combinations of actual and expectedclassifications within each LGD bucket. This could bebased on count, EAD or observed loss basis. Inpractice a common LGD scale typically ranges from 0%to 100% with not more than ten risk grades. In this blog,all expected and observed LGDs are discretized into sixbucket ratings from LR1 to LR6.
Such a table gives an idea of how the observed lossesare classified by the model predicted LGD. Confusionmatrices can be summarized using measures such as ?Percentage match? and ?Mean absolute deviations? toarrive at one figure based on which performance can be evaluated against internal/industry benchmarks. Theadvantage of using a measure like ?Mean absolute deviation? is that it captures the magnitude of the deviation ofthe actual and predicted numbers. In mathematical terms, the mean absolute deviation can be written as:
3. Expected cash shortfall: Expected cash shortfall can be defined as the difference between the total losses expected against the observed. The difference is expressed as a percentage of total observed loss for comparison between different portfolios.
To understand the expected cash shortfall, we look at the sample confusion matrix by observed loss above. In the table, the figure US$ 7,267,809 is derived by multiplying the observed LGDs by EADs. However, if we use the expected LGDs, then this figure becomes US$ 54,783,324 which shows that the LGD model has a large expected cash shortfall of -653% which implies significant over-prediction. Expected cash shortfall method thus gives an idea of the extent of conservatism or underestimation in the LGD model and this should be validated against established benchmarks.
4. Loss capture Ratio: The ?loss capture ratio? gives a measure of the rank-ordering capability of LGD models on the basis of how well they capture the portfolio's final observed loss amount. The loss capture ratio is derived from the ?loss capture curve? which is defined as the cumulative observed loss amount captured while traversing from highest expected LGD to lowest.
To plot the loss capture curve, transactions are first sorted by the LGD model's raw LGD values between 0 and 1 from highest LGD to lowest LGD. The cumulative loss captured percentage is then calculated from left to right (highest expected LGD to lowest) by accumulating the observed loss amount (EAD times observed LGD) over the portfolio's total observed loss. The loss capture ratio is defined as the ratio of the area between the model loss capture curve and the random loss capture curve (45 degree line representing a complete random model) to the area between the ideal loss capture curve and the random loss capture curve. Similar to the accuracy ratio, it is a measure of how close the model is to a perfect model which is able to estimated losses with 100% accuracy.
5. Correlation Analysis: The model validation report for LGD should provide a correlation analysis of the estimated LGD with the actual LGD. This correlation analysis is an important measure for a model's usefulness. Correlationbased metrics quantify the degree of some statistical relationship between predicted and observed values.
Correlation Metric | Definition | Remarks |
R-squared | It can be defined as one minus the fraction of the sum of squared errors to the variance of the observations. Since the second term in the formula can be seen as the fraction of unexplained variance, the R2 can be interpreted as the fraction of explained variance | Although R squared is usually a number on a scale from zero to one, it can yield negative values when the model performance is extremely low |
Pearson/Spearman/ Kendall correlation coefficients | All three metrics measure the degree of linear relationship between predictions and observations | All three correlation coefficients can take values between minus one (perfect negative correlation) and one (perfect positive correlation) with zero meaning no correlation at all. |
Performance measures for Exposure at Default models
Similar to LGD, the EAD models can be validated using scatter plots, and confusion matrices. Most of the backtesting for EADs are done at a product or at an industry level.
Scatter plots
Scatter plots can be useful for examining the relationship between the expected and observed EADs. Such plots can reveal anomalies such as extreme values (indicating validation base clean-up issues) and also how the estimated and observed values move together. As mentioned in our earlier EAD blog, there are some peculiarities with respect to EAD modelling such as treatment of outliers, which could potentially lead to negative EADs being predicted, or the EADs appearing above the granted limit amounts i.e greater than 100%.
Confusion Matrices
Similar to LGDs, confusion matrices can be used for EADs as well, by CCFs and LEQ's intro grades and performing a notching analysis on the basis of these grades. Some models link borrower characteristics to EADs using regression methods, in which case standard regression statistics are tested.
Performance measures for Macroeconomic Models
Conventional methods of macroeconomic forecasts are based on estimated parameter values and intercept terms are used to produce the first-cut forecasts of relevant endogenous factors. These are then adjusted based on subjective/exogenous factors based on available evidence and consensus judgment. Such exogenous factors are based on speculation in the market and global uncertainty. These initial forecasts are based on time-series (ARIMA models, exponential smoothening techniques, etc.) and regression analysis or an ensemble approach. Validation of such macro-economic forecasts can be done based on forecast accuracy based on performance measures derived from forecasting errors. Some of the commonly used measures are:
Measures of forecasting error for macro-economic forecasting
1. MAPE: The MAPE (Mean Absolute Percent Error) measures the size of the error in percentage terms. It is calculated as the average of the unsigned percentage error, as shown in the example below. MAPE gives a measure in % terms which makes it easy to understand. It should be noted that MAPE is scale sensitive and can take extreme values when the actual volumes are low.
2. MAD: The MAD (Mean Absolute Deviation) measures the size of the error in units. It is calculated as the average of the unsigned errors, as shown in the example below. The MAD is a good statistic to use when analyzing the error for a single item. However, if you aggregate MADs over multiple items you need to be careful about high-volume products dominating the results.
Also, validation for the macro-economic factors may include a review of the correlation between macro-economic indicators and historical losses. Based on an evaluation of such correlation trends, only those macro economic factors should be kept which have the closest association with historical losses.
Conclusion
The activity of Model Validation will play an increasingly important role under IFRS 9 with respect to identification of model risk stemming from data, methods, assumptions, calibration, documentation, implementation, usage and governance. The estimation of lifetime expected loss itself is an output of many moving parts working together in a complex macro-economic driven and volatile environment. Modeling for ECL estimation would lead to a significant increase in the complexity and number of underlying models for capital and expected credit loss estimation. Through effective validation, it is important to identify and highlight any model misspecifications or improper use of model outputs so that timely action can be taken to avoid business impact. Validation under an effective model risk management framework would be of prime importance for implementation as per IFRS 9 guidelines.
Don't miss this roundup of our newest and most distinctive insights
Subscribe to our insights to get them delivered directly to your inbox