Assessing the Impact of Modelling on the Expected Credit Loss (EERRLL) of a Portfolio of Small and Medium-sized Enterprises

This paper studies the impact of the internal modelling on the calculation of expected credit loss in the framework of the international standard IICCIISS 9. Indeed, the probability of default of counterparty depends on the model used for the conception of the internal rating system. The multitude of probabilistic models renders uncertain and imprecise, the calculation of the expected loss for the same SSBBEESS portfolio of a Moroccan bank, as well as the comparison of losses over time due to the non-permanence of the rating system used. As a result, the regulator will be unable to guarantee an equitable and transparent system of provisioning of the losses, because of the absence of standardization of the elaboration process of the rating tool.To show this risk associated with the multitude of models, this paper studied the impact of choice of the model on the expected credit loss, by calculating of the probability of default for several types of modelling based respectively on the pure logistic regression and the logistic regression on the principal components.


Introduction
The banking regulation defines several approaches for calculating the capital required to cover the credit risk. This enriches the techniques available for determining the risk profile of banks.
These approaches are composed of simple approaches based on the application of a coefficient, on the credit exposure and of complex approaches based on the internal rating models (Internal Ratings-Basesd ).
The expected loss calculation is based on the probability of default of the corporate, the loss at the time of default and the exposure, in term of outstanding amount of credit, in the event of default. Indeed, various studies were conducted to define an approach for predicting the probability of default. These studies concern linear discriminant analysis, the intelligence techniques, Bayesian networks and probabilistic models, which we will summarize as follows:

−
Multidimensional Linear Discriminant Analysis The prediction of default by linear discriminant analysis was developed by Altman [2], by defining a linear relationship between default and financial ratios. Indeed, Altman has defined a score function ( ) that distinguishes between the healthy and the failing companies. The Altman's approach has been adopted by other research to determine ratios that predict failure such as those conducted by: Taffler [40], Bardos and al. [10] and Grover and Lauvin and al. [26].

−
Intelligence Techniques These techniques are based on different logics such as neural networks and genetic algorithms. The several studies have applied these techniques to predict the default of the corporates, such as those conducted by: Bel and al. [13], Back and al. [8], Liang and wu [32], Bose and Pal [16] and Oreski and al. [35].

−
Bayesian Network The Bayesian classifier (Friedman and al., [23]) is based on the calculation of a posterior probability of each observation belonging to a specific class. Indeed, he finds the posterior probability distribution ( | ) , where = ( 1 , … ) is a random variable to be classified in k categories and = ( 1 , … ) is a set of explanatory variables. The Bayesian classification of failing companies 410 Assessing the Impact of Modelling on the Expected Credit Loss ( ) of a Portfolio of Small and Medium-sized Enterprises a was studied by a set of researchers as Gemela [24], Das and al. [20], Dwyer and al. [21], Gôssl [25] and Tasch [42].

Probabilistic Models
The probabilistic models are the Logit model based on the logistic distribution and the Probit model based on the gaussian distribution. The several studies have focused on discriminant logistic analysis to predict the default of companies such as those conducted by Ohlson [34], Hunter and al. [31], Hensher and al. [28] while the Probit model has been studied by other researchers such as Zmijjewsji [44], Grover and al. [26] and Bunn and al. [18].
Due to the multitude of models, the determination of the expected credit loss ( ) 1 as defined by the Basel Committee on Banking Supervision [12] becomes dependent on the models and techniques chosen for the elaboration of the rating system, which renders uncertain the calculation of the expected credit loss ( ), by the financial institutions, as a result of absence of standardization of the techniques used. In this case, the institutions can exploit the opportunities offered by the modelling to minimize the expected credit loss at the detriment of transparency and stability through the recourse of the arbitrage between the techniques and the models possible. This situation gives rise to a risk of models whose impact on the stability of banking system will have the same importance as the credit risk.
In this paper, we will show the uncertainty in the calculation of expected credit loss, generated by the multitude of models and the absence of standardization and it, by determining the expected loss of an s portfolio of a Moroccan bank. Indeed, we will study the impact of the modelling of probability of default ( ) on the calculation of the expected credit loss and this, by using two versions of probabilistic models to predict the probability of default( ).
The Modelling in our study is based on the logistic regression because we will use the pure logistic regression and the logistic regression on the principal components to determine the probability of default per the rating class.
The rest of this paper is as follows, Section 2 is devoted to the calculation of the expected credit loss. We first give a definition of the credit risk. We then define the approach to calculate the unexpected and expected credit loss. We finally present, in this section, the construction approach of the scoring system and a probabilistic model that will be used. The third section is reserved to the empirical study. In this section, we will first analyze and describe the database used. Then we present and interpret the empirical results of the probability of default for each chosen model. We finally compare the expected credit loss calculated by the chosen models.

Calculation of the Expected Credit
Loss ( )

Definition
The credit risk of the banking portfolio is defined as the risk that counterparty will default for a one-year period. The notion of default means that the counterparty is unable to honor these commitments towards the establishment of credit. The incidents causing the default are multiple, but the most recurrent are: − The downgrading outstanding debts in the bad debts 2 , − The chronic overruns on lines of credit granted to clients, − The accounts that have few transactions and the applications that have lapsed and that have not been renewed, − The unfavorable echoes from the market, the sectoral difficulties and a significant fall in the level of activity.
Each credit relationship is associated with an actual or potential credit risk situation. The total value to which a bank is exposed when a credit is at default.  The Unexpected Loss ( ) The unexpected loss is the total loss from to the expected loss ( ). It is calculated as a standard deviation from the mean at a certain confidence level (1 − ). It is also called of credit.

The Quantification of Credit Risk
For the quantification of credit risk, the Basel Committee provides two categories of approaches for calculating the minimum capital requirements for credit risk ( ).

The Standardized Approach
Under the standardized approach, the bank must calculate the weighted assets. Indeed, a bank's total risk-weighted on-balance sheet equal the sum of the risk-weighted amounts of each asset it holds on-balance sheet.
The risk-weighted amount of an on-balance sheet asset is determined by multiplying its current book value of credit exposures by the risk weight ( ) specified by the regulator. Exposures should be risk-weighted net of specific provisions.
The risk-weighting ( ) is defined by type of claims (claims on sovereigns and central banks, claims on other official entities, claims on banks and securities firms, claims on corporates, claims included in the regulatory retail portfolios, claims secured by residential property, claims secured by commercial real estate, treatment of past due loans, higher-risk categories et other assets) 3 ; and depending on the rating assigned to the counterparty by the external rating agencies.
The minimum capital requirements under the standard approach ( ) to cover the given counterparty credit risk exposure is defined as follows: =β * risk − weighted amount (2) Where : β, is a minimum coefficient of solvability 4 3 If the expected loss ( ) is defined by the formula (1), the unexpected loss ( ) is calculated by multiplying a coefficient (α) by the exposure at default ( ). The coefficient (α) is a function of the , and Maturity ( ). As a result: Furthermore, the banking regulations provide for two categories of Internal Rating-Based 6 :

−
The Internal Ratings-Based-foundation approach Under the foundation approach, as a general rule, banks provide their own estimates of probability of default and rely on supervisory estimates for other risk components ( , and ).

−
The Internal Ratings-Based-advanced approach Under the advanced approach, banks provide of their own estimates of , and , and their own calculation of .

Calculation of the Expected Credit Loss
Under the approach (foundation and advanced), the probability of default ( ) is an important component in calculating the expected loss ( ). Indeed, the aim is to determine the risk profile of the credit counterparty on the basis of qualitative and quantitative data through a discriminant analysis. The internal rating tool must enable credit counterparties to be classified as a function of their probability of default ( ) and must predict customer failure. For this reason, we will present the process of conception of this type of classifier (rating tool):

Presentation of the Process of Conception of a
Rating Tool The following schema can summarize the process of conception of rating tool: 5  Assessing the Impact of Modelling on the Expected Credit Loss ( ) of a Portfolio of Small and Medium-sized Enterprises In the following, we will present the approach, which we have adopted for the conception of ratings models; it is based respectively on logistic regression and logistic regression on principal components.

Database Treatment
The treatment of the database and the choice of explanatory variables are made according to the following schema: The quantitative variables The quantitative variables ( ), = 1, . . ,16, are divided into 6 classes, defined as follows :   Example: the modalities relating to the sector default rate are : 1-below average, 2-equal to average, 3-above average. In this case, the scores given are respectively: 100, 50, 0.
 Variables at five modalities : [0, 25,50,75,100] Example: the modalities relating to natural risk are:1-No risk, 2-Low risk and the adequate crisis plan, 3-High risk and the adequate crisis plan, 4-Low risk without crisis plan, 5-High risk without crisis plan. In this case, the scores given are respectively: 100, 75, 50, 25, 0.
The assessment of the logical relationship between the modalities of each variable and the default is determined on the basis of expert opinion.
Univariate analysis and the determination of explanatory variables − Univariate analysis The objective of univariate analysis is to determine the relationship between a company's default and each of its quantitative and qualitative variables. The default is modeled by a binary variable defined as follows: The relationship between the variable to be explained and the explanatory variables and is determined by the logistic regression model. Indeed, the definition (4) shows that the variable is a Bernoulli variable which takes the values 0 and 1 of parameter with ( = 1) = and ( = 0) = 1 − . Therefore, the formula (4) can be written in probability as follows: The logistic regression relationship consists in defining a relationship between and a logistic probability ( ) defined from the variables � � and ( ) . Indeed, we need to make a logistical transformation (Logit) to bind the dependent variable to each variable among the quantitative variables � �, j = 1, … ,16 and qualitative variables ( ), = 1, … ,19.
For each quantitative variable or qualitative the relationship between the default and the variables studied is defined by the probability: Indeed, the modelling of conditionally to the quantitative variable or qualitative variable , is defined by the model: The modelling of the probability 0 We will only present the univariate analysis of the 414 Assessing the Impact of Modelling on the Expected Credit Loss ( ) of a Portfolio of Small and Medium-sized Enterprises quantitative variables since we will proceed in the same way for the analysis of the qualitative variables . Indeed, for each company ( ), the value of the quantitative variable is and let 0 be the probability that the company ( ) is healthy. Indeed, 0 is defined by � = 1/ = � with: Thus, the formula (6) for company ( ) is written: First, we will define the ratio 0 1− 0 noted the Oddos of 0 : Consequently, the logistic regression model establishes a linear regression relationship between ln � 0 1− 0 � and as follows: Therefore, the formula (6) and the associated probability 8 can be written: Let be the random variable that models the company's default ( ). The conditional variable ( / = ) can be written in probability as follows:  The likelihood function: The likelihood function of enterprise is defined by: = �( Let us note = ( 0 , 1 ) so the estimate of is the determination of ̂ which maximizes the likelihood function and consequently maximizes the Loglikelihood function ( ).
The maximization of will be done by solving the equation system determined by the following conditions: = 0 and 2 2 < 0 (13) Three numerical algorithms are used to solve this problem, which are the Newton-Raphson algorithm, the score method and the Berndt-Hall-Hall-Hausman method.
 Testing of the significance of the coefficients (test de Wald) The testing of the significance of the coefficient 1 , is established by the Wald test. The Wald test is obtained by testing the hypothesis 0 formulated as follows: The Wald test is based to the following ratio: The ratio, under the hypothesis 0 , will folow a 2 with one degree of freedom. We reject 0 if 1 > 1 2 .
 discriminatory power (power stat) The discriminatory power represents the model's ability to predict future situations. We will use the curve to determine the discriminatory power of each variable and . The determination of the curve will be done from the classification table of the sample of estimation of the variable which is presented as follows: One indicates by sensitivity ( ), the proportion of the healthy companies classified well: = + and by specificity ( ), the proportion of the de companies is in default, classified well: = +T If one varies the "probability threshold" from which it is considered that a company must be regarded as healthy, the sensitivity and specificity varies. The curve of the points (1 − , ) is the curve.

−
The determination of explanatory variables To determine the explanatory variables to be retained for modelling, we will carry out a univariate logistic regression analysis for each variable in the chosen list. The choice of variables to be retained for the modelling is based on the discriminatory power of each variable.
The discriminatory power is determined by using the area under the curve and the Accuracy ratio ( ).  Accuracy ratio ( ) The accuracy ratio is defined by the relationship: The takes values between 0 and 1.

 Determination of explanatory variables  The decision rules
The variables that verify the following two characteristics will be retained: − The Area Under the Curve ( ) is superior to 60% and accuracy ratio ( ) is superior to 20% ; − The relationship established between the factor and the default rate must be logical. This qualification is based on expert opinion.
 Strongly correlated variables After selecting the explanatory variables on the basis of the decision rules mentioned above, we will study the correlation between the selected variables. The study of correlations makes it possible to eliminate strongly correlated variables. Indeed, if two or more variables have a correlation coefficient superior to 0,5 ( ≥ 0,5) then the variable that represents the greatest will be selected. In this paragraph, we will present the modeling of the relationship between quantitative and qualitative variables and defect, through pure logistic regression and logistic regression on main components and we will then present the tests for assessing the fit of the multivariate model.

−
The pure logistic regression Let , = 1, … , and , = 1, … ,6 be the respectively the quantitative and qualitative variables retained by the univariate analysis.
The objective of multivariate discriminant analysis by the pure logistic regression, is to study the relationship between the variable to be explained and the explanatory variables , = 1, … , and , = 1, … ,6 . In this case, we will separate the modelling of the quantitative and qualitative variables. Indeed, the Models can be formulated by the following relationships:  The quantitative model (18) where 1 and 2 are the logistics probabilities.
 Presentation of the model The objective of the modelling is to express the variable by the following model: be the quantitative and qualitative data, of the company ( ) and the realization of for this company then is written: The Logit transformation has the following form:

416
Assessing the Impact of Modelling on the Expected Credit Loss ( ) of a Portfolio of Small and Medium-sized Enterprises Thus, ( ) can be written:  The quantitative modelling of by the probability 1 The modelling of conditional only on quantitative data is defined by the model: Let � 1 , ⋯ , � be the quantitative data of the company ( ) which represent the realizations of the variables � 1 , ⋯ , � by the company ( ) and the realization of for this company then is written: To determine the expression of 1i by the logistic regression model, we will first define the Oddos represented by the ratio 1 1− 1 . Hence: Therefore, we can write the model: . , ) and ′ = �1, 1 , ⋯ , �, so: Let be ′ = �1, 1 , ⋯ , � then 1 and can be written Let be the realization of related the company ( ), the conditional variable ( / 1 , ⋯ , ) can be written in probability as follows:  The likelihood function The likelihood function of enterprise is defined by:  The loglikelihood function (LL) Let us note = ( 0 , 1 , … . , ) so the estimate of is to determine ̂ wich maximizes the likelihood function and consequently maximizes the loglikelihood function( ).
The maximization of will be done by solving the equation system determined by the following conditions: Also, this problem is solved numerically, by the Newton-Raphson algorithm, the score method or the Berndt-Hall-Hall-Hausman method.
 The quantitative score function : The estimation of allows determining the quantitative score function 1 that is written: 1 =̂′ Hence, the probability 1 is estimated by: The objective of qualitative modelling is to determine the relationship between the variable and the themes . The values taken by the theme is a linear weighting of the qualitative variables that compose it, noted, , 1 ≤ ≤p. Indeed, is defined as follows: where  = 0 if the variable is not explanatory  For explanatory variables , the associated weights are determined by weight simulations that maximize the discriminatory power of the theme (max ) by a logistic regression between the variable and the them .
The modelling of conditional on qualitative data only, is defined by the model: Let ( 1 , ⋯ , 6 )be the qualitative data of the company ( ) and the realization of for this company then is written: Where 2i = ( = 1/ 1 , ⋯ , 6 ) In this case, the oddos of the probability 2i are written: Therefore, we can write the model: . , ) and ′ = (1, 1 , ⋯ ⋯ , 6 ) thus: ) then 2 and can be written : For the estimation of ̂= (̂0, . . ,̂) , the same procedure presented for the quantitative model will be used.
 The qualitative score function : The estimation of allows determining the qualitative score function 2 that is written: 2 =̂′ Hence, the probability 2 is estimated by: The logistic regression on the principal components The principal components analysis ( ) is a technique based on the reduced of the explanatory variables by transforming the correlated variables into uncorrelated variable, which explain the maximum amount of variance, called the principal components( ), this technique was introduced by Karl Pearson in the early 20 ℎ century, and developed by Harold Hötteling in 1933.
 The principal components analysis( ) Let � , 1 ≤ ≤ � be the explanatory variables (quantitative and qualitative) and let ( ) × be the realization of these variables. Let = ( ) × be the matrix of these realizations. The column vectors of the matrix are .1 , .2 , .3 … , . who represent the realization of each variable .
Let us note = ( ) × the covariance matrix of the variables � , 1 ≤ ≤ � and let ( 1 , 2 , 3 , … . ) and ( 1 , 2 , … , ) be, respectively, the eigenvalues and the eigenvectors of the matrix . We designated by the matrix whose columns , = 1, … , are the principal components defined by = with = ( ) × being the matrix which has as columns the eigenvectors ( i ) of the matrix .
The decomposition into principal component makes it possible to express the vectors into reduced number of principal components = ∑ =1 , j=1…..p , which represents a high percentage of cumulative variation ( ) given by the following formula: The average variation is defined by:  The logistic regression on the principal components The logistic regression on the principal components is equivalent to the logistic regression on the explanatory variables , j = 1, … , p . Indeed, let , = 1, … . be the realizations of the variables hence = ∑ =1 so , ( = 1, . . ) In order to improve the quality of the model in the case of collinearity, we will use a reduced number of ( ) principal components for modelling the variable . Indeed, the formula (20) becomes:

418
Assessing the Impact of Modelling on the Expected Credit Loss ( ) of a Portfolio of Small and Medium-sized Enterprises In our study, the number ( ) is defined as the number of components whose cumulative variation is superior to 70%.
For the , the quatitative variables and the qualitative variables are modeled simultaneously to determine the probability ( )

−
Assessing the fit of the multivariate model: To assess the fit of the multivariate quantitative and qualitative models, we will proceed as follows: To assess individual significance, the hypothesis to be tested is: The testing for the significance of the coefficient , is established by the Wald test based to the following ratio : The ratio, under the hypothesis 0 , will folow a 2 with one degree of freedom. We reject 0 if > 1 2  Testing of the overall significance (the likelihood ratio test) The overall significance test is based on the likelihood ratio between the model without the explanatory variables and the model with the explanatory variables. Indeed, the test can be expressed as follows: The test ratio is defined by: Maximum loglikelihood of the model 1 : Maximum loglikelihood of the model 2 : The ratio, under the hypothesis 0 , will follow a 2 with degree of freedom. We reject 0 if ST > 2 and model 2 is better than model 1

 The Hosmer-Lemeshow test
The objective of this test, defined in Hosmer et al [29], is to assess the concordance between the predicted and the observed values. Indeed, the data are arranged in ascending order of the probabilities calculated by using the model, then divided into 10 groups.
The test hypotheses are: Where = ∑ is the number of observed outcomes, events, in group , is the number of observations in group , ̂ is the average predicted probability by the model in group .
, under the hypothesis 0 , will follow a 2 with 8 degree of freedom. We reject 0 if > 8 2  The performance of the model in the classification of the entreprise The assessment of the performance of the model is necessary to determine the discriminatory power of the model, and to compare several models. To measure the performance of the model we will use the area under the curve ( ) presented previously. A higher means that the discriminatory power of the model is excellent. Hosmer and al. [29] defines the general rules for classification of the models based on the . Consequently, in our study, the model will be retained if the is superior or equal to 0,7, which means that the model has an acceptable discriminatory power.

−
The pure logistic regression model. Let 1 and 2 be respectively the quantitative and qualitative score of the company ( ), where 1 =̂′ and 1 = 1 1+ 1 and 2 = 2 1+ 2 . The final score of company ( ) is defined by the weighted average of the two scores i1 and i2 and the probability = ( = 1/ 1 , … . , 1 , … 6 ) is defined by : The score function and the probability of the model can be formulated as follows: The weighting is determined according to the discriminatory power of the model. Indeed, the weighting which will be retained is those which maximizes the curve of among all the values 10%, 20%, 30%, … . . ,80% and 90%.

−
The principal components of the logistic regression model. In this case, we will simultaneously treat the quantitative and qualitative variables. As a result, the probability = ( = 1/ 1 , … . , 1 , … 6 ) of the company ( ) will be a logistic probability determined directly as a function of the eigenvectors and the cumulative variation chosen. The formulation of the score function and the probability are determined directly by multivariate analysis of the logistic regression between the default and the principal components.
Determination of the Score Grid, and Calculation of the Probability of Default by Class.
A company ( ) is considered healthy if ≥ 1 2 , so: Consequently, the company is healthy if the score is positive.

The Rating Grid
The classification of healthy companies is based on the score function. Indeed, this classification gives rise to the rating grid composed of 8 rating classes.
Each company ( ) is classified into a rating class, the classes vary between and , and are defined as follows: The rating score of the company ( ) is defined by : = ( ) × 100 (40) where is a score fonction of the company ( ).

Calculation of the Probability of Default per Rating Class
The probability of default of the class ( ) is defined by the probability of default of the company ( ) knowing that the company ( ) belonging to the class . Consequently: = ℎ ℎ (41) To define this probability of default per rating class, we will distribute the sample of healthy and default companies as follows:

Determination of Expected Credit Losses
The expected loss is calculated by the formula (1). It depends on three components which are the , and . The process for calculating the was discussed in the previous paragraphs while the calculation of the and will be presented in the following paragraphs.

Determination of the Loss Given Default ( )
The calculation of the loss given default ( ) consists in constructing regression models, which associated the with relevant factors such as the seniority of the debt, the seniority of the relationship, the date of the transfer to the banking litigation, the activity sector the company, the collection process adopted by the bank, the time required for the completion of litigation procedures in justice, the rate of coverage by the first-ranking guarantees, the economic conditions and the probability at default ( ).
These variables were analyzed and used for the calibration of the model in numerous publications. Indeed, Chalupka and Kopecsni [19] present a case study on modelling of bank loans, in which the principal factors that have been identified are the period of issuance of the loan, the quality of the guarantees, the amount of the loan and the duration of the relationship with the debtor. The relationship between the default rate and the recovery rate was investigated in Altman and al ( [3], [4], [6], [7]). The study of the dependence between default probabilities and recovery rates by Bade and al. [9], has shown some improvement in the . While Gurtler et Hibbeln [27] has studied the influence of the length of the recovery process (training) on the level estimated. Different regression models have been used by researchers to model loss given default ( ) as the Tobit model found in McDonald and al. [33], the Beta regression in Huang and al. [30], the inflated Beta regression in Pereira and al. [36], the censored gamma regression in Sigrist and al. [38] and [39] and a mixture of distributions into the model used, by Altman et al. [5].
In this paper, we will use the estimate provided by the Basel Accord for the loss given default ( ), under the foundation approach, fixed at 45% 7 . Indeed, we will ignore the guarantees associated with the credit to determine the .

Determination Exposure at Default ( )
The exposure at default is defined as the sum of: − The value accounted for in the balance sheet ( 0 ). − The value of the unused funding commitment, accounted for off-balance sheet ( 0 ) multiplied by a credit conversion factor ( ).
The credit conversion factor( ) is defined between 0 et 1 ( ∈ [0,1]) and the mathematical formulation of the , is given by the following relationship: The calculation of the exposure at default ( ) will be done in two different ways. The first, consists in modelling the credit conversion factor ( ) by using regression models that associated the with pertinent factors, whereas the second, consists in modelling directly the exposure at default ( ). The models used to model the credit conversion factor ( ) such as the ordinary least squares ( ), the Tobit model, the fractional response regression and the utilization change model, are detailed in Brown [17], Bellotti and al. [14] and Bijak and al. [15]. While the models used for the direct modelling of the as a zero-adjusted gamma model and others, are detailed in E.N.C. Tong and al. [22].
In this paper, we will use the estimate provided by the Basel Accord for credit conversion factor ( ) at the paragraph level 366. Indeed, under the foundation approach, the is fixed at 75%. The exposure at default ( ) of the line of credit is determined as follows : (43) where the amount of the financing authorization granted by the bank to the customer.

Description of the Database
In this study, we used a database of small and medium enterprises (SMEs) of a Moroccan bank composed of 1447 enterprises. For the definition of a we have based on the definition of the Central Bank which considers that an enterprise is an if it realizes a turnover between 10 and 175 million . In terms of default, the portfolio structure is as follows:

Choice of Quantitative Variables
The univariate analysis of the quantitative variables has permitted to determine the quantitative variables which explain the failure. The choice of quantitative variables, is based on the Wald test and the discriminatory power (power stat), determined from the and the . Indeed, the Wald test and the discriminatory power made have permitted to determine the seven explanatory variables follows: For the variables listed in the table 7, the Wald's test shows that the selected variables are significant because the p-value( ) is inferior to 0,05 with a satisfactory discriminatory power because the is superior to 0,6 and the is superior to 0,2.
The correlation between the selected quantitative variables is presented as follows: The previous matrix shows that the variables are not strongly correlated because the correlation coefficients do not exceed 0,5.

Choice of Qualitative Variables
Similarly, for the qualitative variables, the choice is based on Wald's test and on the discriminatory power (power stat). The following table summarizes the results. For each theme, we found that only one variable is significant except for theme 3 where the variables are not significant. Consequently, the modelling of the themes is equivalent to the modelling of the selected variables.

422
Assessing the Impact of Modelling on the Expected Credit Loss ( ) of a Portfolio of Small and Medium-sized Enterprises As with the quantitative variables, the Wald's test shows that the selected variables are significant because the p-value ( ) is inferior to 0,05. For the discriminatory power, the variables have a satisfying discriminatory power because the is superior to 0,6 and the is superior to 0,2. The correlation between the chosen qualitative variables is presented as follows:  The previous matrix shows that the variables are not strongly correlated because the correlation coefficients do not exceed 0,5.

Calculation of the Expected Loss from the Rating Model based on Pure Logistic Regression.
We have previously presented the process of construction of the rating tool. Indeed, we will determine in this paragraph the rating grid and the probability of default by class by using pure logistic regression.

−
Parameter estimation and testing of significance. The parameters estimation and the tests of significance of the coefficients are presented as follows:  The parameters estimations and the Wald's test.
The maximization of the likelihood function is done by the Newton-Raphson algorithm. The results of the estimation of the parameters and Wald's significance test are presented in the following table: Wald's test confirms the significance of the coefficients β because the p-value ( ) of all variables is inferior to 0,05.
 The testing of the overall significance (the likelihood ratio test) The likelihood ratio test shows that the value of the statistic, previously defined, is equal to 46,883, this value is superior to 14,067 ( 2 of degree 7 at the 0,05 threshold). Consequently, we reject 0 and the variables are totally significant and determine a better model.

−
Hosmer-Lemeshow test and the assessment of the performance of the model.

 The Hosmer-Lemeshow test
The Hosmer-Lemeshow test shows that the model fits with the sample data. Indeed, the statistic is equal to 4,42 with a p-value equal to 0,817, superior to the threshold of 0,05.

−
Parameter estimation and testing of significance. The parameters estimation and the tests of significance of the coefficients are presented as follows:  The parameters estimations and the Wald's test.
The maximization of the likelihood function is done by the Newton-Raphson algorithm. The results of the estimation of the parameters and Wald's test are presented as follows: Wald's test confirms the significance of the coefficients τ because the p-value ( ) of all variables is inferior to 0,05.
 The testing of the overall significance (the likelihood ratio test) 424 Assessing the Impact of Modelling on the Expected Credit Loss ( ) of a Portfolio of Small and Medium-sized Enterprises The likelihood ratio test shows that the value of the statistic, previously defined, is equal to 178,364, this value is superior to 14,067 ( 2 of degree 7 at the 0,05 threshold). Consequently, we reject 0 and the variables are totally significant and determine a better model. − Hosmer-Lemeshow test and the assessment of the performance of the model.

 The Hosmer-Lemeshow test
The Hosmer-Lemeshow test shows that the model fits with the sample data. Indeed, the statistic is equal to 12,265 with a p-value equal to 0,092, superior to the threshold of 0,05.
 The assessment of the performance of the model The curve shows that the model offers an acceptable classification of companies because the is equal to 0,784. Determination of the score function. For each company ( ) the score of company 2 ( ) is written as follows: As a result, the variable can be estimated by the probability: The combination of the quantitative and qualitative score to determine the overall score is based on the maximization of the discriminatory power of the model. Therefore, we will retain the combination 1 + (1 − α) 2 that maximizes the . To achieve this, we have varied and we calculated the . The results are presented below: The value of ( ) equal to 50% is the one that maximizes the . Therefore, we have retained the model that gives the same weight to the quantitative and qualitative score. The score grid and the rating classes defined per score and the distribution of the enterprises of the sample per the rating classes, are as follows: The distribution of the portfolio is presented in the following graph: For this first model, the probability of default is determined according to the approach detailed above. Therefore, the per rating class is presented as follows:  The amount of the expected loss per rating class is determined by the formula (1). As a result, for a loss given default ( ) equal to 45% and a equal to 75%, the total amount of the expected loss ( ) is distributed per rating class as follows : The principal components analysis makes it possible to determine the eigenvalues as well as the cumulative variation. The results are summarized below: The vectors from 1 to 7 offer a cumulative variation superior to 70%, the vectors from 1 to 8 offer a cumulative variation superior to 80% and the vectors from 1 to 10 offer a cumulative variation superior to 90%. Therefore, we will use them to calculate the expected loss corresponding to the cumulative variation of 70%, 80% and 90%.

Determination of Eigenvectors.
The eigenvectors i are determined by the combination of quantitative and qualitative explanatory variables as follows: The objective of the univariate analysis is to verify if the correlation between the failure and the main components is maintained as in the case of the initial variables, or the transformation generates a loss of information. The results of this analysis are as follows: The Table 20 shows that the correlation between and the is lost for some eigenvectors by transforming the initial variables into the principal components. However, in the modelling we will retain all components including those whose significance is not justified.  The discriminatory power of the three previous models determined by the is presented as follows: The three models have a discriminatory power that varies between acceptable and good. As a result, they can be used to classify companies and determine the probability of default.

The Score Function per Level of Cumulative Variation
The score function is defined per level of cumulative variation as follows: The probability of default per rating class for the three models based of the is represented as follows: The expected credit loss per model is determined on the basis of the and the per rating class.

Comparison of Expected Credit Losses per Model.
The comparison of the models, based on their performances, show that the models resulting from the principal components analysis, are more performing in terms of discriminatory power. Indeed: For the expected credit loss, we noticed that it increases in parallel with the increase in cumulative variation to approach the expected credit loss determined by the pure logistic regression.

Conclusions
Credit risk is the most important aspect of the banking risk management. Indeed, banks must continuously monitor the adequacy of their capital with the risks taken because they must cover the unexpected loss and must constitute a provision to cover expected losses.
The expected losses are deducted from the result, this, impact directly the level of core capital. Therefore, their calculation must be meticulous to avoid regulatory penalties on the one hand, and not to generate a lack to ganger for the bank on the other hand.
The rating system impact directly the calculation of expected losses because it provides the different components which are the probability of default ( ), the loss given default ( ) and the exposure at default ( ) per rating class and per counterparty for the calculation of the loss amount. As a result, the expected loss amounts depend directly on the modelling approach used.
In this article, we have constructed four rating models that have an acceptable discriminatory power that allow them to predict counterparty default. Indeed, we constructed the first model by using the pure logistic regression approach and the other three models from the principal components on the basis of the cumulative variation, which must be superior to 70%.
The passage from the initial variables to the principal components to model the failure can generate a loss of information, particularly in terms of correlation between the eigenvectors and the default.
The expected loss amounts calculated by the four models are different. We conclude that those determined from the logistic regression on the principal components, they increase in parallel with the increase in cumulative variation and are lower of the expected loss calculated by the pure logistic regression model. This study shows that there is multitude of powerful models of failure prediction, which offer several structures of probability of default per rating class, enable to calculate of various expected loss amounts. As a result, the amount of expected losses becomes a random variable, depends of the model.
In our study, the uncertainty for the second model reaches the threshold of 5.73%. Probably, it will be greater if one opts for other types of models, such as Bayesian modelling or genetic algorithms or fuzzy numbers. In this context, the research focusing on the modelling of the probability of default by the three previous logics can help to determine the degree of uncertainty.
For this reason, the regulator must standardize the used techniques for the construction of the internal models in order to reduce uncertainty and enable a comparison of the credit risk profiles of banks. Indeed, the regulator must provide a standard approach for calculating the expected loss similar to that provided for the unexpected loss (weighted assets) and requires, as a result, that the expected losses calculated by the internal models must not be below 430 Assessing the Impact of Modelling on the Expected Credit Loss ( ) of a Portfolio of Small and Medium-sized Enterprises a fixed floor in relation to the standard measurement.