Credit Risk Modelling is the process of estimating the probability someone will pay back a loan - super exciting mathematical problem!

There are 5 fundamental ML systems for credit scoring:

Transformers; new kid on the block, rarely seen in production models due to un explainability but seen in a lot of pipelines. emerging architecture for sequential borrower behaviour modelling (e.g. payment sequences or app activity logs
Neural Networks; used for large, complex, and non-tabular feature spaces (e.g. transaction sequences, behavioural embeddings)
Decision Tree Ensembles; Random Forests, Gradient Boosted Trees (XGBoost, LightGBM, CatBoost), which dominate modern tabular credit datasets
Logistic Regression; the workhorse of credit scoring, interpretable and regulatory friendly
K-Nearest Neighbours; historically educational and explorative, rarely used in production due to scalability and explainability issues.
Bayesian / Probabilistic Models for hierarchal data

Credit Fundamentals

Formally, credit risk modelling is the use of data, past repayment behaviour, demographics, income, device usage, transaction patterns, and more to estimate the probability that a borrower will default. These probabilities become the foundation of lending decisions: who gets approved, at what limit, and at what price.

Lending money isn’t just about goodwill it’s an investment. When lenders charge interest, they’re being compensated for three things:

The time value of money (money today is worth more than money tomorrow),
The cost of funding, and
The risk of default.

If you lend $10 at a 20% annual rate, you expect $12 in return. But that expectation assumes full repayment. If some borrowers don’t repay, your realised return drops sharply.

Let’s say you lend to 10,000 people and estimate that 15% will default. You’ll lose the $10 you lent to those 1,500 people, a total of $15,000. To break even, the interest from the remaining 8,500 borrowers must cover that loss roughly an additional $1.76 each, or 17.6 percentage points more in interest.

<aside> 🧮

Let’s formalise that calculation

Number of borrowers, N = 10,000
Loan per borrower, L = $10
Default rate, p = 0.15
So non-default rate (1-p) = 0.85

You lose L X P X N = 10 x 0.15 x 10,000 = $15,000.

That loss must be recovered from the non-defaulters, who are N(1-p) = 8,500 borrowers.

Each of those must contribute an extra charge x to cover the total loss:

x = (L x P) / (1 - P)

</aside>

So, if you initially charged 20%, you’d need to raise it to around 38% just to offset the expected losses. That’s unattractive for borrowers and risky for you. This is why interest calibration is such a delicate balancing act.