Building Machine Learning Products Using Lending Club Case
I.?????????????BACKGROUND AND PROBLEM STATEMENT
A.??Introduction of Lending Club/Loan Data Dummy Bank
Lending Club is a prominent peer-to-peer lending company that operates an online marketplace, connecting borrowers with investors. The company faces a critical business problem related to accurately assessing creditworthiness and managing lending risk. To address this problem, Lending Club provides a loan dataset containing historical information about loans issued on their platform. This dataset serves as a valuable resource for analyzing loan performance, identifying trends, and building predictive models.
Business Model: Lending Club's business model revolves around facilitating loans between borrowers and investors while earning revenue through fees. The key steps in the business model include:
1.????Borrower Application: Individuals apply for loans through Lending Club's online platform, providing personal and financial information, including employment history, income, credit score, and loan purpose.
2.????Credit Assessment: Lending Club evaluates the creditworthiness of borrowers using proprietary algorithms and risk models. They assess borrower information, credit scores, and historical data to determine the likelihood of repayment and assign an appropriate interest rate to approved loans.
3.????Investor Participation: Approved loans are made available to investors on the platform. Investors review loan details and decide whether to fund the loans partially or in their entirety, diversifying their investment portfolio.
4.????Loan Disbursement: Once a loan is fully funded, Lending Club disburses the funds to the borrower, who is responsible for repaying the loan amount along with the agreed-upon interest rate.
5.????Loan Servicing: Lending Club manages the repayment process, collecting monthly payments from borrowers and distributing funds to investors. They provide borrower support and handle delinquencies or defaults.
6.????Risk Management: Lending Club continuously monitors loan performance, assessing the risk of default, and employing strategies to mitigate risk, such as collections efforts and external partnerships.
7.????Investor Returns: Investors receive principal and interest payments as borrowers repay their loans, based on the interest rates assigned to the loans they funded.
8.????Platform Fees: Lending Club charges fees to borrowers, including an origination fee, and charges servicing fees to investors based on loan repayments.
?B.???Business Problem and Dataset
Lending Club faces a critical business problem related to accurately assessing creditworthiness and managing lending risk. This problem can be broken down into two key challenges:
1.??Credit Risk Assessment: Accurately evaluating the creditworthiness of potential borrowers is crucial to minimize the risk of default. By analyzing the loan dataset, Lending Club aims to identify patterns and factors contributing to loan defaults or delinquencies. This allows them to make more precise assessments of credit risk, assign appropriate interest rates, and enhance the overall risk management process.
2.??Default Prediction: Predicting the likelihood of loan defaults is essential for proactive risk management. Lending Club intends to leverage the historical loan data provided in the dataset to develop predictive models. These models assess the probability of default based on borrower attributes and loan characteristics, enabling the company to prioritize risk management efforts and take proactive measures to reduce default rates.
?
The loan dataset borrowed from Lending Club contains comprehensive information about past loans issued on their platform. It includes borrower attributes (e.g., employment length, income, credit score), loan details (e.g., amount, interest rate, purpose), and loan status (e.g., fully paid, charged off, current).
By analyzing this dataset, Lending Club can gain valuable insights and address the business problem. They can identify relationships between borrower attributes and loan defaults, evaluate the impact of loan amounts or interest rates on repayment behavior, and explore the influence of different borrower characteristics on loan outcomes.
Furthermore, Lending Club can develop risk models and predictive algorithms based on the dataset. These models can assess creditworthiness more effectively, optimize lending practices, and improve decision-making. By utilizing the dataset to build accurate risk models, Lending Club can enhance profitability, minimize default rates, and provide a more reliable lending platform for borrowers and investors.
Understanding the factors that influence loan performance allows Lending Club to optimize their lending practices. By analyzing the dataset, Lending Club can identify correlations between borrower attributes, loan terms, and repayment behavior. This insight enables them to fine-tune their underwriting criteria, improve decision-making processes, and enhance the profitability and sustainability of their lending operations.
Additionally, Lending Club can leverage the dataset to assess the impact of various factors on loan defaults and delinquencies. This analysis helps them identify potential risk indicators and develop strategies to mitigate those risks effectively.
In conclusion, Lending Club faces a critical business problem related to accurately assessing creditworthiness and managing lending risk. By leveraging the loan dataset, Lending Club can address this problem by improving credit risk assessment, predicting loan defaults, optimizing lending practices, and making data-driven lending decisions.
Through comprehensive analysis of the dataset, Lending Club can identify patterns and factors contributing to loan defaults, build predictive models, and refine their underwriting criteria. This leads to better decision-making, reduced default rates, and enhanced profitability for the company. Ultimately, the loan dataset provides invaluable insights that help Lending Club create a more efficient, reliable, and sustainable lending platform for borrowers and investors.
?II. REQUIREMENTS AND LIMITATION ANALYSIS
A.??Stakeholders:
1.????Borrowers: Borrowers play a central role in the Lending Club ecosystem as they seek personal loans to meet their financial needs. Their objectives include obtaining loans at competitive interest rates, accessing quick and convenient loan processing, and receiving clear and transparent terms and conditions. Borrowers aim to secure funds that enable them to consolidate debt, finance home improvements, support small businesses, or fulfill other financial goals.
2.????Investors: Investors form an integral part of the Lending Club platform by funding loans as an investment opportunity. Their objectives revolve around earning competitive returns on their investment, diversifying their investment portfolio, and managing risk. Investors seek attractive interest rates on loans while balancing risk exposure by carefully selecting loans based on borrower profiles, creditworthiness, and loan characteristics.
3.????Lending Club: As the operator of the platform, Lending Club serves as an intermediary, connecting borrowers and investors. Lending Club has several objectives to fulfill:
·???????????????Facilitate secure and efficient lending processes to ensure a positive user experience for both borrowers and investors.
·???????????????Assess the creditworthiness of borrowers accurately to minimize default risk and protect the interests of investors.
·???????????????Optimize lending practices and decision-making through data analysis and risk modeling to improve loan performance.
·???????????????Maximize revenue through borrower fees (such as origination fees) and servicing fees charged to investors.
·???????????????Provide a trustworthy and transparent platform that safeguards the interests of all stakeholders involved.
B.???Objectives of Stakeholders:
§?Borrowers: Obtain affordable loans with transparent terms and conditions, accessing funds to meet their financial needs.
§?Investors: Earn competitive returns on their investments, diversify their investment portfolio, and manage risk effectively.
§?Lending Club: Facilitate secure and efficient lending processes, minimize default risk, optimize lending practices and decision-making, and maximize revenue generation.
C.???Limitations: Data and Inventory
1.????Data Limitations:
§??Incomplete or missing data points can significantly impact the accuracy of credit risk assessments and predictive models. The absence of essential variables may result in biased decision-making or incomplete risk evaluations.
§??Historical loan data may not capture evolving market conditions or economic fluctuations, limiting the ability to predict future loan performance accurately. New trends or unforeseen economic events might affect loan defaults differently.
§??Data bias or inconsistencies in the dataset can lead to biased models, resulting in flawed risk assessments. It is essential to address and rectify any data biases to ensure fair and accurate decision-making.
D.????Inventory Limitations:
§?Insufficient loan inventory can limit the availability of investment opportunities for investors, potentially affecting their ability to diversify their portfolios. A limited pool of loans may restrict investors' choices, leading to concentrated risk exposure.
§?Lack of loan diversity in terms of loan purpose or borrower profiles can limit the accuracy and reliability of credit risk assessments and predictive models. A homogenous loan portfolio may not provide a representative sample for accurate risk evaluation across various loan types or borrower segments.
E.?????Limitations: Human Resources and Timeframe
1.????Human Resource Limitations:
·????Insufficient expertise or resources in data analysis and risk modeling can limit the effectiveness of credit risk assessment and default prediction. Adequate training and skill development are crucial to ensure accurate modeling and analysis.
·????Inadequate customer service support or borrower assistance can impact borrower satisfaction and retention. A lack of prompt and effective customer service may result in dissatisfaction and hinder borrower loyalty and trust.
F.?????Timeframe Limitations:
·????Limited historical data or a short timeframe for analysis can restrict the ability to identify long-term trends accurately or predict loan defaults effectively. A shorter historical period may not capture cyclical patterns or economic cycles that affect loan performance.
·????Limited time for risk assessment and loan processing may lead to rushed decisions or inadequate evaluations. Adequate time must be allocated to ensure thorough risk assessment and robust loan underwriting processes.
领英推荐
?
Addressing the objectives and limitations of stakeholders is essential for the success and sustainability of Lending Club. By ensuring transparent and efficient lending processes, optimizing lending practices, and leveraging data effectively, Lending Club can meet the needs of borrowers and investors while minimizing risks. Overcoming limitations related to data quality, loan inventory, human resources, and timeframe is crucial to enhancing decision-making, improving credit risk assessment accuracy, and creating a reliable lending platform that benefits all stakeholders involved. By continuously improving these aspects, Lending Club can enhance the trust and confidence of borrowers and investors, strengthening its position as a leading peer-to-peer lending platform.
?3. Analyzing of Solution
?
???????????3.1 Potential Solution
This section is divided into two categories, which are (a) Non-ML Solution and (b) ML Solution.
In recent years, credit scoring is the prominent system to address the solution of finding the bad and good loan category. Many companies in various industries, namely finance and banking, use credit scores to make decisions on whether to offer consumer products like credit cards or loans. The credit score itself varies based on how the scoring system and what institution conducts the calculation. For example in the US, it’s provided by the three nationwide CRAs, Equifax?, TransUnion? and Experian? and the score is a three-digit number, typically between 300 and 850. In Indonesian it’s conducted by OJK and Bank of Indonesia. There several background histories that are typically calculated or taken into account to calculate the credit score, some of them as follow :
●?????Bill-paying history
●?????Current unpaid debt
●?????Number and type of loan accounts you have
●?????How long you have had your loan accounts open
●?????Available credit balance
●?????New applications for credit
●?????Debt sent to collection , a foreclosure , or a bankruptcy
To build a non-ML solution for our problem we can use the credit score system above. We judge based the background and history mentioned in the criteria.
There are many different types of classification algorithms for modeling classification predictive modeling problems.There is no good theory on how to map algorithms onto problem types; instead, it is generally recommended that a practitioner use controlled experiments and discover which algorithm and algorithm configuration results in the best performance for a given classification task.
2. Solution Process
3.3 Pros & Cons of the Candidate Solutions
So far, we have two candidate solutions : (1) Credit Scoring and (2) Classification with Machine Learning. In the?next section the pros and cons of each solution will be explained.
a.????Pros of Credit Scoring
- Easy to Interpret : Since credit scoring is a system that is manually done by humans, we could easily interpret what features assumedly affect the loan performance.
b.????Cons of Credit Scoring
- Hardly identifies trends & patterns : it requires a lot of work to identify the pattern of the data since it’s done manually. We need to compare one data history with a thousand of other data which’s exhausted manually.
?
c.????Pros of Classification with Machine Learning
- Automation : The adoption of machine learning can cut down time and human workload. Automation is more reliable, efficient, and quick.
- Easily identifies trends & patterns : Machine Learning can review large volumes of data and discover specific trends and patterns that would not be apparent to humans.
- Handling multi-dimensional & multi-variety data : Machine Learning algorithms are good at handling data that are multi-dimensional and multi-variety, and they can do this in dynamic or uncertain environments.
?
d.????Cons of Classification with Machine Learning
- Data Acquisition : The whole concept of machine learning is about identifying useful data. The outcome will be incorrect if a credible data source is not provided. The quality of the data is also significant. If the user or institution needs more quality data, wait for it. It will cause delays in providing the output. So, machine learning significantly depends on the data and its quality.
- Time and Resources : Eventhough, it can cut time once the model has been built, but building end to end machine learning from scratch requires a great deal of time since it’s a complex process. Trial runs are held to check the accuracy and reliability of the machine. It requires massive and expensive resources and high-quality expertise to set up that quality of infrastructure. Trials runs are costly as they would cost in terms of time and expenses.
- Research and Innovations : Machine learning is an evolving concept. This area has not seen any major developments yet that fully revolutionized any economic sector. The area requires continuous research and innovation. The machine learning can be obsolete should we not maintain the model with respect to the trend and the factors that affect the model itself.
?
4. How the Solution Works
a. Required Data
The dataset is taken from The Irish Dummy banks which borrowed the dataset from Lending Club, which is a P2P lending company.?There are 709903 data points with 28 columns (most of categorical features are explanatory of the numerical categorical)
b. Algorithm Used
The project will use some algorithms like Logistic Regression, Random Forest, etc. Several algorithms will be trained on the dataset and we will conduct experiments on it until the best performance model is found. The best model will be implemented as our model.
c. Metrics
The metrics used for this project are ROC-AUC and F1 score.
d. Maintenance of the Potential Solution
This project will be maintained by the IT Team and monitored by the Product Team as well. To prevent obsolation of the model and to keep the model up to date with the trend, the model will be re-trained with newest data.
e. Impact of the Potential Solution to the Business
The positive impact to our business should the solution work is we can reach or even pass our target revenue and profit by maximizing the probability to give a loan to a potential debtor, as well as minimize the loss by eliminating the individual that might have the possibility to be a bad debtor.
?
?
?
?
?