×
Samples Blogs Make Payment About Us Reviews 4.9/5 Order Now

Program To Predict Loan Eligibility in Python Language Assignment Solution

July 09, 2024
Dr. Andrew Taylor
Dr. Andrew
🇨🇦 Canada
Python
Dr. Andrew Taylor, a renowned figure in the realm of Computer Science, earned his PhD from McGill University in Montreal, Canada. With 7 years of experience, he has tackled over 500 Python assignments, leveraging his extensive knowledge and skills to deliver outstanding results.
Key Topics
  • Instructions
    • Objective
  • Requirements and Specifications
Tip of the day
Use Python libraries effectively by importing only what you need. For example, if you're working with data, using libraries like pandas and numpy can save time and simplify complex tasks like data manipulation and analysis.
News
In 2024, the Biden-Harris Administration has expanded high-dosage tutoring and extended learning programs to boost academic achievement, helping programming students and others recover from pandemic-related setbacks. These initiatives are funded by federal resources aimed at improving math and literacy skills​

Instructions

Objective

Write a python assignment program to predict loan eligibility.

Requirements and Specifications

Additional Project : Bancassurance

Description

Background and Context

Best insurance company and My Bank have set up a Bancassurance(Bancassurance is a relationship between a bank and an insurance company), now using the data of liability customers of My Bank, The Best insurance company wants to convert customers with both a life insurance policy and an account in My bank to loan customers(taking a loan against a life insurance policy)

A campaign that the company ran last year for liability customers showed a healthy conversion rate of over 12.56% success. You are provided with data of customers who have an account in My bank and life insurance policy in the Best insurance company

You as a data scientist at the Best insurance company have to build a model to identify the positively responding customers who have a higher probability of purchasing the loan. This will increase the success ratio and reduce the cost of the campaign.

Objective

  • To predict whether a liability customer will buy a loan or not.
  • Which variables are most significant for making predictions.
  • Which segment of customers should be targeted more.

Source Code

import pandas as pd import numpy as np import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression from sklearn import preprocessing, tree import seaborn as sns from sklearn.feature_selection import SelectKBest from sklearn.feature_selection import chi2 ### Read Data df = pd.read_csv('My_Bank.csv') df.head(10) print(f"This dataset has {len(df)} rows") ### Show the number of NaN values in each column df.isnull().sum() ### Remove non-useful columns df = df.drop(columns = ['CUST_ID']) ### Convert ACC_OP_DATE to Numeric df['ACC_OP_DATE'] = pd.to_datetime(df['ACC_OP_DATE']).dt.strftime("%m%d%Y").astype(int) df.head(5) ### Categorize object columns object_columns = df.select_dtypes(include=['object']).columns for col in object_columns: values = df[col].unique() values_dict = {x[0]: x[1] for x in zip(values, range(len(values)))} df[col] = df[col].map(values_dict) ### Normalize data df_norm = (df-df.min())/(df.max()-df.min()) df_norm.head() ### Extract target column Y = df_norm['TARGET'] X = df_norm.drop(columns=['TARGET']) X.head() print(f"There are {len(X.columns)} variables and {len(X)} records") ### Display correlation map to see the relation between variables f = plt.figure(figsize = (10,10)) plt.matshow(df_norm.corr(), fignum = f.number) plt.colorbar() plt.xticks(range(len(df_norm.columns)), df_norm.columns, rotation=90); plt.yticks(range(len(df_norm.columns)), df_norm.columns); plt.show() ### Split data into train and test X_train, X_test, Y_train, Y_test = train_test_split( ... X, Y, test_size=0.3, random_state=42) ### Build LogisticRegression Model model = LogisticRegression() model.fit(X_train, Y_train) ### Score model.score(X_test, Y_test) ### Create a plot of model's accuracy vs. K best features scores = [] for k in range(1, len(X.columns)): X_new = SelectKBest(chi2, k = k).fit_transform(X, Y) X_train2, X_test2, Y_train2, Y_test2 = train_test_split(X_new, Y, test_size=0.3, random_state=42) model = LogisticRegression() model.fit(X_train2, Y_train2) score = model.score(X_test2, Y_test2) scores.append(score) plt.plot(range(1, len(X.columns)), scores) plt.grid(True) plt.xlabel('Number of Features') plt.ylabel("Model's Accuracy") ### Pick optimal number of features kopt = range(1, len(X.columns))[np.argmax(scores)] print(f"The optimal number of features is {kopt}, giving a model accuracy of {max(scores)*100.0}%") Xopt_lr = SelectKBest(chi2, k = kopt).fit_transform(X, Y) # Build a new model but only with best features ### Select best features X_new = SelectKBest(chi2, k=kopt).fit_transform(X, Y) ### Split into Train and Test with new X values X_train2, X_test2, Y_train2, Y_test2 = train_test_split(X, Y, test_size=0.3, random_state=42) ### Build Model model2 = LogisticRegression() model2.fit(X_train2, Y_train2) model2.score(X_test2, Y_test2) # Decision Tree treeClf = tree.DecisionTreeClassifier() treeClf.fit(X_train, Y_train) treeClf.score(X_test, Y_test) ### Select K best features and run again the decision tree scoresTree = [] for k in range(1, len(X.columns)): X_new = SelectKBest(chi2, k = k).fit_transform(X, Y) X_train3, X_test3, Y_train3, Y_test3 = train_test_split(X_new, Y, test_size=0.3, random_state=42) treeClf = tree.DecisionTreeClassifier() treeClf.fit(X_train3, Y_train3) score = treeClf.score(X_test3, Y_test3) scoresTree.append(score) plt.plot(range(1, len(X.columns)), scores) plt.grid(True) plt.xlabel('Number of Features') plt.ylabel("Model's Accuracy") koptTree = range(1, len(X.columns))[np.argmax(scoresTree)] print(f"The optimal number of features for Decision Tree is {koptTree}, giving a model accuracy of {max(scoresTree)*100.0}%") Xopt_tree = SelectKBest(chi2, k = koptTree).fit_transform(X, Y) ### Plot Scores of both LogisticRegression and DecisionTree vs. Number of features plt.plot(range(1, len(X.columns)), scores, label = 'LogisticRegression') plt.plot(range(1, len(X.columns)), scoresTree, label = 'DecisionTree') plt.legend() plt.grid(True) plt.xlabel('Number of Features') plt.ylabel("Model's Accuracy") plt.show()

Similar Samples

Explore our comprehensive programming assignment samples at ProgrammingHomeworkHelp.com. From Java and Python to C++ and SQL, each sample exemplifies our commitment to delivering high-quality solutions. Whether you need assistance with algorithms, databases, or web development, our samples showcase our expertise in tackling diverse programming challenges effectively. Dive into our examples to see how we can help you excel in your programming assignments.