Instructions
Objective
Write a python assignment program to predict loan eligibility.
Requirements and Specifications
Additional Project : Bancassurance
Description
Background and Context
Best insurance company and My Bank have set up a Bancassurance(Bancassurance is a relationship between a bank and an insurance company), now using the data of liability customers of My Bank, The Best insurance company wants to convert customers with both a life insurance policy and an account in My bank to loan customers(taking a loan against a life insurance policy)
A campaign that the company ran last year for liability customers showed a healthy conversion rate of over 12.56% success. You are provided with data of customers who have an account in My bank and life insurance policy in the Best insurance company
You as a data scientist at the Best insurance company have to build a model to identify the positively responding customers who have a higher probability of purchasing the loan. This will increase the success ratio and reduce the cost of the campaign.
Objective
- To predict whether a liability customer will buy a loan or not.
- Which variables are most significant for making predictions.
- Which segment of customers should be targeted more.
Source Code
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn import preprocessing, tree
import seaborn as sns
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2
### Read Data
df = pd.read_csv('My_Bank.csv')
df.head(10)
print(f"This dataset has {len(df)} rows")
### Show the number of NaN values in each column
df.isnull().sum()
### Remove non-useful columns
df = df.drop(columns = ['CUST_ID'])
### Convert ACC_OP_DATE to Numeric
df['ACC_OP_DATE'] = pd.to_datetime(df['ACC_OP_DATE']).dt.strftime("%m%d%Y").astype(int)
df.head(5)
### Categorize object columns
object_columns = df.select_dtypes(include=['object']).columns
for col in object_columns:
values = df[col].unique()
values_dict = {x[0]: x[1] for x in zip(values, range(len(values)))}
df[col] = df[col].map(values_dict)
### Normalize data
df_norm = (df-df.min())/(df.max()-df.min())
df_norm.head()
### Extract target column
Y = df_norm['TARGET']
X = df_norm.drop(columns=['TARGET'])
X.head()
print(f"There are {len(X.columns)} variables and {len(X)} records")
### Display correlation map to see the relation between variables
f = plt.figure(figsize = (10,10))
plt.matshow(df_norm.corr(), fignum = f.number)
plt.colorbar()
plt.xticks(range(len(df_norm.columns)), df_norm.columns, rotation=90);
plt.yticks(range(len(df_norm.columns)), df_norm.columns);
plt.show()
### Split data into train and test
X_train, X_test, Y_train, Y_test = train_test_split(
... X, Y, test_size=0.3, random_state=42)
### Build LogisticRegression Model
model = LogisticRegression()
model.fit(X_train, Y_train)
### Score
model.score(X_test, Y_test)
### Create a plot of model's accuracy vs. K best features
scores = []
for k in range(1, len(X.columns)):
X_new = SelectKBest(chi2, k = k).fit_transform(X, Y)
X_train2, X_test2, Y_train2, Y_test2 = train_test_split(X_new, Y, test_size=0.3, random_state=42)
model = LogisticRegression()
model.fit(X_train2, Y_train2)
score = model.score(X_test2, Y_test2)
scores.append(score)
plt.plot(range(1, len(X.columns)), scores)
plt.grid(True)
plt.xlabel('Number of Features')
plt.ylabel("Model's Accuracy")
### Pick optimal number of features
kopt = range(1, len(X.columns))[np.argmax(scores)]
print(f"The optimal number of features is {kopt}, giving a model accuracy of {max(scores)*100.0}%")
Xopt_lr = SelectKBest(chi2, k = kopt).fit_transform(X, Y)
# Build a new model but only with best features
### Select best features
X_new = SelectKBest(chi2, k=kopt).fit_transform(X, Y)
### Split into Train and Test with new X values
X_train2, X_test2, Y_train2, Y_test2 = train_test_split(X, Y, test_size=0.3, random_state=42)
### Build Model
model2 = LogisticRegression()
model2.fit(X_train2, Y_train2)
model2.score(X_test2, Y_test2)
# Decision Tree
treeClf = tree.DecisionTreeClassifier()
treeClf.fit(X_train, Y_train)
treeClf.score(X_test, Y_test)
### Select K best features and run again the decision tree
scoresTree = []
for k in range(1, len(X.columns)):
X_new = SelectKBest(chi2, k = k).fit_transform(X, Y)
X_train3, X_test3, Y_train3, Y_test3 = train_test_split(X_new, Y, test_size=0.3, random_state=42)
treeClf = tree.DecisionTreeClassifier()
treeClf.fit(X_train3, Y_train3)
score = treeClf.score(X_test3, Y_test3)
scoresTree.append(score)
plt.plot(range(1, len(X.columns)), scores)
plt.grid(True)
plt.xlabel('Number of Features')
plt.ylabel("Model's Accuracy")
koptTree = range(1, len(X.columns))[np.argmax(scoresTree)]
print(f"The optimal number of features for Decision Tree is {koptTree}, giving a model accuracy of {max(scoresTree)*100.0}%")
Xopt_tree = SelectKBest(chi2, k = koptTree).fit_transform(X, Y)
### Plot Scores of both LogisticRegression and DecisionTree vs. Number of features
plt.plot(range(1, len(X.columns)), scores, label = 'LogisticRegression')
plt.plot(range(1, len(X.columns)), scoresTree, label = 'DecisionTree')
plt.legend()
plt.grid(True)
plt.xlabel('Number of Features')
plt.ylabel("Model's Accuracy")
plt.show()
Similar Samples
Explore our comprehensive programming assignment samples at ProgrammingHomeworkHelp.com. From Java and Python to C++ and SQL, each sample exemplifies our commitment to delivering high-quality solutions. Whether you need assistance with algorithms, databases, or web development, our samples showcase our expertise in tackling diverse programming challenges effectively. Dive into our examples to see how we can help you excel in your programming assignments.
Python
Python
Python
Python
Python
Python
Python
Python
Python
Python
Python
Python
Python
Python
Python
Python
Python
Python
Python
Python