Customer Churn Prediction

Customer churn is the number of customer leaving from using the company service. It is the critical thing to the company to known about it since it directly impacts in the business growth. There is said that it is really hard to bring back churned customer than the new customer. Thus, it is better to know in beforehand the caused reasons for customer churn. The customer churn can happen due to reasons like 1) either not happy with the company services(switch to other company) or 2) customer own circumstances(eg. relocation, death, etc)

In this project, the tele-communicaton dataset is used to figure out the reason of customer churn in the company. Basically, the following questions are answered in this work.

  1. What are the reasons of cusotmer churn?
  2. Which cutomers(who) are most likely to churn?
  3. Suggest the customer retention strategy
  4. Build and Compare different Machine Learning Algorithms and find the best one
In [12]:
# import necessary libraries
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import graphviz
import pydotplus
import plotly.figure_factory as ff #for table visualization
from plotly.offline import init_notebook_mode, iplot
from sklearn import tree
from scipy.stats import zscore
from imblearn.over_sampling import SMOTE # for handeling unbalanced data
from sklearn.externals import joblib
from yellowbrick.classifier import ConfusionMatrix
from sklearn.tree import DecisionTreeClassifier
from xgboost.sklearn import XGBClassifier
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.linear_model import LogisticRegression
from lightgbm import LGBMClassifier
from sklearn.model_selection import train_test_split, GridSearchCV, StratifiedShuffleSplit
from sklearn.metrics import accuracy_score,confusion_matrix, classification_report, roc_auc_score, roc_curve, recall_score, precision_score, f1_score
In [13]:
'''=== function to load dataset ===='''
def load_daset(dataset):
    df =pd.read_csv(dataset)
    return df
In [14]:
# call load_daset()
df = load_daset('Telco-Customer-Churn.csv')
df.head()
Out[14]:
customerID gender SeniorCitizen Partner Dependents tenure PhoneService MultipleLines InternetService OnlineSecurity ... DeviceProtection TechSupport StreamingTV StreamingMovies Contract PaperlessBilling PaymentMethod MonthlyCharges TotalCharges Churn
0 7590-VHVEG Female 0 Yes No 1 No No phone service DSL No ... No No No No Month-to-month Yes Electronic check 29.85 29.85 No
1 5575-GNVDE Male 0 No No 34 Yes No DSL Yes ... Yes No No No One year No Mailed check 56.95 1889.5 No
2 3668-QPYBK Male 0 No No 2 Yes No DSL Yes ... No No No No Month-to-month Yes Mailed check 53.85 108.15 Yes
3 7795-CFOCW Male 0 No No 45 No No phone service DSL Yes ... Yes Yes No No One year No Bank transfer (automatic) 42.30 1840.75 No
4 9237-HQITU Female 0 No No 2 Yes No Fiber optic No ... No No No No Month-to-month Yes Electronic check 70.70 151.65 Yes

5 rows × 21 columns

In [15]:
#drop customerID column and show last 10 rows
df_copy = df.copy()#copy datafram
data = df_copy.drop('customerID', axis=1) 
data.tail()
    
Out[15]:
gender SeniorCitizen Partner Dependents tenure PhoneService MultipleLines InternetService OnlineSecurity OnlineBackup DeviceProtection TechSupport StreamingTV StreamingMovies Contract PaperlessBilling PaymentMethod MonthlyCharges TotalCharges Churn
7038 Male 0 Yes Yes 24 Yes Yes DSL Yes No Yes Yes Yes Yes One year Yes Mailed check 84.80 1990.5 No
7039 Female 0 Yes Yes 72 Yes Yes Fiber optic No Yes Yes No Yes Yes One year Yes Credit card (automatic) 103.20 7362.9 No
7040 Female 0 Yes Yes 11 No No phone service DSL Yes No No No No No Month-to-month Yes Electronic check 29.60 346.45 No
7041 Male 1 Yes No 4 Yes Yes Fiber optic No No No No No No Month-to-month Yes Mailed check 74.40 306.6 Yes
7042 Male 0 No No 66 Yes No Fiber optic Yes No Yes Yes Yes Yes Two year Yes Bank transfer (automatic) 105.65 6844.5 No
In [16]:
print ("Rows     : " ,data.shape[0])
print ("Columns  : " ,data.shape[1])
print ("\nFeatures : \n" ,data.columns.tolist())
Rows     :  7043
Columns  :  20

Features : 
 ['gender', 'SeniorCitizen', 'Partner', 'Dependents', 'tenure', 'PhoneService', 'MultipleLines', 'InternetService', 'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport', 'StreamingTV', 'StreamingMovies', 'Contract', 'PaperlessBilling', 'PaymentMethod', 'MonthlyCharges', 'TotalCharges', 'Churn']

Note:

In this dataset, there are 7043 rows(obseravations) and 19 columns(features) and last columns as predictive variable

In [17]:
'''===  Show descriptive statistics for numerical and categorical === '''

#numerical only
print('Descriptive statistics for numerical\n')
print(data.describe())


#categorical only
data.describe(include='O') 
Descriptive statistics for numerical

       SeniorCitizen       tenure  MonthlyCharges
count    7043.000000  7043.000000     7043.000000
mean        0.162147    32.371149       64.761692
std         0.368612    24.559481       30.090047
min         0.000000     0.000000       18.250000
25%         0.000000     9.000000       35.500000
50%         0.000000    29.000000       70.350000
75%         0.000000    55.000000       89.850000
max         1.000000    72.000000      118.750000
Out[17]:
gender Partner Dependents PhoneService MultipleLines InternetService OnlineSecurity OnlineBackup DeviceProtection TechSupport StreamingTV StreamingMovies Contract PaperlessBilling PaymentMethod TotalCharges Churn
count 7043 7043 7043 7043 7043 7043 7043 7043 7043 7043 7043 7043 7043 7043 7043 7043 7043
unique 2 2 2 2 3 3 3 3 3 3 3 3 3 2 4 6531 2
top Male No No Yes No Fiber optic No No No No No No Month-to-month Yes Electronic check 20.2 No
freq 3555 3641 4933 6361 3390 3096 3498 3088 3095 3473 2810 2785 3875 4171 2365 11 5174
In [18]:
# check datatype of each columns
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 20 columns):
gender              7043 non-null object
SeniorCitizen       7043 non-null int64
Partner             7043 non-null object
Dependents          7043 non-null object
tenure              7043 non-null int64
PhoneService        7043 non-null object
MultipleLines       7043 non-null object
InternetService     7043 non-null object
OnlineSecurity      7043 non-null object
OnlineBackup        7043 non-null object
DeviceProtection    7043 non-null object
TechSupport         7043 non-null object
StreamingTV         7043 non-null object
StreamingMovies     7043 non-null object
Contract            7043 non-null object
PaperlessBilling    7043 non-null object
PaymentMethod       7043 non-null object
MonthlyCharges      7043 non-null float64
TotalCharges        7043 non-null object
Churn               7043 non-null object
dtypes: float64(1), int64(2), object(17)
memory usage: 1.1+ MB

Note:

The above output shows that the majority of variables are of categorical datatype (object). This categorical datatype should be changed into numerical one before feeding to machine learning algorithms.

And, It was also noticed that the column 'TotalCharges' is appeared as non- float data type though it should be float type. So, lets convert all the values in that columns as float data type. The non numeric values changed into NaN value after following command.

In [19]:
# convert columns 'MonthlyCharges' as float data type
data['TotalCharges'] = pd.to_numeric(data['TotalCharges'], errors='coerce')
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 20 columns):
gender              7043 non-null object
SeniorCitizen       7043 non-null int64
Partner             7043 non-null object
Dependents          7043 non-null object
tenure              7043 non-null int64
PhoneService        7043 non-null object
MultipleLines       7043 non-null object
InternetService     7043 non-null object
OnlineSecurity      7043 non-null object
OnlineBackup        7043 non-null object
DeviceProtection    7043 non-null object
TechSupport         7043 non-null object
StreamingTV         7043 non-null object
StreamingMovies     7043 non-null object
Contract            7043 non-null object
PaperlessBilling    7043 non-null object
PaymentMethod       7043 non-null object
MonthlyCharges      7043 non-null float64
TotalCharges        7032 non-null float64
Churn               7043 non-null object
dtypes: float64(2), int64(2), object(16)
memory usage: 1.1+ MB

Note:

Now 'TotalCharges' is changed to float data type. And, the total counts is only 7032 out of 7043 observations. It seems that there is NaN values. Lets check out and remove them.

In [20]:
'''== check Null  values in all columns ===='''
# show the rows with NaN values 
print('No. of rows with NaN: {}'.format(data[data.isnull().any(axis=1)].shape[0]))
No. of rows with NaN: 11
In [21]:
#remove all Null values 
data = data.dropna()
data.shape
Out[21]:
(7032, 20)
In [22]:
#drop duplicate except first occurance 
#data[data.duplicated()]
data = data.drop_duplicates( keep='first')
data.shape
Out[22]:
(7010, 20)
In [23]:
# check the catergory values in each category column
caterogy_column = data.select_dtypes(include='object')
for x in caterogy_column.columns:
    print(x,':', data[x].unique())
    
    
gender : ['Female' 'Male']
Partner : ['Yes' 'No']
Dependents : ['No' 'Yes']
PhoneService : ['No' 'Yes']
MultipleLines : ['No phone service' 'No' 'Yes']
InternetService : ['DSL' 'Fiber optic' 'No']
OnlineSecurity : ['No' 'Yes' 'No internet service']
OnlineBackup : ['Yes' 'No' 'No internet service']
DeviceProtection : ['No' 'Yes' 'No internet service']
TechSupport : ['No' 'Yes' 'No internet service']
StreamingTV : ['No' 'Yes' 'No internet service']
StreamingMovies : ['No' 'Yes' 'No internet service']
Contract : ['Month-to-month' 'One year' 'Two year']
PaperlessBilling : ['Yes' 'No']
PaymentMethod : ['Electronic check' 'Mailed check' 'Bank transfer (automatic)'
 'Credit card (automatic)']
Churn : ['No' 'Yes']

Note:

In the above ouput, there is two different words in a column but refereing to the same meaning, for example, 'No phone service' and 'No', 'No' and 'No internet service'. Lets make them to same one.

In [24]:
'''change into same word- NO'''

binary_for = { 'No phone service':'No', 'No internet service':'No'}

# function to convert catergory values into numeric
def convert_value_binary(col_names: list):
     for col in col_names:
            data[col] = data[col].replace(binary_for)


 
In [25]:
#columns to change its category values
columns = ['MultipleLines', 'OnlineSecurity', 
        'OnlineBackup', 'DeviceProtection','TechSupport', 'StreamingTV', 'StreamingMovies']

#call function
convert_value_binary(columns)
data.head(5)
Out[25]:
gender SeniorCitizen Partner Dependents tenure PhoneService MultipleLines InternetService OnlineSecurity OnlineBackup DeviceProtection TechSupport StreamingTV StreamingMovies Contract PaperlessBilling PaymentMethod MonthlyCharges TotalCharges Churn
0 Female 0 Yes No 1 No No DSL No Yes No No No No Month-to-month Yes Electronic check 29.85 29.85 No
1 Male 0 No No 34 Yes No DSL Yes No Yes No No No One year No Mailed check 56.95 1889.50 No
2 Male 0 No No 2 Yes No DSL Yes Yes No No No No Month-to-month Yes Mailed check 53.85 108.15 Yes
3 Male 0 No No 45 No No DSL Yes No Yes Yes No No One year No Bank transfer (automatic) 42.30 1840.75 No
4 Female 0 No No 2 Yes No Fiber optic No No No No No No Month-to-month Yes Electronic check 70.70 151.65 Yes

2. Exploratory Data Analysis

In this section, each variables are visulaized and analyised with regard to predicitive class(churn or not churn). The answers for the questions no. 1, 2 and 3 are tried to figure out in this part.

In [27]:
'''==== check class/label distribution in dataset  ===='''


class_count = data['Churn'].value_counts()
print('Class count: \n',class_count) # each class with total no. sample of data
y_pos = [class_count[0], class_count[1]] #  for y_axis 
x_pos = ['No', 'Yes'] # for axis 

# creates bar chart of class labels
plt.figure(figsize=(7,5))
plt.bar(x_pos, y_pos, width=0.50, color='g')
plt.xlabel('Class')
plt.ylabel('No. of sample in class')
plt.title('Class distribution')
plt.show()

#''' === another way to plot using frequency distribution ==='''

# creates histogram figure
#class_dist1 =list(np.array(df['Churn'])) # takes class column
#plt.figure()
#plt.hist(class_dist1)
#plt.xlabel('Class')
#plt.ylabel('No. of sample in class')
#plt.title('Class distribution')
#plt.show()
Class count: 
 No     5153
Yes    1857
Name: Churn, dtype: int64

Note:

In the above bar chart, it clearly shows that the class label are imbalanced i.e. class- Yes(27%) and class- No(73%) of total dataset. Thus, we need to balance them either by upsampling the class- Yes or undersampling the class-No so that the model will predict the unseen data corretly to both class. Otherwise, the model will bias with the majority of class. So, it is handle in imblance data section

In [28]:
variable_not = ['MonthlyCharges', 'TotalCharges']
#show barchart of all coloumns 
for x in data.columns[:-1]:
    
    if x not in variable_not:
        print('Show  '+ x + ' in Class Distribution ')
        #a =data.groupby([x, 'Churn']).size().unstack(fill_value=0)
        if x == 'tenure':
            pd.crosstab(data[x],data['Churn']).plot.bar(figsize=(15,7))
            plt.show()
            print('\n\n')
            
            
        else:
            pd.crosstab(data[x],data['Churn']).plot.bar(figsize=(7,5))
            plt.show()
            print('\n\n')
            
       
Show  gender in Class Distribution 


Show  SeniorCitizen in Class Distribution 


Show  Partner in Class Distribution 


Show  Dependents in Class Distribution 


Show  tenure in Class Distribution 


Show  PhoneService in Class Distribution 


Show  MultipleLines in Class Distribution 


Show  InternetService in Class Distribution 


Show  OnlineSecurity in Class Distribution 


Show  OnlineBackup in Class Distribution 


Show  DeviceProtection in Class Distribution 


Show  TechSupport in Class Distribution 


Show  StreamingTV in Class Distribution 


Show  StreamingMovies in Class Distribution 


Show  Contract in Class Distribution 


Show  PaperlessBilling in Class Distribution 


Show  PaymentMethod in Class Distribution 


Lets further visualize and try to know customer churn based on services with contract type.

In [729]:
"""=== Lets try to anylize the services  based on the contract types ==="""

lists_ofservice = ['PhoneService', 'MultipleLines', 'InternetService', 'OnlineSecurity',
       'OnlineBackup', 'DeviceProtection', 'TechSupport', 'StreamingTV',
       'StreamingMovies',]

for x in  lists_ofservice:
    print('Show  '+ x + ' with contract type ')
    g = sns.catplot('Churn', col=x, row ='Contract', data=data, kind='count', aspect=0.8, hue_order=['g', 'b']) # bar plot
    plt.show()
    print('\n\n')
Show  PhoneService with contract type 


Show  MultipleLines with contract type 


Show  InternetService with contract type 


Show  OnlineSecurity with contract type 


Show  OnlineBackup with contract type 


Show  DeviceProtection with contract type 


Show  TechSupport with contract type 


Show  StreamingTV with contract type 


Show  StreamingMovies with contract type 


From the above plots, it is clearly noticed that the large of protion of customer churn are from those customers who has short tenure, phone service(yes), multiplelines(No), oniline security(No), online backup(No), Device protection(No), Techsupport(No), fiber optic as internet service, month-to-month contract type, electronic check payment type.

Especially the customer who has short contract type i.e. month-to-month are most likely to churn.

The reasons might be: 1) Company doesn't take necessary steps before contract end, 2) Customer might get better offer from another company, and 3) Since the majority of leaving customer are from 'phone service' and 'internet-service(Fiber-optic)', the services in these categories should be checked/revised to make sure they are providing the really better service than the competitor.

In [29]:
#change the category values into dummy variable and drop the first column(to avoid dummy trap)
df_dummy= pd.get_dummies(data.iloc[:,:-1], drop_first=True)
df_dummy = pd.concat([df_dummy, data['Churn']], axis=1)
print('Dimension of dataset after dummy variable applied:\n',df_dummy.shape)
df_dummy.head()
Dimension of dataset after dummy variable applied:
 (7010, 24)
Out[29]:
SeniorCitizen tenure MonthlyCharges TotalCharges gender_Male Partner_Yes Dependents_Yes PhoneService_Yes MultipleLines_Yes InternetService_Fiber optic ... TechSupport_Yes StreamingTV_Yes StreamingMovies_Yes Contract_One year Contract_Two year PaperlessBilling_Yes PaymentMethod_Credit card (automatic) PaymentMethod_Electronic check PaymentMethod_Mailed check Churn
0 0 1 29.85 29.85 0 1 0 0 0 0 ... 0 0 0 0 0 1 0 1 0 No
1 0 34 56.95 1889.50 1 0 0 1 0 0 ... 0 0 0 1 0 0 0 0 1 No
2 0 2 53.85 108.15 1 0 0 1 0 0 ... 0 0 0 0 0 1 0 0 1 Yes
3 0 45 42.30 1840.75 1 0 0 0 0 0 ... 1 0 0 1 0 0 0 0 0 No
4 0 2 70.70 151.65 0 0 0 1 0 1 ... 0 0 0 0 0 1 0 1 0 Yes

5 rows × 24 columns

In [30]:
# apply zscore to ceratain columns
columns = ['tenure', 'MonthlyCharges','TotalCharges']
new_data = df_dummy.copy()
new_data[columns] = new_data[columns].apply(zscore) 
new_data.tail()
Out[30]:
SeniorCitizen tenure MonthlyCharges TotalCharges gender_Male Partner_Yes Dependents_Yes PhoneService_Yes MultipleLines_Yes InternetService_Fiber optic ... TechSupport_Yes StreamingTV_Yes StreamingMovies_Yes Contract_One year Contract_Two year PaperlessBilling_Yes PaymentMethod_Credit card (automatic) PaymentMethod_Electronic check PaymentMethod_Mailed check Churn
7038 0 -0.347506 0.662329 -0.132289 1 1 1 1 1 0 ... 1 1 1 1 0 1 0 0 1 No
7039 0 1.610184 1.274384 2.237896 0 1 1 1 1 1 ... 0 1 1 1 0 1 1 0 0 No
7040 0 -0.877714 -1.173839 -0.857607 0 1 1 0 0 0 ... 0 0 0 0 0 1 0 1 0 No
7041 1 -1.163210 0.316384 -0.875188 1 1 0 1 1 1 ... 0 0 0 0 0 1 0 0 1 Yes
7042 0 1.365473 1.355881 2.009189 1 0 0 1 0 1 ... 1 1 1 0 1 1 0 0 0 No

5 rows × 24 columns

In [31]:
#split data into train (70%) and test (30%)
def data_split(data, test_size):
    '''
    data = data to split
    test_size = size of test-data
    '''
    train, test = train_test_split(data, stratify=data['Churn'], test_size = test_size, random_state=3)
    X_train = train.drop(['Churn'], axis=1) # drop 'Churn' column
    y_train = train['Churn'] # Churn column for train 
    X_test = test.drop(['Churn'], axis=1) # drop 'Churn' column
    y_test = test['Churn'] # Churn column for test

    return X_train, y_train, X_test, y_test
In [32]:
#call data split function
X_train, y_train, X_test, y_test = data_split(new_data, 0.3)
In [33]:
# check ratio of train and test based on class label
print(y_test.value_counts()[0]/(y_test.value_counts()[1]+y_test.value_counts()[0])) # class 'No churn'
print(y_test.value_counts()[1]/(y_test.value_counts()[1]+y_test.value_counts()[0])) #clas 'Churn'
0.7351402757964812
0.2648597242035188

3.4 Handle Imbalanced Data

For handling this imblanced data, over-sampling technique(SMOTE) is used. So that the both class will have equal sample. In this case, up-sampling only to the train dataset, but test datset set should be as it is. This is done in the section below- model building.

In [34]:
'''=== up-sampling is done to the minority class. So that it becomes 
  balanced. This is done only to train data but not to test data ==='''

#over-sampling the data
def up_sampling(X_train, y_train):
    '''
    X_train = Input features of training dataset
    y_train = Ouput/class of training dataset
    '''
    sm = SMOTE(random_state=42, ratio=1.0) # SMOTE is apply for up-sampling
    X_train_new, y_train_new = sm.fit_sample(X_train, y_train) # up-sampling only the train data
    return X_train_new, y_train_new
In [35]:
# call up_sampling function
X_train_new, y_train_new = up_sampling(X_train, y_train)
print('Before up-sampling:\n', y_train.value_counts())
print()
unique_elements, counts_elements = np.unique(y_train_new, return_counts=True)
#print('\n After up-sampling:', np.bincount(X_train_new)) for int values types
print('\n After up-sampling:', counts_elements)
Before up-sampling:
 No     3607
Yes    1300
Name: Churn, dtype: int64


 After up-sampling: [3607 3607]

3.4 Model

The gridSearchCV with 10 Cross-validation(CV) is used for tunning the parameteres (find the optimal parameters) for each of algorithms. And, the selected optimal parameters is used to to buld the model and evaluated with test data. The confusion matrix from each model are displayed. AUC score, classification accuracy are used for model perfomance.

In [36]:
"""== return best parameter =="""

#parameter tuning with gridsearch
def find_optimal_para(algorithm, parameters, cv, X_train, y_train):
    """
    algorithm =  instance of alorithms
    parameters = list of parameter 
    cv = no. of cross-validation
    X_train = training dataset(features)
    y_train = training dataset(labels)
    
    """
    best_clf = GridSearchCV(algorithm, parameters, cv =cv) 
    best_clf.fit(X_train,y_train)
    return best_clf.best_params_
    
In [37]:
"""== return clasification accuracy, confusion matrix and  classification report =="""

#model evaluate
def model_evaluaton(model, X_test, y_test ):
    """
    model= final model(parameter tunned)
    X_test = testing datset(features)
    y_test = testing dataset(labels)
    
    """
    predict = model.predict(X_test)  # prediciton on test data
    accu_score = accuracy_score(predict, y_test)# clasification accuracy
    con_matrix = confusion_matrix(predict, y_test) # confusion matrix
    cla_report = classification_report(predict, y_test) # classification report
    return accu_score, con_matrix, cla_report
    
In [38]:
#creates confustion matrix(from yellobrik)
def confusion_Matrix(model,  X_test, y_test, label):
    """
    model = final model(parameter tunned)
    X_test = testing datset(features)
    y_test = testing dataset(labels)
    label =  labels/ouputs in list, e.g ['Yes', 'No']
    
    """
    # ConfusionMatrix  model
    cm = ConfusionMatrix(model, classes=label)
    
    # Fit fits the passed model. 
    #cm.fit(X_train, y_train)
    cm.score(X_test, y_test)
    cm.finalize()
In [39]:
'''=== ROC curve  plot ===='''
def creat_roc_curve(model, X_test, y_test, pos_label):
    
    """
    model = final model(parameter tunned)
    X_test = testing datset(features)
    y_test = testing dataset(labels)
    pos_label = postive class(either in integer or string), eg. 'Yes'
    
    
    """
    y_test_prob_tune = model.predict_proba(X_test)#class probabilities for Auc_score
    prob_pstive = [p[1] for p in y_test_prob_tune] # probabilty for positive class
    auc_score = roc_auc_score(y_test, prob_pstive)
    #print('Auc_score:', auc_score)


    #create figure- area under curve
    fpr,tpr,thresholds = roc_curve(y_test, prob_pstive, pos_label=pos_label) # find true positve, false positive rate and thresholds

    plt.figure(figsize=(7,5))
    plt.plot(fpr, tpr, marker='o',label="auc_score = " + str(auc_score))
    plt.plot([0, 1], [0, 1],'r--')
    plt.xlim([0, 1])
    plt.ylim([0, 1])
    plt.title('ROC Curve')
    plt.xlabel('False Positive Rate')
    plt.ylabel('True Positive Rate')
    plt.legend(loc=4)
    plt.show()
In [40]:
"""=== train model with best paramter and evalute perfomance ==="""

#train model 
def train_evaluate_model(algorithm, parameters,cv, X_train, y_train, X_test, y_test, label, pos_label ):
    """
    algorithm = instance of alorithms , 
    parameters = list of parameter ,
    cv = no. of cross-validation,
    X_train = train dataset(features),
    y_train = train dataset(labels),
    X_test = test dataset(features),
    y_test = testa dataset(labels),
    label = labels/ouputs in list, e.g ['Yes', 'No'], 
    pos_label = postive class(either in integer or string), eg. 'Yes'
    """
    
    
    best_parmeter = find_optimal_para(algorithm, parameters, cv, X_train, y_train) # return best parameter
    print('best parameter: ', best_parmeter)
    print()
    
    alg_with_best_par =  algorithm.set_params(**best_parmeter) # create algorithm instance with best parameter
    alg_with_best_par.fit(X_train_new, y_train_new) #fit training data
    accu_score, con_matrix, cla_report = model_evaluaton(alg_with_best_par,X_test, y_test) # call model_evaluaton
    print( 'Classification acurracy:', accu_score)
    print('\nConfusion Matrix:\n', con_matrix)
    print('\nclassification_report:\n', cla_report )
    print()
    
    print( 'Confusion Matrix from yellobrick:')
    confusion_Matrix(alg_with_best_par,  X_test, y_test, label)
    print()
  
    creat_roc_curve(alg_with_best_par, X_test, y_test, pos_label)
    
    #return model 
    return alg_with_best_par
    
In [41]:
"""== function to save model =="""
# save the model to disk
def save_model(model, file_name):
    joblib.dump(model, 'Models/'+ file_name) #save in Models folder
    return ('Model saved')
In [42]:
 
#parameter
parameter_lr = {
    #'penalty' : ['l1', 'l2'],
    'C' :  [0.001,0.01,0.1,1,10,100],# regularization parameter.  C = 1/λ.
    'solver' : ['newton-cg', 'lbfgs', 'liblinear'] # optimizer
}

#no. of fold forcross-validaton
cv = 10

#train dataset
train_features= X_train_new
train_labels = y_train_new

#test dataset
test_features = X_test.to_numpy()
test_labels= y_test.to_numpy()

#for confusion matrix(yellobrick)
label = ['Yes', 'No']

#positve class for roc_curve
pos_label ='Yes'
In [33]:
%%time
#call  train_evaluate_model()
alogrihtm = LogisticRegression()
model_lr =train_evaluate_model(alogrihtm, parameter_lr, cv, train_features, train_labels,test_features, test_labels, label, pos_label )
best parameter:  {'C': 0.1, 'solver': 'newton-cg'}

Classification acurracy: 0.7451260104612458

Confusion Matrix:
 [[1126  116]
 [ 420  441]]

classification_report:
               precision    recall  f1-score   support

          No       0.73      0.91      0.81      1242
         Yes       0.79      0.51      0.62       861

   micro avg       0.75      0.75      0.75      2103
   macro avg       0.76      0.71      0.71      2103
weighted avg       0.75      0.75      0.73      2103


Confusion Matrix from yellobrick:

CPU times: user 18.7 s, sys: 254 ms, total: 18.9 s
Wall time: 10.2 s
In [34]:
#save model
save_model(model_lr, 'final_model_lr')
Out[34]:
'Model saved'
In [201]:
#show features importance in diagram
feat_import = pd.Series(model_lr.coef_.flatten(), index=X_train.columns)
feat_import.nlargest(23).plot(kind='barh',figsize=(15,13))
plt.show()
In [ ]:
# save the model 
save_model(model_lr, 'final_model_lr')
In [40]:
#parameters
parameter_dt = {'criterion':['entropy','gini'],
              'max_depth':range(1,50),
              'min_samples_leaf': range(20,100, 10)
              #'max_features': [ 5, 10, 12],
              #'splitter': ['best','random']
              
             }
In [41]:
%%time
#train with best parameter and evaluate 
alogrihtm = DecisionTreeClassifier( random_state=42)
model_dt =train_evaluate_model(alogrihtm, parameter_dt, cv, train_features, train_labels,test_features, test_labels, label, pos_label )
best parameter:  {'criterion': 'gini', 'max_depth': 9, 'min_samples_leaf': 20}

Classification acurracy: 0.7627199239182121

Confusion Matrix:
 [[1217  170]
 [ 329  387]]

classification_report:
               precision    recall  f1-score   support

          No       0.79      0.88      0.83      1387
         Yes       0.69      0.54      0.61       716

   micro avg       0.76      0.76      0.76      2103
   macro avg       0.74      0.71      0.72      2103
weighted avg       0.76      0.76      0.75      2103


Confusion Matrix from yellobrick:

CPU times: user 5min 53s, sys: 1.85 s, total: 5min 55s
Wall time: 6min 11s
In [42]:
# save the model 
save_model(model_dt, 'final_model_dt')
Out[42]:
'Model saved'
In [111]:
# show in tree-structure 
churnTree = tree.export_graphviz(model_dt, out_file=None, 
                         feature_names = list(X_train.columns.values),  
                         class_names = [ 'No Churn', 'Churn'],
                         filled=True, 
                         rounded=True,
                         special_characters=True)  
graph = graphviz.Source(churnTree) 
#graph.render('decision_tree.gv', view=True)
pydot_graph = pydotplus.graph_from_dot_data(churnTree)
pydot_graph.write_png('original_tree.png') # shave figure
#pydot_graph.set_size('"5,5!"')
#pydot_graph.write_png('resized_tree.png')
graph
Out[111]:
Tree 0 Contract_Two year ≤ 0.976 gini = 0.5 samples = 7214 value = [3607, 3607] class = No Churn 1 Contract_One year ≤ 0.997 gini = 0.483 samples = 6001 value = [2454, 3547] class = Churn 0->1 True 200 MonthlyCharges ≤ 0.916 gini = 0.094 samples = 1213 value = [1153, 60] class = No Churn 0->200 False 2 InternetService_Fiber optic ≤ 0.005 gini = 0.436 samples = 4853 value = [1556, 3297] class = Churn 1->2 141 StreamingMovies_Yes ≤ 0.038 gini = 0.341 samples = 1148 value = [898, 250] class = No Churn 1->141 3 tenure ≤ -1.164 gini = 0.499 samples = 1837 value = [882, 955] class = Churn 2->3 68 tenure ≤ -0.595 gini = 0.347 samples = 3016 value = [674, 2342] class = Churn 2->68 4 InternetService_No ≤ 0.646 gini = 0.415 samples = 744 value = [219, 525] class = Churn 3->4 29 PaymentMethod_Electronic check ≤ 0.005 gini = 0.477 samples = 1093 value = [663, 430] class = No Churn 3->29 5 MonthlyCharges ≤ -0.251 gini = 0.32 samples = 519 value = [104, 415] class = Churn 4->5 18 TotalCharges ≤ -0.996 gini = 0.5 samples = 225 value = [115, 110] class = No Churn 4->18 6 tenure ≤ -1.285 gini = 0.296 samples = 492 value = [89, 403] class = Churn 5->6 17 gini = 0.494 samples = 27 value = [15, 12] class = No Churn 5->17 7 OnlineSecurity_Yes ≤ 0.928 gini = 0.381 samples = 234 value = [60, 174] class = Churn 6->7 12 tenure ≤ -1.245 gini = 0.2 samples = 258 value = [29, 229] class = Churn 6->12 8 gender_Male ≤ 0.994 gini = 0.353 samples = 214 value = [49, 165] class = Churn 7->8 11 gini = 0.495 samples = 20 value = [11, 9] class = No Churn 7->11 9 gini = 0.267 samples = 126 value = [20, 106] class = Churn 8->9 10 gini = 0.442 samples = 88 value = [29, 59] class = Churn 8->10 13 gini = 0.0 samples = 63 value = [0, 63] class = Churn 12->13 14 tenure ≤ -1.245 gini = 0.253 samples = 195 value = [29, 166] class = Churn 12->14 15 gini = 0.433 samples = 60 value = [19, 41] class = Churn 14->15 16 gini = 0.137 samples = 135 value = [10, 125] class = Churn 14->16 19 PaymentMethod_Mailed check ≤ 0.077 gini = 0.493 samples = 168 value = [74, 94] class = Churn 18->19 26 tenure ≤ -1.244 gini = 0.404 samples = 57 value = [41, 16] class = No Churn 18->26 20 gini = 0.47 samples = 37 value = [23, 14] class = No Churn 19->20 21 TotalCharges ≤ -1.002 gini = 0.475 samples = 131 value = [51, 80] class = Churn 19->21 22 gini = 0.485 samples = 29 value = [17, 12] class = No Churn 21->22 23 gender_Male ≤ 0.5 gini = 0.444 samples = 102 value = [34, 68] class = Churn 21->23 24 gini = 0.375 samples = 56 value = [14, 42] class = Churn 23->24 25 gini = 0.491 samples = 46 value = [20, 26] class = Churn 23->25 27 gini = 0.128 samples = 29 value = [27, 2] class = No Churn 26->27 28 gini = 0.5 samples = 28 value = [14, 14] class = No Churn 26->28 30 PhoneService_Yes ≤ 0.938 gini = 0.425 samples = 695 value = [482, 213] class = No Churn 29->30 55 PaymentMethod_Electronic check ≤ 0.992 gini = 0.496 samples = 398 value = [181, 217] class = Churn 29->55 31 OnlineSecurity_Yes ≤ 0.971 gini = 0.499 samples = 149 value = [78, 71] class = No Churn 30->31 42 tenure ≤ -0.599 gini = 0.385 samples = 546 value = [404, 142] class = No Churn 30->42 32 StreamingMovies_Yes ≤ 0.107 gini = 0.491 samples = 95 value = [41, 54] class = Churn 31->32 39 tenure ≤ -0.455 gini = 0.431 samples = 54 value = [37, 17] class = No Churn 31->39 33 PaperlessBilling_Yes ≤ 0.815 gini = 0.496 samples = 55 value = [30, 25] class = No Churn 32->33 36 StreamingTV_Yes ≤ 0.605 gini = 0.399 samples = 40 value = [11, 29] class = Churn 32->36 34 gini = 0.434 samples = 22 value = [7, 15] class = Churn 33->34 35 gini = 0.422 samples = 33 value = [23, 10] class = No Churn 33->35 37 gini = 0.255 samples = 20 value = [3, 17] class = Churn 36->37 38 gini = 0.48 samples = 20 value = [8, 12] class = Churn 36->38 40 gini = 0.172 samples = 21 value = [19, 2] class = No Churn 39->40 41 gini = 0.496 samples = 33 value = [18, 15] class = No Churn 39->41 43 MultipleLines_Yes ≤ 0.055 gini = 0.444 samples = 325 value = [217, 108] class = No Churn 42->43 50 tenure ≤ 0.616 gini = 0.26 samples = 221 value = [187, 34] class = No Churn 42->50 44 MonthlyCharges ≤ -1.492 gini = 0.402 samples = 262 value = [189, 73] class = No Churn 43->44 47 gender_Male ≤ 0.001 gini = 0.494 samples = 63 value = [28, 35] class = Churn 43->47 45 gini = 0.268 samples = 69 value = [58, 11] class = No Churn 44->45 46 gini = 0.436 samples = 193 value = [131, 62] class = No Churn 44->46 48 gini = 0.439 samples = 43 value = [14, 29] class = Churn 47->48 49 gini = 0.42 samples = 20 value = [14, 6] class = No Churn 47->49 51 TotalCharges ≤ -0.789 gini = 0.203 samples = 192 value = [170, 22] class = No Churn 50->51 54 gini = 0.485 samples = 29 value = [17, 12] class = No Churn 50->54 52 gini = 0.0 samples = 31 value = [31, 0] class = No Churn 51->52 53 gini = 0.236 samples = 161 value = [139, 22] class = No Churn 51->53 56 gini = 0.0 samples = 43 value = [0, 43] class = Churn 55->56 57 OnlineBackup_Yes ≤ 1.0 gini = 0.5 samples = 355 value = [181, 174] class = No Churn 55->57 58 OnlineBackup_Yes ≤ 0.001 gini = 0.491 samples = 272 value = [118, 154] class = Churn 57->58 63 PhoneService_Yes ≤ 0.5 gini = 0.366 samples = 83 value = [63, 20] class = No Churn 57->63 59 tenure ≤ -0.515 gini = 0.499 samples = 246 value = [118, 128] class = Churn 58->59 62 gini = 0.0 samples = 26 value = [0, 26] class = Churn 58->62 60 gini = 0.474 samples = 179 value = [69, 110] class = Churn 59->60 61 gini = 0.393 samples = 67 value = [49, 18] class = No Churn 59->61 64 gini = 0.475 samples = 36 value = [22, 14] class = No Churn 63->64 65 MonthlyCharges ≤ -0.292 gini = 0.223 samples = 47 value = [41, 6] class = No Churn 63->65 66 gini = 0.087 samples = 22 value = [21, 1] class = No Churn 65->66 67 gini = 0.32 samples = 25 value = [20, 5] class = No Churn 65->67 69 PaperlessBilling_Yes ≤ 0.001 gini = 0.237 samples = 1779 value = [244, 1535] class = Churn 68->69 102 MultipleLines_Yes ≤ 0.017 gini = 0.454 samples = 1237 value = [430, 807] class = Churn 68->102 70 OnlineSecurity_Yes ≤ 0.995 gini = 0.385 samples = 280 value = [73, 207] class = Churn 69->70 81 TechSupport_Yes ≤ 0.99 gini = 0.202 samples = 1499 value = [171, 1328] class = Churn 69->81 71 tenure ≤ -0.889 gini = 0.346 samples = 252 value = [56, 196] class = Churn 70->71 80 gini = 0.477 samples = 28 value = [17, 11] class = No Churn 70->80 72 OnlineBackup_Yes ≤ 0.898 gini = 0.291 samples = 204 value = [36, 168] class = Churn 71->72 77 MonthlyCharges ≤ 0.506 gini = 0.486 samples = 48 value = [20, 28] class = Churn 71->77 73 gender_Male ≤ 0.991 gini = 0.246 samples = 181 value = [26, 155] class = Churn 72->73 76 gini = 0.491 samples = 23 value = [10, 13] class = Churn 72->76 74 gini = 0.169 samples = 118 value = [11, 107] class = Churn 73->74 75 gini = 0.363 samples = 63 value = [15, 48] class = Churn 73->75 78 gini = 0.5 samples = 20 value = [10, 10] class = No Churn 77->78 79 gini = 0.459 samples = 28 value = [10, 18] class = Churn 77->79 82 MonthlyCharges ≤ 0.81 gini = 0.185 samples = 1410 value = [145, 1265] class = Churn 81->82 95 TotalCharges ≤ -0.488 gini = 0.414 samples = 89 value = [26, 63] class = Churn 81->95 83 tenure ≤ -1.245 gini = 0.221 samples = 997 value = [126, 871] class = Churn 82->83 90 TotalCharges ≤ -0.323 gini = 0.088 samples = 413 value = [19, 394] class = Churn 82->90 84 MonthlyCharges ≤ 0.521 gini = 0.075 samples = 282 value = [11, 271] class = Churn 83->84 87 OnlineSecurity_Yes ≤ 0.981 gini = 0.27 samples = 715 value = [115, 600] class = Churn 83->87 85 gini = 0.056 samples = 243 value = [7, 236] class = Churn 84->85 86 gini = 0.184 samples = 39 value = [4, 35] class = Churn 84->86 88 gini = 0.25 samples = 669 value = [98, 571] class = Churn 87->88 89 gini = 0.466 samples = 46 value = [17, 29] class = Churn 87->89 91 gender_Male ≤ 0.963 gini = 0.07 samples = 385 value = [14, 371] class = Churn 90->91 94 gini = 0.293 samples = 28 value = [5, 23] class = Churn 90->94 92 gini = 0.034 samples = 233 value = [4, 229] class = Churn 91->92 93 gini = 0.123 samples = 152 value = [10, 142] class = Churn 91->93 96 TotalCharges ≤ -0.697 gini = 0.451 samples = 67 value = [23, 44] class = Churn 95->96 101 gini = 0.236 samples = 22 value = [3, 19] class = Churn 95->101 97 TotalCharges ≤ -0.83 gini = 0.375 samples = 44 value = [11, 33] class = Churn 96->97 100 gini = 0.499 samples = 23 value = [12, 11] class = No Churn 96->100 98 gini = 0.413 samples = 24 value = [7, 17] class = Churn 97->98 99 gini = 0.32 samples = 20 value = [4, 16] class = Churn 97->99 103 TotalCharges ≤ 0.524 gini = 0.498 samples = 268 value = [142, 126] class = No Churn 102->103 116 OnlineSecurity_Yes ≤ 0.996 gini = 0.418 samples = 969 value = [288, 681] class = Churn 102->116 104 Dependents_Yes ≤ 0.019 gini = 0.498 samples = 207 value = [97, 110] class = Churn 103->104 113 PaymentMethod_Electronic check ≤ 0.141 gini = 0.387 samples = 61 value = [45, 16] class = No Churn 103->113 105 MonthlyCharges ≤ 0.977 gini = 0.496 samples = 148 value = [81, 67] class = No Churn 104->105 110 MonthlyCharges ≤ 0.792 gini = 0.395 samples = 59 value = [16, 43] class = Churn 104->110 106 MonthlyCharges ≤ 0.675 gini = 0.474 samples = 109 value = [67, 42] class = No Churn 105->106 109 gini = 0.46 samples = 39 value = [14, 25] class = Churn 105->109 107 gini = 0.497 samples = 84 value = [45, 39] class = No Churn 106->107 108 gini = 0.211 samples = 25 value = [22, 3] class = No Churn 106->108 111 gini = 0.489 samples = 33 value = [14, 19] class = Churn 110->111 112 gini = 0.142 samples = 26 value = [2, 24] class = Churn 110->112 114 gini = 0.17 samples = 32 value = [29, 3] class = No Churn 113->114 115 gini = 0.495 samples = 29 value = [16, 13] class = No Churn 113->115 117 PaperlessBilling_Yes ≤ 0.035 gini = 0.381 samples = 808 value = [207, 601] class = Churn 116->117 130 tenure ≤ 0.414 gini = 0.5 samples = 161 value = [81, 80] class = No Churn 116->130 118 TotalCharges ≤ 1.004 gini = 0.499 samples = 90 value = [47, 43] class = No Churn 117->118 123 PaymentMethod_Electronic check ≤ 0.017 gini = 0.346 samples = 718 value = [160, 558] class = Churn 117->123 119 Partner_Yes ≤ 0.91 gini = 0.496 samples = 70 value = [32, 38] class = Churn 118->119 122 gini = 0.375 samples = 20 value = [15, 5] class = No Churn 118->122 120 gini = 0.461 samples = 36 value = [13, 23] class = Churn 119->120 121 gini = 0.493 samples = 34 value = [19, 15] class = No Churn 119->121 124 MonthlyCharges ≤ 1.188 gini = 0.44 samples = 230 value = [75, 155] class = Churn 123->124 127 tenure ≤ 0.992 gini = 0.288 samples = 488 value = [85, 403] class = Churn 123->127 125 gini = 0.484 samples = 170 value = [70, 100] class = Churn 124->125 126 gini = 0.153 samples = 60 value = [5, 55] class = Churn 124->126 128 gini = 0.256 samples = 438 value = [66, 372] class = Churn 127->128 129 gini = 0.471 samples = 50 value = [19, 31] class = Churn 127->129 131 MonthlyCharges ≤ 1.067 gini = 0.438 samples = 74 value = [24, 50] class = Churn 130->131 136 tenure ≤ 0.993 gini = 0.452 samples = 87 value = [57, 30] class = No Churn 130->136 132 gini = 0.489 samples = 33 value = [19, 14] class = No Churn 131->132 133 MonthlyCharges ≤ 1.312 gini = 0.214 samples = 41 value = [5, 36] class = Churn 131->133 134 gini = 0.095 samples = 20 value = [1, 19] class = Churn 133->134 135 gini = 0.308 samples = 21 value = [4, 17] class = Churn 133->135 137 TotalCharges ≤ 1.201 gini = 0.493 samples = 52 value = [29, 23] class = No Churn 136->137 140 gini = 0.32 samples = 35 value = [28, 7] class = No Churn 136->140 138 gini = 0.428 samples = 29 value = [20, 9] class = No Churn 137->138 139 gini = 0.476 samples = 23 value = [9, 14] class = Churn 137->139 142 TechSupport_Yes ≤ 0.079 gini = 0.129 samples = 563 value = [524, 39] class = No Churn 141->142 169 StreamingTV_Yes ≤ 0.004 gini = 0.461 samples = 585 value = [374, 211] class = No Churn 141->169 143 PaymentMethod_Electronic check ≤ 0.025 gini = 0.09 samples = 401 value = [382, 19] class = No Churn 142->143 160 TotalCharges ≤ 0.419 gini = 0.216 samples = 162 value = [142, 20] class = No Churn 142->160 144 TotalCharges ≤ -0.945 gini = 0.056 samples = 349 value = [339, 10] class = No Churn 143->144 157 Partner_Yes ≤ 0.529 gini = 0.286 samples = 52 value = [43, 9] class = No Churn 143->157 145 gini = 0.159 samples = 23 value = [21, 2] class = No Churn 144->145 146 MonthlyCharges ≤ -1.467 gini = 0.048 samples = 326 value = [318, 8] class = No Churn 144->146 147 MonthlyCharges ≤ -1.489 gini = 0.012 samples = 166 value = [165, 1] class = No Churn 146->147 152 TotalCharges ≤ -0.549 gini = 0.084 samples = 160 value = [153, 7] class = No Churn 146->152 148 gini = 0.0 samples = 116 value = [116, 0] class = No Churn 147->148 149 TotalCharges ≤ -0.814 gini = 0.039 samples = 50 value = [49, 1] class = No Churn 147->149 150 gini = 0.095 samples = 20 value = [19, 1] class = No Churn 149->150 151 gini = 0.0 samples = 30 value = [30, 0] class = No Churn 149->151 153 gini = 0.188 samples = 38 value = [34, 4] class = No Churn 152->153 154 MultipleLines_Yes ≤ 0.5 gini = 0.048 samples = 122 value = [119, 3] class = No Churn 152->154 155 gini = 0.095 samples = 60 value = [57, 3] class = No Churn 154->155 156 gini = 0.0 samples = 62 value = [62, 0] class = No Churn 154->156 158 gini = 0.375 samples = 28 value = [21, 7] class = No Churn 157->158 159 gini = 0.153 samples = 24 value = [22, 2] class = No Churn 157->159 161 OnlineBackup_Yes ≤ 0.078 gini = 0.315 samples = 102 value = [82, 20] class = No Churn 160->161 168 gini = 0.0 samples = 60 value = [60, 0] class = No Churn 160->168 162 MonthlyCharges ≤ -0.182 gini = 0.107 samples = 53 value = [50, 3] class = No Churn 161->162 165 OnlineSecurity_Yes ≤ 0.5 gini = 0.453 samples = 49 value = [32, 17] class = No Churn 161->165 163 gini = 0.0 samples = 33 value = [33, 0] class = No Churn 162->163 164 gini = 0.255 samples = 20 value = [17, 3] class = No Churn 162->164 166 gini = 0.499 samples = 21 value = [10, 11] class = Churn 165->166 167 gini = 0.337 samples = 28 value = [22, 6] class = No Churn 165->167 170 MonthlyCharges ≤ 0.074 gini = 0.217 samples = 113 value = [99, 14] class = No Churn 169->170 177 PaperlessBilling_Yes ≤ 0.002 gini = 0.486 samples = 472 value = [275, 197] class = No Churn 169->177 171 gini = 0.346 samples = 36 value = [28, 8] class = No Churn 170->171 172 tenure ≤ 1.141 gini = 0.144 samples = 77 value = [71, 6] class = No Churn 170->172 173 OnlineBackup_Yes ≤ 0.5 gini = 0.069 samples = 56 value = [54, 2] class = No Churn 172->173 176 gini = 0.308 samples = 21 value = [17, 4] class = No Churn 172->176 174 gini = 0.159 samples = 23 value = [21, 2] class = No Churn 173->174 175 gini = 0.0 samples = 33 value = [33, 0] class = No Churn 173->175 178 MonthlyCharges ≤ 0.5 gini = 0.375 samples = 108 value = [81, 27] class = No Churn 177->178 185 PaperlessBilling_Yes ≤ 0.983 gini = 0.498 samples = 364 value = [194, 170] class = No Churn 177->185 179 gini = 0.117 samples = 32 value = [30, 2] class = No Churn 178->179 180 TotalCharges ≤ 0.419 gini = 0.441 samples = 76 value = [51, 25] class = No Churn 178->180 181 gini = 0.496 samples = 22 value = [10, 12] class = Churn 180->181 182 OnlineBackup_Yes ≤ 0.5 gini = 0.366 samples = 54 value = [41, 13] class = No Churn 180->182 183 gini = 0.095 samples = 20 value = [19, 1] class = No Churn 182->183 184 gini = 0.457 samples = 34 value = [22, 12] class = No Churn 182->184 186 gini = 0.0 samples = 27 value = [0, 27] class = Churn 185->186 187 Partner_Yes ≤ 0.004 gini = 0.489 samples = 337 value = [194, 143] class = No Churn 185->187 188 TotalCharges ≤ 1.38 gini = 0.444 samples = 117 value = [78, 39] class = No Churn 187->188 195 Partner_Yes ≤ 0.992 gini = 0.499 samples = 220 value = [116, 104] class = No Churn 187->195 189 MonthlyCharges ≤ 0.267 gini = 0.351 samples = 75 value = [58, 17] class = No Churn 188->189 192 tenure ≤ 1.1 gini = 0.499 samples = 42 value = [20, 22] class = Churn 188->192 190 gini = 0.495 samples = 20 value = [11, 9] class = No Churn 189->190 191 gini = 0.249 samples = 55 value = [47, 8] class = No Churn 189->191 193 gini = 0.375 samples = 20 value = [5, 15] class = Churn 192->193 194 gini = 0.434 samples = 22 value = [15, 7] class = No Churn 192->194 196 gini = 0.0 samples = 23 value = [0, 23] class = Churn 195->196 197 tenure ≤ 1.405 gini = 0.484 samples = 197 value = [116, 81] class = No Churn 195->197 198 gini = 0.495 samples = 163 value = [90, 73] class = No Churn 197->198 199 gini = 0.36 samples = 34 value = [26, 8] class = No Churn 197->199 201 PaymentMethod_Electronic check ≤ 0.051 gini = 0.034 samples = 917 value = [901, 16] class = No Churn 200->201 222 tenure ≤ 1.59 gini = 0.253 samples = 296 value = [252, 44] class = No Churn 200->222 202 SeniorCitizen ≤ 0.467 gini = 0.025 samples = 864 value = [853, 11] class = No Churn 201->202 219 tenure ≤ 1.264 gini = 0.171 samples = 53 value = [48, 5] class = No Churn 201->219 203 tenure ≤ 0.945 gini = 0.015 samples = 804 value = [798, 6] class = No Churn 202->203 216 TotalCharges ≤ 0.916 gini = 0.153 samples = 60 value = [55, 5] class = No Churn 202->216 204 tenure ≤ 0.815 gini = 0.03 samples = 328 value = [323, 5] class = No Churn 203->204 211 tenure ≤ 1.345 gini = 0.004 samples = 476 value = [475, 1] class = No Churn 203->211 205 MultipleLines_Yes ≤ 0.5 gini = 0.014 samples = 289 value = [287, 2] class = No Churn 204->205 210 gini = 0.142 samples = 39 value = [36, 3] class = No Churn 204->210 206 gini = 0.0 samples = 207 value = [207, 0] class = No Churn 205->206 207 MonthlyCharges ≤ -1.332 gini = 0.048 samples = 82 value = [80, 2] class = No Churn 205->207 208 gini = 0.172 samples = 21 value = [19, 2] class = No Churn 207->208 209 gini = 0.0 samples = 61 value = [61, 0] class = No Churn 207->209 212 tenure ≤ 1.304 gini = 0.013 samples = 150 value = [149, 1] class = No Churn 211->212 215 gini = 0.0 samples = 326 value = [326, 0] class = No Churn 211->215 213 gini = 0.0 samples = 128 value = [128, 0] class = No Churn 212->213 214 gini = 0.087 samples = 22 value = [21, 1] class = No Churn 212->214 217 gini = 0.0 samples = 35 value = [35, 0] class = No Churn 216->217 218 gini = 0.32 samples = 25 value = [20, 5] class = No Churn 216->218 220 gini = 0.0 samples = 30 value = [30, 0] class = No Churn 219->220 221 gini = 0.34 samples = 23 value = [18, 5] class = No Churn 219->221 223 OnlineBackup_Yes ≤ 0.686 gini = 0.341 samples = 197 value = [154, 43] class = No Churn 222->223 238 TotalCharges ≤ 2.278 gini = 0.02 samples = 99 value = [98, 1] class = No Churn 222->238 224 MonthlyCharges ≤ 1.363 gini = 0.454 samples = 66 value = [43, 23] class = No Churn 223->224 229 MonthlyCharges ≤ 1.36 gini = 0.259 samples = 131 value = [111, 20] class = No Churn 223->229 225 TotalCharges ≤ 1.62 gini = 0.346 samples = 45 value = [35, 10] class = No Churn 224->225 228 gini = 0.472 samples = 21 value = [8, 13] class = Churn 224->228 226 gini = 0.245 samples = 21 value = [18, 3] class = No Churn 225->226 227 gini = 0.413 samples = 24 value = [17, 7] class = No Churn 225->227 230 TotalCharges ≤ 1.777 gini = 0.105 samples = 54 value = [51, 3] class = No Churn 229->230 233 TotalCharges ≤ 2.416 gini = 0.344 samples = 77 value = [60, 17] class = No Churn 229->233 231 gini = 0.255 samples = 20 value = [17, 3] class = No Churn 230->231 232 gini = 0.0 samples = 34 value = [34, 0] class = No Churn 230->232 234 tenure ≤ 1.386 gini = 0.408 samples = 56 value = [40, 16] class = No Churn 233->234 237 gini = 0.091 samples = 21 value = [20, 1] class = No Churn 233->237 235 gini = 0.313 samples = 36 value = [29, 7] class = No Churn 234->235 236 gini = 0.495 samples = 20 value = [11, 9] class = No Churn 234->236 239 gini = 0.095 samples = 20 value = [19, 1] class = No Churn 238->239 240 gini = 0.0 samples = 79 value = [79, 0] class = No Churn 238->240
In [215]:
'''==== plot train and test score with respect to 1-50 max_depth ===='''
acc_train = []
acc_test = []


for n in range(1,30):
    
    #decision tree instance
    deci_with_tune = DecisionTreeClassifier(criterion ='gini', max_depth = n, min_samples_leaf=20)
    deci_with_tune.fit(train_features, train_labels )

    
    # prediciton on test data
    predict_tuned = deci_with_tune.predict(test_features)  
    accu_score = accuracy_score(predict_tuned , test_labels)# clasification accuracy
    acc_test.append(accu_score)
    
    
    # prediciton on train data
    train_predict_tuned = deci_with_tune.predict(train_features) 
    train_accu_score = accuracy_score(train_predict_tuned , train_labels)# clasification accuracy
    acc_train.append(train_accu_score)



#creates figure
plt.figure(figsize=(10,7))
x = np.arange(1, 30) # for x-axis
plt.plot(x, acc_train, label='training Accuracy' )
plt.plot(x, acc_test, label='testing Accuracy' )
plt.legend()
plt.show()

Note:

At first, with the max_depth equal to 1, the model acuracy is low. Nerverthless, as the no. of depth goes on increasing, the model get better, but after max_depth = 9(nearly), the model perfectly predict the training data but fails to generalize the new data as the max_depth get increased. Thus, this shows that the model will be overfitted if we increase max_depth.

Lets try look the top 12 most contributed features

In [216]:
'''==== Show Top 12 features contribution to target labels ==='''

coeffs = model_dt.feature_importances_
list_of_coeffs = list(sorted(zip(coeffs, X_train.columns), reverse=True))
x_val = [x[0] for x in list_of_coeffs[:12]]
y_val = [x[1] for x in list_of_coeffs[:12]]

#create figure
plt.figure(figsize=(10,7))
plt.barh( y_val, x_val, align='center', color='blue')
plt.title('Top 12 Importance Features')
plt.gca().invert_yaxis() # first with higest contributed one 
plt.show()

"""==Another way=="""
#feat_importances = pd.Series(coeffs, index=X_train.columns)
#feat_importances.nlargest(10).plot(kind='barh')
Out[216]:
'==Another way=='
In [406]:
'''=== Another way to draw ROC curve, so handy and efficent === '''
import scikitplot as skplt
y_test_prob_tune = model_dt.predict_proba(X_test)#class probabilities for Auc_score
y_true = y_test# ground truth labels
skplt.metrics.plot_roc(y_true, y_test_prob_tune) # takes  true and predicted lalels
plt.show()
In [37]:
#parameter
parameter_ran = {
    'n_estimators': [int(x) for x in np.linspace(start = 50, stop = 1000, num = 10)], #number of trees in the forest
    #'max_features':['auto', 'sqrt'], 
    'max_depth' : [int(x) for x in np.linspace(5, 50, num = 5)], 
    'min_samples_leaf': [int(x) for x in np.linspace(5,105, num =5)]
}
In [38]:
%%time
#random forest model
algorithm_rf = RandomForestClassifier(random_state=1)
model_rf =train_evaluate_model(algorithm_rf, parameter_ran, cv, train_features, train_labels,test_features, test_labels, label, pos_label )
best parameter:  {'max_depth': 16, 'min_samples_leaf': 5, 'n_estimators': 155}

Classification acurracy: 0.785544460294817

Confusion Matrix:
 [[1263  168]
 [ 283  389]]

classification_report:
               precision    recall  f1-score   support

          No       0.82      0.88      0.85      1431
         Yes       0.70      0.58      0.63       672

   micro avg       0.79      0.79      0.79      2103
   macro avg       0.76      0.73      0.74      2103
weighted avg       0.78      0.79      0.78      2103


Confusion Matrix from yellobrick:

CPU times: user 1h 44min 15s, sys: 41.5 s, total: 1h 44min 57s
Wall time: 1h 45min 18s
In [41]:
# save the model 
save_model(model_rf, 'final_model_rf')
Out[41]:
'Model saved'
In [413]:
#show features importance in diagram
feat_import_rn = pd.Series(model_rf.feature_importances_.flatten(), index=X_train.columns)
feat_import.nlargest(23).plot(kind='barh',figsize=(15,13))
plt.show()
In [42]:
# show in tree-structure 
churnTree = tree.export_graphviz(model_rf.estimators_[0], out_file=None, 
                         feature_names = list(X_train.columns.values),  
                         class_names = [ 'No Churn', 'Churn'],
                         filled=True, 
                         rounded=True,
                         special_characters=True)  
graph = graphviz.Source(churnTree) 
#graph.render('decision_tree.gv', view=True)
pydot_graph = pydotplus.graph_from_dot_data(churnTree)
pydot_graph.write_png('random_tree.png') # shave figure
graph
Out[42]:
Tree 0 MonthlyCharges ≤ -1.284 gini = 0.5 samples = 4566 value = [3616, 3598] class = No Churn 1 Contract_Two year ≤ 0.699 gini = 0.288 samples = 809 value = [1067, 225] class = No Churn 0->1 True 78 OnlineBackup_Yes ≤ 1.0 gini = 0.49 samples = 3757 value = [2549, 3373] class = Churn 0->78 False 2 PhoneService_Yes ≤ 0.921 gini = 0.393 samples = 525 value = [614, 225] class = No Churn 1->2 77 gini = 0.0 samples = 284 value = [453, 0] class = No Churn 1->77 3 tenure ≤ -1.044 gini = 0.39 samples = 44 value = [17, 47] class = Churn 2->3 12 MultipleLines_Yes ≤ 0.907 gini = 0.354 samples = 481 value = [597, 178] class = No Churn 2->12 4 PaperlessBilling_Yes ≤ 0.228 gini = 0.32 samples = 35 value = [10, 40] class = Churn 3->4 11 gini = 0.5 samples = 9 value = [7, 7] class = No Churn 3->11 5 MonthlyCharges ≤ -1.336 gini = 0.459 samples = 10 value = [5, 9] class = Churn 4->5 8 PaperlessBilling_Yes ≤ 0.978 gini = 0.239 samples = 25 value = [5, 31] class = Churn 4->8 6 gini = 0.494 samples = 5 value = [4, 5] class = Churn 5->6 7 gini = 0.32 samples = 5 value = [1, 4] class = Churn 5->7 9 gini = 0.0 samples = 6 value = [0, 9] class = Churn 8->9 10 gini = 0.302 samples = 19 value = [5, 22] class = Churn 8->10 13 Partner_Yes ≤ 0.88 gini = 0.362 samples = 435 value = [533, 166] class = No Churn 12->13 66 MonthlyCharges ≤ -1.325 gini = 0.266 samples = 46 value = [64, 12] class = No Churn 12->66 14 SeniorCitizen ≤ 0.177 gini = 0.414 samples = 304 value = [345, 143] class = No Churn 13->14 65 gini = 0.194 samples = 131 value = [188, 23] class = No Churn 13->65 15 Partner_Yes ≤ 0.029 gini = 0.397 samples = 295 value = [340, 128] class = No Churn 14->15 64 gini = 0.375 samples = 9 value = [5, 15] class = Churn 14->64 16 PaperlessBilling_Yes ≤ 0.055 gini = 0.389 samples = 290 value = [340, 122] class = No Churn 15->16 63 gini = 0.0 samples = 5 value = [0, 6] class = Churn 15->63 17 MonthlyCharges ≤ -1.511 gini = 0.365 samples = 204 value = [253, 80] class = No Churn 16->17 48 PaymentMethod_Credit card (automatic) ≤ 0.5 gini = 0.439 samples = 86 value = [87, 42] class = No Churn 16->48 18 Dependents_Yes ≤ 0.5 gini = 0.103 samples = 35 value = [52, 3] class = No Churn 17->18 23 Contract_One year ≤ 0.596 gini = 0.401 samples = 169 value = [201, 77] class = No Churn 17->23 19 TotalCharges ≤ -0.994 gini = 0.083 samples = 30 value = [44, 2] class = No Churn 18->19 22 gini = 0.198 samples = 5 value = [8, 1] class = No Churn 18->22 20 gini = 0.32 samples = 9 value = [8, 2] class = No Churn 19->20 21 gini = 0.0 samples = 21 value = [36, 0] class = No Churn 19->21 24 PaymentMethod_Electronic check ≤ 0.5 gini = 0.466 samples = 126 value = [126, 74] class = No Churn 23->24 45 PaymentMethod_Credit card (automatic) ≤ 0.206 gini = 0.074 samples = 43 value = [75, 3] class = No Churn 23->45 25 PaymentMethod_Mailed check ≤ 0.288 gini = 0.471 samples = 117 value = [113, 69] class = No Churn 24->25 44 gini = 0.401 samples = 9 value = [13, 5] class = No Churn 24->44 26 TotalCharges ≤ -0.965 gini = 0.415 samples = 24 value = [24, 10] class = No Churn 25->26 31 TotalCharges ≤ -0.996 gini = 0.479 samples = 93 value = [89, 59] class = No Churn 25->31 27 gini = 0.1 samples = 12 value = [18, 1] class = No Churn 26->27 28 PaymentMethod_Credit card (automatic) ≤ 0.99 gini = 0.48 samples = 12 value = [6, 9] class = Churn 26->28 29 gini = 0.444 samples = 7 value = [3, 6] class = Churn 28->29 30 gini = 0.5 samples = 5 value = [3, 3] class = No Churn 28->30 32 MonthlyCharges ≤ -1.499 gini = 0.454 samples = 42 value = [24, 45] class = Churn 31->32 37 Dependents_Yes ≤ 0.5 gini = 0.292 samples = 51 value = [65, 14] class = No Churn 31->37 33 TotalCharges ≤ -1.002 gini = 0.365 samples = 15 value = [6, 19] class = Churn 32->33 36 gini = 0.483 samples = 27 value = [18, 26] class = Churn 32->36 34 gini = 0.32 samples = 9 value = [3, 12] class = Churn 33->34 35 gini = 0.42 samples = 6 value = [3, 7] class = Churn 33->35 38 tenure ≤ -0.918 gini = 0.327 samples = 44 value = [54, 14] class = No Churn 37->38 43 gini = 0.0 samples = 7 value = [11, 0] class = No Churn 37->43 39 TotalCharges ≤ -0.96 gini = 0.393 samples = 34 value = [38, 14] class = No Churn 38->39 42 gini = 0.0 samples = 10 value = [16, 0] class = No Churn 38->42 40 gini = 0.349 samples = 26 value = [31, 9] class = No Churn 39->40 41 gini = 0.486 samples = 8 value = [7, 5] class = No Churn 39->41 46 gini = 0.0 samples = 32 value = [55, 0] class = No Churn 45->46 47 gini = 0.227 samples = 11 value = [20, 3] class = No Churn 45->47 49 Dependents_Yes ≤ 0.966 gini = 0.464 samples = 75 value = [71, 41] class = No Churn 48->49 62 gini = 0.111 samples = 11 value = [16, 1] class = No Churn 48->62 50 tenure ≤ -1.207 gini = 0.478 samples = 66 value = [61, 40] class = No Churn 49->50 61 gini = 0.165 samples = 9 value = [10, 1] class = No Churn 49->61 51 gini = 0.457 samples = 31 value = [17, 31] class = Churn 50->51 52 tenure ≤ -0.225 gini = 0.282 samples = 35 value = [44, 9] class = No Churn 50->52 53 gender_Male ≤ 0.5 gini = 0.162 samples = 30 value = [41, 4] class = No Churn 52->53 60 gini = 0.469 samples = 5 value = [3, 5] class = Churn 52->60 54 gini = 0.0 samples = 11 value = [18, 0] class = No Churn 53->54 55 tenure ≤ -0.878 gini = 0.252 samples = 19 value = [23, 4] class = No Churn 53->55 56 tenure ≤ -1.082 gini = 0.426 samples = 10 value = [9, 4] class = No Churn 55->56 59 gini = 0.0 samples = 9 value = [14, 0] class = No Churn 55->59 57 gini = 0.245 samples = 5 value = [6, 1] class = No Churn 56->57 58 gini = 0.5 samples = 5 value = [3, 3] class = No Churn 56->58 67 PaperlessBilling_Yes ≤ 0.5 gini = 0.097 samples = 27 value = [37, 2] class = No Churn 66->67 72 PaperlessBilling_Yes ≤ 0.312 gini = 0.394 samples = 19 value = [27, 10] class = No Churn 66->72 68 Contract_One year ≤ 0.5 gini = 0.142 samples = 18 value = [24, 2] class = No Churn 67->68 71 gini = 0.0 samples = 9 value = [13, 0] class = No Churn 67->71 69 gini = 0.298 samples = 7 value = [9, 2] class = No Churn 68->69 70 gini = 0.0 samples = 11 value = [15, 0] class = No Churn 68->70 73 tenure ≤ -0.255 gini = 0.219 samples = 12 value = [21, 3] class = No Churn 72->73 76 gini = 0.497 samples = 7 value = [6, 7] class = Churn 72->76 74 gini = 0.124 samples = 6 value = [14, 1] class = No Churn 73->74 75 gini = 0.346 samples = 6 value = [7, 2] class = No Churn 73->75 79 PaymentMethod_Electronic check ≤ 0.0 gini = 0.439 samples = 2411 value = [1232, 2562] class = Churn 78->79 524 InternetService_Fiber optic ≤ 0.3 gini = 0.472 samples = 1346 value = [1317, 811] class = No Churn 78->524 80 tenure ≤ -0.634 gini = 0.5 samples = 1007 value = [791, 799] class = Churn 79->80 293 Contract_One year ≤ 0.997 gini = 0.32 samples = 1404 value = [441, 1763] class = Churn 79->293 81 InternetService_Fiber optic ≤ 0.087 gini = 0.403 samples = 465 value = [208, 534] class = Churn 80->81 182 InternetService_Fiber optic ≤ 0.027 gini = 0.43 samples = 542 value = [583, 265] class = No Churn 80->182 82 OnlineSecurity_Yes ≤ 0.991 gini = 0.483 samples = 226 value = [140, 203] class = Churn 81->82 135 tenure ≤ -1.248 gini = 0.283 samples = 239 value = [68, 331] class = Churn 81->135 83 TechSupport_Yes ≤ 0.996 gini = 0.455 samples = 176 value = [94, 175] class = Churn 82->83 122 tenure ≤ -1.209 gini = 0.47 samples = 50 value = [46, 28] class = No Churn 82->122 84 MonthlyCharges ≤ -0.668 gini = 0.386 samples = 147 value = [59, 167] class = Churn 83->84 115 PaymentMethod_Mailed check ≤ 0.5 gini = 0.303 samples = 29 value = [35, 8] class = No Churn 83->115 85 TotalCharges ≤ -0.962 gini = 0.471 samples = 40 value = [25, 41] class = Churn 84->85 94 TechSupport_Yes ≤ 0.009 gini = 0.335 samples = 107 value = [34, 126] class = Churn 84->94 86 PhoneService_Yes ≤ 0.854 gini = 0.305 samples = 23 value = [6, 26] class = Churn 85->86 89 Partner_Yes ≤ 0.5 gini = 0.493 samples = 17 value = [19, 15] class = No Churn 85->89 87 gini = 0.0 samples = 10 value = [0, 13] class = Churn 86->87 88 gini = 0.432 samples = 13 value = [6, 13] class = Churn 86->88 90 MonthlyCharges ≤ -0.716 gini = 0.486 samples = 12 value = [14, 10] class = No Churn 89->90 93 gini = 0.5 samples = 5 value = [5, 5] class = No Churn 89->93 91 gini = 0.408 samples = 5 value = [4, 10] class = Churn 90->91 92 gini = 0.0 samples = 7 value = [10, 0] class = No Churn 90->92 95 TotalCharges ≤ -0.802 gini = 0.359 samples = 97 value = [34, 111] class = Churn 94->95 114 gini = 0.0 samples = 10 value = [0, 15] class = Churn 94->114 96 PaperlessBilling_Yes ≤ 0.166 gini = 0.281 samples = 84 value = [21, 103] class = Churn 95->96 111 StreamingMovies_Yes ≤ 0.28 gini = 0.472 samples = 13 value = [13, 8] class = No Churn 95->111 97 TotalCharges ≤ -0.989 gini = 0.375 samples = 40 value = [15, 45] class = Churn 96->97 104 TotalCharges ≤ -0.986 gini = 0.17 samples = 44 value = [6, 58] class = Churn 96->104 98 gini = 0.5 samples = 5 value = [4, 4] class = No Churn 97->98 99 tenure ≤ -1.166 gini = 0.334 samples = 35 value = [11, 41] class = Churn 97->99 100 gini = 0.208 samples = 25 value = [4, 30] class = Churn 99->100 101 TotalCharges ≤ -0.892 gini = 0.475 samples = 10 value = [7, 11] class = Churn 99->101 102 gini = 0.463 samples = 5 value = [4, 7] class = Churn 101->102 103 gini = 0.49 samples = 5 value = [3, 4] class = Churn 101->103 105 TotalCharges ≤ -0.99 gini = 0.095 samples = 28 value = [2, 38] class = Churn 104->105 108 StreamingTV_Yes ≤ 0.828 gini = 0.278 samples = 16 value = [4, 20] class = Churn 104->108 106 gini = 0.208 samples = 12 value = [2, 15] class = Churn 105->106 107 gini = 0.0 samples = 16 value = [0, 23] class = Churn 105->107 109 gini = 0.105 samples = 11 value = [1, 17] class = Churn 108->109 110 gini = 0.5 samples = 5 value = [3, 3] class = No Churn 108->110 112 gini = 0.355 samples = 8 value = [10, 3] class = No Churn 111->112 113 gini = 0.469 samples = 5 value = [3, 5] class = Churn 111->113 116 StreamingMovies_Yes ≤ 0.5 gini = 0.346 samples = 13 value = [14, 4] class = No Churn 115->116 119 gender_Male ≤ 0.5 gini = 0.269 samples = 16 value = [21, 4] class = No Churn 115->119 117 gini = 0.355 samples = 8 value = [10, 3] class = No Churn 116->117 118 gini = 0.32 samples = 5 value = [4, 1] class = No Churn 116->118 120 gini = 0.18 samples = 9 value = [9, 1] class = No Churn 119->120 121 gini = 0.32 samples = 7 value = [12, 3] class = No Churn 119->121 123 gini = 0.415 samples = 11 value = [5, 12] class = Churn 122->123 124 MonthlyCharges ≤ -0.403 gini = 0.404 samples = 39 value = [41, 16] class = No Churn 122->124 125 PaymentMethod_Mailed check ≤ 0.5 gini = 0.124 samples = 19 value = [28, 2] class = No Churn 124->125 130 PaperlessBilling_Yes ≤ 0.5 gini = 0.499 samples = 20 value = [13, 14] class = Churn 124->130 126 Dependents_Yes ≤ 0.5 gini = 0.219 samples = 11 value = [14, 2] class = No Churn 125->126 129 gini = 0.0 samples = 8 value = [14, 0] class = No Churn 125->129 127 gini = 0.245 samples = 6 value = [6, 1] class = No Churn 126->127 128 gini = 0.198 samples = 5 value = [8, 1] class = No Churn 126->128 131 gini = 0.32 samples = 8 value = [8, 2] class = No Churn 130->131 132 PaymentMethod_Mailed check ≤ 0.19 gini = 0.415 samples = 12 value = [5, 12] class = Churn 130->132 133 gini = 0.408 samples = 5 value = [5, 2] class = No Churn 132->133 134 gini = 0.0 samples = 7 value = [0, 10] class = Churn 132->134 136 PaperlessBilling_Yes ≤ 0.135 gini = 0.097 samples = 41 value = [4, 74] class = Churn 135->136 141 MonthlyCharges ≤ 0.44 gini = 0.319 samples = 198 value = [64, 257] class = Churn 135->141 137 TotalCharges ≤ -0.979 gini = 0.375 samples = 10 value = [4, 12] class = Churn 136->137 140 gini = 0.0 samples = 31 value = [0, 62] class = Churn 136->140 138 gini = 0.42 samples = 5 value = [3, 7] class = Churn 137->138 139 gini = 0.278 samples = 5 value = [1, 5] class = Churn 137->139 142 MultipleLines_Yes ≤ 0.099 gini = 0.246 samples = 112 value = [26, 155] class = Churn 141->142 163 OnlineBackup_Yes ≤ 0.022 gini = 0.396 samples = 86 value = [38, 102] class = Churn 141->163 143 SeniorCitizen ≤ 0.162 gini = 0.356 samples = 61 value = [22, 73] class = Churn 142->143 158 TotalCharges ≤ -0.845 gini = 0.089 samples = 51 value = [4, 82] class = Churn 142->158 144 PaymentMethod_Mailed check ≤ 0.852 gini = 0.421 samples = 48 value = [22, 51] class = Churn 143->144 157 gini = 0.0 samples = 13 value = [0, 22] class = Churn 143->157 145 tenure ≤ -1.24 gini = 0.278 samples = 36 value = [9, 45] class = Churn 144->145 154 TotalCharges ≤ -0.804 gini = 0.432 samples = 12 value = [13, 6] class = No Churn 144->154 146 gini = 0.469 samples = 5 value = [3, 5] class = Churn 145->146 147 gender_Male ≤ 0.939 gini = 0.227 samples = 31 value = [6, 40] class = Churn 145->147 148 TotalCharges ≤ -0.796 gini = 0.077 samples = 19 value = [1, 24] class = Churn 147->148 151 Partner_Yes ≤ 0.102 gini = 0.363 samples = 12 value = [5, 16] class = Churn 147->151 149 gini = 0.198 samples = 7 value = [1, 8] class = Churn 148->149 150 gini = 0.0 samples = 12 value = [0, 16] class = Churn 148->150 152 gini = 0.469 samples = 6 value = [5, 3] class = No Churn 151->152 153 gini = 0.0 samples = 6 value = [0, 13] class = Churn 151->153 155 gini = 0.48 samples = 7 value = [6, 4] class = No Churn 154->155 156 gini = 0.346 samples = 5 value = [7, 2] class = No Churn 154->156 159 gini = 0.0 samples = 19 value = [0, 31] class = Churn 158->159 160 SeniorCitizen ≤ 0.456 gini = 0.135 samples = 32 value = [4, 51] class = Churn 158->160 161 gini = 0.153 samples = 23 value = [3, 33] class = Churn 160->161 162 gini = 0.1 samples = 9 value = [1, 18] class = Churn 160->162 164 StreamingMovies_Yes ≤ 0.094 gini = 0.427 samples = 77 value = [38, 85] class = Churn 163->164 181 gini = 0.0 samples = 9 value = [0, 17] class = Churn 163->181 165 SeniorCitizen ≤ 0.211 gini = 0.497 samples = 30 value = [24, 28] class = Churn 164->165 172 gender_Male ≤ 0.991 gini = 0.317 samples = 47 value = [14, 57] class = Churn 164->172 166 TotalCharges ≤ -0.623 gini = 0.5 samples = 22 value = [21, 20] class = No Churn 165->166 171 gini = 0.397 samples = 8 value = [3, 8] class = Churn 165->171 167 MonthlyCharges ≤ 0.514 gini = 0.426 samples = 13 value = [8, 18] class = Churn 166->167 170 gini = 0.231 samples = 9 value = [13, 2] class = No Churn 166->170 168 gini = 0.49 samples = 8 value = [8, 6] class = No Churn 167->168 169 gini = 0.0 samples = 5 value = [0, 12] class = Churn 167->169 173 TotalCharges ≤ -0.614 gini = 0.142 samples = 27 value = [3, 36] class = Churn 172->173 176 PaymentMethod_Mailed check ≤ 0.291 gini = 0.451 samples = 20 value = [11, 21] class = Churn 172->176 174 gini = 0.0 samples = 19 value = [0, 26] class = Churn 173->174 175 gini = 0.355 samples = 8 value = [3, 10] class = Churn 173->175 177 gini = 0.48 samples = 10 value = [6, 9] class = Churn 176->177 178 tenure ≤ -0.949 gini = 0.415 samples = 10 value = [5, 12] class = Churn 176->178 179 gini = 0.469 samples = 5 value = [5, 3] class = No Churn 178->179 180 gini = 0.0 samples = 5 value = [0, 9] class = Churn 178->180 183 Contract_Two year ≤ 0.756 gini = 0.284 samples = 248 value = [328, 68] class = No Churn 182->183 216 gender_Male ≤ 0.001 gini = 0.492 samples = 294 value = [255, 197] class = No Churn 182->216 184 OnlineBackup_Yes ≤ 0.05 gini = 0.363 samples = 174 value = [217, 68] class = No Churn 183->184 215 gini = 0.0 samples = 74 value = [111, 0] class = No Churn 183->215 185 StreamingMovies_Yes ≤ 0.07 gini = 0.276 samples = 158 value = [217, 43] class = No Churn 184->185 214 gini = 0.0 samples = 16 value = [0, 25] class = Churn 184->214 186 PaymentMethod_Mailed check ≤ 0.555 gini = 0.119 samples = 96 value = [147, 10] class = No Churn 185->186 201 gender_Male ≤ 0.936 gini = 0.435 samples = 62 value = [70, 33] class = No Churn 185->201 187 Contract_One year ≤ 0.5 gini = 0.134 samples = 62 value = [90, 7] class = No Churn 186->187 196 tenure ≤ -0.382 gini = 0.095 samples = 34 value = [57, 3] class = No Churn 186->196 188 MonthlyCharges ≤ -0.261 gini = 0.245 samples = 31 value = [42, 7] class = No Churn 187->188 195 gini = 0.0 samples = 31 value = [48, 0] class = No Churn 187->195 189 Dependents_Yes ≤ 0.5 gini = 0.056 samples = 26 value = [34, 1] class = No Churn 188->189 194 gini = 0.49 samples = 5 value = [8, 6] class = No Churn 188->194 190 tenure ≤ -0.307 gini = 0.074 samples = 19 value = [25, 1] class = No Churn 189->190 193 gini = 0.0 samples = 7 value = [9, 0] class = No Churn 189->193 191 gini = 0.32 samples = 5 value = [4, 1] class = No Churn 190->191 192 gini = 0.0 samples = 14 value = [21, 0] class = No Churn 190->192 197 MonthlyCharges ≤ -0.448 gini = 0.305 samples = 12 value = [13, 3] class = No Churn 196->197 200 gini = 0.0 samples = 22 value = [44, 0] class = No Churn 196->200 198 gini = 0.0 samples = 6 value = [8, 0] class = No Churn 197->198 199 gini = 0.469 samples = 6 value = [5, 3] class = No Churn 197->199 202 TechSupport_Yes ≤ 0.936 gini = 0.496 samples = 33 value = [31, 26] class = No Churn 201->202 211 tenure ≤ 0.325 gini = 0.258 samples = 29 value = [39, 7] class = No Churn 201->211 203 tenure ≤ -0.307 gini = 0.42 samples = 17 value = [9, 21] class = Churn 202->203 208 DeviceProtection_Yes ≤ 0.5 gini = 0.302 samples = 16 value = [22, 5] class = No Churn 202->208 204 gini = 0.0 samples = 5 value = [0, 7] class = Churn 203->204 205 OnlineSecurity_Yes ≤ 0.087 gini = 0.476 samples = 12 value = [9, 14] class = Churn 203->205 206 gini = 0.459 samples = 7 value = [9, 5] class = No Churn 205->206 207 gini = 0.0 samples = 5 value = [0, 9] class = Churn 205->207 209 gini = 0.0 samples = 6 value = [11, 0] class = No Churn 208->209 210 gini = 0.43 samples = 10 value = [11, 5] class = No Churn 208->210 212 gini = 0.375 samples = 18 value = [21, 7] class = No Churn 211->212 213 gini = 0.0 samples = 11 value = [18, 0] class = No Churn 211->213 217 Contract_Two year ≤ 0.565 gini = 0.423 samples = 121 value = [131, 57] class = No Churn 216->217 248 StreamingMovies_Yes ≤ 0.098 gini = 0.498 samples = 173 value = [124, 140] class = Churn 216->248 218 Partner_Yes ≤ 0.004 gini = 0.439 samples = 108 value = [116, 56] class = No Churn 217->218 245 MonthlyCharges ≤ 0.913 gini = 0.117 samples = 13 value = [15, 1] class = No Churn 217->245 219 TechSupport_Yes ≤ 0.973 gini = 0.324 samples = 43 value = [51, 13] class = No Churn 218->219 224 MonthlyCharges ≤ 1.232 gini = 0.479 samples = 65 value = [65, 43] class = No Churn 218->224 220 StreamingMovies_Yes ≤ 0.5 gini = 0.405 samples = 32 value = [33, 13] class = No Churn 219->220 223 gini = 0.0 samples = 11 value = [18, 0] class = No Churn 219->223 221 gini = 0.225 samples = 21 value = [27, 4] class = No Churn 220->221 222 gini = 0.48 samples = 11 value = [6, 9] class = Churn 220->222 225 OnlineBackup_Yes ≤ 0.042 gini = 0.425 samples = 45 value = [52, 23] class = No Churn 224->225 238 OnlineSecurity_Yes ≤ 0.996 gini = 0.478 samples = 20 value = [13, 20] class = Churn 224->238 226 TechSupport_Yes ≤ 0.5 gini = 0.382 samples = 40 value = [52, 18] class = No Churn 225->226 237 gini = 0.0 samples = 5 value = [0, 5] class = Churn 225->237 227 OnlineSecurity_Yes ≤ 0.5 gini = 0.426 samples = 30 value = [36, 16] class = No Churn 226->227 234 DeviceProtection_Yes ≤ 0.5 gini = 0.198 samples = 10 value = [16, 2] class = No Churn 226->234 228 TotalCharges ≤ 0.13 gini = 0.457 samples = 20 value = [22, 12] class = No Churn 227->228 233 gini = 0.346 samples = 10 value = [14, 4] class = No Churn 227->233 229 gini = 0.142 samples = 7 value = [12, 1] class = No Churn 228->229 230 tenure ≤ 0.631 gini = 0.499 samples = 13 value = [10, 11] class = Churn 228->230 231 gini = 0.198 samples = 6 value = [1, 8] class = Churn 230->231 232 gini = 0.375 samples = 7 value = [9, 3] class = No Churn 230->232 235 gini = 0.375 samples = 5 value = [6, 2] class = No Churn 234->235 236 gini = 0.0 samples = 5 value = [10, 0] class = No Churn 234->236 239 tenure ≤ 1.141 gini = 0.305 samples = 10 value = [3, 13] class = Churn 238->239 242 TotalCharges ≤ 1.158 gini = 0.484 samples = 10 value = [10, 7] class = No Churn 238->242 240 gini = 0.42 samples = 5 value = [3, 7] class = Churn 239->240 241 gini = 0.0 samples = 5 value = [0, 6] class = Churn 239->241 243 gini = 0.49 samples = 5 value = [4, 3] class = No Churn 242->243 244 gini = 0.48 samples = 5 value = [6, 4] class = No Churn 242->244 246 gini = 0.0 samples = 7 value = [8, 0] class = No Churn 245->246 247 gini = 0.219 samples = 6 value = [7, 1] class = No Churn 245->247 249 tenure ≤ 0.86 gini = 0.421 samples = 63 value = [65, 28] class = No Churn 248->249 264 MultipleLines_Yes ≤ 0.039 gini = 0.452 samples = 110 value = [59, 112] class = Churn 248->264 250 OnlineSecurity_Yes ≤ 0.974 gini = 0.45 samples = 55 value = [54, 28] class = No Churn 249->250 263 gini = 0.0 samples = 8 value = [11, 0] class = No Churn 249->263 251 MonthlyCharges ≤ 0.183 gini = 0.468 samples = 41 value = [37, 22] class = No Churn 250->251 260 DeviceProtection_Yes ≤ 0.5 gini = 0.386 samples = 14 value = [17, 6] class = No Churn 250->260 252 gini = 0.198 samples = 6 value = [8, 1] class = No Churn 251->252 253 MonthlyCharges ≤ 0.71 gini = 0.487 samples = 35 value = [29, 21] class = No Churn 251->253 254 PaymentMethod_Credit card (automatic) ≤ 0.894 gini = 0.5 samples = 27 value = [19, 19] class = No Churn 253->254 259 gini = 0.278 samples = 8 value = [10, 2] class = No Churn 253->259 255 SeniorCitizen ≤ 0.187 gini = 0.499 samples = 19 value = [12, 13] class = Churn 254->255 258 gini = 0.497 samples = 8 value = [7, 6] class = No Churn 254->258 256 gini = 0.495 samples = 14 value = [11, 9] class = No Churn 255->256 257 gini = 0.32 samples = 5 value = [1, 4] class = Churn 255->257 261 gini = 0.245 samples = 8 value = [12, 2] class = No Churn 260->261 262 gini = 0.494 samples = 6 value = [5, 4] class = No Churn 260->262 265 tenure ≤ 0.448 gini = 0.5 samples = 25 value = [22, 21] class = No Churn 264->265 272 Contract_One year ≤ 0.802 gini = 0.411 samples = 85 value = [37, 91] class = Churn 264->272 266 tenure ≤ -0.409 gini = 0.444 samples = 18 value = [10, 20] class = Churn 265->266 271 gini = 0.142 samples = 7 value = [12, 1] class = No Churn 265->271 267 SeniorCitizen ≤ 0.5 gini = 0.346 samples = 11 value = [4, 14] class = Churn 266->267 270 gini = 0.5 samples = 7 value = [6, 6] class = No Churn 266->270 268 gini = 0.463 samples = 6 value = [4, 7] class = Churn 267->268 269 gini = 0.0 samples = 5 value = [0, 7] class = Churn 267->269 273 StreamingTV_Yes ≤ 0.075 gini = 0.34 samples = 76 value = [25, 90] class = Churn 272->273 292 gini = 0.142 samples = 9 value = [12, 1] class = No Churn 272->292 274 DeviceProtection_Yes ≤ 0.5 gini = 0.401 samples = 11 value = [13, 5] class = No Churn 273->274 277 DeviceProtection_Yes ≤ 0.773 gini = 0.217 samples = 65 value = [12, 85] class = Churn 273->277 275 gini = 0.32 samples = 6 value = [8, 2] class = No Churn 274->275 276 gini = 0.469 samples = 5 value = [5, 3] class = No Churn 274->276 278 Dependents_Yes ≤ 0.595 gini = 0.194 samples = 32 value = [5, 41] class = Churn 277->278 285 TotalCharges ≤ 1.614 gini = 0.237 samples = 33 value = [7, 44] class = Churn 277->285 279 OnlineSecurity_Yes ≤ 0.376 gini = 0.176 samples = 27 value = [4, 37] class = Churn 278->279 284 gini = 0.32 samples = 5 value = [1, 4] class = Churn 278->284 280 MonthlyCharges ≤ 1.105 gini = 0.157 samples = 22 value = [3, 32] class = Churn 279->280 283 gini = 0.278 samples = 5 value = [1, 5] class = Churn 279->283 281 gini = 0.074 samples = 16 value = [1, 25] class = Churn 280->281 282 gini = 0.346 samples = 6 value = [2, 7] class = Churn 280->282 286 Dependents_Yes ≤ 0.075 gini = 0.117 samples = 20 value = [2, 30] class = Churn 285->286 291 gini = 0.388 samples = 13 value = [5, 14] class = Churn 285->291 287 MonthlyCharges ≤ 1.286 gini = 0.208 samples = 10 value = [2, 15] class = Churn 286->287 290 gini = 0.0 samples = 10 value = [0, 15] class = Churn 286->290 288 gini = 0.0 samples = 5 value = [0, 9] class = Churn 287->288 289 gini = 0.375 samples = 5 value = [2, 6] class = Churn 287->289 294 PaperlessBilling_Yes ≤ 0.0 gini = 0.279 samples = 1316 value = [345, 1717] class = Churn 293->294 501 InternetService_Fiber optic ≤ 0.057 gini = 0.438 samples = 88 value = [96, 46] class = No Churn 293->501 295 DeviceProtection_Yes ≤ 0.989 gini = 0.418 samples = 218 value = [104, 245] class = Churn 294->295 340 MonthlyCharges ≤ 0.312 gini = 0.242 samples = 1098 value = [241, 1472] class = Churn 294->340 296 tenure ≤ -0.889 gini = 0.364 samples = 172 value = [66, 210] class = Churn 295->296 327 Contract_Two year ≤ 0.5 gini = 0.499 samples = 46 value = [38, 35] class = No Churn 295->327 297 MultipleLines_Yes ≤ 0.744 gini = 0.28 samples = 110 value = [32, 158] class = Churn 296->297 314 MonthlyCharges ≤ -0.46 gini = 0.478 samples = 62 value = [34, 52] class = Churn 296->314 298 TotalCharges ≤ -0.991 gini = 0.245 samples = 94 value = [24, 144] class = Churn 297->298 309 TotalCharges ≤ -0.844 gini = 0.463 samples = 16 value = [8, 14] class = Churn 297->309 299 gini = 0.346 samples = 6 value = [7, 2] class = No Churn 298->299 300 TotalCharges ≤ -0.98 gini = 0.191 samples = 88 value = [17, 142] class = Churn 298->300 301 gini = 0.444 samples = 9 value = [6, 12] class = Churn 300->301 302 Dependents_Yes ≤ 0.959 gini = 0.144 samples = 79 value = [11, 130] class = Churn 300->302 303 SeniorCitizen ≤ 0.105 gini = 0.121 samples = 67 value = [8, 116] class = Churn 302->303 306 TotalCharges ≤ -0.917 gini = 0.291 samples = 12 value = [3, 14] class = Churn 302->306 304 gini = 0.136 samples = 58 value = [8, 101] class = Churn 303->304 305 gini = 0.0 samples = 9 value = [0, 15] class = Churn 303->305 307 gini = 0.219 samples = 5 value = [1, 7] class = Churn 306->307 308 gini = 0.346 samples = 7 value = [2, 7] class = Churn 306->308 310 gini = 0.375 samples = 6 value = [6, 2] class = No Churn 309->310 311 TotalCharges ≤ -0.745 gini = 0.245 samples = 10 value = [2, 12] class = Churn 309->311 312 gini = 0.0 samples = 5 value = [0, 8] class = Churn 311->312 313 gini = 0.444 samples = 5 value = [2, 4] class = Churn 311->313 315 gini = 0.245 samples = 5 value = [6, 1] class = No Churn 314->315 316 TotalCharges ≤ -0.681 gini = 0.458 samples = 57 value = [28, 51] class = Churn 314->316 317 gini = 0.0 samples = 9 value = [0, 11] class = Churn 316->317 318 SeniorCitizen ≤ 0.054 gini = 0.484 samples = 48 value = [28, 40] class = Churn 316->318 319 MonthlyCharges ≤ 0.523 gini = 0.482 samples = 30 value = [25, 17] class = No Churn 318->319 324 OnlineSecurity_Yes ≤ 0.142 gini = 0.204 samples = 18 value = [3, 23] class = Churn 318->324 320 MultipleLines_Yes ≤ 0.45 gini = 0.397 samples = 16 value = [16, 6] class = No Churn 319->320 323 gini = 0.495 samples = 14 value = [9, 11] class = Churn 319->323 321 gini = 0.245 samples = 7 value = [6, 1] class = No Churn 320->321 322 gini = 0.444 samples = 9 value = [10, 5] class = No Churn 320->322 325 gini = 0.278 samples = 9 value = [2, 10] class = Churn 324->325 326 gini = 0.133 samples = 9 value = [1, 13] class = Churn 324->326 328 PaymentMethod_Electronic check ≤ 0.843 gini = 0.494 samples = 39 value = [28, 35] class = Churn 327->328 339 gini = 0.0 samples = 7 value = [10, 0] class = No Churn 327->339 329 gini = 0.0 samples = 5 value = [0, 8] class = Churn 328->329 330 MonthlyCharges ≤ 0.165 gini = 0.5 samples = 34 value = [28, 27] class = No Churn 328->330 331 gini = 0.423 samples = 12 value = [16, 7] class = No Churn 330->331 332 Dependents_Yes ≤ 0.069 gini = 0.469 samples = 22 value = [12, 20] class = Churn 330->332 333 Partner_Yes ≤ 0.5 gini = 0.497 samples = 16 value = [11, 13] class = Churn 332->333 338 gini = 0.219 samples = 6 value = [1, 7] class = Churn 332->338 334 StreamingMovies_Yes ≤ 0.08 gini = 0.43 samples = 11 value = [5, 11] class = Churn 333->334 337 gini = 0.375 samples = 5 value = [6, 2] class = No Churn 333->337 335 gini = 0.408 samples = 5 value = [5, 2] class = No Churn 334->335 336 gini = 0.0 samples = 6 value = [0, 9] class = Churn 334->336 341 TotalCharges ≤ -0.956 gini = 0.313 samples = 381 value = [112, 465] class = Churn 340->341 418 TechSupport_Yes ≤ 0.974 gini = 0.201 samples = 717 value = [129, 1007] class = Churn 340->418 342 Partner_Yes ≤ 0.924 gini = 0.16 samples = 145 value = [19, 197] class = Churn 341->342 363 TotalCharges ≤ -0.615 gini = 0.383 samples = 236 value = [93, 268] class = Churn 341->363 343 PhoneService_Yes ≤ 0.967 gini = 0.118 samples = 129 value = [12, 179] class = Churn 342->343 358 MonthlyCharges ≤ -0.408 gini = 0.403 samples = 16 value = [7, 18] class = Churn 342->358 344 gini = 0.0 samples = 25 value = [0, 39] class = Churn 343->344 345 MonthlyCharges ≤ -0.67 gini = 0.145 samples = 104 value = [12, 140] class = Churn 343->345 346 gini = 0.32 samples = 14 value = [5, 20] class = Churn 345->346 347 tenure ≤ -1.245 gini = 0.104 samples = 90 value = [7, 120] class = Churn 345->347 348 MonthlyCharges ≤ 0.165 gini = 0.082 samples = 81 value = [5, 112] class = Churn 347->348 357 gini = 0.32 samples = 9 value = [2, 8] class = Churn 347->357 349 gender_Male ≤ 0.051 gini = 0.046 samples = 58 value = [2, 82] class = Churn 348->349 354 MonthlyCharges ≤ 0.177 gini = 0.165 samples = 23 value = [3, 30] class = Churn 348->354 350 TechSupport_Yes ≤ 0.5 gini = 0.133 samples = 20 value = [2, 26] class = Churn 349->350 353 gini = 0.0 samples = 38 value = [0, 56] class = Churn 349->353 351 gini = 0.0 samples = 15 value = [0, 18] class = Churn 350->351 352 gini = 0.32 samples = 5 value = [2, 8] class = Churn 350->352 355 gini = 0.355 samples = 9 value = [3, 10] class = Churn 354->355 356 gini = 0.0 samples = 14 value = [0, 20] class = Churn 354->356 359 gini = 0.5 samples = 6 value = [6, 6] class = No Churn 358->359 360 MonthlyCharges ≤ 0.161 gini = 0.142 samples = 10 value = [1, 12] class = Churn 358->360 361 gini = 0.0 samples = 5 value = [0, 7] class = Churn 360->361 362 gini = 0.278 samples = 5 value = [1, 5] class = Churn 360->362 364 MonthlyCharges ≤ -0.379 gini = 0.346 samples = 166 value = [57, 199] class = Churn 363->364 395 TotalCharges ≤ -0.197 gini = 0.451 samples = 70 value = [36, 69] class = Churn 363->395 365 OnlineSecurity_Yes ≤ 0.985 gini = 0.203 samples = 81 value = [14, 108] class = Churn 364->365 380 PaymentMethod_Electronic check ≤ 0.929 gini = 0.436 samples = 85 value = [43, 91] class = Churn 364->380 366 OnlineSecurity_Yes ≤ 0.071 gini = 0.133 samples = 74 value = [8, 104] class = Churn 365->366 379 gini = 0.48 samples = 7 value = [6, 4] class = No Churn 365->379 367 StreamingTV_Yes ≤ 0.974 gini = 0.157 samples = 64 value = [8, 85] class = Churn 366->367 378 gini = 0.0 samples = 10 value = [0, 19] class = Churn 366->378 368 OnlineBackup_Yes ≤ 0.008 gini = 0.144 samples = 48 value = [5, 59] class = Churn 367->368 375 DeviceProtection_Yes ≤ 0.048 gini = 0.185 samples = 16 value = [3, 26] class = Churn 367->375 369 TotalCharges ≤ -0.829 gini = 0.16 samples = 42 value = [5, 52] class = Churn 368->369 374 gini = 0.0 samples = 6 value = [0, 7] class = Churn 368->374 370 TotalCharges ≤ -0.917 gini = 0.081 samples = 33 value = [2, 45] class = Churn 369->370 373 gini = 0.42 samples = 9 value = [3, 7] class = Churn 369->373 371 gini = 0.18 samples = 13 value = [2, 18] class = Churn 370->371 372 gini = 0.0 samples = 20 value = [0, 27] class = Churn 370->372 376 gini = 0.305 samples = 8 value = [3, 13] class = Churn 375->376 377 gini = 0.0 samples = 8 value = [0, 13] class = Churn 375->377 381 gini = 0.0 samples = 5 value = [0, 8] class = Churn 380->381 382 MonthlyCharges ≤ 0.295 gini = 0.45 samples = 80 value = [43, 83] class = Churn 380->382 383 TotalCharges ≤ -0.741 gini = 0.43 samples = 75 value = [36, 79] class = Churn 382->383 394 gini = 0.463 samples = 5 value = [7, 4] class = No Churn 382->394 384 StreamingTV_Yes ≤ 0.906 gini = 0.481 samples = 56 value = [35, 52] class = Churn 383->384 391 Partner_Yes ≤ 0.526 gini = 0.069 samples = 19 value = [1, 27] class = Churn 383->391 385 gender_Male ≤ 0.001 gini = 0.435 samples = 47 value = [24, 51] class = Churn 384->385 390 gini = 0.153 samples = 9 value = [11, 1] class = No Churn 384->390 386 gini = 0.491 samples = 26 value = [19, 25] class = Churn 385->386 387 Partner_Yes ≤ 0.5 gini = 0.271 samples = 21 value = [5, 26] class = Churn 385->387 388 gini = 0.34 samples = 16 value = [5, 18] class = Churn 387->388 389 gini = 0.0 samples = 5 value = [0, 8] class = Churn 387->389 392 gini = 0.0 samples = 10 value = [0, 14] class = Churn 391->392 393 gini = 0.133 samples = 9 value = [1, 13] class = Churn 391->393 396 StreamingMovies_Yes ≤ 0.804 gini = 0.498 samples = 41 value = [28, 32] class = Churn 395->396 409 OnlineBackup_Yes ≤ 0.026 gini = 0.292 samples = 29 value = [8, 37] class = Churn 395->409 397 OnlineSecurity_Yes ≤ 0.168 gini = 0.444 samples = 31 value = [15, 30] class = Churn 396->397 406 MonthlyCharges ≤ -0.45 gini = 0.231 samples = 10 value = [13, 2] class = No Churn 396->406 398 tenure ≤ -0.605 gini = 0.48 samples = 20 value = [12, 18] class = Churn 397->398 403 TotalCharges ≤ -0.457 gini = 0.32 samples = 11 value = [3, 12] class = Churn 397->403 399 gini = 0.0 samples = 6 value = [0, 8] class = Churn 398->399 400 SeniorCitizen ≤ 0.396 gini = 0.496 samples = 14 value = [12, 10] class = No Churn 398->400 401 gini = 0.408 samples = 8 value = [10, 4] class = No Churn 400->401 402 gini = 0.375 samples = 6 value = [2, 6] class = Churn 400->402 404 gini = 0.219 samples = 6 value = [1, 7] class = Churn 403->404 405 gini = 0.408 samples = 5 value = [2, 5] class = Churn 403->405 407 gini = 0.0 samples = 5 value = [10, 0] class = No Churn 406->407 408 gini = 0.48 samples = 5 value = [3, 2] class = No Churn 406->408 410 TotalCharges ≤ 0.225 gini = 0.332 samples = 23 value = [8, 30] class = Churn 409->410 417 gini = 0.0 samples = 6 value = [0, 7] class = Churn 409->417 411 SeniorCitizen ≤ 0.291 gini = 0.18 samples = 12 value = [2, 18] class = Churn 410->411 414 SeniorCitizen ≤ 0.5 gini = 0.444 samples = 11 value = [6, 12] class = Churn 410->414 412 gini = 0.142 samples = 7 value = [1, 12] class = Churn 411->412 413 gini = 0.245 samples = 5 value = [1, 6] class = Churn 411->413 415 gini = 0.0 samples = 5 value = [6, 0] class = No Churn 414->415 416 gini = 0.0 samples = 6 value = [0, 12] class = Churn 414->416 419 OnlineSecurity_Yes ≤ 0.981 gini = 0.166 samples = 642 value = [93, 926] class = Churn 418->419 480 MonthlyCharges ≤ 0.856 gini = 0.426 samples = 75 value = [36, 81] class = Churn 418->480 420 Dependents_Yes ≤ 0.974 gini = 0.145 samples = 589 value = [73, 854] class = Churn 419->420 467 tenure ≤ 0.958 gini = 0.34 samples = 53 value = [20, 72] class = Churn 419->467 421 tenure ≤ 0.992 gini = 0.12 samples = 561 value = [56, 819] class = Churn 420->421 460 TotalCharges ≤ 0.296 gini = 0.44 samples = 28 value = [17, 35] class = Churn 420->460 422 Dependents_Yes ≤ 0.019 gini = 0.109 samples = 556 value = [50, 815] class = Churn 421->422 459 gini = 0.48 samples = 5 value = [6, 4] class = No Churn 421->459 423 TotalCharges ≤ -0.719 gini = 0.112 samples = 539 value = [50, 786] class = Churn 422->423 458 gini = 0.0 samples = 17 value = [0, 29] class = Churn 422->458 424 StreamingMovies_Yes ≤ 0.012 gini = 0.05 samples = 271 value = [11, 419] class = Churn 423->424 443 TotalCharges ≤ -0.701 gini = 0.174 samples = 268 value = [39, 367] class = Churn 423->443 425 DeviceProtection_Yes ≤ 0.967 gini = 0.068 samples = 160 value = [9, 246] class = Churn 424->425 436 SeniorCitizen ≤ 0.067 gini = 0.023 samples = 111 value = [2, 173] class = Churn 424->436 426 gender_Male ≤ 0.099 gini = 0.05 samples = 147 value = [6, 230] class = Churn 425->426 433 tenure ≤ -1.131 gini = 0.266 samples = 13 value = [3, 16] class = Churn 425->433 427 tenure ≤ -1.001 gini = 0.073 samples = 69 value = [4, 101] class = Churn 426->427 430 TotalCharges ≤ -0.977 gini = 0.03 samples = 78 value = [2, 129] class = Churn 426->430 428 gini = 0.061 samples = 64 value = [3, 92] class = Churn 427->428 429 gini = 0.18 samples = 5 value = [1, 9] class = Churn 427->429 431 gini = 0.245 samples = 5 value = [1, 6] class = Churn 430->431 432 gini = 0.016 samples = 73 value = [1, 123] class = Churn 430->432 434 gini = 0.0 samples = 8 value = [0, 12] class = Churn 433->434 435 gini = 0.49 samples = 5 value = [3, 4] class = Churn 433->435 437 PaperlessBilling_Yes ≤ 0.813 gini = 0.037 samples = 65 value = [2, 103] class = Churn 436->437 442 gini = 0.0 samples = 46 value = [0, 70] class = Churn 436->442 438 gini = 0.0 samples = 6 value = [0, 11] class = Churn 437->438 439 MonthlyCharges ≤ 0.808 gini = 0.042 samples = 59 value = [2, 92] class = Churn 437->439 440 gini = 0.0 samples = 31 value = [0, 53] class = Churn 439->440 441 gini = 0.093 samples = 28 value = [2, 39] class = Churn 439->441 444 gini = 0.48 samples = 7 value = [6, 4] class = No Churn 443->444 445 MultipleLines_Yes ≤ 0.115 gini = 0.153 samples = 261 value = [33, 363] class = Churn 443->445 446 SeniorCitizen ≤ 0.847 gini = 0.308 samples = 44 value = [12, 51] class = Churn 445->446 451 TotalCharges ≤ 0.503 gini = 0.118 samples = 217 value = [21, 312] class = Churn 445->451 447 gini = 0.184 samples = 29 value = [4, 35] class = Churn 446->447 448 tenure ≤ -0.718 gini = 0.444 samples = 15 value = [8, 16] class = Churn 446->448 449 gini = 0.305 samples = 9 value = [3, 13] class = Churn 448->449 450 gini = 0.469 samples = 6 value = [5, 3] class = No Churn 448->450 452 MonthlyCharges ≤ 1.005 gini = 0.098 samples = 171 value = [14, 258] class = Churn 451->452 455 gender_Male ≤ 0.889 gini = 0.203 samples = 46 value = [7, 54] class = Churn 451->455 453 gini = 0.154 samples = 102 value = [14, 153] class = Churn 452->453 454 gini = 0.0 samples = 69 value = [0, 105] class = Churn 452->454 456 gini = 0.175 samples = 23 value = [3, 28] class = Churn 455->456 457 gini = 0.231 samples = 23 value = [4, 26] class = Churn 455->457 461 TotalCharges ≤ -0.138 gini = 0.343 samples = 22 value = [9, 32] class = Churn 460->461 466 gini = 0.397 samples = 6 value = [8, 3] class = No Churn 460->466 462 TotalCharges ≤ -0.339 gini = 0.453 samples = 14 value = [9, 17] class = Churn 461->462 465 gini = 0.0 samples = 8 value = [0, 15] class = Churn 461->465 463 gini = 0.231 samples = 9 value = [2, 13] class = Churn 462->463 464 gini = 0.463 samples = 5 value = [7, 4] class = No Churn 462->464 468 SeniorCitizen ≤ 0.5 gini = 0.232 samples = 46 value = [11, 71] class = Churn 467->468 479 gini = 0.18 samples = 7 value = [9, 1] class = No Churn 467->479 469 tenure ≤ -0.95 gini = 0.274 samples = 35 value = [10, 51] class = Churn 468->469 476 tenure ≤ -0.429 gini = 0.091 samples = 11 value = [1, 20] class = Churn 468->476 470 MultipleLines_Yes ≤ 0.5 gini = 0.408 samples = 12 value = [6, 15] class = Churn 469->470 473 MonthlyCharges ≤ 0.988 gini = 0.18 samples = 23 value = [4, 36] class = Churn 469->473 471 gini = 0.397 samples = 6 value = [3, 8] class = Churn 470->471 472 gini = 0.42 samples = 6 value = [3, 7] class = Churn 470->472 474 gini = 0.408 samples = 9 value = [4, 10] class = Churn 473->474 475 gini = 0.0 samples = 14 value = [0, 26] class = Churn 473->475 477 gini = 0.0 samples = 5 value = [0, 11] class = Churn 476->477 478 gini = 0.18 samples = 6 value = [1, 9] class = Churn 476->478 481 TotalCharges ≤ -0.094 gini = 0.486 samples = 34 value = [22, 31] class = Churn 480->481 490 gender_Male ≤ 0.857 gini = 0.342 samples = 41 value = [14, 50] class = Churn 480->490 482 MonthlyCharges ≤ 0.594 gini = 0.414 samples = 27 value = [12, 29] class = Churn 481->482 489 gini = 0.278 samples = 7 value = [10, 2] class = No Churn 481->489 483 gender_Male ≤ 0.059 gini = 0.133 samples = 10 value = [1, 13] class = Churn 482->483 486 MonthlyCharges ≤ 0.691 gini = 0.483 samples = 17 value = [11, 16] class = Churn 482->486 484 gini = 0.245 samples = 5 value = [1, 6] class = Churn 483->484 485 gini = 0.0 samples = 5 value = [0, 7] class = Churn 483->485 487 gini = 0.42 samples = 5 value = [7, 3] class = No Churn 486->487 488 gini = 0.36 samples = 12 value = [4, 13] class = Churn 486->488 491 MultipleLines_Yes ≤ 0.196 gini = 0.219 samples = 21 value = [5, 35] class = Churn 490->491 496 tenure ≤ 0.745 gini = 0.469 samples = 20 value = [9, 15] class = Churn 490->496 492 gini = 0.337 samples = 7 value = [3, 11] class = Churn 491->492 493 MonthlyCharges ≤ 1.331 gini = 0.142 samples = 14 value = [2, 24] class = Churn 491->493 494 gini = 0.0 samples = 9 value = [0, 18] class = Churn 493->494 495 gini = 0.375 samples = 5 value = [2, 6] class = Churn 493->495 497 Partner_Yes ≤ 0.407 gini = 0.36 samples = 14 value = [4, 13] class = Churn 496->497 500 gini = 0.408 samples = 6 value = [5, 2] class = No Churn 496->500 498 gini = 0.444 samples = 6 value = [2, 4] class = Churn 497->498 499 gini = 0.298 samples = 8 value = [2, 9] class = Churn 497->499 502 StreamingMovies_Yes ≤ 0.5 gini = 0.255 samples = 38 value = [51, 9] class = No Churn 501->502 513 OnlineSecurity_Yes ≤ 0.943 gini = 0.495 samples = 50 value = [45, 37] class = No Churn 501->513 503 MonthlyCharges ≤ -0.176 gini = 0.266 samples = 15 value = [16, 3] class = No Churn 502->503 508 tenure ≤ 0.591 gini = 0.25 samples = 23 value = [35, 6] class = No Churn 502->508 504 Partner_Yes ≤ 0.5 gini = 0.133 samples = 10 value = [13, 1] class = No Churn 503->504 507 gini = 0.48 samples = 5 value = [3, 2] class = No Churn 503->507 505 gini = 0.32 samples = 5 value = [4, 1] class = No Churn 504->505 506 gini = 0.0 samples = 5 value = [9, 0] class = No Churn 504->506 509 StreamingTV_Yes ≤ 0.5 gini = 0.175 samples = 17 value = [28, 3] class = No Churn 508->509 512 gini = 0.42 samples = 6 value = [7, 3] class = No Churn 508->512 510 gini = 0.0 samples = 6 value = [9, 0] class = No Churn 509->510 511 gini = 0.236 samples = 11 value = [19, 3] class = No Churn 509->511 514 MonthlyCharges ≤ 1.195 gini = 0.499 samples = 40 value = [32, 35] class = Churn 513->514 523 gini = 0.231 samples = 10 value = [13, 2] class = No Churn 513->523 515 Dependents_Yes ≤ 0.057 gini = 0.416 samples = 24 value = [31, 13] class = No Churn 514->515 520 tenure ≤ 0.904 gini = 0.083 samples = 16 value = [1, 22] class = Churn 514->520 516 gender_Male ≤ 0.5 gini = 0.245 samples = 15 value = [24, 4] class = No Churn 515->516 519 gini = 0.492 samples = 9 value = [7, 9] class = Churn 515->519 517 gini = 0.0 samples = 7 value = [11, 0] class = No Churn 516->517 518 gini = 0.36 samples = 8 value = [13, 4] class = No Churn 516->518 521 gini = 0.0 samples = 11 value = [0, 17] class = Churn 520->521 522 gini = 0.278 samples = 5 value = [1, 5] class = Churn 520->522 525 tenure ≤ -1.168 gini = 0.29 samples = 527 value = [697, 149] class = No Churn 524->525 624 TotalCharges ≤ 1.012 gini = 0.499 samples = 819 value = [620, 662] class = Churn 524->624 526 MultipleLines_Yes ≤ 0.764 gini = 0.379 samples = 40 value = [15, 44] class = Churn 525->526 537 TechSupport_Yes ≤ 0.782 gini = 0.231 samples = 487 value = [682, 105] class = No Churn 525->537 527 PaperlessBilling_Yes ≤ 0.087 gini = 0.315 samples = 35 value = [10, 41] class = Churn 526->527 536 gini = 0.469 samples = 5 value = [5, 3] class = No Churn 526->536 528 TotalCharges ≤ -0.971 gini = 0.48 samples = 11 value = [6, 9] class = Churn 527->528 531 PaymentMethod_Mailed check ≤ 0.5 gini = 0.198 samples = 24 value = [4, 32] class = Churn 527->531 529 gini = 0.278 samples = 5 value = [1, 5] class = Churn 528->529 530 gini = 0.494 samples = 6 value = [5, 4] class = No Churn 528->530 532 gini = 0.117 samples = 11 value = [1, 15] class = Churn 531->532 533 TotalCharges ≤ -0.969 gini = 0.255 samples = 13 value = [3, 17] class = Churn 531->533 534 gini = 0.444 samples = 5 value = [3, 6] class = Churn 533->534 535 gini = 0.0 samples = 8 value = [0, 11] class = Churn 533->535 538 gender_Male ≤ 0.256 gini = 0.34 samples = 201 value = [245, 68] class = No Churn 537->538 587 tenure ≤ 0.468 gini = 0.144 samples = 286 value = [437, 37] class = No Churn 537->587 539 Partner_Yes ≤ 0.5 gini = 0.222 samples = 98 value = [137, 20] class = No Churn 538->539 562 TotalCharges ≤ 0.542 gini = 0.426 samples = 103 value = [108, 48] class = No Churn 538->562 540 StreamingTV_Yes ≤ 0.097 gini = 0.308 samples = 52 value = [64, 15] class = No Churn 539->540 553 tenure ≤ -0.592 gini = 0.12 samples = 46 value = [73, 5] class = No Churn 539->553 541 TotalCharges ≤ -0.852 gini = 0.185 samples = 36 value = [52, 6] class = No Churn 540->541 548 StreamingMovies_Yes ≤ 0.5 gini = 0.49 samples = 16 value = [12, 9] class = No Churn 540->548 542 gini = 0.444 samples = 5 value = [4, 2] class = No Churn 541->542 543 OnlineSecurity_Yes ≤ 0.5 gini = 0.142 samples = 31 value = [48, 4] class = No Churn 541->543 544 gini = 0.061 samples = 19 value = [31, 1] class = No Churn 543->544 545 tenure ≤ -0.062 gini = 0.255 samples = 12 value = [17, 3] class = No Churn 543->545 546 gini = 0.48 samples = 5 value = [3, 2] class = No Churn 545->546 547 gini = 0.124 samples = 7 value = [14, 1] class = No Churn 545->547 549 gini = 0.245 samples = 5 value = [6, 1] class = No Churn 548->549 550 DeviceProtection_Yes ≤ 0.597 gini = 0.49 samples = 11 value = [6, 8] class = Churn 548->550 551 gini = 0.0 samples = 5 value = [0, 7] class = Churn 550->551 552 gini = 0.245 samples = 6 value = [6, 1] class = No Churn 550->552 554 gini = 0.408 samples = 5 value = [5, 2] class = No Churn 553->554 555 DeviceProtection_Yes ≤ 0.5 gini = 0.081 samples = 41 value = [68, 3] class = No Churn 553->555 556 gini = 0.0 samples = 16 value = [28, 0] class = No Churn 555->556 557 Dependents_Yes ≤ 0.783 gini = 0.13 samples = 25 value = [40, 3] class = No Churn 555->557 558 MonthlyCharges ≤ -0.229 gini = 0.219 samples = 16 value = [21, 3] class = No Churn 557->558 561 gini = 0.0 samples = 9 value = [19, 0] class = No Churn 557->561 559 gini = 0.444 samples = 7 value = [6, 3] class = No Churn 558->559 560 gini = 0.0 samples = 9 value = [15, 0] class = No Churn 558->560 563 DeviceProtection_Yes ≤ 0.074 gini = 0.469 samples = 80 value = [78, 47] class = No Churn 562->563 584 Dependents_Yes ≤ 0.5 gini = 0.062 samples = 23 value = [30, 1] class = No Churn 562->584 564 PaymentMethod_Mailed check ≤ 0.5 gini = 0.413 samples = 53 value = [56, 23] class = No Churn 563->564 577 MonthlyCharges ≤ -0.144 gini = 0.499 samples = 27 value = [22, 24] class = Churn 563->577 565 PaymentMethod_Credit card (automatic) ≤ 0.5 gini = 0.468 samples = 40 value = [37, 22] class = No Churn 564->565 576 gini = 0.095 samples = 13 value = [19, 1] class = No Churn 564->576 566 tenure ≤ 0.448 gini = 0.49 samples = 32 value = [28, 21] class = No Churn 565->566 575 gini = 0.18 samples = 8 value = [9, 1] class = No Churn 565->575 567 TotalCharges ≤ -0.445 gini = 0.1 samples = 14 value = [18, 1] class = No Churn 566->567 570 OnlineSecurity_Yes ≤ 0.244 gini = 0.444 samples = 18 value = [10, 20] class = Churn 566->570 568 gini = 0.245 samples = 7 value = [6, 1] class = No Churn 567->568 569 gini = 0.0 samples = 7 value = [12, 0] class = No Churn 567->569 571 PaperlessBilling_Yes ≤ 0.404 gini = 0.33 samples = 13 value = [5, 19] class = Churn 570->571 574 gini = 0.278 samples = 5 value = [5, 1] class = No Churn 570->574 572 gini = 0.463 samples = 5 value = [4, 7] class = Churn 571->572 573 gini = 0.142 samples = 8 value = [1, 12] class = Churn 571->573 578 StreamingMovies_Yes ≤ 0.337 gini = 0.473 samples = 21 value = [15, 24] class = Churn 577->578 583 gini = 0.0 samples = 6 value = [7, 0] class = No Churn 577->583 579 gini = 0.49 samples = 11 value = [12, 9] class = No Churn 578->579 580 TotalCharges ≤ 0.066 gini = 0.278 samples = 10 value = [3, 15] class = Churn 578->580 581 gini = 0.444 samples = 5 value = [3, 6] class = Churn 580->581 582 gini = 0.0 samples = 5 value = [0, 9] class = Churn 580->582 585 gini = 0.0 samples = 15 value = [19, 0] class = No Churn 584->585 586 gini = 0.153 samples = 8 value = [11, 1] class = No Churn 584->586 588 Contract_One year ≤ 0.086 gini = 0.283 samples = 105 value = [141, 29] class = No Churn 587->588 609 Dependents_Yes ≤ 0.5 gini = 0.051 samples = 181 value = [296, 8] class = No Churn 587->609 589 MonthlyCharges ≤ -0.471 gini = 0.121 samples = 67 value = [101, 7] class = No Churn 588->589 600 TotalCharges ≤ -0.347 gini = 0.458 samples = 38 value = [40, 22] class = No Churn 588->600 590 MonthlyCharges ≤ -0.8 gini = 0.332 samples = 11 value = [15, 4] class = No Churn 589->590 593 OnlineSecurity_Yes ≤ 0.5 gini = 0.065 samples = 56 value = [86, 3] class = No Churn 589->593 591 gini = 0.0 samples = 6 value = [12, 0] class = No Churn 590->591 592 gini = 0.49 samples = 5 value = [3, 4] class = Churn 590->592 594 gini = 0.0 samples = 26 value = [44, 0] class = No Churn 593->594 595 PaperlessBilling_Yes ≤ 0.5 gini = 0.124 samples = 30 value = [42, 3] class = No Churn 593->595 596 MonthlyCharges ≤ 0.058 gini = 0.219 samples = 15 value = [21, 3] class = No Churn 595->596 599 gini = 0.0 samples = 15 value = [21, 0] class = No Churn 595->599 597 gini = 0.42 samples = 7 value = [7, 3] class = No Churn 596->597 598 gini = 0.0 samples = 8 value = [14, 0] class = No Churn 596->598 601 tenure ≤ -0.511 gini = 0.091 samples = 10 value = [20, 1] class = No Churn 600->601 604 DeviceProtection_Yes ≤ 0.994 gini = 0.5 samples = 28 value = [20, 21] class = Churn 600->604 602 gini = 0.0 samples = 5 value = [10, 0] class = No Churn 601->602 603 gini = 0.165 samples = 5 value = [10, 1] class = No Churn 601->603 605 Partner_Yes ≤ 0.006 gini = 0.436 samples = 18 value = [9, 19] class = Churn 604->605 608 gini = 0.26 samples = 10 value = [11, 2] class = No Churn 604->608 606 gini = 0.497 samples = 10 value = [7, 6] class = No Churn 605->606 607 gini = 0.231 samples = 8 value = [2, 13] class = Churn 605->607 610 MultipleLines_Yes ≤ 0.5 gini = 0.012 samples = 100 value = [167, 1] class = No Churn 609->610 615 SeniorCitizen ≤ 0.467 gini = 0.098 samples = 81 value = [129, 7] class = No Churn 609->615 611 SeniorCitizen ≤ 0.5 gini = 0.026 samples = 47 value = [76, 1] class = No Churn 610->611 614 gini = 0.0 samples = 53 value = [91, 0] class = No Churn 610->614 612 gini = 0.0 samples = 42 value = [68, 0] class = No Churn 611->612 613 gini = 0.198 samples = 5 value = [8, 1] class = No Churn 611->613 616 StreamingMovies_Yes ≤ 0.5 gini = 0.016 samples = 76 value = [124, 1] class = No Churn 615->616 623 gini = 0.496 samples = 5 value = [5, 6] class = Churn 615->623 617 gini = 0.0 samples = 26 value = [43, 0] class = No Churn 616->617 618 gender_Male ≤ 0.5 gini = 0.024 samples = 50 value = [81, 1] class = No Churn 616->618 619 gini = 0.0 samples = 24 value = [46, 0] class = No Churn 618->619 620 MonthlyCharges ≤ 0.379 gini = 0.054 samples = 26 value = [35, 1] class = No Churn 618->620 621 gini = 0.124 samples = 10 value = [14, 1] class = No Churn 620->621 622 gini = 0.0 samples = 16 value = [21, 0] class = No Churn 620->622 625 OnlineSecurity_Yes ≤ 0.996 gini = 0.417 samples = 449 value = [209, 497] class = Churn 624->625 712 Contract_Two year ≤ 0.969 gini = 0.409 samples = 370 value = [411, 165] class = No Churn 624->712 626 tenure ≤ -0.651 gini = 0.366 samples = 389 value = [147, 463] class = Churn 625->626 699 gender_Male ≤ 0.5 gini = 0.457 samples = 60 value = [62, 34] class = No Churn 625->699 627 Partner_Yes ≤ 0.017 gini = 0.246 samples = 173 value = [38, 227] class = Churn 626->627 650 Dependents_Yes ≤ 0.943 gini = 0.432 samples = 216 value = [109, 236] class = Churn 626->650 628 Dependents_Yes ≤ 0.946 gini = 0.287 samples = 128 value = [34, 162] class = Churn 627->628 643 TotalCharges ≤ -0.885 gini = 0.109 samples = 45 value = [4, 65] class = Churn 627->643 629 PaymentMethod_Electronic check ≤ 0.006 gini = 0.28 samples = 123 value = [32, 158] class = Churn 628->629 642 gini = 0.444 samples = 5 value = [2, 4] class = Churn 628->642 630 TotalCharges ≤ -0.508 gini = 0.36 samples = 24 value = [8, 26] class = Churn 629->630 635 gender_Male ≤ 0.77 gini = 0.26 samples = 99 value = [24, 132] class = Churn 629->635 631 TotalCharges ≤ -0.636 gini = 0.426 samples = 19 value = [8, 18] class = Churn 630->631 634 gini = 0.0 samples = 5 value = [0, 8] class = Churn 630->634 632 gini = 0.346 samples = 11 value = [4, 14] class = Churn 631->632 633 gini = 0.5 samples = 8 value = [4, 4] class = No Churn 631->633 636 TotalCharges ≤ -0.688 gini = 0.143 samples = 60 value = [7, 83] class = Churn 635->636 641 gini = 0.382 samples = 39 value = [17, 49] class = Churn 635->641 637 gini = 0.223 samples = 31 value = [6, 41] class = Churn 636->637 638 MonthlyCharges ≤ 0.684 gini = 0.045 samples = 29 value = [1, 42] class = Churn 636->638 639 gini = 0.1 samples = 9 value = [1, 18] class = Churn 638->639 640 gini = 0.0 samples = 20 value = [0, 24] class = Churn 638->640 644 gini = 0.444 samples = 5 value = [2, 4] class = Churn 643->644 645 gender_Male ≤ 0.913 gini = 0.061 samples = 40 value = [2, 61] class = Churn 643->645 646 gini = 0.0 samples = 25 value = [0, 36] class = Churn 645->646 647 StreamingMovies_Yes ≤ 0.5 gini = 0.137 samples = 15 value = [2, 25] class = Churn 645->647 648 gini = 0.0 samples = 7 value = [0, 15] class = Churn 647->648 649 gini = 0.278 samples = 8 value = [2, 10] class = Churn 647->649 651 gender_Male ≤ 0.902 gini = 0.409 samples = 177 value = [81, 202] class = Churn 650->651 688 Contract_One year ≤ 0.5 gini = 0.495 samples = 39 value = [28, 34] class = Churn 650->688 652 Partner_Yes ≤ 0.893 gini = 0.331 samples = 93 value = [31, 117] class = Churn 651->652 675 MultipleLines_Yes ≤ 0.5 gini = 0.466 samples = 84 value = [50, 85] class = Churn 651->675 653 StreamingMovies_Yes ≤ 0.217 gini = 0.311 samples = 70 value = [21, 88] class = Churn 652->653 668 SeniorCitizen ≤ 0.5 gini = 0.381 samples = 23 value = [10, 29] class = Churn 652->668 654 tenure ≤ 0.025 gini = 0.334 samples = 23 value = [7, 26] class = Churn 653->654 659 MultipleLines_Yes ≤ 0.058 gini = 0.301 samples = 47 value = [14, 62] class = Churn 653->659 655 gini = 0.5 samples = 9 value = [6, 6] class = No Churn 654->655 656 tenure ≤ 0.456 gini = 0.091 samples = 14 value = [1, 20] class = Churn 654->656 657 gini = 0.0 samples = 9 value = [0, 14] class = Churn 656->657 658 gini = 0.245 samples = 5 value = [1, 6] class = Churn 656->658 660 TotalCharges ≤ 0.292 gini = 0.498 samples = 10 value = [8, 7] class = No Churn 659->660 663 StreamingTV_Yes ≤ 0.72 gini = 0.177 samples = 37 value = [6, 55] class = Churn 659->663 661 gini = 0.245 samples = 5 value = [1, 6] class = Churn 660->661 662 gini = 0.219 samples = 5 value = [7, 1] class = No Churn 660->662 664 gini = 0.0 samples = 7 value = [0, 11] class = Churn 663->664 665 MonthlyCharges ≤ 1.171 gini = 0.211 samples = 30 value = [6, 44] class = Churn 663->665 666 gini = 0.375 samples = 10 value = [3, 9] class = Churn 665->666 667 gini = 0.145 samples = 20 value = [3, 35] class = Churn 665->667 669 MonthlyCharges ≤ 1.047 gini = 0.498 samples = 10 value = [7, 8] class = Churn 668->669 672 MonthlyCharges ≤ 1.052 gini = 0.219 samples = 13 value = [3, 21] class = Churn 668->672 670 gini = 0.219 samples = 5 value = [7, 1] class = No Churn 669->670 671 gini = 0.0 samples = 5 value = [0, 7] class = Churn 669->671 673 gini = 0.375 samples = 6 value = [3, 9] class = Churn 672->673 674 gini = 0.0 samples = 7 value = [0, 12] class = Churn 672->674 676 StreamingMovies_Yes ≤ 0.5 gini = 0.231 samples = 18 value = [26, 4] class = No Churn 675->676 679 StreamingMovies_Yes ≤ 0.042 gini = 0.353 samples = 66 value = [24, 81] class = Churn 675->679 677 gini = 0.298 samples = 9 value = [9, 2] class = No Churn 676->677 678 gini = 0.188 samples = 9 value = [17, 2] class = No Churn 676->678 680 gini = 0.418 samples = 29 value = [14, 33] class = Churn 679->680 681 TotalCharges ≤ 0.784 gini = 0.285 samples = 37 value = [10, 48] class = Churn 679->681 682 DeviceProtection_Yes ≤ 0.344 gini = 0.394 samples = 26 value = [10, 27] class = Churn 681->682 687 gini = 0.0 samples = 11 value = [0, 21] class = Churn 681->687 683 TotalCharges ≤ 0.305 gini = 0.49 samples = 11 value = [8, 6] class = No Churn 682->683 686 gini = 0.159 samples = 15 value = [2, 21] class = Churn 682->686 684 gini = 0.5 samples = 5 value = [3, 3] class = No Churn 683->684 685 gini = 0.469 samples = 6 value = [5, 3] class = No Churn 683->685 689 SeniorCitizen ≤ 0.141 gini = 0.44 samples = 31 value = [16, 33] class = Churn 688->689 698 gini = 0.142 samples = 8 value = [12, 1] class = No Churn 688->698 690 DeviceProtection_Yes ≤ 0.433 gini = 0.478 samples = 24 value = [15, 23] class = Churn 689->690 697 gini = 0.165 samples = 7 value = [1, 10] class = Churn 689->697 691 StreamingMovies_Yes ≤ 0.043 gini = 0.48 samples = 11 value = [12, 8] class = No Churn 690->691 694 Partner_Yes ≤ 0.865 gini = 0.278 samples = 13 value = [3, 15] class = Churn 690->694 692 gini = 0.463 samples = 6 value = [7, 4] class = No Churn 691->692 693 gini = 0.494 samples = 5 value = [5, 4] class = No Churn 691->693 695 gini = 0.245 samples = 5 value = [1, 6] class = Churn 694->695 696 gini = 0.298 samples = 8 value = [2, 9] class = Churn 694->696 700 tenure ≤ -0.49 gini = 0.497 samples = 41 value = [33, 28] class = No Churn 699->700 709 tenure ≤ -0.225 gini = 0.284 samples = 19 value = [29, 6] class = No Churn 699->709 701 MultipleLines_Yes ≤ 0.339 gini = 0.435 samples = 15 value = [8, 17] class = Churn 700->701 704 Dependents_Yes ≤ 0.5 gini = 0.424 samples = 26 value = [25, 11] class = No Churn 700->704 702 gini = 0.444 samples = 8 value = [8, 4] class = No Churn 701->702 703 gini = 0.0 samples = 7 value = [0, 13] class = Churn 701->703 705 PaperlessBilling_Yes ≤ 0.022 gini = 0.483 samples = 19 value = [16, 11] class = No Churn 704->705 708 gini = 0.0 samples = 7 value = [9, 0] class = No Churn 704->708 706 gini = 0.0 samples = 5 value = [9, 0] class = No Churn 705->706 707 gini = 0.475 samples = 14 value = [7, 11] class = Churn 705->707 710 gini = 0.0 samples = 10 value = [15, 0] class = No Churn 709->710 711 gini = 0.42 samples = 9 value = [14, 6] class = No Churn 709->711 713 MultipleLines_Yes ≤ 0.021 gini = 0.48 samples = 248 value = [230, 154] class = No Churn 712->713 760 Dependents_Yes ≤ 0.612 gini = 0.108 samples = 122 value = [181, 11] class = No Churn 712->760 714 Contract_One year ≤ 0.128 gini = 0.372 samples = 41 value = [58, 19] class = No Churn 713->714 727 MultipleLines_Yes ≤ 0.964 gini = 0.493 samples = 207 value = [172, 135] class = No Churn 713->727 715 TotalCharges ≤ 1.605 gini = 0.278 samples = 11 value = [20, 4] class = No Churn 714->715 718 PaymentMethod_Credit card (automatic) ≤ 0.5 gini = 0.406 samples = 30 value = [38, 15] class = No Churn 714->718 716 gini = 0.0 samples = 6 value = [11, 0] class = No Churn 715->716 717 gini = 0.426 samples = 5 value = [9, 4] class = No Churn 715->717 719 DeviceProtection_Yes ≤ 0.5 gini = 0.47 samples = 20 value = [23, 14] class = No Churn 718->719 724 gender_Male ≤ 0.289 gini = 0.117 samples = 10 value = [15, 1] class = No Churn 718->724 720 gini = 0.133 samples = 7 value = [13, 1] class = No Churn 719->720 721 tenure ≤ 0.978 gini = 0.491 samples = 13 value = [10, 13] class = Churn 719->721 722 gini = 0.32 samples = 7 value = [8, 2] class = No Churn 721->722 723 gini = 0.26 samples = 6 value = [2, 11] class = Churn 721->723 725 gini = 0.0 samples = 5 value = [10, 0] class = No Churn 724->725 726 gini = 0.278 samples = 5 value = [5, 1] class = No Churn 724->726 728 gini = 0.0 samples = 9 value = [0, 12] class = Churn 727->728 729 SeniorCitizen ≤ 0.926 gini = 0.486 samples = 198 value = [172, 123] class = No Churn 727->729 730 StreamingTV_Yes ≤ 0.191 gini = 0.493 samples = 149 value = [122, 96] class = No Churn 729->730 749 Contract_One year ≤ 0.019 gini = 0.455 samples = 49 value = [50, 27] class = No Churn 729->749 731 gini = 0.0 samples = 22 value = [33, 0] class = No Churn 730->731 732 SeniorCitizen ≤ 0.016 gini = 0.499 samples = 127 value = [89, 96] class = Churn 730->732 733 Contract_Two year ≤ 0.039 gini = 0.499 samples = 116 value = [89, 80] class = No Churn 732->733 748 gini = 0.0 samples = 11 value = [0, 16] class = Churn 732->748 734 MonthlyCharges ≤ 1.026 gini = 0.491 samples = 107 value = [89, 68] class = No Churn 733->734 747 gini = 0.0 samples = 9 value = [0, 12] class = Churn 733->747 735 gini = 0.0 samples = 8 value = [14, 0] class = No Churn 734->735 736 PaymentMethod_Mailed check ≤ 0.5 gini = 0.499 samples = 99 value = [75, 68] class = No Churn 734->736 737 tenure ≤ 1.427 gini = 0.5 samples = 94 value = [68, 67] class = No Churn 736->737 746 gini = 0.219 samples = 5 value = [7, 1] class = No Churn 736->746 738 tenure ≤ 1.162 gini = 0.489 samples = 77 value = [46, 62] class = Churn 737->738 745 gini = 0.302 samples = 17 value = [22, 5] class = No Churn 737->745 739 TechSupport_Yes ≤ 0.86 gini = 0.498 samples = 48 value = [38, 33] class = No Churn 738->739 742 MonthlyCharges ≤ 1.481 gini = 0.339 samples = 29 value = [8, 29] class = Churn 738->742 740 gini = 0.48 samples = 31 value = [18, 27] class = Churn 739->740 741 gini = 0.355 samples = 17 value = [20, 6] class = No Churn 739->741 743 gini = 0.185 samples = 21 value = [3, 26] class = Churn 742->743 744 gini = 0.469 samples = 8 value = [5, 3] class = No Churn 742->744 750 Partner_Yes ≤ 0.5 gini = 0.382 samples = 24 value = [26, 9] class = No Churn 749->750 755 PaymentMethod_Credit card (automatic) ≤ 0.11 gini = 0.49 samples = 25 value = [24, 18] class = No Churn 749->755 751 gini = 0.5 samples = 6 value = [4, 4] class = No Churn 750->751 752 StreamingTV_Yes ≤ 0.5 gini = 0.302 samples = 18 value = [22, 5] class = No Churn 750->752 753 gini = 0.0 samples = 8 value = [11, 0] class = No Churn 752->753 754 gini = 0.43 samples = 10 value = [11, 5] class = No Churn 752->754 756 MonthlyCharges ≤ 1.432 gini = 0.451 samples = 19 value = [21, 11] class = No Churn 755->756 759 gini = 0.42 samples = 6 value = [3, 7] class = Churn 755->759 757 gini = 0.0 samples = 12 value = [19, 0] class = No Churn 756->757 758 gini = 0.26 samples = 7 value = [2, 11] class = Churn 756->758 761 DeviceProtection_Yes ≤ 0.5 gini = 0.155 samples = 74 value = [108, 10] class = No Churn 760->761 772 MonthlyCharges ≤ 1.444 gini = 0.027 samples = 48 value = [73, 1] class = No Churn 760->772 762 gini = 0.0 samples = 14 value = [21, 0] class = No Churn 761->762 763 tenure ≤ 1.08 gini = 0.185 samples = 60 value = [87, 10] class = No Churn 761->763 764 gini = 0.48 samples = 8 value = [6, 4] class = No Churn 763->764 765 TotalCharges ≤ 2.411 gini = 0.128 samples = 52 value = [81, 6] class = No Churn 763->765 766 gender_Male ≤ 0.388 gini = 0.236 samples = 27 value = [38, 6] class = No Churn 765->766 771 gini = 0.0 samples = 25 value = [43, 0] class = No Churn 765->771 767 gini = 0.0 samples = 13 value = [21, 0] class = No Churn 766->767 768 PaymentMethod_Credit card (automatic) ≤ 0.5 gini = 0.386 samples = 14 value = [17, 6] class = No Churn 766->768 769 gini = 0.375 samples = 7 value = [9, 3] class = No Churn 768->769 770 gini = 0.397 samples = 7 value = [8, 3] class = No Churn 768->770 773 MonthlyCharges ≤ 1.337 gini = 0.067 samples = 18 value = [28, 1] class = No Churn 772->773 776 gini = 0.0 samples = 30 value = [45, 0] class = No Churn 772->776 774 gini = 0.0 samples = 13 value = [21, 0] class = No Churn 773->774 775 gini = 0.219 samples = 5 value = [7, 1] class = No Churn 773->775
In [39]:
#parameter
parameter_grd = {
    'max_depth' : [int(x) for x in np.linspace(5, 50, num = 5)],
    'min_samples_leaf': [int(x) for x in np.linspace(5,105, num =5)],
    'max_features':['auto', 'sqrt'], 
    'learning_rate':[0.001, 0.01, 0.1, 1, 10, 100] # Boosting learning rate
}
In [40]:
%%time
algorithm_grad_boosting = GradientBoostingClassifier(random_state=1)

model_gbm_boosting =train_evaluate_model(algorithm_grad_boosting, parameter_grd, cv, train_features, train_labels,test_features, test_labels, label, pos_label )
best parameter:  {'learning_rate': 0.1, 'max_depth': 50, 'max_features': 'sqrt', 'min_samples_leaf': 5}

Classification acurracy: 0.7727056585829767

Confusion Matrix:
 [[1318  250]
 [ 228  307]]

classification_report:
               precision    recall  f1-score   support

          No       0.85      0.84      0.85      1568
         Yes       0.55      0.57      0.56       535

   micro avg       0.77      0.77      0.77      2103
   macro avg       0.70      0.71      0.70      2103
weighted avg       0.78      0.77      0.77      2103


Confusion Matrix from yellobrick:

CPU times: user 2h 4min 56s, sys: 19.7 s, total: 2h 5min 16s
Wall time: 2h 6min 15s
In [43]:
# save the model 
save_model(model_gbm_boosting, 'final_gbm_boosting')
Out[43]:
'Model saved'
In [44]:
#show features importance in diagram
feat_import_gbm = pd.Series(model_gbm_boosting.feature_importances_, index=X_train.columns)
feat_import_gbm.nlargest(15).plot(kind='barh',figsize=(10,9)).invert_yaxis()# show most import first
plt.show()
In [183]:
#parameters
parameter_xgboost = {
    'max_depth' : [int(x) for x in np.linspace(5, 50, num = 5)],
    'num_leaves': [int(x) for x in np.linspace(5,105, num =5)],
    #'max_features':['auto', 'sqrt'], 
    #'booster': ['gbtree'],
    'learning_rate':[0.001, 0.01, 0.1, 1, 10],
    'reg_lambda': [0, 0.1, 1, 5, 10, 20, 50] #L2 regularization term on weights
    
}
In [161]:
#tune parameters and show performance on test set
algorithm_xgboost = XGBClassifier(random_state=1)

model_xgboost =train_evaluate_model(algorithm_xgboost, parameter_xgboost, cv, train_features, train_labels,test_features, test_labels, label, pos_label )
Classification acurracy: 0.776034236804565

Confusion Matrix:
 [[1295  220]
 [ 251  337]]

classification_report:
               precision    recall  f1-score   support

          No       0.84      0.85      0.85      1515
         Yes       0.61      0.57      0.59       588

   micro avg       0.78      0.78      0.78      2103
   macro avg       0.72      0.71      0.72      2103
weighted avg       0.77      0.78      0.77      2103


Confusion Matrix from yellobrick:

CPU times: user 4.73 s, sys: 35.7 ms, total: 4.76 s
Wall time: 4.91 s
In [194]:
#show features importance in diagram
feat_import_gbm = pd.Series(model_xgboost.feature_importances_, index=X_train.columns)
feat_import_gbm.nlargest(15).plot(kind='barh',figsize=(10,9)).invert_yaxis()# show most import first
plt.show()
In [165]:
# save the model 
save_model(model_xgboost, 'final_model_xgboost')
Out[165]:
'Model saved'
In [41]:
parameter_lightgbm = {
    'n_estimators': [int(x) for x in np.linspace(start = 50, stop = 1000, num = 5)], #number boosted tre to fit
    'max_depth' : [int(x) for x in np.linspace(5, 50, num = 5)],
    'num_leaves': [int(x) for x in np.linspace(5,105, num =5)],
    #'max_features':['auto', 'sqrt'], 
    'learning_rate':[0.001, 0.01, 0.1, 1, 10],
    #'booster': ['gbtree'],
    'reg_lambda': [0, 0.1, 1, 5, 10, 20, 50] #L2 regularization term on weights
    
}
In [82]:
%%time
algorithm_lightgbm = LGBMClassifier(random_state=1)

model_lgbm =train_evaluate_model(algorithm_lightgbm, parameter_lightgbm,cv, train_features, train_labels,test_features, test_labels, label, pos_label )
best parameter:  {'n_estimators': 287, 'num_leaves': 55}

Classification acurracy: 0.7703281027104137

Confusion Matrix:
 [[1301  238]
 [ 245  319]]

classification_report:
               precision    recall  f1-score   support

          No       0.84      0.85      0.84      1539
         Yes       0.57      0.57      0.57       564

   micro avg       0.77      0.77      0.77      2103
   macro avg       0.71      0.71      0.71      2103
weighted avg       0.77      0.77      0.77      2103


Confusion Matrix from yellobrick:

CPU times: user 1h 2min 4s, sys: 5min 54s, total: 1h 7min 58s
Wall time: 25min 6s
In [92]:
#save model
save_model(model_lgbm, 'final_model_lgbm')
Out[92]:
'Model saved'
In [94]:
#show features importance in diagram
feat_import_gbm = pd.Series(model_lgbm.feature_importances_, index=X_train.columns)
feat_import_gbm.nlargest(15).plot(kind='barh',figsize=(10,9)).invert_yaxis()# show most import first
plt.show()

5. Conclusion

Compare model perfomance

In [43]:
#dict to store all metrics
model_metric = {"Model": [],
                "Accuracy_score": [],
                "Auc_score": [],
                "Recall_score" : [],
                "Precision" : [],
                "f1_score" : []
                }



# model(with best parameter) perfomance compare
def compare_alg( model_name, model, test_feat, test_label):
    """
    model_name = algorithm name,
    model = model(with best parameter),
    test_feat = test dataset (features), 
    test_label = test labels
    """
    
    #predict with test data
    predict_val = model.predict(test_feat)
    accuracy = accuracy_score(test_label, predict_val) # accuracy score
    roc_auc = roc_auc_score(np.where(test_labels == 'Yes',1, 0), np.where(predict_val == 'Yes',1, 0)) # auc score(takes only number)
    rec_score = recall_score(test_label, predict_val, pos_label = 'Yes') # recall score
    precio_score = precision_score(test_label, predict_val, pos_label = 'Yes') # precious score
    f1score = f1_score(test_label, predict_val, pos_label = 'Yes')  # f1 score
    
    # append values to dict
    model_metric["Model"].append(model_name)
    model_metric["Accuracy_score"].append(accuracy)
    model_metric["Auc_score"].append(roc_auc)
    model_metric["Recall_score"].append(rec_score)
    model_metric["Precision"].append(precio_score)
    model_metric["f1_score"].append(f1score)
    
    print('metric appended')
    
    
    
  
In [51]:
# call  compare_alg()
model_list = [model_lr, model_dt, model_rf, model_gbm_boosting,  model_xgboost, model_lgbm]
model_name =['Logistic Regression', 'Decision Tree', 'Random Forest','GradientBoosting','XGBoost Classifier','LightGBM classifier']

for model, name in zip(model_list, model_name):
    compare_alg(name, model, test_features, test_labels)
    
print('All metric appended')
    
metric appended
metric appended
metric appended
metric appended
metric appended
metric appended
All metric appended
In [53]:
#show(mertics) in panda dataframe
metric_df = pd.DataFrame(model_metric) # change into pandas dataframe
metric_df
Out[53]:
Model Accuracy_score Auc_score Recall_score Precision f1_score
0 Logistic Regression 0.745126 0.760036 0.791741 0.512195 0.622003
1 Decision Tree 0.762720 0.740993 0.694794 0.540503 0.608013
2 Random Forest 0.785544 0.757666 0.698384 0.578869 0.633035
3 GradientBoosting 0.772706 0.701845 0.551167 0.573832 0.562271
4 XGBoost Classifier 0.776034 0.721336 0.605027 0.573129 0.588646
5 LightGBM classifier 0.770328 0.707119 0.572711 0.565603 0.569135
In [55]:
"""== using plotly to create table== """
init_notebook_mode(connected=True)  # initiate the Plotly Notebook mode
table = ff.create_table(np.round(metric_df, 3)) #create tabel
iplot(table)

Conclusion:

In this section, i am looking which models perfome better in term of positive (Churn) class prediciton. Thus, like to look on metric like recall and precious score.

Suprisinngly, Logistic regression, Decison tree and Random forest did very well in in recall score, and Random Forest, Gradient boosting and XGBoost have bigger precious score. Logistic Regression has the least precious score.

In addition, Random forest did very well in the overall model classification with higest auc_score.