Life Expectancy Prediction

What Is The Business Use Case?
Steps Involved In Heart Failure Prediction.
Importing Library
Loading Data
Plotting Count Plot
Examining The Correlation Matrix
For All The Features.
Examining Count Plot Of Age.
Outlier Detection Plotting.
KDE Plot.
Data Preprocessing.
Train Test Split.
Model Building.
Model Conclusion.

(1) What Is The Business Use case ?

This use case is all about the ‘Life Expectancy’ prediction of a person in a country using the ANN model.

(2) Importing Required Libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import preprocessing
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
import seaborn as sns
from keras.layers import Dense, BatchNormalization, Dropout, LSTM
from keras.models import Sequential
from keras.utils import to_categorical
from keras import callbacks
from sklearn.metrics import precision_score, recall_score, confusion_matrix, classification_report, accuracy_score, f1_score

(3) Loading Input Data

df = pd.read_csv('Life Expectancy Data.csv')
df.head()

df.columns

(4) Feature Descriptions:

Feature	Description
Country	countries have been collected from the same WHO data repository website
Year	year 2013-2000
Status	Status of country Developing or Developed
Life expectancy	Life Expectancy in Age our target
Adult Mortality	Adult Mortality Rates of both sexes (probability of dying between 15 and 60 years per 1000 population)
infant deaths	Number of Infant Deaths per 1000 population
Alcohol	Alcohol, recorded per capita (15+) consumption (in litres of pure alcohol)
percentage expenditure	Expenditure on health as a percentage of Gross Domestic Product per capita(%)
Hepatitis B	Hepatitis B (HepB) immunization coverage among 1-year-olds (%)
Measles	Measles – number of reported cases per 1000 population
BMI	Average Body Mass Index of entire population
under-five deaths	Number of under-five deaths per 1000 population
Polio	Polio (Pol3) immunization coverage among 1-year-olds (%)
Total expenditure	General government expenditure on health as a percentage of total government expenditure (%)
Diphtheria	Diphtheria tetanus toxoid and pertussis (DTP3) immunization coverage among 1-year-olds (%)
HIV/AIDS	Deaths per 1 000 live births HIV/AIDS (0-4 years)
GDP	Gross Domestic Product per capita (in USD)
Population	Population of the country
thinness 1-19 years	Prevalence of thinness among children and adolescents for Age 10 to 19 (% )
thinness 5-9 years	Prevalence of thinness among children for Age 5 to 9(%)
Income composition of resources	Human Development Index in terms of income composition of resources (index ranging from 0 to 1)
Schooling	Number of years of Schooling(years)

(5) Exploratory Data Analysis:

Data Frame Shape:

print("Number of Rows:",df.shape[0])
print("Number of Features:",df.shape[1])

Data Frame Info:

df.info()

Missing Value Counts:

df.isnull().sum()

Selecting Categorical and Numerical Columns:

numeric_columns = df.select_dtypes(include = ['float64', 'int64']).columns
categorical_columns = df.select_dtypes(include = ['object']).columns

print('Numeric Columns:', numeric_columns)
print('Categorical Columns:', categorical_columns)

(6) Data Cleaning:

Missing Value Treatment:

df.isnull().sum()

imputer = SimpleImputer(missing_values = np.nan, strategy = 'mean', fill_value = None)

df['Life expectancy '] = imputer.fit_transform(df['Life expectancy '])
df['Adult Mortality'] = imputer.fit_transform(df[['Adult Mortality']])
df['Alcohol'] = imputer.fit_transform(df[['Alcohol']])
df['Hepatitis B'] = imputer.fit_transform(df[['Hepatitis B']])
df[' BMI '] = imputer.fit_transform(df[[' BMI ']])
df['Polio'] = imputer.fit_transform(df[['Polio']])
df['Total expenditure'] = imputer.fit_transform(df[['Total expenditure']])
df['Diphtheria '] = imputer.fit_transform(df[['Diphtheria ']])
df['GDP'] = imputer.fit_transform(df[['GDP']])
df['Population'] = imputer.fit_transform(df[['Population']])
df[' thinness  1-19 years'] = imputer.fit_transform(df[[' thinness  1-19 years']])
df[' thinness 5-9 years'] = imputer.fit_transform(df[[' thinness 5-9 years']])
df['Income composition of resources'] = imputer.fit_transform(df[['Income composition of resources']])
df['Schooling'] = imputer.fit_transform(df[['Schooling']])

df.isnull().sum()

Handling Outliers:

for column in df.columns:
    fig = px.box(df, y=column, title=f'Box Plot for {column}')
    
    # Update layout to center the title and make it bold
    fig.update_layout(
        title=dict(text=f'<b>Box Plot for {column}</b>', x=0.5),
        boxmode='group'  
    )
    
    fig.show()

Dealing With Outliers:

# Specify the list of columns you want to handle outliers for
cols_to_handle_outliers = [
    'Adult Mortality', 'infant deaths', 'Alcohol', 'percentage expenditure',
    'Hepatitis B', 'Measles ', ' BMI ', 'under-five deaths ', 'Polio',
    'Total expenditure', 'Diphtheria ', ' HIV/AIDS', 'GDP', 'Population',
    ' thinness  1-19 years', ' thinness 5-9 years',
    'Income composition of resources', 'Schooling'
]

# Perform outlier handling for each specified column
for col_name in cols_to_handle_outliers:
    # Calculate quartiles and IQR
    q1 = df[col_name].quantile(0.25)
    q3 = df[col_name].quantile(0.75)
    iqr = q3 - q1

    # Define the lower and upper bounds for outliers
    lower_bound = q1 - 1.5 * iqr
    upper_bound = q3 + 1.5 * iqr

    # Replace outliers with the mean value of the column
    df[col_name] = np.where((df[col_name] > upper_bound) | (df[col_name] < lower_bound), np.mean(df[col_name]), df[col_name])

Checking Outlier After Outlier Removal:

for column in df.columns:
    fig = px.box(df, y=column, title=f'Box Plot for {column}')
    
    # Update layout to center the title and make it bold
    fig.update_layout(
        title=dict(text=f'<b>Box Plot for {column}</b>', x=0.5),
        boxmode='group'  
    )
    
    fig.show()

(7) Data Visualization:

Count Plot For Year:

fig = px.histogram(df, 
                   x = 'Year', 
                   color = 'Year',
                   text_auto = '0.2s',
                   title = '<b>Count Plot For Year</b>'
                  )
fig.update_layout(
               title_x = 0.5
                 )
fig.show()

Trend Of Life Expectancy Over Years:

fig = px.line(df.sort_values(by='Year'), x='Year', y='Life expectancy ',
              markers = True, 
              animation_frame='Country',
              color='Country', 
              symbol = 'Country', 
              title='Trend of Life Expectancy Over the Years')

#update layout to center the title and make it bold
fig.update_layout(
    title=dict(text='<b>Trend of Life Expectancy Over the Years</b>', x=0.5)
)

fig.show()

Count Plot For Status Of Country:

fig = px.histogram(df, 
                   x = 'Status', 
                   color = 'Status', 
                   text_auto = '0.2s',
                   title = '<b>Count Plot For The Status Of The Country</b>')
fig.update_layout(
                 title_x = 0.5
)
fig.show()

Life Expectancy Of Developing Country:

fig = px.histogram(developing_df, 
                   x = 'Life expectancy ', 
                   text_auto='.2s',
                   title = 'Life Expectancy Of Developing Nation')
fig.update_layout(
    xaxis_title='Life Expectancy',
    yaxis_title='Count of Ages',
    title_text='<b>Life Expectancy of Developing Countries</b>',
    title_x=0.5,  # Center title
)
fig.show()

Life Expectancy Of Developed Nation:

developed_nation = df[df['Status'] == 'Developed']
fig = px.histogram(developed_nation, 
                   x = 'Life expectancy ', 
                   text_auto='.2s',
                   title = 'Life Expectancy Of Developed Nation')
fig.update_layout(
    xaxis_title='Life Expectancy',
    yaxis_title='Count of Ages',
    title_text='<b>Life Expectancy of Developing Countries</b>',
    title_x=0.5,  # Center title
)

fig.show()

Average Adult Mortality Of Developing & Developed Countries:

df_adult_mortality = df.groupby('Status', as_index=False).agg({'Adult Mortality':'mean'})
df_adult_mortality

fig = px.bar(df_adult_mortality,
            x = 'Status',
            y = 'Adult Mortality',
            color = 'Status',
             text_auto='.2s',
            title = 'Average Adult Mortality of Developing and Developed Countries'
            )
fig.update_layout(
    title_text = '<b>Average Adult Mortality of Developing and Developed Countries</b>',
    title_x=0.5
)
fig.show()

Average Infant Deaths Of Developing & Developed Countries:

df_avg_infant_death = df.groupby('Status', as_index=False).agg({'infant deaths':'mean'})
df_avg_infant_death

fig = px.bar(df_avg_infant_death, 
             x = 'Status', 
             y = 'infant deaths', 
             color = 'Status',
             text_auto='.2s')

fig.update_layout(
                  title_text = '<b>Average Infant deaths of Developing and Developed Countries</b>',
                  title_x = 0.5
                 )
fig.show()

Average Alcohol Consumption:

df_avg_alcohol_consumpt = df.groupby('Status', as_index = False).agg({'Alcohol': 'mean'})
df_avg_alcohol_consumpt

fig = px.bar(df_avg_alcohol_consumpt, 
             x = 'Status', 
             y = 'Alcohol',
             text_auto = '0.2s'
            )
fig.update_layout(
    title_text = '<b>Average Alcohol Consumption</b>',
    title_x = 0.5
)
fig.show()

Life Expectancy vs Adult Mortality:

fig = px.scatter(df.sort_values(by = 'Year'), 
                 x = 'Life expectancy ', 
                 y = 'Adult Mortality',  
                 color = 'Country', 
                 size = 'Year',
                animation_frame = 'Country')
fig.update_layout(
    title_text = '<b>Life Expectancy vs Adult Mortality</b>',
    title_x = 0.5)

fig.show()

Life Expectancy vs Infant Deaths For Countries Over Years:

fig = px.scatter(df.sort_values(by='Year'), 
                 x = 'Life expectancy ', 
                 y = 'infant deaths',
                color = 'Country',
                size = 'Year')
fig.update_layout(title_text='<b>Life expectancy vs Infant deaths for Countries over Years</b>', 
                  title_x=0.5
                 )
fig.show()

Correlation Matrix Of Numeric Columns:

correlation_matrix = df[numeric_columns].corr()
plt.figure(figsize = (20, 20))
sns.heatmap(correlation_matrix, annot = True)

(8) Data Preprocessing:

Handling Categorical Features:

‘Country’
‘Status’

df['Country'].unique()

df['Status'].unique()

# Columns to apply label encoding
cols_to_encode = ['Country', 'Status']

# Apply label encoding to X
label_encoder_df = LabelEncoder()
for col in cols_to_encode:
    df[col] = label_encoder_df.fit_transform(df[col])

df['Country'].unique()

df['Status'].unique()

Splitting Features From Target:

X = df.drop('Life expectancy ', axis=1)
y = df['Life expectancy ']

Data Scaling:

# Columns to scale
cols_to_scale = ['Country', 'Year', 'Adult Mortality',
       'infant deaths', 'Alcohol', 'percentage expenditure', 'Hepatitis B',
       'Measles ', ' BMI ', 'under-five deaths ', 'Polio', 'Total expenditure',
       'Diphtheria ', ' HIV/AIDS', 'GDP', 'Population',
       ' thinness  1-19 years', ' thinness 5-9 years',
       'Income composition of resources', 'Schooling']

# Apply Min-Max scaling to the specified columns
scaler = MinMaxScaler()
X[cols_to_scale] = scaler.fit_transform(X[cols_to_scale])

Splitting Data Into Train & Test Split:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print(f"Shape of X_train is: {X_train.shape}")
print(f"Shape of Y_train is: {y_train.shape}\n")
print(f"Shape of X_test is: {X_test.shape}")
print(f"Shape of Y_test is: {y_test.shape}")

(9) Building ANN Model.

Model Structure:

model = Sequential([
        Dense(64, activation='relu', input_dim=21),
        Dense(64, activation='relu'),
        Dense(64, activation='relu'),
        Dense(1, activation='linear')
])

Model Compiling:

model.compile(optimizer='adam', loss='mean_squared_error', metrics=['mean_absolute_error','mean_squared_error'])

Model Summary:

model.summary()

Model Visualization:

# Plot the model
plot_model(model, to_file='model_plot.png', show_shapes=True, show_layer_names=True)

Model Fitting:

history = model.fit(X_train, y_train, epochs=150, validation_split=0.2)

Train Vs Validation Loss:

# Define needed variables
tr_loss = history.history['loss']
val_loss = history.history['val_loss']
index_loss = np.argmin(val_loss)
val_lowest = val_loss[index_loss]

Epochs = [i+1 for i in range(len(tr_loss))]
loss_label = f'best epoch= {str(index_loss + 1)}'

# Plot training history
plt.figure(figsize= (20, 8))
plt.style.use('fivethirtyeight')

plt.plot(Epochs, tr_loss, 'r', label= 'Training loss')
plt.plot(Epochs, val_loss, 'g', label= 'Validation loss')
plt.scatter(index_loss + 1, val_lowest, s= 150, c= 'blue', label= loss_label)
plt.title('Training and Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()

plt.tight_layout
plt.show()

#metrics=['mean_absolute_error','mean_squared_error']

mae = history.history['mean_absolute_error']

acc_loss_df = pd.DataFrame({"Mean Absolute error" : mae,
                            "Loss" : tr_loss,
                            "Epoch" : Epochs})

acc_loss_df.style.bar(color = '#84A9AC',
                      subset = ['Mean Absolute error','Loss'])

Prediction:

y_pred = model.predict(X_test)

R2 Score:

R2 = r2_score(y_test, y_pred)
print("R2 Score=",R2 )

Life Expectancy Prediction – ANN!

Life Expectancy Prediction

Table Of Contents:

(1) What Is The Business Use case ?

(2) Importing Required Libraries

(3) Loading Input Data

(4) Feature Descriptions:

(5) Exploratory Data Analysis:

Data Frame Shape:

Data Frame Info:

Missing Value Counts:

Selecting Categorical and Numerical Columns:

(6) Data Cleaning:

Missing Value Treatment:

Handling Outliers:

Dealing With Outliers:

Checking Outlier After Outlier Removal:

(7) Data Visualization:

Count Plot For Year:

Trend Of Life Expectancy Over Years:

Count Plot For Status Of Country:

Life Expectancy Of Developing Country:

Life Expectancy Of Developed Nation:

Average Adult Mortality Of Developing & Developed Countries:

Average Infant Deaths Of Developing & Developed Countries:

Average Alcohol Consumption:

Life Expectancy vs Adult Mortality:

Life Expectancy vs Infant Deaths For Countries Over Years:

Correlation Matrix Of Numeric Columns:

(8) Data Preprocessing:

Handling Categorical Features:

Splitting Features From Target:

Data Scaling:

Splitting Data Into Train & Test Split:

(9) Building ANN Model.

Model Structure:

Model Compiling:

Model Summary:

Model Visualization:

Model Fitting:

Train Vs Validation Loss:

Prediction:

R2 Score:

Leave a Reply Cancel reply