Life Expectancy Prediction – ANN!


Life Expectancy Prediction

Table Of Contents:

  1. What Is The Business Use Case?
  2. Steps Involved In Heart Failure Prediction.
  3. Importing Library
  4. Loading Data
  5. Plotting Count Plot
  6. Examining The Correlation Matrix
  7. For All The Features.
  8. Examining Count Plot Of Age.
  9. Outlier Detection Plotting.
  10. KDE Plot.
  11. Data Preprocessing.
  12. Train Test Split.
  13. Model Building.
  14. Model Conclusion.

(1) What Is The Business Use case ?

  • This use case is all about the ‘Life Expectancy’ prediction of a person in a country using the ANN model.

(2) Importing Required Libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import preprocessing
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
import seaborn as sns
from keras.layers import Dense, BatchNormalization, Dropout, LSTM
from keras.models import Sequential
from keras.utils import to_categorical
from keras import callbacks
from sklearn.metrics import precision_score, recall_score, confusion_matrix, classification_report, accuracy_score, f1_score

(3) Loading Input Data

df = pd.read_csv('Life Expectancy Data.csv')
df.head()
df.columns

(4) Feature Descriptions:

FeatureDescription
Countrycountries have been collected from the same WHO data repository website
Yearyear 2013-2000
StatusStatus of country Developing or Developed
Life expectancyLife Expectancy in Age our target
Adult MortalityAdult Mortality Rates of both sexes (probability of dying between 15 and 60 years per 1000 population)
infant deathsNumber of Infant Deaths per 1000 population
AlcoholAlcohol, recorded per capita (15+) consumption (in litres of pure alcohol)
percentage expenditureExpenditure on health as a percentage of Gross Domestic Product per capita(%)
Hepatitis BHepatitis B (HepB) immunization coverage among 1-year-olds (%)
MeaslesMeasles – number of reported cases per 1000 population
BMIAverage Body Mass Index of entire population
under-five deathsNumber of under-five deaths per 1000 population
PolioPolio (Pol3) immunization coverage among 1-year-olds (%)
Total expenditureGeneral government expenditure on health as a percentage of total government expenditure (%)
DiphtheriaDiphtheria tetanus toxoid and pertussis (DTP3) immunization coverage among 1-year-olds (%)
HIV/AIDSDeaths per 1 000 live births HIV/AIDS (0-4 years)
GDPGross Domestic Product per capita (in USD)
PopulationPopulation of the country
thinness 1-19 yearsPrevalence of thinness among children and adolescents for Age 10 to 19 (% )
thinness 5-9 yearsPrevalence of thinness among children for Age 5 to 9(%)
Income composition of resourcesHuman Development Index in terms of income composition of resources (index ranging from 0 to 1)
SchoolingNumber of years of Schooling(years)

(5) Exploratory Data Analysis:

Data Frame Shape:

print("Number of Rows:",df.shape[0])
print("Number of Features:",df.shape[1])

Data Frame Info:

df.info()

Missing Value Counts:

df.isnull().sum()

Selecting Categorical and Numerical Columns:

numeric_columns = df.select_dtypes(include = ['float64', 'int64']).columns
categorical_columns = df.select_dtypes(include = ['object']).columns
print('Numeric Columns:', numeric_columns)
print('Categorical Columns:', categorical_columns)

(6) Data Cleaning:

Missing Value Treatment:

df.isnull().sum()
imputer = SimpleImputer(missing_values = np.nan, strategy = 'mean', fill_value = None)

df['Life expectancy '] = imputer.fit_transform(df['Life expectancy '])
df['Adult Mortality'] = imputer.fit_transform(df[['Adult Mortality']])
df['Alcohol'] = imputer.fit_transform(df[['Alcohol']])
df['Hepatitis B'] = imputer.fit_transform(df[['Hepatitis B']])
df[' BMI '] = imputer.fit_transform(df[[' BMI ']])
df['Polio'] = imputer.fit_transform(df[['Polio']])
df['Total expenditure'] = imputer.fit_transform(df[['Total expenditure']])
df['Diphtheria '] = imputer.fit_transform(df[['Diphtheria ']])
df['GDP'] = imputer.fit_transform(df[['GDP']])
df['Population'] = imputer.fit_transform(df[['Population']])
df[' thinness  1-19 years'] = imputer.fit_transform(df[[' thinness  1-19 years']])
df[' thinness 5-9 years'] = imputer.fit_transform(df[[' thinness 5-9 years']])
df['Income composition of resources'] = imputer.fit_transform(df[['Income composition of resources']])
df['Schooling'] = imputer.fit_transform(df[['Schooling']])
df.isnull().sum()

Handling Outliers:

for column in df.columns:
    fig = px.box(df, y=column, title=f'Box Plot for {column}')
    
    # Update layout to center the title and make it bold
    fig.update_layout(
        title=dict(text=f'<b>Box Plot for {column}</b>', x=0.5),
        boxmode='group'  
    )
    
    fig.show()

Dealing With Outliers:

# Specify the list of columns you want to handle outliers for
cols_to_handle_outliers = [
    'Adult Mortality', 'infant deaths', 'Alcohol', 'percentage expenditure',
    'Hepatitis B', 'Measles ', ' BMI ', 'under-five deaths ', 'Polio',
    'Total expenditure', 'Diphtheria ', ' HIV/AIDS', 'GDP', 'Population',
    ' thinness  1-19 years', ' thinness 5-9 years',
    'Income composition of resources', 'Schooling'
]

# Perform outlier handling for each specified column
for col_name in cols_to_handle_outliers:
    # Calculate quartiles and IQR
    q1 = df[col_name].quantile(0.25)
    q3 = df[col_name].quantile(0.75)
    iqr = q3 - q1

    # Define the lower and upper bounds for outliers
    lower_bound = q1 - 1.5 * iqr
    upper_bound = q3 + 1.5 * iqr

    # Replace outliers with the mean value of the column
    df[col_name] = np.where((df[col_name] > upper_bound) | (df[col_name] < lower_bound), np.mean(df[col_name]), df[col_name])

Checking Outlier After Outlier Removal:

for column in df.columns:
    fig = px.box(df, y=column, title=f'Box Plot for {column}')
    
    # Update layout to center the title and make it bold
    fig.update_layout(
        title=dict(text=f'<b>Box Plot for {column}</b>', x=0.5),
        boxmode='group'  
    )
    
    fig.show()

(7) Data Visualization:

Count Plot For Year:

fig = px.histogram(df, 
                   x = 'Year', 
                   color = 'Year',
                   text_auto = '0.2s',
                   title = '<b>Count Plot For Year</b>'
                  )
fig.update_layout(
               title_x = 0.5
                 )
fig.show()

Trend Of Life Expectancy Over Years:

fig = px.line(df.sort_values(by='Year'), x='Year', y='Life expectancy ',
              markers = True, 
              animation_frame='Country',
              color='Country', 
              symbol = 'Country', 
              title='Trend of Life Expectancy Over the Years')

#update layout to center the title and make it bold
fig.update_layout(
    title=dict(text='<b>Trend of Life Expectancy Over the Years</b>', x=0.5)
)

fig.show()

Count Plot For Status Of Country:

fig = px.histogram(df, 
                   x = 'Status', 
                   color = 'Status', 
                   text_auto = '0.2s',
                   title = '<b>Count Plot For The Status Of The Country</b>')
fig.update_layout(
                 title_x = 0.5
)
fig.show()

Life Expectancy Of Developing Country:

fig = px.histogram(developing_df, 
                   x = 'Life expectancy ', 
                   text_auto='.2s',
                   title = 'Life Expectancy Of Developing Nation')
fig.update_layout(
    xaxis_title='Life Expectancy',
    yaxis_title='Count of Ages',
    title_text='<b>Life Expectancy of Developing Countries</b>',
    title_x=0.5,  # Center title
)
fig.show()

Life Expectancy Of Developed Nation:

developed_nation = df[df['Status'] == 'Developed']
fig = px.histogram(developed_nation, 
                   x = 'Life expectancy ', 
                   text_auto='.2s',
                   title = 'Life Expectancy Of Developed Nation')
fig.update_layout(
    xaxis_title='Life Expectancy',
    yaxis_title='Count of Ages',
    title_text='<b>Life Expectancy of Developing Countries</b>',
    title_x=0.5,  # Center title
)

fig.show()

Average Adult Mortality Of Developing & Developed Countries:

df_adult_mortality = df.groupby('Status', as_index=False).agg({'Adult Mortality':'mean'})
df_adult_mortality
fig = px.bar(df_adult_mortality,
            x = 'Status',
            y = 'Adult Mortality',
            color = 'Status',
             text_auto='.2s',
            title = 'Average Adult Mortality of Developing and Developed Countries'
            )
fig.update_layout(
    title_text = '<b>Average Adult Mortality of Developing and Developed Countries</b>',
    title_x=0.5
)
fig.show()

Average Infant Deaths Of Developing & Developed Countries:

df_avg_infant_death = df.groupby('Status', as_index=False).agg({'infant deaths':'mean'})
df_avg_infant_death
fig = px.bar(df_avg_infant_death, 
             x = 'Status', 
             y = 'infant deaths', 
             color = 'Status',
             text_auto='.2s')

fig.update_layout(
                  title_text = '<b>Average Infant deaths of Developing and Developed Countries</b>',
                  title_x = 0.5
                 )
fig.show()

Average Alcohol Consumption:

df_avg_alcohol_consumpt = df.groupby('Status', as_index = False).agg({'Alcohol': 'mean'})
df_avg_alcohol_consumpt
fig = px.bar(df_avg_alcohol_consumpt, 
             x = 'Status', 
             y = 'Alcohol',
             text_auto = '0.2s'
            )
fig.update_layout(
    title_text = '<b>Average Alcohol Consumption</b>',
    title_x = 0.5
)
fig.show()

Life Expectancy vs Adult Mortality:

fig = px.scatter(df.sort_values(by = 'Year'), 
                 x = 'Life expectancy ', 
                 y = 'Adult Mortality',  
                 color = 'Country', 
                 size = 'Year',
                animation_frame = 'Country')
fig.update_layout(
    title_text = '<b>Life Expectancy vs Adult Mortality</b>',
    title_x = 0.5)

fig.show()

Life Expectancy vs Infant Deaths For Countries Over Years:

fig = px.scatter(df.sort_values(by='Year'), 
                 x = 'Life expectancy ', 
                 y = 'infant deaths',
                color = 'Country',
                size = 'Year')
fig.update_layout(title_text='<b>Life expectancy vs Infant deaths for Countries over Years</b>', 
                  title_x=0.5
                 )
fig.show()

Correlation Matrix Of Numeric Columns:

correlation_matrix = df[numeric_columns].corr()
plt.figure(figsize = (20, 20))
sns.heatmap(correlation_matrix, annot = True)

(8) Data Preprocessing:

Handling Categorical Features:

  • ‘Country’
  • ‘Status’
df['Country'].unique()
df['Status'].unique()
# Columns to apply label encoding
cols_to_encode = ['Country', 'Status']

# Apply label encoding to X
label_encoder_df = LabelEncoder()
for col in cols_to_encode:
    df[col] = label_encoder_df.fit_transform(df[col])
df['Country'].unique()
df['Status'].unique()

Splitting Features From Target:

X = df.drop('Life expectancy ', axis=1)
y = df['Life expectancy ']
X

Data Scaling:

# Columns to scale
cols_to_scale = ['Country', 'Year', 'Adult Mortality',
       'infant deaths', 'Alcohol', 'percentage expenditure', 'Hepatitis B',
       'Measles ', ' BMI ', 'under-five deaths ', 'Polio', 'Total expenditure',
       'Diphtheria ', ' HIV/AIDS', 'GDP', 'Population',
       ' thinness  1-19 years', ' thinness 5-9 years',
       'Income composition of resources', 'Schooling']

# Apply Min-Max scaling to the specified columns
scaler = MinMaxScaler()
X[cols_to_scale] = scaler.fit_transform(X[cols_to_scale])
X

Splitting Data Into Train & Test Split:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
print(f"Shape of X_train is: {X_train.shape}")
print(f"Shape of Y_train is: {y_train.shape}\n")
print(f"Shape of X_test is: {X_test.shape}")
print(f"Shape of Y_test is: {y_test.shape}")

(9) Building ANN Model.

Model Structure:

model = Sequential([
        Dense(64, activation='relu', input_dim=21),
        Dense(64, activation='relu'),
        Dense(64, activation='relu'),
        Dense(1, activation='linear')
])

Model Compiling:

model.compile(optimizer='adam', loss='mean_squared_error', metrics=['mean_absolute_error','mean_squared_error'])

Model Summary:

model.summary()

Model Visualization:

# Plot the model
plot_model(model, to_file='model_plot.png', show_shapes=True, show_layer_names=True)

Model Fitting:

history = model.fit(X_train, y_train, epochs=150, validation_split=0.2)

Train Vs Validation Loss:

# Define needed variables
tr_loss = history.history['loss']
val_loss = history.history['val_loss']
index_loss = np.argmin(val_loss)
val_lowest = val_loss[index_loss]

Epochs = [i+1 for i in range(len(tr_loss))]
loss_label = f'best epoch= {str(index_loss + 1)}'

# Plot training history
plt.figure(figsize= (20, 8))
plt.style.use('fivethirtyeight')

plt.plot(Epochs, tr_loss, 'r', label= 'Training loss')
plt.plot(Epochs, val_loss, 'g', label= 'Validation loss')
plt.scatter(index_loss + 1, val_lowest, s= 150, c= 'blue', label= loss_label)
plt.title('Training and Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()

plt.tight_layout
plt.show()
#metrics=['mean_absolute_error','mean_squared_error']

mae = history.history['mean_absolute_error']

acc_loss_df = pd.DataFrame({"Mean Absolute error" : mae,
                            "Loss" : tr_loss,
                            "Epoch" : Epochs})

acc_loss_df.style.bar(color = '#84A9AC',
                      subset = ['Mean Absolute error','Loss'])

Prediction:

y_pred = model.predict(X_test)

R2 Score:

R2 = r2_score(y_test, y_pred)
print("R2 Score=",R2 )

Leave a Reply

Your email address will not be published. Required fields are marked *