Linear Regression – Assumption – 5 (How To Detect & Avoid Autocorrelation In Regression ?)


How To Detect & Avoid Autocorrelation In Regression ?

Table Of Contents:

  1. Methods To Detect Autocorrelation In Error Term?
  2. Methods To Avoid The Autocorrelation In Error Term.

(1) Methods To Detect Autocorrelation In Error Term?

  1. Residual Plot (vs. time or observation order)
  2. Durbin-Watson Test
  3. Autocorrelation Function (ACF) Plot
  4. Ljung-Box Test (for multiple lags)

(1.1) Residual Plot To Detect Autocorrelation In Error Term?

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm

# Simulate ordered data (e.g., time series)
np.random.seed(42)
n = 100
advertising = np.random.normal(1000, 200, n)

# Introduce autocorrelation in error terms
errors = np.zeros(n)
rho = 0.8  # autocorrelation factor
errors[0] = np.random.normal(0, 10)
for i in range(1, n):
    errors[i] = rho * errors[i - 1] + np.random.normal(0, 10)

# Generate sales (dependent variable)
sales = 50 + 0.5 * advertising + errors

# Fit a linear regression model
X = sm.add_constant(advertising)
model = sm.OLS(sales, X).fit()
residuals = model.resid

# 📈 Plot residuals over time
plt.figure(figsize=(10, 5))
plt.plot(residuals, marker='o')
plt.title("Residual Plot Over Time")
plt.xlabel("Month (Observation Order)")
plt.ylabel("Residual")
plt.axhline(0, linestyle="--", color="gray")
plt.grid(True)
plt.show()

(1.2) Durbin-Watson Test To Detect Autocorrelation In Error Term?

import pandas as pd
import statsmodels.api as sm
from statsmodels.stats.stattools import durbin_watson
import seaborn as sns

# Load or simulate time series data
df = sns.load_dataset("flights")  # Just for example
df = df.rename(columns={"passengers": "sales"})
df["month_id"] = range(len(df))
X = sm.add_constant(df["month_id"])
y = df["sales"]

# Fit linear regression model
model = sm.OLS(y, X).fit()

# Get residuals
residuals = model.resid

# Durbin-Watson Test
dw_stat = durbin_watson(residuals)
print(f"Durbin-Watson Statistic: {dw_stat:.4f}")

(1.3) Autocorrelation Function (ACF) Plot

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.graphics.tsaplots import plot_acf

# Set seed for reproducibility
np.random.seed(42)

# Create autocorrelated residuals (AR(1) process)
n = 100
residuals = [0]
for t in range(1, n):
    residuals.append(0.8 * residuals[t-1] + np.random.normal())

residuals = np.array(residuals)

# Plot residuals
plt.figure(figsize=(10, 4))
plt.plot(residuals, label='Residuals')
plt.title("Autocorrelated Residuals")
plt.xlabel("Time")
plt.ylabel("Residual")
plt.grid(True)
plt.legend()
plt.show()

# Plot ACF
plot_acf(residuals, lags=20)
plt.title("ACF Plot of Residuals")
plt.show()

(2) Methods To Avoid Autocorrelation In Error Term?

Leave a Reply

Your email address will not be published. Required fields are marked *