Types of Linear Regression

There are two main types of linear regression and two additional types, Let's take a look at them.

1. Simple Linear Regression

Used when there is only one independent variable. For example, predicting a person’s based on their height.

This is the simplest form of linear regression, and it involves only one independent variable and one dependent variable. The equation for simple linear regression is: y=β0+β1xy=\beta_{0} ​ +\beta _{1} ​* x

where:

  • yy is the dependent variable

  • xx is the independent variable

  • β0β_0 is the intercept

  • β1β_1 is the slope

Example:

Predicting house price based on house size.

2. Multiple Linear Regression

Involves two or more independent variables, such as predicting house prices based on factors like area, number of rooms, and location.

This involves more than one independent variable and one dependent variable. The equation for multiple linear regression is: y=β0+β1x1+β2x2++βnxny = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \dots + \beta_n x_n

where:

  • yy is the dependent variable

  • x1,x2,,xnx1, x2, …, xn are the independent variables

  • β0β_0 is the intercept

  • β1,β2,,βnβ_1, β_2, …, β_n are the slopes

Example:

Predicting motor MPG (Miles Per Gallon) based on horse power and weight.

Multiple Linear Regression

3. Regularized Linear Regression

Techniques like Lasso (Least Absolute Shrinkage and Selection Operator) and Ridge Regression add penalties to the model for higher coefficients, reducing overfitting and improving model generalization.

Regularization introduces a penalty term to the linear regression cost function, helping prevent overfitting by constraining large coefficients.

1. Lasso Regression (Least Absolute Shrinkage and Selection Operator)

In Lasso regression, the penalty added is proportional to the absolute value of each coefficient. The cost function to minimize is:

J(β)=i=1n(yiyi^)2+λj=1pβjJ(\beta) = \sum_{i=1}^{n} (y_i - \hat{y_i})^2 + \lambda \sum_{j=1}^{p} |\beta_j| where:

J(β)J(\beta) is the cost function,

yiy_i is the actual value,

yi^\hat{y_i} is the predicted value,

λ\lambda is the regularization parameter controlling the penalty’s strength,

βj\beta_j represents the coefficients of each feature.

The L1 -norm penalty j=1pβj\sum_{j=1}^{p} |\beta_j| often results in some coefficients being reduced to zero, effectively performing feature selection.

2. Ridge Regression

Ridge regression uses an L2 -norm penalty, where the sum of the squared coefficients is added to the cost function:

J(β)=i=1n(yiyi^)2+λj=1pβj2J(\beta) = \sum_{i=1}^{n} (y_i - \hat{y_i})^2 + \lambda \sum_{j=1}^{p} \beta_j^2 Here:

j=1pβj2\sum_{j=1}^{p} \beta_j^2 is the L2 -norm, which penalizes large coefficients more smoothly than Lasso.

This penalty tends to reduce the impact of all coefficients but doesn’t force any to zero, making Ridge regression useful when all features are considered potentially valuable.

These formulas are essential for understanding how polynomial and regularized linear regressions adjust the model’s complexity and enhance its ability to generalize across diverse datasets.

Last updated