Multiple Linear Regression is a statistical technique that models the relationship between a dependent variable and multiple independent variables by fitting a linear equation to observed data. This method is commonly used for predictive modeling, where the aim is to predict the value of the dependent variable based on values of independent variables.
The general equation of multiple linear regression is: Y=b0+b1X1+b2X2+⋯+bnXn+ϵ
where:
Y is the dependent variable (target).
X1,X2,…,Xn are independent variables (features).
b0,b1,…,bn are the coefficients (weights) of the model.
ϵ is the error term.
In PHP, we can implement multiple linear regression using two popular libraries: RubixML and PHP-ML. Let's dive into each.
Implementing Multiple Linear Regression with Rubix ML
Example: Predicting House Prices
Let's say we want to predict house prices based on the following features:
Number of rooms
Square footage
Distance to the nearest city center
Step 1: Prepare the Data
useRubix\ML\Datasets\Labeled;useRubix\ML\Datasets\Unlabeled;useRubix\ML\Regressors\Ridge;// Sample data: [rooms, size, miles to city center] => Price$samples = [ [3,1500,5],// 3 rooms, 1500 sqft, 5 miles to city center [4,2000,3], [2,800,10], [5,2500,1], [3,1600,4],];// House prices$labels = [300000,500000,200000,750000,350000];// Create new dataset with float values$dataset =newLabeled($samples, $labels);
Step 2: Initialize the Model
RubixML offers several regression algorithms. For this example, we'll use Ridge Regression, which is a form of linear regression suitable for multicollinearity (when features are correlated).
// Alpha parameter for regularization$estimator =newRidge(1e-3);
Step 3: Train the Model
$estimator->train($dataset);
Step 4: Make Predictions
Now, we can make predictions on new data points.
// Create new samples for prediction// Important: Each sample must be its own array within the main array$newSamples = [ [4,1800,3],// First house [2,1200,8] // Second house];// Create Unlabeled dataset for prediction$newDataset =newUnlabeled($newSamples);// Make predictions$predictions = $estimator->predict($newDataset);// Print predictionsecho"Predictions for new houses:\n";foreach ($predictions as $index => $prediction) {echosprintf("House %d: $%s\n", $index +1,number_format($prediction,2));}
Full Code:
Full Code of Example
useRubix\ML\Datasets\Labeled;useRubix\ML\Datasets\Unlabeled;useRubix\ML\Regressors\Ridge;useRubix\ML\CrossValidation\Metrics\MeanAbsoluteError;useRubix\ML\CrossValidation\Metrics\MeanSquaredError;// Sample data: [rooms, size, miles to city center] => Price$samples = [ [3,1500,5],// 3 rooms, 1500 sqft, 5 miles to city center [4,2000,3], [2,800,10], [5,2500,1], [3,1600,4],];// House prices$labels = [300000,500000,200000,750000,350000];// Create new dataset with float values$dataset =newLabeled($samples, $labels);// Alpha parameter for regularization$estimator =newRidge(1e-3); $estimator->train($dataset);// Create new samples for prediction// Important: Each sample must be its own array within the main array$newSamples = [ [4,1800,3],// First house [2,1200,8] // Second house];// Create Unlabeled dataset for prediction$newDataset =newUnlabeled($newSamples);// Make predictions$predictions = $estimator->predict($newDataset);// Print predictionsecho"Predictions for new houses:\n";foreach ($predictions as $index => $prediction) {echosprintf("House %d: $%s\n", $index +1,number_format($prediction,2));}//// Calculate error metrics for actual values$actualValues = [450000,280000];echo"\n\nMetrics:";$mseMetric =newMeanSquaredError();$score = $mseMetric->score($predictions, $actualValues);echo"\nMean Squared Error: $".number_format(abs($score),2);echo"\nRoot Mean Squared Error: $".number_format(sqrt(abs($score)),2);$maeMetric =newMeanAbsoluteError();$score = $maeMetric->score($predictions, $actualValues);echo"\nMean Absolute Error: $".number_format(abs($score),2);
Result:
Predictions for new houses:
House 1: $577,025.89
House 2: $351,275.14
Chart:
Implementing Multiple Linear Regression with PHP-ML
Example: Predicting House Prices
Similar to our example with RubixML, we'll predict house prices based on rooms, square footage, and distance to the city center.
PHP-ML provides a Least Squares regression algorithm, which fits a linear model to minimize the sum of squared residuals.
$regressor =newLeastSquares();
Step 3: Train the Model
$regressor->train($samples, $labels);
Step 4: Make Predictions
You can now use the trained model to predict prices for new house data.
$newData = [ [4,1800,3], [2,1200,8],];$predictions = $regressor->predict($newData);// Print predictionsecho"Predictions for new houses:\n";echo"--------------------------\n";foreach ($predictions as $index => $prediction) {echosprintf("House %d: $%s\n", $index +1,number_format($prediction,2));}
Full Code:
Full Code of Example
usePhpml\Dataset\ArrayDataset;usePhpml\Metric\Regression;usePhpml\Regression\LeastSquares;$samples = [ [3,1500,5], [4,2000,3], [2,800,10], [5,2500,1], [3,1600,4],];$labels = [300000,500000,200000,750000,350000];$regressor =newLeastSquares();$regressor->train($samples, $labels);$newData = [ [4,1800,3], [2,1200,8],];$predictions = $regressor->predict($newData);// Print predictionsecho"Predictions for new houses:\n";echo"--------------------------\n";foreach ($predictions as $index => $prediction) {echosprintf("House %d: $%s\n", $index +1,number_format($prediction,2));}// Calculate error metrics for actual values$actualValues = [450000,280000];// Calculate error metrics for actual valuesecho"\n\nMetrics:";echo"\n-------";$mse =Regression::meanSquaredError($predictions, $actualValues);echo"\nMean Squared Error: $".number_format($mse,2);echo"\nRoot Mean Squared Error: $".number_format(sqrt($mse),2);$mae =Regression::meanAbsoluteError($predictions, $actualValues);echo"\nMean Absolute Error: $".number_format(abs($mae),2);
Result:
Predictions for new houses:
House 1: $577,026.62
House 2: $351,276.24
Chart:
Comparing RubixML and PHP-ML
Both libraries provide similar functionality for linear regression, with differences in the underlying algorithms and options available.
Feature
RubixML
PHP-ML
Model
Ridge Regression (L2)
Least Squares
Data Input Format
Dataset Objects
ArrayDataset
Flexibility
High
Moderate
Model Variety
Broad
Limited
Installation
rubix/ml
php-ai/php-ml
Conclusion
This chapter demonstrates how to implement multiple linear regression in PHP using RubixML and PHP-ML libraries. Each library has strengths: RubixML offers flexibility and a broader set of machine learning algorithms, while PHP-ML provides a straightforward interface for quick prototyping. By following the examples, you can start building your predictive models in PHP and apply them to various real-world scenarios like price prediction, trend analysis, and forecasting.