Multiple Linear Regression is a statistical technique that models the relationship between a dependent variable and multiple independent variables by fitting a linear equation to observed data. This method is commonly used for predictive modeling, where the aim is to predict the value of the dependent variable based on values of independent variables.
The general equation of multiple linear regression is: Y=b0+b1X1+b2X2+⋯+bnXn+ϵ
where:
Y is the dependent variable (target).
X1,X2,…,Xn are independent variables (features).
b0,b1,…,bn are the coefficients (weights) of the model.
ϵ is the error term.
In PHP, we can implement multiple linear regression using two popular libraries: RubixML and PHP-ML. Let's dive into each.
Implementing Multiple Linear Regression with Rubix ML
Example: Predicting House Prices
Let's say we want to predict house prices based on the following features:
Number of rooms
Square footage
Distance to the nearest city center
Step 1: Prepare the Data
use Rubix\ML\Datasets\Labeled;
use Rubix\ML\Datasets\Unlabeled;
use Rubix\ML\Regressors\Ridge;
// Sample data: [rooms, size, miles to city center] => Price
$samples = [
[3, 1500, 5], // 3 rooms, 1500 sqft, 5 miles to city center
[4, 2000, 3],
[2, 800, 10],
[5, 2500, 1],
[3, 1600, 4],
];
// House prices
$labels = [300000, 500000, 200000, 750000, 350000];
// Create new dataset with float values
$dataset = new Labeled($samples, $labels);
Step 2: Initialize the Model
RubixML offers several regression algorithms. For this example, we'll use Ridge Regression, which is a form of linear regression suitable for multicollinearity (when features are correlated).
// Alpha parameter for regularization
$estimator = new Ridge(1e-3);
Step 3: Train the Model
$estimator->train($dataset);
Step 4: Make Predictions
Now, we can make predictions on new data points.
// Create new samples for prediction
// Important: Each sample must be its own array within the main array
$newSamples = [
[4, 1800, 3], // First house
[2, 1200, 8] // Second house
];
// Create Unlabeled dataset for prediction
$newDataset = new Unlabeled($newSamples);
// Make predictions
$predictions = $estimator->predict($newDataset);
// Print predictions
echo "Predictions for new houses:\n";
foreach ($predictions as $index => $prediction) {
echo sprintf(
"House %d: $%s\n",
$index + 1,
number_format($prediction, 2)
);
}
Full Code:
Full Code of Example
use Rubix\ML\Datasets\Labeled;
use Rubix\ML\Datasets\Unlabeled;
use Rubix\ML\Regressors\Ridge;
use Rubix\ML\CrossValidation\Metrics\MeanAbsoluteError;
use Rubix\ML\CrossValidation\Metrics\MeanSquaredError;
// Sample data: [rooms, size, miles to city center] => Price
$samples = [
[3, 1500, 5], // 3 rooms, 1500 sqft, 5 miles to city center
[4, 2000, 3],
[2, 800, 10],
[5, 2500, 1],
[3, 1600, 4],
];
// House prices
$labels = [300000, 500000, 200000, 750000, 350000];
// Create new dataset with float values
$dataset = new Labeled($samples, $labels);
// Alpha parameter for regularization
$estimator = new Ridge(1e-3);
$estimator->train($dataset);
// Create new samples for prediction
// Important: Each sample must be its own array within the main array
$newSamples = [
[4, 1800, 3], // First house
[2, 1200, 8] // Second house
];
// Create Unlabeled dataset for prediction
$newDataset = new Unlabeled($newSamples);
// Make predictions
$predictions = $estimator->predict($newDataset);
// Print predictions
echo "Predictions for new houses:\n";
foreach ($predictions as $index => $prediction) {
echo sprintf(
"House %d: $%s\n",
$index + 1,
number_format($prediction, 2)
);
}
//// Calculate error metrics for actual values
$actualValues = [450000, 280000];
echo "\n\nMetrics:";
$mseMetric = new MeanSquaredError();
$score = $mseMetric->score($predictions, $actualValues);
echo "\nMean Squared Error: $" . number_format(abs($score), 2);
echo "\nRoot Mean Squared Error: $" . number_format(sqrt(abs($score)), 2);
$maeMetric = new MeanAbsoluteError();
$score = $maeMetric->score($predictions, $actualValues);
echo "\nMean Absolute Error: $" . number_format(abs($score), 2);
Result:
Predictions for new houses:
House 1: $577,025.89
House 2: $351,275.14
Chart:
Implementing Multiple Linear Regression with PHP-ML
Example: Predicting House Prices
Similar to our example with RubixML, we'll predict house prices based on rooms, square footage, and distance to the city center.
Predictions for new houses:
House 1: $577,026.62
House 2: $351,276.24
Comparing RubixML and PHP-ML
Both libraries provide similar functionality for linear regression, with differences in the underlying algorithms and options available.
Feature
RubixML
PHP-ML
Model
Ridge Regression (L2)
Least Squares
Data Input Format
Dataset Objects
ArrayDataset
Flexibility
High
Moderate
Model Variety
Broad
Limited
Installation
rubix/ml
php-ai/php-ml
Conclusion
This chapter demonstrates how to implement multiple linear regression in PHP using RubixML and PHP-ML libraries. Each library has strengths: RubixML offers flexibility and a broader set of machine learning algorithms, while PHP-ML provides a straightforward interface for quick prototyping. By following the examples, you can start building your predictive models in PHP and apply them to various real-world scenarios like price prediction, trend analysis, and forecasting.