Conversation
|
Hi @insel-maz, Thank you for your interest in MathPHP! And thank you for your pull request. Please allow me some time to review the PR and understand the new functionality. Thanks again, |
|
Hi @markrogoyski, I've just pushed a new commit to update some code to match the new shape of $points that I introduced earlier. Thank you |
|
Hi @insel-maz, Apologies on the delay in reviewing this. It is on my to-do list. I will try to get back to you soon with feedback. Thanks again for the contribution. |
|
Sorry for the delay in reviewing. Thanks again for this contribution. The core coefficient computation (design matrix + OLS) seems correct and well-structured. I think there may be one issue with regression statistics: Here's a minimal test that demonstrates the problem: /**
* Demonstrates that regression statistics are wrong for k > 1.
* Oracle: statsmodels OLS, scikit-learn LinearRegression
*/
public function testDegreesOfFreedomAndStatistics(): void
{
// y ≈ 2x₁ + 3x₂ + 5 with noise, k=2 features, n=10 points
$points = [
[[1.0, 2.0], 13.3], [[3.0, 1.0], 13.8],
[[2.0, 4.0], 21.5], [[5.0, 2.0], 20.9],
[[4.0, 3.0], 22.4], [[1.0, 5.0], 21.7],
[[6.0, 1.0], 20.2], [[3.0, 3.0], 20.1],
[[2.0, 6.0], 26.6], [[4.0, 4.0], 25.3],
];
$regression = new Multilinear($points);
// Degrees of freedom: should be n - k - 1 = 10 - 2 - 1 = 7
// Currently returns 8 (n - 1 - 1, using p=1 instead of p=k)
$this->assertEquals(7, $regression->degreesOfFreedom());
// F-statistic: should be 705.22 (statsmodels OLS)
// Currently returns ~1612 (wrong MSR and MSE)
$this->assertEqualsWithDelta(705.2243, $regression->fStatistic(), 0.001);
}The root cause is on line 112 of LeastSquares.php: $this->p = $order; // always 1 for multilinear, but should be k
A possible fix: derive p from the actual design matrix size after construction, instead of from $X = $this->createDesignMatrix($xs);
$this->p = $X->getN() - $this->fit_constant; // actual number of predictors
$this->ν = $this->n - $this->p - $this->fit_constant;I think this would work for both polynomial and multilinear cases since the design matrix always has the correct number of columns. I also noticed that I can provide additional unit tests to help verify what the expected results should be if that would be helpful. |
This PR introduces MathPHP\Statistics\Regression\Multilinear for performing multilinear regression. The change is intended to be non-breaking.
If you would like me to make any further changes, please let me know.
Thank you very much for your comprehensive library!