Add Skew Normal distribution (SN1 from GAMLSS) by Copilot · Pull Request #167 · simon-hirsch/ondil

Copilot · 2025-10-26T09:32:52Z

Implements the Skew Normal distribution corresponding to GAMLSS SN1, using scipy.stats.skewnorm as the underlying implementation.

Changes

New distribution class: SkewNormal

Three parameters: mu (location), sigma (scale), nu (skewness)
Uses ScipyMixin for scipy integration (pdf/cdf/ppf/rvs)
Analytical derivatives verified numerically:
- dl1_dp1: ∂L/∂μ = z/σ - ν/σ · φ(νz)/Φ(νz)
- dl1_dp1: ∂L/∂σ = -1/σ + z²/σ - νz/σ · φ(νz)/Φ(νz)
- dl1_dp1: ∂L/∂ν = z · φ(νz)/Φ(νz)
- Second and cross derivatives for Fisher information

Files added:

src/ondil/distributions/skewnormal.py
tests/distributions/skewnormal/ test structure

Usage

from ondil.distributions import SkewNormal
from ondil.estimators import OnlineDistributionalRegression

estimator = OnlineDistributionalRegression(
    distribution=SkewNormal(),
    equation={0: "all", 1: "all", 2: "intercept"},
    method="ols"
)
estimator.fit(X, y)

Compatible with existing test infrastructure (test_dist.py, test_derivatives.py).

Original prompt

Please add the skew gaussian distribution. The implementation in the GAMLSS R package is the SN1 distribution. Please have a look at the other distributions in ondil to make sure you employ as much existing infrastructure in the implementation as possible.

We employ

common testing for all distributions, see tests/distributions/test_xxx.py for distributions and cdf, ppf etc and the cross-derivatives

https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.skewnorm.html

https://github.com/gamlss-dev/gamlss.dist/blob/main/R/SN1.R

Please use the scipyMixin class to reuse scipy infrastructure in python.

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: simon-hirsch <64348015+simon-hirsch@users.noreply.github.com>

…sian-distribution

…ub.com/simon-hirsch/rolch into copilot/add-skew-gaussian-distribution

simon-hirsch · 2025-11-22T18:53:48Z

@BerriJ the derivatives match.

the fitting is not exatly the same, we end at a similar LL but with slightly different coefficients.
I see some issues with the CDF as well, but also sometimes Rs integrate throws errors in the CI/CD.

This seems to be one of the more tricky distributions to implement. I'm not sure how we want to proceed here, what do you think?

Copilot

Pull request overview

This PR adds the Skew Normal (SN1) distribution from GAMLSS to the ondil library, using scipy.stats.skewnorm as the underlying implementation via the ScipyMixin class.

Key Changes

New SkewNormal distribution class with three parameters: mu (location), sigma (scale), and nu (skewness)
Analytical first derivatives implemented for all three parameters with corresponding second derivatives using the pseudo-derivative approach
Integration with existing test infrastructure including distribution function tests and derivative verification

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
`src/ondil/distributions/skewnormal.py`	Main implementation of SkewNormal distribution with scipy integration, derivative calculations, and initial value computation
`src/ondil/distributions/__init__.py`	Adds SkewNormal to the module exports
`tests/distributions/test_dist.py`	Adds SkewNormal to special tolerance and bounds configurations; improves error message formatting
`tests/distributions/test_derivatives.py`	Adds SkewNormal to special bounds configuration; attempts to improve error message formatting (incomplete)
`tests/distributions/skewnormal/test_coefs.py`	Basic smoke test for the distribution (does not verify coefficients against R)
`tests/distributions/skewnormal/__init__.py`	Empty init file for test module
`tests/distributions/skew_normal/test_coefs.py`	Coefficient verification test using rpy2 to compare against R GAMLSS (incomplete - missing nu coefficients)
`tests/distributions/skew_normal/__init__.py`	Empty init file for duplicate test directory

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-22T19:01:03Z

src/ondil/distributions/skewnormal.py

+            )
+
+        if param == 2:
+            return (np.exp(lpdf - lcdf)) * w / nu


Potential division by zero when nu is zero. Line 110 computes w / nu where w = nu * z. When nu = 0, this results in 0 / 0 = NaN. Consider handling this edge case explicitly, either by checking for nu == 0 or using a small epsilon value to avoid division by zero.

Suggested change

return (np.exp(lpdf - lcdf)) * w / nu

# Avoid division by zero when nu == 0

epsilon = 1e-8

safe_nu = np.where(np.abs(nu) < epsilon, epsilon, nu)

return (np.exp(lpdf - lcdf)) * w / safe_nu

Copilot · 2025-11-22T19:01:03Z

src/ondil/distributions/skewnormal.py

+        return np.vstack([
+            (y + np.mean(y)) / 2,
+            np.full_like(y, np.std(y, ddof=1) / 4),
+            np.full_like(y, 0.1),
+        ]).T


The initial values computation is inconsistent with the docstring and other distributions. The docstring (lines 132-135) states:

mu = mean of y

sigma = std of y

nu = 0 (no skewness initially)

However, the actual implementation computes:

mu = (y + np.mean(y)) / 2 (line 138) - This averages each observation with the overall mean, resulting in different initial mu for each observation

sigma = np.std(y, ddof=1) / 4 (line 139) - Constant value, but divided by 4

nu = 0.1 (line 140) - Constant value of 0.1, not 0

Consider using np.tile([np.mean(y), np.std(y, ddof=1), 0], (y.shape[0], 1)) to match the docstring and follow the pattern used in other distributions (e.g., Normal, SkewT).

Suggested change

return np.vstack([

(y + np.mean(y)) / 2,

np.full_like(y, np.std(y, ddof=1) / 4),

np.full_like(y, 0.1),

]).T

return np.tile(

[np.mean(y), np.std(y, ddof=1), 0],

(y.shape[0], 1)

)

Copilot · 2025-11-22T19:01:04Z

src/ondil/distributions/skewnormal.py

+        # lcdf <- log(0.5 * (1 + pgamma(s, shape = 1/2, scale = 1) * sign(w)))
+        # dldm <- -(exp(lpdf - lcdf)) * nu/sigma + sign(z) * (abs(z)^(2 - 1))/sigma
+
+        lpdf = (1 - (1 / 2)) * np.log(2) - s - sp.gammaln(1 / 2) - np.log(2)


[nitpick] The expression (1 - (1/2)) * np.log(2) - np.log(2) in line 93 can be simplified. This evaluates to:

(1/2) * np.log(2) - np.log(2) = -(1/2) * np.log(2) = np.log(2**(-0.5)) = np.log(1/np.sqrt(2))

Consider simplifying to improve readability and potentially performance. Alternatively, add a comment explaining why this specific form is used if it's matching the R GAMLSS implementation exactly.

Suggested change

lpdf = (1 - (1 / 2)) * np.log(2) - s - sp.gammaln(1 / 2) - np.log(2)

lpdf = np.log(1 / np.sqrt(2)) - s - sp.gammaln(1 / 2)

Copilot · 2025-11-22T19:01:04Z

src/ondil/distributions/skewnormal.py

+
+        if param == 2:
+            return (np.exp(lpdf - lcdf)) * w / nu
+


The method doesn't explicitly handle invalid param values. If param is not 0, 1, or 2, the function will silently return None. Following the pattern in other distributions (e.g., InverseGaussian), add an explicit error check after the conditional blocks:

raise ValueError("param must be 0 (mu), 1 (sigma), or 2 (nu)")

Suggested change

raise ValueError("param must be 0 (mu), 1 (sigma), or 2 (nu)")

Copilot · 2025-11-22T19:01:04Z

tests/distributions/test_derivatives.py

            assert np.allclose(derivative_mapping[key](), R_list.rx2(key)), (
                f"Derivative {key} doesn't match for {distribution.__class__.__name__}"
+                f"Got: {derivative_mapping[key]().round(3)} for Python \n"
+                f"Got: {np.asarray(R_list.rx2(key)).round(3)} for R \n"
+                f"Difference: {(derivative_mapping[key]() - R_list.rx2(key)).round(3)}"
            )


The assertion error message is incomplete. Line 108 ends with a string that doesn't include a newline, and lines 109-111 appear to be separate strings that won't be concatenated. Add a newline character \n at the end of line 108's string to properly connect it with the following lines, similar to how it's done in test_dist.py lines 97-100.

Copilot · 2025-11-22T19:01:05Z

tests/distributions/skew_normal/test_coefs.py

+        nu.formula = ~cyl + hp,
+        family=gamlss.dist::{dist.corresponding_gamlss}(),
+        data=as.data.frame(mtcars),
+        control=gamlss.control(n.cyc=100, trace=TRUE)
+    )
+    list(
+        "mu" = coef(model, "mu"),
+        "sigma" = coef(model, "sigma")
+    )


The R GAMLSS model is fitted with nu.formula = ~cyl + hp (line 22), but the returned list only includes coefficients for "mu" and "sigma" (lines 28-29). The "nu" coefficients should also be extracted and verified, or the nu.formula should be simplified to ~1 if nu coefficients are not being tested.

Initial plan

1596fcc

Copilot AI assigned Copilot and simon-hirsch Oct 26, 2025

Copilot started work on behalf of simon-hirsch October 26, 2025 09:33 View session

Add SkewNormal distribution implementation

b51d434

Co-authored-by: simon-hirsch <64348015+simon-hirsch@users.noreply.github.com>

simon-hirsch added the distributions label Oct 26, 2025

Copilot AI and others added 2 commits October 26, 2025 09:44

[AUTOMATED] Style fixes by ruff

94ecbd1

Fix derivative for mu parameter in SkewNormal distribution

82fd969

Co-authored-by: simon-hirsch <64348015+simon-hirsch@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Add skew gaussian distribution implementation~~ Add Skew Normal distribution (SN1 from GAMLSS) Oct 26, 2025

Copilot AI requested a review from simon-hirsch October 26, 2025 09:54

Copilot finished work on behalf of simon-hirsch October 26, 2025 09:54

simon-hirsch and others added 12 commits October 29, 2025 17:06

Adjust tests

bbcffb1

Merge remote-tracking branch 'origin/main' into copilot/add-skew-gaus…

2cba930

…sian-distribution

Fix test syntax

4525d4a

Adjust test output

12b62dc

Better debug messages

f02c6ac

Fix derivatives for SkewNormal distribution

827fd5a

Add test for fitting skew normal distribution with gamlss in R

9dcd062

[AUTOMATED] Style fixes by ruff

8b9ce0e

Name test function for SkewNormal distribution

b943983

Merge branch 'copilot/add-skew-gaussian-distribution' of https://gith…

80516fb

…ub.com/simon-hirsch/rolch into copilot/add-skew-gaussian-distribution

Align start values to R's gamlss defaults

e86bfbd

[AUTOMATED] Style fixes by ruff

081335b

simon-hirsch approved these changes Nov 22, 2025

View reviewed changes

simon-hirsch marked this pull request as ready for review November 22, 2025 18:53

simon-hirsch requested review from BerriJ and Copilot November 22, 2025 18:55

Copilot started reviewing on behalf of simon-hirsch November 22, 2025 18:56 View session

Copilot finished reviewing on behalf of simon-hirsch November 22, 2025 19:00

Copilot AI reviewed Nov 22, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Skew Normal distribution (SN1 from GAMLSS)#167

Add Skew Normal distribution (SN1 from GAMLSS)#167
Copilot wants to merge 16 commits intomainfrom
copilot/add-skew-gaussian-distribution

Copilot AI commented Oct 26, 2025 •

edited

Loading

Uh oh!

simon-hirsch commented Nov 22, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Nov 22, 2025

Uh oh!

Copilot AI Nov 22, 2025

Uh oh!

Copilot AI Nov 22, 2025

Uh oh!

Copilot AI Nov 22, 2025

Uh oh!

Copilot AI Nov 22, 2025

Uh oh!

Copilot AI Nov 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-            return (np.exp(lpdf - lcdf)) * w / nu
+            # Avoid division by zero when nu == 0
+            epsilon = 1e-8
+            safe_nu = np.where(np.abs(nu) < epsilon, epsilon, nu)
+            return (np.exp(lpdf - lcdf)) * w / safe_nu

	lpdf = (1 - (1 / 2)) * np.log(2) - s - sp.gammaln(1 / 2) - np.log(2)
	lpdf = np.log(1 / np.sqrt(2)) - s - sp.gammaln(1 / 2)


	raise ValueError("param must be 0 (mu), 1 (sigma), or 2 (nu)")

Conversation

Copilot AI commented Oct 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Usage

Uh oh!

simon-hirsch commented Nov 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes

Reviewed changes

Uh oh!

Copilot AI Nov 22, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 22, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 22, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 22, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 22, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 22, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Oct 26, 2025 •

edited

Loading

simon-hirsch commented Nov 22, 2025 •

edited

Loading