Skip to content

Add Skew Normal distribution (SN1 from GAMLSS)#167

Open
Copilot wants to merge 16 commits intomainfrom
copilot/add-skew-gaussian-distribution
Open

Add Skew Normal distribution (SN1 from GAMLSS)#167
Copilot wants to merge 16 commits intomainfrom
copilot/add-skew-gaussian-distribution

Conversation

Copy link
Contributor

Copilot AI commented Oct 26, 2025

Implements the Skew Normal distribution corresponding to GAMLSS SN1, using scipy.stats.skewnorm as the underlying implementation.

Changes

New distribution class: SkewNormal

  • Three parameters: mu (location), sigma (scale), nu (skewness)
  • Uses ScipyMixin for scipy integration (pdf/cdf/ppf/rvs)
  • Analytical derivatives verified numerically:
    • dl1_dp1: ∂L/∂μ = z/σ - ν/σ · φ(νz)/Φ(νz)
    • dl1_dp1: ∂L/∂σ = -1/σ + z²/σ - νz/σ · φ(νz)/Φ(νz)
    • dl1_dp1: ∂L/∂ν = z · φ(νz)/Φ(νz)
    • Second and cross derivatives for Fisher information

Files added:

  • src/ondil/distributions/skewnormal.py
  • tests/distributions/skewnormal/ test structure

Usage

from ondil.distributions import SkewNormal
from ondil.estimators import OnlineDistributionalRegression

estimator = OnlineDistributionalRegression(
    distribution=SkewNormal(),
    equation={0: "all", 1: "all", 2: "intercept"},
    method="ols"
)
estimator.fit(X, y)

Compatible with existing test infrastructure (test_dist.py, test_derivatives.py).

Original prompt

Please add the skew gaussian distribution. The implementation in the GAMLSS R package is the SN1 distribution. Please have a look at the other distributions in ondil to make sure you employ as much existing infrastructure in the implementation as possible.

We employ

  • common testing for all distributions, see tests/distributions/test_xxx.py for distributions and cdf, ppf etc and the cross-derivatives

https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.skewnorm.html

https://github.com/gamlss-dev/gamlss.dist/blob/main/R/SN1.R

Please use the scipyMixin class to reuse scipy infrastructure in python.


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: simon-hirsch <64348015+simon-hirsch@users.noreply.github.com>
Copilot AI and others added 2 commits October 26, 2025 09:44
Co-authored-by: simon-hirsch <64348015+simon-hirsch@users.noreply.github.com>
Copilot AI changed the title [WIP] Add skew gaussian distribution implementation Add Skew Normal distribution (SN1 from GAMLSS) Oct 26, 2025
Copilot AI requested a review from simon-hirsch October 26, 2025 09:54
@simon-hirsch
Copy link
Owner

simon-hirsch commented Nov 22, 2025

@BerriJ the derivatives match.

  • the fitting is not exatly the same, we end at a similar LL but with slightly different coefficients.
  • I see some issues with the CDF as well, but also sometimes Rs integrate throws errors in the CI/CD.

This seems to be one of the more tricky distributions to implement. I'm not sure how we want to proceed here, what do you think?

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds the Skew Normal (SN1) distribution from GAMLSS to the ondil library, using scipy.stats.skewnorm as the underlying implementation via the ScipyMixin class.

Key Changes

  • New SkewNormal distribution class with three parameters: mu (location), sigma (scale), and nu (skewness)
  • Analytical first derivatives implemented for all three parameters with corresponding second derivatives using the pseudo-derivative approach
  • Integration with existing test infrastructure including distribution function tests and derivative verification

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/ondil/distributions/skewnormal.py Main implementation of SkewNormal distribution with scipy integration, derivative calculations, and initial value computation
src/ondil/distributions/__init__.py Adds SkewNormal to the module exports
tests/distributions/test_dist.py Adds SkewNormal to special tolerance and bounds configurations; improves error message formatting
tests/distributions/test_derivatives.py Adds SkewNormal to special bounds configuration; attempts to improve error message formatting (incomplete)
tests/distributions/skewnormal/test_coefs.py Basic smoke test for the distribution (does not verify coefficients against R)
tests/distributions/skewnormal/__init__.py Empty init file for test module
tests/distributions/skew_normal/test_coefs.py Coefficient verification test using rpy2 to compare against R GAMLSS (incomplete - missing nu coefficients)
tests/distributions/skew_normal/__init__.py Empty init file for duplicate test directory

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

)

if param == 2:
return (np.exp(lpdf - lcdf)) * w / nu
Copy link

Copilot AI Nov 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential division by zero when nu is zero. Line 110 computes w / nu where w = nu * z. When nu = 0, this results in 0 / 0 = NaN. Consider handling this edge case explicitly, either by checking for nu == 0 or using a small epsilon value to avoid division by zero.

Suggested change
return (np.exp(lpdf - lcdf)) * w / nu
# Avoid division by zero when nu == 0
epsilon = 1e-8
safe_nu = np.where(np.abs(nu) < epsilon, epsilon, nu)
return (np.exp(lpdf - lcdf)) * w / safe_nu

Copilot uses AI. Check for mistakes.
Comment on lines +137 to +141
return np.vstack([
(y + np.mean(y)) / 2,
np.full_like(y, np.std(y, ddof=1) / 4),
np.full_like(y, 0.1),
]).T
Copy link

Copilot AI Nov 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initial values computation is inconsistent with the docstring and other distributions. The docstring (lines 132-135) states:

  • mu = mean of y
  • sigma = std of y
  • nu = 0 (no skewness initially)

However, the actual implementation computes:

  • mu = (y + np.mean(y)) / 2 (line 138) - This averages each observation with the overall mean, resulting in different initial mu for each observation
  • sigma = np.std(y, ddof=1) / 4 (line 139) - Constant value, but divided by 4
  • nu = 0.1 (line 140) - Constant value of 0.1, not 0

Consider using np.tile([np.mean(y), np.std(y, ddof=1), 0], (y.shape[0], 1)) to match the docstring and follow the pattern used in other distributions (e.g., Normal, SkewT).

Suggested change
return np.vstack([
(y + np.mean(y)) / 2,
np.full_like(y, np.std(y, ddof=1) / 4),
np.full_like(y, 0.1),
]).T
return np.tile(
[np.mean(y), np.std(y, ddof=1), 0],
(y.shape[0], 1)
)

Copilot uses AI. Check for mistakes.
# lcdf <- log(0.5 * (1 + pgamma(s, shape = 1/2, scale = 1) * sign(w)))
# dldm <- -(exp(lpdf - lcdf)) * nu/sigma + sign(z) * (abs(z)^(2 - 1))/sigma

lpdf = (1 - (1 / 2)) * np.log(2) - s - sp.gammaln(1 / 2) - np.log(2)
Copy link

Copilot AI Nov 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The expression (1 - (1/2)) * np.log(2) - np.log(2) in line 93 can be simplified. This evaluates to:

  • (1/2) * np.log(2) - np.log(2) = -(1/2) * np.log(2) = np.log(2**(-0.5)) = np.log(1/np.sqrt(2))

Consider simplifying to improve readability and potentially performance. Alternatively, add a comment explaining why this specific form is used if it's matching the R GAMLSS implementation exactly.

Suggested change
lpdf = (1 - (1 / 2)) * np.log(2) - s - sp.gammaln(1 / 2) - np.log(2)
lpdf = np.log(1 / np.sqrt(2)) - s - sp.gammaln(1 / 2)

Copilot uses AI. Check for mistakes.

if param == 2:
return (np.exp(lpdf - lcdf)) * w / nu

Copy link

Copilot AI Nov 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method doesn't explicitly handle invalid param values. If param is not 0, 1, or 2, the function will silently return None. Following the pattern in other distributions (e.g., InverseGaussian), add an explicit error check after the conditional blocks:

raise ValueError("param must be 0 (mu), 1 (sigma), or 2 (nu)")
Suggested change
raise ValueError("param must be 0 (mu), 1 (sigma), or 2 (nu)")

Copilot uses AI. Check for mistakes.
Comment on lines 107 to 112
assert np.allclose(derivative_mapping[key](), R_list.rx2(key)), (
f"Derivative {key} doesn't match for {distribution.__class__.__name__}"
f"Got: {derivative_mapping[key]().round(3)} for Python \n"
f"Got: {np.asarray(R_list.rx2(key)).round(3)} for R \n"
f"Difference: {(derivative_mapping[key]() - R_list.rx2(key)).round(3)}"
)
Copy link

Copilot AI Nov 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assertion error message is incomplete. Line 108 ends with a string that doesn't include a newline, and lines 109-111 appear to be separate strings that won't be concatenated. Add a newline character \n at the end of line 108's string to properly connect it with the following lines, similar to how it's done in test_dist.py lines 97-100.

Copilot uses AI. Check for mistakes.
Comment on lines +22 to +30
nu.formula = ~cyl + hp,
family=gamlss.dist::{dist.corresponding_gamlss}(),
data=as.data.frame(mtcars),
control=gamlss.control(n.cyc=100, trace=TRUE)
)
list(
"mu" = coef(model, "mu"),
"sigma" = coef(model, "sigma")
)
Copy link

Copilot AI Nov 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The R GAMLSS model is fitted with nu.formula = ~cyl + hp (line 22), but the returned list only includes coefficients for "mu" and "sigma" (lines 28-29). The "nu" coefficients should also be extracted and verified, or the nu.formula should be simplified to ~1 if nu coefficients are not being tested.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants