Add bivariate copula implementation#137
Add bivariate copula implementation#137chrischu12 wants to merge 91 commits intosimon-hirsch:mainfrom
Conversation
simon-hirsch
left a comment
There was a problem hiding this comment.
I miss the base class for the copulas, because now all distributions need to implement the param_link parameter. This should be solved by a CopulaMixin class that implements the necessary methods.
There was a problem hiding this comment.
Pull Request Overview
This PR adds support for bivariate copulas by introducing new link functions and multivariate copula distributions, and integrates them into the online multivariate distribution regression estimator.
- Introduce
FisherZLinkandTauToParincopulalinks.py - Extend base
Distributionto acceptparam_linksand update existing distributions - Add
BiCopNormalandMarginalCopulaclasses for Gaussian copula modeling - Integrate copula-based parameter handling into
online_mvdistreg
Reviewed Changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/ondil/link/copulalinks.py | New Fisher Z and Kendall‐tau link functions |
| src/ondil/link/init.py | Export copula link functions |
| src/ondil/base/distribution.py | Add support for param_links in base Distribution |
| src/ondil/distributions/normal.py | Update constructor signature for param_links |
| src/ondil/distributions/mv_t_low_rank.py | Pass param_links through to superclass |
| src/ondil/distributions/mv_normal_chol.py | Pass param_links through to superclass |
| src/ondil/distributions/mv_marg_cop.py | New MarginalCopula distribution |
| src/ondil/distributions/bicop_normal.py | New BiCopNormal Gaussian copula distribution |
| src/ondil/distributions/init.py | Register new copula distributions |
| src/ondil/estimators/online_mvdistreg.py | Integrate copula option and branching logic into fitting routine |
Comments suppressed due to low confidence (3)
src/ondil/distributions/normal.py:47
- The
param_linksparameter should be typed asdict[int, LinkFunction]instead ofLinkFunction, and default to an empty dict. Update the annotation toparam_links: dict[int, LinkFunction] = {}.
param_links: LinkFunction = {},
src/ondil/distributions/bicop_normal.py:33
- The constructor uses a singular
param_linkbut the base class expectsparam_links: dict[int, LinkFunction]. Consider renaming toparam_linksand accepting a mapping keyed by parameter index.
param_link: LinkFunction = TauToPar(),
src/ondil/estimators/online_mvdistreg.py:690
- The code previously scaled X and stored it in X_scaled but now passes the raw X to
_outer_fit, bypassing the scaling step. It should likely beX_scaledhere.
self._outer_fit(X=X, y=y, theta=theta)
cc8a021 to
48f3249
Compare
63c30eb to
f5a3724
Compare
|
@chrischu12 I have changed the base branch as we #134 is merged now. |
c403d59 to
ac94ff5
Compare
There was a problem hiding this comment.
Pull Request Overview
Copilot reviewed 18 out of 18 changed files in this pull request and generated 12 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
f958ffd to
b5207eb
Compare
| # If no regularization is allowed, we just return the theta | ||
| return theta | ||
|
|
||
| def _make_initial_eta(self, theta: np.ndarray): |
There was a problem hiding this comment.
Should this be on the copula class?
| theta[a] = self.distribution.set_initial_guess(y, theta[a], p) | ||
| theta = self._handle_path_regularization(theta=theta, p=p, a=a) | ||
| # if (inner_iteration == 0) and (outer_iteration == 0) & (a == 0): | ||
| # theta[a] = self.distribution.set_initial_guess(y, theta[a], p) |
There was a problem hiding this comment.
This should not be commented as it is necessary for the multivariate case. Properly switch between copula and MV case in the fitting
| dl2_link = self.distribution.cube_to_flat(dl2_link, param=p) | ||
| dl2_link = dl2_link[:, k] | ||
| else: | ||
| eta = self._make_initial_eta(theta) |
| # If the likelihood is at some point decreasing, we're breaking | ||
| # Hence we need to store previous iteration values: | ||
| if (inner_iteration > 0) | (outer_iteration > 0): | ||
| if (inner_iteration == 0) and (outer_iteration == 0): |
| eta_elem, param=p, k=k, d=self.dim_ | ||
| ) | ||
|
|
||
| if issubclass(self.distribution.__class__, CopulaMixin): |
There was a problem hiding this comment.
this block should be as eta_to_theta_element on the copula class and then we don't need to import and check for specific copulas on the main estimator class.
| param=p, | ||
| ).reshape(-1, 1) | ||
|
|
||
| if isinstance( |
There was a problem hiding this comment.
Same for this block. Should also be a function on the copula class (the same as above), eta_to_theta_element
| old_likelihood = self._current_likelihood[a] | ||
|
|
||
| # If the LL is decreasing, we're resetting to the previous iteration | ||
| if ((outer_iteration > 0) | (inner_iteration > 1)) & decreasing: |
| if issubclass(self.distribution.__class__, CopulaMixin) and p == 0: | ||
| out[a][p] = np.tanh(out[a][p] / 2) * (1 - 1e-8) | ||
| out[a][p] = self.distribution.param_link_inverse( | ||
| out[a][p] * (1 - 1e-8), param=0 | ||
| ) * (1 - 1e-8) |
There was a problem hiding this comment.
if issubclass(self.distribution.class, CopulaMixin) and p == 0:
out[a][p] = np.tanh(out[a][p] / 2) * (1 - 1e-8)
out[a][p] = self.distribution.param_link_inverse(
out[a][p] * (1 - 1e-8), param=0
) * (1 - 1e-8)
This pattern is duplicated a few times across the code. Can this be moved in the param_link_inverse for the copula?
No description provided.