Integrate AlphaGenome for Non-Coding Regulatory predictions by AkeBoss-tech · Pull Request #6 · AkeBoss-tech/HiC-TAD-Library

AkeBoss-tech · 2026-04-05T04:41:31Z

This PR integrates AlphaGenome models to predict regulatory and 3D organizational consequences of DNA modifications.

It fulfills the requirements for "Work Order 02: AlphaGenome Integration & 3D Organization Delta" by building out the AlphaGenome API connector, the Track Delta Engine, refactoring TAD boundary computations, extending the 3D polymer simulation engine, and validating via an integration test of a mock MYC enhancer deletion.

PR created automatically by Jules for task 3754893517514412865 started by @AkeBoss-tech

- Add AlphaGenomeConnector in `src/hg_dt/models/alphagenome.py` to interface with the DeepMind client. - Add `src/hg_dt/analyze/deltas.py` to compute quantitative deltas for accessibility, contact maps, and expression, and identify distal loops and silenced elements. - Update `src/tad_boundaries.py` with `compute_insulation_delta` and `compare_tad_boundaries`. - Update `src/polymer_sim.py` with `polymer_from_contact_map` function to simulate 3D polymer from a direct contact matrix. - Add integration test for *MYC* enhancer deletion simulation. - Fix broken/flaky tests in `test_polymer_sim.py` and `test_tad_boundaries.py`. Co-authored-by: AkeBoss-tech <69588353+AkeBoss-tech@users.noreply.github.com>

google-labs-jules · 2026-04-05T04:41:32Z

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.

For security, I will only act on instructions from the user who triggered this task.

chatgpt-codex-connector · 2026-04-05T04:41:37Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

Copilot

Pull request overview

This PR adds initial AlphaGenome integration primitives (API connector + delta computations) and extends existing TAD/polymer tooling and tests to support analyzing regulatory/3D organizational consequences from predicted tracks/contact maps.

Changes:

Added an AlphaGenomeConnector wrapper (optional dependency) plus delta/loop/silencing utilities for interpreting AlphaGenome outputs.
Extended polymer simulation with a polymer_from_contact_map() wrapper and adjusted tests for determinism.
Added insulation-delta and TAD boundary comparison helpers; updated TAD boundary tests for NaN/None handling and single-boundary behavior.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`src/hg_dt/models/alphagenome.py`	Adds AlphaGenome SDK connector wrapper and output selection mapping.
`src/hg_dt/analyze/deltas.py`	Introduces helper functions to compute track/contact deltas and detect loops/silencing.
`src/tad_boundaries.py`	Adds insulation delta + TAD boundary comparison helpers; adjusts TAD interval calling for safer grouping/chromsize fallback.
`src/polymer_sim.py`	Adds `polymer_from_contact_map()` to run polymer simulation directly from a contact matrix.
`tests/test_alphagenome_integration.py`	New integration-style test using a mock AlphaGenome connector and MYC enhancer deletion scenario.
`tests/test_tad_boundaries.py`	Fixes boundary type set comparison around `None` vs `NaN`; adjusts single-boundary test call.
`tests/test_polymer_sim.py`	Seeds NumPy global RNG to make mocked cooler matrices reproducible across calls.
`tests/conftest.py`	Updates sample restraints fixture data.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-05T04:46:20Z

+            def __array__(self):
+                return self.values


MockTrack.__array__ is missing the optional dtype parameter that NumPy may pass when converting via np.array(...)/np.asarray(...). This can raise TypeError: __array__() takes 1 positional argument but 2 were given and break the test. Update __array__ to accept dtype=None and return np.asarray(self.values, dtype=dtype) (or delegate to self.values.__array__(dtype) when available).

Suggested change

def __array__(self):

return self.values

def __array__(self, dtype=None):

return np.asarray(self.values, dtype=dtype)

Copilot · 2026-04-05T04:46:20Z

+        with np.errstate(divide='ignore', invalid='ignore'):
+            mean_ins = np.nanmean(ins)
+            if mean_ins > 0:
+                ins_score = np.log2(ins / mean_ins)


compute_insulation_delta can produce -inf/+inf values when ins[i] is 0 (e.g., at edges) and mean_ins > 0, due to np.log2(ins / mean_ins). These infinities can later propagate into downstream consumers (e.g., insulation_to_backbone_stiffness), yielding NaNs. Consider adding a small pseudocount / clipping to keep ins strictly positive before the log, or explicitly replace non-finite values after the log.

Suggested change

ins_score = np.log2(ins / mean_ins)

safe_ins = np.clip(ins, np.finfo(float).tiny, None)

ins_score = np.log2(safe_ins / mean_ins)

ins_score = np.where(np.isfinite(ins_score), ins_score, 0.0)

Copilot · 2026-04-05T04:46:20Z

+            chrom_len = clr.chromsizes.get(chrom_name, 100_000_000) # Fallback to a large number if chromsizes missing or if testing with mock
+            sorted_bounds = chrom_bounds.sort_values('start')
+
+            positions = sorted_bounds[['start', 'end', 'boundary_class']].values.tolist()


The chrom_len fallback is hard-coded to 100_000_000, which can create extremely large first/last TAD intervals when chromsizes are missing; these are then likely discarded by the max_tad_length_bp filter (default 3Mb), resulting in an empty output despite having boundaries. If chromsizes are unavailable, consider deriving a bounded chrom_len from the boundary positions (e.g., last boundary end + max_tad_length_bp) and/or skipping first/last TAD generation in that case.

Suggested change

chrom_len = clr.chromsizes.get(chrom_name, 100_000_000) # Fallback to a large number if chromsizes missing or if testing with mock

sorted_bounds = chrom_bounds.sort_values('start')

positions = sorted_bounds[['start', 'end', 'boundary_class']].values.tolist()

sorted_bounds = chrom_bounds.sort_values('start')

positions = sorted_bounds[['start', 'end', 'boundary_class']].values.tolist()

last_boundary_end = int(positions[-1][1]) if positions else 0

chrom_len = clr.chromsizes.get(chrom_name)

if chrom_len is None:

# When chromsizes are unavailable (e.g. mocks/tests), avoid creating

# artificially huge edge TADs that will be discarded by the

# max_tad_length_bp filter.

chrom_len = last_boundary_end + max_tad_length_bp

Copilot · 2026-04-05T04:46:20Z

+    def predict_sequence(
+        self,
+        sequence: str,
+        organism="HUMAN",
+        requested_outputs: Optional[List[str]] = None
+    ):
+        """
+        Predict all relevant tracks for a 1 Mb sequence.
+
+        Args:
+            sequence: 1 Mb DNA sequence
+            organism: "HUMAN" or "MOUSE"
+            requested_outputs: list of strings (e.g. ['ATAC', 'CHIP_TF', 'CHIP_HISTONE', 'CAGE', 'CONTACT_MAPS'])
+        """
+        if requested_outputs is None:
+            requested_outputs = [
+                dna_client.OutputType.ATAC,
+                dna_client.OutputType.CHIP_TF,
+                dna_client.OutputType.CHIP_HISTONE,
+                dna_client.OutputType.CAGE,
+                dna_client.OutputType.CONTACT_MAPS
+            ]
+        else:
+            # Map string to enum
+            output_map = {
+                'ATAC': dna_client.OutputType.ATAC,
+                'CHIP_TF': dna_client.OutputType.CHIP_TF,
+                'CHIP_HISTONE': dna_client.OutputType.CHIP_HISTONE,
+                'CAGE': dna_client.OutputType.CAGE,
+                'CONTACT_MAPS': dna_client.OutputType.CONTACT_MAPS,
+                'RNA_SEQ': dna_client.OutputType.RNA_SEQ
+            }
+            requested_outputs = [output_map[req] for req in requested_outputs if req in output_map]
+


requested_outputs is typed/documented as List[str], but the default value passed is a list of dna_client.OutputType enums. If a caller also passes enums, the current string-to-enum mapping will drop them (producing an empty/partial requested_outputs). Consider accepting List[Union[str, dna_client.OutputType]] and passing through enum values unchanged (and update the docstring accordingly).

Copilot AI review requested due to automatic review settings April 5, 2026 04:41

Copilot started reviewing on behalf of AkeBoss-tech April 5, 2026 04:42 View session

Copilot AI reviewed Apr 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate AlphaGenome for Non-Coding Regulatory predictions#6

Integrate AlphaGenome for Non-Coding Regulatory predictions#6
AkeBoss-tech wants to merge 1 commit intosetup-hg38-env-1687678791183735257from
alphagenome-integration-3754893517514412865

AkeBoss-tech commented Apr 5, 2026

Uh oh!

google-labs-jules Bot commented Apr 5, 2026

Uh oh!

chatgpt-codex-connector Bot commented Apr 5, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 5, 2026

Uh oh!

Copilot AI Apr 5, 2026

Uh oh!

Copilot AI Apr 5, 2026

Uh oh!

Copilot AI Apr 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-                ins_score = np.log2(ins / mean_ins)
+                safe_ins = np.clip(ins, np.finfo(float).tiny, None)
+                ins_score = np.log2(safe_ins / mean_ins)
+                ins_score = np.where(np.isfinite(ins_score), ins_score, 0.0)

-            chrom_len = clr.chromsizes.get(chrom_name, 100_000_000) # Fallback to a large number if chromsizes missing or if testing with mock
-            sorted_bounds = chrom_bounds.sort_values('start')
-            positions = sorted_bounds[['start', 'end', 'boundary_class']].values.tolist()
+            sorted_bounds = chrom_bounds.sort_values('start')
+            positions = sorted_bounds[['start', 'end', 'boundary_class']].values.tolist()
+            last_boundary_end = int(positions[-1][1]) if positions else 0
+            chrom_len = clr.chromsizes.get(chrom_name)
+            if chrom_len is None:
+                # When chromsizes are unavailable (e.g. mocks/tests), avoid creating
+                # artificially huge edge TADs that will be discarded by the
+                # max_tad_length_bp filter.
+                chrom_len = last_boundary_end + max_tad_length_bp

Conversation

AkeBoss-tech commented Apr 5, 2026

Uh oh!

google-labs-jules Bot commented Apr 5, 2026

Uh oh!

chatgpt-codex-connector Bot commented Apr 5, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants