Hardening pass: tracking, models, runtime, tests, and docs alignment by dr-gareth-roberts · Pull Request #29 · dr-gareth-roberts/insideLLMs

dr-gareth-roberts · 2026-03-04T11:13:40Z

Description

Summary
This PR applies a focused hardening pass across experiment tracking, model typing, retry error handling, dataset optional imports, and related tests/docs updates.

Changes
Experiment tracking
Enforced non-empty tracker list in MultiTracker constructor.
Returned a concrete first run ID from MultiTracker.start_run (no empty fallback).
File: insideLLMs/experiment_tracking.py
Optional dependency robustness
Switched HF datasets import to importlib-based optional loading.
Set availability based on resolved load_dataset symbol.
File: insideLLMs/dataset_utils.py
Model retry safety
In ModelWrapper.generate, made last_error explicitly typed and fail-fast if unexpectedly unset before raising.
File: insideLLMs/models/base.py
OpenAI model typing cleanup
Added explicit **kwargs: Any annotations to generate/chat/stream.
Added explicit return type -> ModelInfo for info.
Kept SDK boundary ignore where OpenAI typing overloads are narrower than our ChatMessage interface.
File: insideLLMs/models/openai.py
Tests typing improvements
Added explicit typing to helper probe methods used in coverage tests.
File: tests/test_probes_models_coverage.py
Docs consistency
Bias tutorial config now sets output_dir: bias_results to match analysis step paths.
File: wiki/tutorials/Bias-Testing.md
Why
Reduce type/lint noise in strict IDE diagnostics.
Tighten runtime guarantees around run IDs and retry error paths.
Improve optional dependency behavior for environments missing extras.
Keep docs aligned with real output locations.
Risk
Low to medium:

Mostly typing/guard improvements and deterministic behavior tightening.
One behavior change: MultiTracker([]) now raises ValueError (intentional fail-fast).
Validation
Manual code-level validation and IDE diagnostics due intermittent command-runner issues in IDE environment.
Branch push completed and ready for CI validation.

Summary by Sourcery

Harden experiment tracking, model retry behavior, routing/classification, and contrib utilities while tightening typing and optional dependency handling across the codebase.

New Features:

Add restricted arithmetic-only execution helpers and use them in calculator tools and sandboxed Python execution to avoid unsafe eval/exec.
Expose all exports in several core modules to provide explicit public APIs.

Bug Fixes:

Ensure MultiTracker requires at least one tracker and always returns a concrete run ID from start_run.
Ensure WandBTracker, MLflowTracker, and TensorBoardTracker use resolved modules/constructors, validate availability at init time, and always return non-empty run IDs.
Make LocalFileTracker artifacts metadata and artifact_type defaults consistent, and improve numeric metrics typing in the tracking decorator.
Handle missing or uninitialized external clients (OpenAI, Anthropic, routing model) and classification edge cases more robustly.
Fix HuggingFace dataset loading to use importlib-based optional imports, default split to train, and raise cleanly when unavailable.
Guarantee retry helpers and ModelWrapper.generate always raise a concrete exception when retries are exhausted.
Tighten semantics in tracing, semantic cache, rate limiting, routing, resource IO, config loading, structured extraction, and other utilities to avoid silent failures and type ambiguities.
Improve reproducibility and config utilities by handling TensorFlow imports, package metadata, experiment listings, and config validation types more safely.
Update tracing fingerprinting to validate event types and fail fast on unsupported inputs.
Adjust tests and coverage probes to assert latency metadata, use precise typing, and handle pydantic-availability paths correctly.
Align bias-testing tutorial output directory with the analysis step paths.

Enhancements:

Refine optional dependency detection for wandb, mlflow, TensorBoard, and HuggingFace datasets using helper loaders and module-level availability flags.
Strengthen debug tracing APIs to return optional events and use typed payloads, and make execution analyzers more robust to failures.
Improve adapter, routing, chains, ensemble, synthesis, evalbom, and HITL utility behavior by catching and logging errors more explicitly while keeping core flows running.
Clarify OpenAI model typing by annotating kwargs, tightening error context for chat, and specifying the ModelInfo return type of info().
Improve runtime result/status coercion, model info normalization, and pipeline construction with clearer types and safer fallbacks.
Standardize exception capture variables and add explicit typing hints across tracing, async utilities, CLI, schemas, tests, and multiple runtime modules.

Summary by CodeRabbit

New Features
- Added safer arithmetic and code execution pathways for agents, replacing direct eval/exec with restricted implementations featuring explicit operator handling.
- Implemented lazy loading for optional datasets package with graceful fallback handling.
Bug Fixes
- Enhanced null-safety checks across provider clients, models, and intent routing to prevent dereferencing None values.
- Improved error handling in retry and circuit-breaking logic with explicit exception validation.
API Changes
- PromptDebugger logging methods now return Optional[DebugTraceEvent], allowing graceful handling of inactive traces.
- Updated function return types and type hints for improved type safety.

bolt-new-by-stackblitz · 2026-03-04T11:13:44Z

Run & review this pull request in StackBlitz Codeflow.

sourcery-ai · 2026-03-04T11:13:47Z

Reviewer's Guide

Hardening pass focusing on safer experiment tracking backends, stricter retry/typing behavior, safer arithmetic/code execution for contrib agents, more robust optional dependency handling, and minor doc/test/runtime cleanups.

Updated class diagram for experiment trackers and MultiTracker

classDiagram
    class ExperimentTracker {
        <<abstract>>
        - config TrackingConfig
        - _run_active bool
        - _run_id str
        - _step int
        + start_run(run_name str, run_id str, nested bool) str
        + end_run(status str) None
        + log_metrics(metrics dict~str, Any~, step int) None
        + log_params(params dict~str, Any~) None
        + log_artifact(artifact_path str, artifact_type str, artifact_name str) None
        + log_table(table_name str, data list~dict~str, Any~~, columns list~str~, step int) None
        + watch_model(model Any, log_freq int) None
    }

    class WandBTracker {
        - entity str | None
        - wandb_kwargs dict~str, Any~
        - _wandb Any
        - _run Any
        - _run_active bool
        - _run_id str
        - _step int
        + __init__(config TrackingConfig, entity str, wandb_kwargs dict~str, Any~) None
        + start_run(run_name str, run_id str, nested bool) str
        + end_run(status str) None
        + log_metrics(metrics dict~str, float~, step int) None
        + log_params(params dict~str, Any~) None
        + log_artifact(artifact_path str, artifact_type str, artifact_name str) None
        + log_table(table_name str, data list~dict~str, Any~~, columns list~str~, step int) None
        + watch_model(model Any, log_freq int) None
    }

    class MLflowTracker {
        - tracking_uri str | None
        - _mlflow Any
        - _experiment_id str | None
        - _run_active bool
        - _run_id str
        - _step int
        + __init__(config TrackingConfig, tracking_uri str, experiment_name str) None
        + start_run(run_name str, run_id str, nested bool) str
        + end_run(status str) None
        + log_metrics(metrics dict~str, float~, step int) None
        + log_params(params dict~str, Any~) None
        + log_artifact(artifact_path str) None
        + log_model(model Any, model_name str, model_flavor str) None
        + register_model(model_uri str, model_name str) None
    }

    class TensorBoardTracker {
        - log_dir str
        - _summary_writer_cls Any
        - _writer Any | None
        - _run_active bool
        - _run_id str
        - _step int
        + __init__(config TrackingConfig, log_dir str) None
        + start_run(run_name str, run_id str, nested bool) str
        + end_run(status str) None
        + log_metrics(metrics dict~str, float~, step int) None
        + log_params(params dict~str, Any~) None
        + log_artifact(artifact_path str, artifact_type str, artifact_name str) None
        + log_table(table_name str, data list~dict~str, Any~~, columns list~str~, step int) None
        + watch_model(model Any, log_freq int) None
    }

    class LocalFileTracker {
        - output_dir Path
        - _run_dir Path | None
        - _metrics list~dict~str, Any~~
        - _params dict~str, Any~
        - _artifacts list~dict~str, str~~
        + start_run(run_name str, run_id str, nested bool) str
        + end_run(status str) None
        + log_metrics(metrics dict~str, float~, step int) None
        + log_params(params dict~str, Any~) None
        + log_artifact(artifact_path str, artifact_type str, artifact_name str) None
        + snapshot() dict~str, Any~
    }

    class MultiTracker {
        - trackers list~ExperimentTracker~
        - _run_active bool
        - _run_id str
        + __init__(trackers list~ExperimentTracker~, config TrackingConfig) None
        + start_run(run_name str, run_id str, nested bool) str
        + end_run(status str) None
        + log_metrics(metrics dict~str, float~, step int) None
        + log_params(params dict~str, Any~) None
        + log_artifact(artifact_path str, artifact_type str, artifact_name str) None
    }

    ExperimentTracker <|-- WandBTracker
    ExperimentTracker <|-- MLflowTracker
    ExperimentTracker <|-- TensorBoardTracker
    ExperimentTracker <|-- LocalFileTracker
    ExperimentTracker <|-- MultiTracker

    MultiTracker "1" o-- "many" ExperimentTracker : aggregates
    WandBTracker ..> "1" Any : uses_wandb_module
    MLflowTracker ..> "1" Any : uses_mlflow_module
    TensorBoardTracker ..> "1" Any : uses_summary_writer_cls

File-Level Changes

Change	Details	Files
Refactor experiment trackers to use safely-loaded optional dependencies and stricter run ID handling, plus MultiTracker invariants.	Introduce _load_optional_module helper and centralize optional imports for wandb, mlflow, and tensorboard writers Inject concrete tracker client instances into tracker classes instead of using global modules directly Ensure start_run for WandBTracker and MLflowTracker validates and stringifies returned run IDs, raising RuntimeError if missing Make TensorBoardTracker fail fast when writer is unavailable and return the concrete run_name as run_id Make LocalFileTracker artifacts list store typed metadata dicts and default artifact_type when logging Have MultiTracker.init require at least one tracker and make start_run always return the first child run_id	`insideLLMs/experiment_tracking.py`
Harden contrib agent calculator and python execution to use safe arithmetic-only evaluators.	Replace eval() in tool decorator examples and BaseAgent usage with _safe_eval_arithmetic Add helpers to apply binary/unary ops and evaluate arithmetic expressions in a restricted AST subset Introduce _safe_exec_python_subset to execute only assignments + arithmetic expressions and wire it into execute_python Update probe docs/examples to reference safe arithmetic evaluator instead of eval	`insideLLMs/contrib/agents.py` `insideLLMs/probes/agent_probe.py`
Improve optional HF dataset import behavior and tighten dataset loader contract.	Switch datasets import to importlib-based dynamic loading and expose load_dataset symbol when present Define HF_DATASETS_AVAILABLE based on resolved load_dataset rather than import success alone Change load_hf_dataset to require HF datasets at runtime and default to split='train', returning a concrete list Export public names via all for better static analysis	`insideLLMs/dataset_utils.py`
Tighten retry/model wrapper behavior and error typing around last_error and retries.	Make ModelWrapper.generate track last_error as Exception	None and fail-fast with explicit RuntimeError if somehow unset Update execute_with_retry (sync and async) to ensure last_exception is always set before raising RetryExhaustedError, inserting a synthetic RuntimeError when needed Leave other retry wrappers unchanged except for typing/except-as variable naming consistency
Type-safety and API hardening across tracing, debugging, adapters, routing, runtime config, and various utilities.	Add or refine type annotations for payload dicts, DebugTracer methods (now Optional return types), and other helper methods Make trace_fingerprint validate event element types and reject unexpected event types with TypeError Harden adapters to check client initialization before generate_chat and raise RuntimeError if missing Guard routing classifier against None intents and missing model, adjusting fallback logic Use getattr and casts in config tests to avoid direct attribute assumptions under different pydantic versions Adjust various helper functions to catch exceptions as named variables and preserve control flow semantics without swallowing unexpected types Export all in several modules (retry, rate_limiting, runtime, safety, config, registry, types, resources, semantic_cache, cost_tracking, etc.) for static tooling	`insideLLMs/trace/tracing.py` `insideLLMs/contrib/debugging.py` `insideLLMs/contrib/adapters.py` `insideLLMs/contrib/routing.py` `insideLLMs/runtime/_config_loader.py` `insideLLMs/runtime/_high_level.py` `insideLLMs/runtime/_result_utils.py` `insideLLMs/runtime/reproducibility.py` `insideLLMs/config.py` `insideLLMs/resources.py` `insideLLMs/semantic_cache.py` `insideLLMs/integrations/langchain.py` `insideLLMs/async_utils.py` `insideLLMs/cli/_record_utils.py` `insideLLMs/cli/commands/doctor.py` `insideLLMs/contrib/hitl.py` `insideLLMs/rate_limiting.py` `insideLLMs/contrib/chains.py` `insideLLMs/contrib/ensemble.py` `insideLLMs/contrib/evalbom.py` `insideLLMs/shadow.py` `insideLLMs/structured_extraction.py` `insideLLMs/schemas/registry.py` `insideLLMs/runtime/__init__.py` `insideLLMs/registry.py` `insideLLMs/safety.py` `insideLLMs/types.py` `insideLLMs/cost_tracking.py`
OpenAI model typing cleanup and stronger error contexts.	Add **kwargs: Any to generate/chat/stream signatures and declare info() return type as ModelInfo In chat error handling, access first message content via typed dict access rather than .get Keep behavior otherwise identical while clarifying SDK boundary typing	`insideLLMs/models/openai.py`
Test updates and docs alignment to match new behavior and typing.	Tighten tests around latency_ms to require non-None values and non-negative metrics Use typing.cast and getattr in config tests for better mypy/pyright compatibility Extend example config tests to assert dataset.data is populated Adjust except clauses to bind as _ where needed to avoid unused-variable noise Align Bias-Testing tutorial YAML with actual output_dir bias_results path	`tests/test_models_config_coverage.py` `tests/test_probes_models_coverage.py` `wiki/tutorials/Bias-Testing.md`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

github-actions · 2026-03-04T11:13:58Z

insideLLMs Behavioural Diff

Exit code: 0
Baseline run dir: /home/runner/work/_temp/insidellms-run-base-22666913806-3487
Candidate run dir: /home/runner/work/_temp/insidellms-run-head-22666913806-15824

Metric	Count
Common keys	12
Regressions	0
Improvements	0
Other changes	0
Only baseline	0
Only candidate	0
Trace drifts	0
Trace violation increases	0

Top changed records

Regressions (0)

None

Improvements (0)

None

Other changes (0)

None

Only in baseline (0)

None

Only in candidate (0)

None

Trace drifts (0)

None

Trace violation increases (0)

None

Generated by insideLLMs/action.

coderabbitai · 2026-03-04T11:14:04Z

📝 Walkthrough

Walkthrough

This PR standardizes exception handling across 30+ files by binding caught exceptions to placeholder variables, adds explicit module-level __all__ declarations to control public API surfaces, and introduces safety improvements including null-safety checks in adapter initialization, safer arithmetic/code execution pathways in agent tools, and optional return types in debug tracing methods.

Changes

Cohort / File(s)	Summary
Exception handling standardization `insideLLMs/async_utils.py`, `insideLLMs/cli/_record_utils.py`, `insideLLMs/cli/commands/doctor.py`, `insideLLMs/contrib/chains.py`, `insideLLMs/contrib/ensemble.py`, `insideLLMs/contrib/evalbom.py`, `insideLLMs/contrib/hitl.py`, `insideLLMs/contrib/synthesis.py`, `insideLLMs/integrations/langchain.py`, `insideLLMs/rate_limiting.py`, `insideLLMs/resources.py`, `insideLLMs/retry.py`, `insideLLMs/runtime/_base.py`, `insideLLMs/runtime/_high_level.py`, `insideLLMs/runtime/_result_utils.py`, `insideLLMs/schemas/registry.py`, `insideLLMs/semantic_cache.py`, `insideLLMs/shadow.py`, `insideLLMs/structured_extraction.py`	Standardized except blocks from `except Exception:` to `except Exception as _:` to explicitly bind caught exceptions as unused variables, preserving control flow with clearer intent.
Public API surface exposure `insideLLMs/config.py`, `insideLLMs/cost_tracking.py`, `insideLLMs/rate_limiting.py`, `insideLLMs/registry.py`, `insideLLMs/runtime/__init__.py`, `insideLLMs/safety.py`, `insideLLMs/types.py`	Added dynamic `__all__` declarations that export all non-private globals, formalizing public API surfaces for star-import and introspection without changing runtime behavior.
Safety improvements: null checks and guards `insideLLMs/contrib/adapters.py`, `insideLLMs/contrib/routing.py`	Added explicit null-safety guards: adapters now validate client initialization before use; routing now requires non-null model/intent before pattern matching and fallback paths.
Safe code evaluation pathways `insideLLMs/contrib/agents.py`, `insideLLMs/probes/agent_probe.py`	Introduced `_apply_binary_operator`, `_apply_unary_operator`, `_eval_exec_expression`, and `_safe_exec_python_subset` helpers to replace direct eval/exec; updated calculator and python tools to use safer execution paths; examples updated to reference new safe evaluation approach.
Type annotation refinement `insideLLMs/contrib/debugging.py`, `insideLLMs/runtime/reproducibility.py`, `insideLLMs/models/openai.py`, `insideLLMs/models/base.py`	Updated return types to `Optional[DebugTraceEvent]` in debugging methods; changed list_all return to include `Union[str, float]` for values; added explicit typing for kwargs in OpenAI model methods and last_error in retry paths.
Dynamic/lazy import refactoring `insideLLMs/dataset_utils.py`, `insideLLMs/experiment_tracking.py`	Implemented lazy loading for optional dependencies (datasets, wandb, mlflow, tensorboard) via runtime module lookups instead of direct imports; centralized error handling and availability detection; changed load_hf_dataset default split from "test" to "train" and updated return type.
Trace fingerprinting robustness `insideLLMs/trace/tracing.py`	Reworked trace_fingerprint() normalization to explicitly loop over events instead of heuristic-based checking; now accepts TraceEvent or dict and raises TypeError for unsupported types; added explicit type annotation for payload dict.
Runtime configuration handling `insideLLMs/runtime/_config_loader.py`	Introduced local alias `pipeline_base_model` to isolate base_model usage in pipeline construction; standardized exception binding in snapshot building.
Test coverage updates `tests/test_models_config_coverage.py`, `tests/test_probes_models_coverage.py`	Updated import style for config module; added typing.cast() for type coercion in tests; enhanced type annotations on test probe methods; added latency_ms assertions.
Documentation example `wiki/tutorials/Bias-Testing.md`	Added output_dir configuration field to bias_harness.yaml example under dataset section.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Claude/refactor cli module hdnt z #19: Modifies _probe_category_from_value in insideLLMs/cli/_record_utils.py with same exception handler standardization.
Feature/diff gate no workflow #8: Updates insideLLMs/registry.py with module-level registrations overlapping with all addition.
trace config agent probe #12: Shares modifications to insideLLMs/probes/agent_probe.py and insideLLMs/trace/tracing.py for agent and tracing infrastructure.

Poem

🐰 Exception handlers bound with care, unsafe evals vanish in air,
__all__ declares what's truly blessed, safety guards pass the test.
Type hints now crystal, imports so lazy, making the codebase less hazy! ✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main changes: a hardening pass affecting tracking, models, runtime, tests, and docs—reflecting the core objectives of the PR.
Docstring Coverage	✅ Passed	Docstring coverage is 85.03% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/insideLLMs-hardening-20260304

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

sourcery-ai

Hey - I've found 3 issues, and left some high level feedback:

The change to load_hf_dataset's default split from 'test' to 'train' is a behavioral change that may surprise existing callers; consider whether this should remain 'test' for backward compatibility or be called out clearly as an intentional breaking change.
Using __all__ = [name for name in globals() if not name.startswith('_')] in multiple modules will export helper symbols, imported modules, and other internals; consider defining __all__ explicitly per module to avoid unintentionally expanding the public surface.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The change to `load_hf_dataset`'s default `split` from `'test'` to `'train'` is a behavioral change that may surprise existing callers; consider whether this should remain `'test'` for backward compatibility or be called out clearly as an intentional breaking change.
- Using `__all__ = [name for name in globals() if not name.startswith('_')]` in multiple modules will export helper symbols, imported modules, and other internals; consider defining `__all__` explicitly per module to avoid unintentionally expanding the public surface.

## Individual Comments

### Comment 1
<location path="insideLLMs/contrib/agents.py" line_range="3034-3036" />
<code_context>
     return _eval(parsed)


+def _apply_binary_operator(
+    op: ast.operator, left: Union[int, float], right: Union[int, float]
+) -> Union[int, float]:
+    if isinstance(op, ast.Add):
+        return left + right
+    if isinstance(op, ast.Sub):
+        return left - right
+    if isinstance(op, ast.Mult):
+        return left * right
+    if isinstance(op, ast.Div):
+        return left / right
+    if isinstance(op, ast.FloorDiv):
+        return left // right
+    if isinstance(op, ast.Mod):
+        return left % right
+    if isinstance(op, ast.Pow):
+        return left**right
+    raise ValueError("Unsupported arithmetic operator")
+
+
</code_context>
<issue_to_address>
**suggestion:** The error message in `_apply_binary_operator` is quite generic and may obscure which operator caused the failure.

Including the operator in the error (e.g. via `type(op).__name__` or rendering it directly) would make it clearer which expression failed, improving debuggability when an unsupported operator reaches this code.

```suggestion
    if isinstance(op, ast.Pow):
        return left**right
    raise ValueError(f"Unsupported arithmetic operator: {type(op).__name__}")
```
</issue_to_address>

### Comment 2
<location path="insideLLMs/trace/tracing.py" line_range="1868-1874" />
<code_context>
     insideLLMs.trace.trace_config.TracePayloadNormaliser : Normalise payloads before fingerprinting.
     """
     # Normalize to list of dicts
-    if events and isinstance(events[0], TraceEvent):
-        event_dicts = [e.to_dict() for e in events]
-    else:
-        event_dicts = list(events)
+    event_dicts: list[dict[str, Any]] = []
+    for event in events:
+        if isinstance(event, TraceEvent):
+            event_dicts.append(event.to_dict())
+            continue
+        if isinstance(event, dict):
+            event_dicts.append(event)
+            continue
+        raise TypeError(f"Unsupported trace event type: {type(event).__name__}")

     # Sort events by sequence to ensure deterministic ordering
</code_context>
<issue_to_address>
**question (bug_risk):** Making `trace_fingerprint` raise on any non-dict/non-TraceEvent may be stricter than callers expect.

The previous implementation accepted any iterable of dict-like objects and would silently pass through unexpected types. The new loop raises `TypeError` for anything that’s not a `TraceEvent` or `dict`, which can break callers that pass custom trace-like objects. If the stricter typing is intentional, please either (a) document this behavioral change clearly, or (b) handle unknown but dict-like types more leniently (e.g., coercion/ignoring) to preserve backward compatibility.
</issue_to_address>

### Comment 3
<location path="tests/test_models_config_coverage.py" line_range="992" />
<code_context>
         assert config.dataset is not None
         assert config.runner is not None
         assert "example" in config.tags
+        assert config.dataset.data is not None
         assert len(config.dataset.data) == 2

</code_context>
<issue_to_address>
**suggestion (testing):** Missing tests for new HuggingFace dataset loading behaviour and ImportError conditions

This PR changes `load_hf_dataset` to default `split` to `"train"` and to treat the HF datasets import as optional (raising `ImportError` if `load_dataset` is missing), but I don’t see tests for:

1. Verifying that `"train"` is used as the default split (e.g., by mocking `datasets.load_dataset` and asserting the `split` argument).
2. Verifying that an `ImportError` is raised when the `datasets` package exists but `load_dataset` is missing/`None`.

Please add tests (e.g., in `tests/test_dataset_utils.py` or an existing dataset test module) to cover these behaviours and guard against regressions in the import/availability logic.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2026-03-04T11:15:29Z

+    if isinstance(op, ast.Pow):
+        return left**right
+    raise ValueError("Unsupported arithmetic operator")


suggestion: The error message in _apply_binary_operator is quite generic and may obscure which operator caused the failure.

Including the operator in the error (e.g. via type(op).__name__ or rendering it directly) would make it clearer which expression failed, improving debuggability when an unsupported operator reaches this code.

Suggested change

if isinstance(op, ast.Pow):

return left**right

raise ValueError("Unsupported arithmetic operator")

if isinstance(op, ast.Pow):

return left**right

raise ValueError(f"Unsupported arithmetic operator: {type(op).__name__}")

sourcery-ai · 2026-03-04T11:15:29Z

-    if events and isinstance(events[0], TraceEvent):
-        event_dicts = [e.to_dict() for e in events]
-    else:
-        event_dicts = list(events)
+    event_dicts: list[dict[str, Any]] = []
+    for event in events:
+        if isinstance(event, TraceEvent):
+            event_dicts.append(event.to_dict())
+            continue
+        if isinstance(event, dict):
+            event_dicts.append(event)
+            continue
+        raise TypeError(f"Unsupported trace event type: {type(event).__name__}")

    # Sort events by sequence to ensure deterministic ordering
    sorted_events = sorted(event_dicts, key=lambda e: e.get("seq", 0))


question (bug_risk): Making trace_fingerprint raise on any non-dict/non-TraceEvent may be stricter than callers expect.

The previous implementation accepted any iterable of dict-like objects and would silently pass through unexpected types. The new loop raises TypeError for anything that’s not a TraceEvent or dict, which can break callers that pass custom trace-like objects. If the stricter typing is intentional, please either (a) document this behavioral change clearly, or (b) handle unknown but dict-like types more leniently (e.g., coercion/ignoring) to preserve backward compatibility.

sourcery-ai · 2026-03-04T11:15:29Z

        assert config.dataset is not None
        assert config.runner is not None
        assert "example" in config.tags
+        assert config.dataset.data is not None


suggestion (testing): Missing tests for new HuggingFace dataset loading behaviour and ImportError conditions

This PR changes load_hf_dataset to default split to "train" and to treat the HF datasets import as optional (raising ImportError if load_dataset is missing), but I don’t see tests for:

Verifying that "train" is used as the default split (e.g., by mocking datasets.load_dataset and asserting the split argument).

Verifying that an ImportError is raised when the datasets package exists but load_dataset is missing/None.

Please add tests (e.g., in tests/test_dataset_utils.py or an existing dataset test module) to cover these behaviours and guard against regressions in the import/availability logic.

dosubot · 2026-03-04T11:16:43Z

Related Documentation

1 document(s) may need updating based on files changed in this PR:

insideLLMs

Agent Probe and Trace-Aware Testing

View Suggested Changes

@@ -13,7 +13,7 @@
 5. Validates the trace against contracts defined in the `TraceConfig` using the `validate_with_config()` helper.
 6. Packages the prompt, final response, tool calls, trace events, fingerprint, violations, and metadata into an `AgentProbeResult`.
 
-Each tool available to the agent is described by a `ToolDefinition` object, specifying its name, description, parameter schema, and optional handler. The trace configuration and validation utilities are available as public APIs for advanced customization and integration.
+Each tool available to the agent is described by a `ToolDefinition` object, specifying its name, description, parameter schema, and optional handler. For tools that perform arithmetic evaluation, the `_safe_eval_arithmetic()` function provides a secure alternative to `eval()`, restricting execution to arithmetic operations only. The trace configuration and validation utilities are available as public APIs for advanced customization and integration.
 
 ## Trace-Aware Results and Tracing Integration
 The probe system is tightly integrated with the tracing subsystem. All agent actions, tool calls, and relevant events are recorded by the `TraceRecorder`. The trace is then validated against contracts (e.g., required tool order, forbidden actions) using the `validate_with_config()` function, which respects the toggles and schemas defined in the `TraceConfig`. Payloads can be normalized or redacted using the `TracePayloadNormaliser`, which supports both built-in and custom normalization strategies for deterministic trace fingerprints and privacy compliance.

[Accept] [Decline]

Note: You must be authenticated to accept/decline updates.

^{How did I do? Any feedback?}

Copilot

Pull request overview

Hardening pass across insideLLMs’ runtime, model wrappers, integrations, and docs to improve typing clarity, optional-dependency robustness, and deterministic behavior around tracking / retries.

Changes:

Tighten experiment tracking behavior (non-empty MultiTracker; deterministic run IDs) and optional dependency loading for trackers/datasets.
Improve model/runtime typing and failure-path handling (retry exhaustion guards, trace fingerprint input validation, minor API typing cleanups).
Update tests and docs to align with runtime outputs/typing expectations.

Reviewed changes

Copilot reviewed 40 out of 40 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
wiki/tutorials/Bias-Testing.md	Align tutorial output directory with analysis paths.
tests/test_probes_models_coverage.py	Tighten typing on probe helper methods used for coverage.
tests/test_models_config_coverage.py	Strengthen typing and assertions for model metadata paths.
insideLLMs/types.py	Add module-wide `__all__` export list for static analysis/export consistency.
insideLLMs/trace/tracing.py	Improve typing and input validation for trace payloads/fingerprinting.
insideLLMs/structured_extraction.py	Minor exception-handling style/typing adjustments.
insideLLMs/shadow.py	Minor exception-handling style adjustments for request-body decoding.
insideLLMs/semantic_cache.py	Minor exception-handling style adjustments around Redis operations.
insideLLMs/schemas/registry.py	Minor exception-handling style adjustment in semver parsing.
insideLLMs/safety.py	Add module-wide `__all__` export list.
insideLLMs/runtime/reproducibility.py	Typing adjustments (e.g., list_all return value typing) and optional import annotation.
insideLLMs/runtime/_result_utils.py	Minor exception-handling style adjustments around model info coercion.
insideLLMs/runtime/_high_level.py	Minor exception-handling style adjustments in status/category coercion.
insideLLMs/runtime/_config_loader.py	Minor typing refactor in pipeline construction and path snapshotting.
insideLLMs/runtime/_base.py	Minor exception-handling style adjustment for callback attribute tagging.
insideLLMs/runtime/init.py	Add module-wide `__all__` export list for runtime package.
insideLLMs/retry.py	Harden retry exhaustion behavior and add module-wide `__all__`.
insideLLMs/resources.py	Minor exception-handling style adjustments in file/fsync helpers.
insideLLMs/registry.py	Add module-wide `__all__` export list.
insideLLMs/rate_limiting.py	Minor exception-handling changes and add module-wide `__all__`.
insideLLMs/probes/agent_probe.py	Update docs/examples to avoid unsafe `eval` in tool handler samples.
insideLLMs/models/openai.py	Tighten typing (`**kwargs: Any`, `info() -> ModelInfo`) and error-path tweaks.
insideLLMs/models/base.py	Harden retry exhaustion raising logic and typing for `last_error`.
insideLLMs/integrations/langchain.py	Minor exception-handling style adjustments in optional imports/stream path.
insideLLMs/experiment_tracking.py	Optional dependency loading via importlib; run-id hardening; MultiTracker empty-list guard; minor typing cleanups; add `__all__`.
insideLLMs/dataset_utils.py	Optional HF datasets import via importlib; adjust availability logic; add `__all__`.
insideLLMs/cost_tracking.py	Add module-wide `__all__` export list.
insideLLMs/contrib/synthesis.py	Minor exception-handling style adjustments in generation helpers.
insideLLMs/contrib/routing.py	Harden classification fallbacks and minor exception-handling style changes.
insideLLMs/contrib/hitl.py	Minor exception-handling style adjustment for callback emission.
insideLLMs/contrib/evalbom.py	Minor exception-handling style adjustments around manifest parsing.
insideLLMs/contrib/ensemble.py	Minor exception-handling style adjustments when skipping failed models.
insideLLMs/contrib/debugging.py	Adjust return typing to Optional for trace-event helpers and minor typing fixes.
insideLLMs/contrib/chains.py	Minor exception-handling style adjustment and typing for route steps dict.
insideLLMs/contrib/agents.py	Replace `eval`/`exec` usage with restricted arithmetic parsing/execution utilities.
insideLLMs/contrib/adapters.py	Fail-fast if clients are uninitialized; minor exception-handling style adjustments.
insideLLMs/config.py	Minor typing tweak in Union order; add module-wide `__all__`.
insideLLMs/cli/commands/doctor.py	Minor exception-handling style adjustment for entrypoint plugin discovery.
insideLLMs/cli/_record_utils.py	Minor exception-handling style adjustment in enum coercion.
insideLLMs/async_utils.py	Minor exception-handling style adjustment in task-cancellation path.

Comments suppressed due to low confidence (1)

insideLLMs/dataset_utils.py:243

The load_hf_dataset signature now defaults split to "train", but the docstring (and earlier examples in this same module) still says the default is "test". Either revert the default to "test" (to avoid a silent behavior change for callers who omit split) or update the docs/examples to match the new default.

def load_hf_dataset(dataset_name: str, split: str = "train", **kwargs) -> list[dict[str, Any]]:
    """Load a dataset from the HuggingFace Datasets Hub into a list of dictionaries.

    Downloads and loads a dataset from the HuggingFace Hub, converting it to the
    same list-of-dictionaries format used by the other loaders in this module.
    This provides access to thousands of publicly available datasets for NLP,
    computer vision, and other machine learning tasks.

    Args:
        dataset_name: The name of the dataset on HuggingFace Hub. Can be in
            the format "dataset_name" for official datasets or "user/dataset"
            for community datasets.
        split: The dataset split to load. Common values are "train", "test",
            "validation". Defaults to "test".
        **kwargs: Additional keyword arguments passed to ``datasets.load_dataset``.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copilot · 2026-03-04T11:18:36Z

        except Exception as e:
            # Get first message content for error context
-            first_msg = messages[0].get("content", "") if messages else ""
+            first_msg = messages[0]["content"] if messages else ""
            raise ModelGenerationError(


In the generic exception handler, messages[0]["content"] can raise KeyError if a caller passes a malformed ChatMessage (missing content), which would mask the original OpenAI error and bypass ModelGenerationError. Consider using a safe access pattern (e.g., get) or guarding against missing keys here so the error wrapper is reliable.

Copilot · 2026-03-04T11:18:36Z

                    if self.circuit_breaker:
                        self.circuit_breaker.record_failure()
-                    raise
+                    raise e


raise e in an except block discards the original traceback context and can make debugging much harder. Use a bare raise here to re-raise while preserving the original traceback.

Suggested change

raise e

raise

Copilot · 2026-03-04T11:18:37Z

+def load_hf_dataset(dataset_name: str, split: str = "train", **kwargs) -> list[dict[str, Any]]:
    """Load a dataset from the HuggingFace Datasets Hub into a list of dictionaries.

    Downloads and loads a dataset from the HuggingFace Hub, converting it to the


The docstring section below still states the function may return None when the dataset is empty, but the current signature/implementation always returns a list (or raises). Please update the docstring (and any callers that check for None) to match the actual contract.

Copilot · 2026-03-04T11:18:37Z

+    if isinstance(op, ast.Pow):
+        return left**right
+    raise ValueError("Unsupported arithmetic operator")


Even with eval/exec removed, allowing exponentiation (**) can still be used for computational/allocational DoS (e.g., 2**100000000) when allow_exec=True. Consider adding guards (max exponent / max AST depth / max operations) or disabling Pow in the restricted subset to keep the "sandbox" safer under adversarial input.

devin-ai-integration

Devin Review found 2 potential issues.

View 8 additional findings in Devin Review.

devin-ai-integration · 2026-03-04T11:20:48Z

-def load_hf_dataset(
-    dataset_name: str, split: str = "test", **kwargs
-) -> Optional[list[dict[str, Any]]]:
+def load_hf_dataset(dataset_name: str, split: str = "train", **kwargs) -> list[dict[str, Any]]:


🔴 Default split for load_hf_dataset silently changed from 'test' to 'train'

The default value of the split parameter in load_hf_dataset was changed from "test" to "train", but the docstring on line 242 still says "Defaults to 'test'", and API_REFERENCE.md:1601 also documents the old default. Any caller relying on the default (i.e., calling load_hf_dataset("some_dataset") without explicitly passing split) will now silently load the training split instead of the test split.

Root cause and impact

The function signature at insideLLMs/dataset_utils.py:229 changed:

# Old: def load_hf_dataset(dataset_name: str, split: str = "test", **kwargs) -> ...: # New: def load_hf_dataset(dataset_name: str, split: str = "train", **kwargs) -> ...:

This is a silent behavioral change. The _config_loader.py:591 caller is unaffected because it uses config.get("split", "test"), but any direct caller of load_hf_dataset that omits the split argument will now receive training data instead of test data. The docstring (line 242: "Defaults to 'test'") and API_REFERENCE.md are now inconsistent with the actual behavior.

Impact: Users running evaluations without specifying split will unknowingly evaluate against training data instead of test data, potentially producing misleading benchmark results.

Suggested change

def load_hf_dataset(dataset_name: str, split: str = "train", **kwargs) -> list[dict[str, Any]]:

def load_hf_dataset(dataset_name: str, split: str = "test", **kwargs) -> list[dict[str, Any]]:

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-03-04T11:20:50Z

+                except Exception as e:
                    if self.circuit_breaker:
                        self.circuit_breaker.record_failure()
-                    raise
+                    raise e


🟡 raise e instead of raise alters traceback and is inconsistent with sync version

In execute_async, the exception re-raise was changed from raise to raise e, which alters the traceback information and is inconsistent with the synchronous execute method.

Detailed explanation

At insideLLMs/rate_limiting.py:2899-2902, the code was changed:

except Exception as e: if self.circuit_breaker: self.circuit_breaker.record_failure() raise e # was: raise

In Python, raise preserves the original traceback of the caught exception, while raise e creates a new traceback starting from the raise e statement. This means the original call site information (where the exception actually occurred inside func()) is lost from the traceback.

The synchronous counterpart at insideLLMs/rate_limiting.py:2827-2830 correctly uses bare raise:

except Exception as _: if self.circuit_breaker: self.circuit_breaker.record_failure() raise

Impact: When debugging failures in async code paths, the traceback will point to the raise e line rather than the original failure location inside func(), making debugging harder.

Suggested change

except Exception as e:

if self.circuit_breaker:

self.circuit_breaker.record_failure()

raise

raise e

except Exception as _:

if self.circuit_breaker:

self.circuit_breaker.record_failure()

raise

Was this helpful? React with 👍 or 👎 to provide feedback.

coderabbitai

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)

insideLLMs/safety.py (1)

735-735: ⚠️ Potential issue | 🟠 Major

Add type parameters to Pattern annotations to fix mypy strict checking.

Lines 735 and 1110 use bare Pattern, which violates the strict mypy configuration for insideLLMs.safety (configured with disallow_untyped_defs = true and type-arg errors enabled).
Fix
-    def __init__(self, patterns: Optional[dict[str, Pattern]] = None):
+    def __init__(self, patterns: Optional[dict[str, Pattern[str]]] = None):

-        self._custom_patterns: list[tuple[str, Pattern, RiskLevel]] = []
+        self._custom_patterns: list[tuple[str, Pattern[str], RiskLevel]] = []
Required per the coding guidelines for insideLLMs/**/*.py: Typecheck Python code using Mypy on the insideLLMs module.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@insideLLMs/safety.py` at line 735, The bare Pattern type used in annotations
must be parameterized; update the annotations to use Pattern[str] (e.g., change
def __init__(self, patterns: Optional[dict[str, Pattern]] = None) to def
__init__(self, patterns: Optional[dict[str, Pattern[str]]] = None)) and do the
same for the other occurrence around line 1110 (replace Pattern with
Pattern[str]); also ensure the Pattern symbol you import (from typing or re)
supports the type parameter and adjust the import if necessary.

insideLLMs/experiment_tracking.py (1)

2504-2512: ⚠️ Potential issue | 🟠 Major

Add rollback on partial startup failure in MultiTracker.start_run.

If one tracker fails at Line 2506, previously started trackers stay active. This can leak runs and leave inconsistent state.

💡 Suggested fix

     def start_run(
         self,
         run_name: Optional[str] = None,
         run_id: Optional[str] = None,
         nested: bool = False,
     ) -> str:
@@
-        run_ids = []
-        for tracker in self.trackers:
-            rid = tracker.start_run(run_name=run_name, run_id=run_id, nested=nested)
-            run_ids.append(rid)
+        run_ids: list[str] = []
+        started: list[ExperimentTracker] = []
+        try:
+            for tracker in self.trackers:
+                rid = tracker.start_run(run_name=run_name, run_id=run_id, nested=nested)
+                started.append(tracker)
+                run_ids.append(rid)
+        except Exception:
+            for started_tracker in reversed(started):
+                try:
+                    started_tracker.end_run(status="failed")
+                except Exception:
+                    pass
+            raise
 
         first_run_id = run_ids[0]
         self._run_active = True
         self._run_id = first_run_id
         return first_run_id

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@insideLLMs/experiment_tracking.py` around lines 2504 - 2512,
MultiTracker.start_run currently loops through self.trackers calling
tracker.start_run and if one raises, previously started trackers remain active;
modify start_run so that if any tracker.start_run raises an exception you
iterate over the trackers that already started (use the collected run_ids or
tracker slice) and call the appropriate cleanup method (e.g., tracker.stop_run
or tracker.end_run — whichever the tracker API provides) for each started
tracker, catching and logging any cleanup exceptions but not suppressing the
original error, then re-raise the original exception; finally, only set
self._run_active and self._run_id after all trackers have successfully started
and return the first_run_id as before.

insideLLMs/dataset_utils.py (1)

229-255: ⚠️ Potential issue | 🟠 Major

load_hf_dataset default split change is a silent API break and conflicts with docs.

Line 229 defaults to "train", but Line 242 still states default "test". This will change behavior for callers that omit split.
💡 Suggested fix (if backward compatibility is intended)
-def load_hf_dataset(dataset_name: str, split: str = "train", **kwargs) -> list[dict[str, Any]]:
+def load_hf_dataset(dataset_name: str, split: str = "test", **kwargs) -> list[dict[str, Any]]:
Also align the Returns section: it currently says “Returns None only if the dataset is empty,” but the function returns list[dict[str, Any]] and already returns [] for empty datasets.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@insideLLMs/dataset_utils.py` around lines 229 - 255, The docstring and
function signature for load_hf_dataset disagree about the default split
(signature sets split="train" while the Args text says default "test") and the
Returns section incorrectly claims the function can return None; update them to
be consistent: decide which default split you want to keep (if preserving
backward compatibility, change the signature to split="test"; otherwise update
the Args text to state default "train"), and change the Returns text to state
that an empty dataset yields an empty list ([]) rather than None; ensure the
function definition name load_hf_dataset and its docstring are updated together
so callers and docs match.

insideLLMs/contrib/debugging.py (1)

1681-2034: ⚠️ Potential issue | 🟡 Minor

Update method return docs to match new Optional[DebugTraceEvent] signatures.

These methods now correctly return Optional[...], but their Returns sections still describe non-optional DebugTraceEvent, which creates a stale API contract.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@insideLLMs/contrib/debugging.py` around lines 1681 - 2034, Update the Returns
docstrings to reflect the Optional[DebugTraceEvent] return type for the changed
APIs: step_start, step_end, log_prompt, log_response, log_variable, log_error,
and log_warning. For each of these methods, change the "Returns" section text to
state that the method returns an Optional[DebugTraceEvent] (i.e., a
DebugTraceEvent or None) and preserve any existing descriptive details (e.g.,
truncation behavior) so callers understand when a None might be returned; keep
the examples and See Also as-is. Ensure the method signatures already showing
Optional[...] remain consistent with the updated Returns description.

🧹 Nitpick comments (8)

insideLLMs/schemas/registry.py (1)

470-473: Narrow the caught exception types in version parsing.

At Line 472, except Exception is broader than needed here and can hide unrelated defects.
Catch only the parse-related failures.

Proposed patch

-    except Exception as _:
+    except (ValueError, IndexError):
         return (0, 0, 0)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@insideLLMs/schemas/registry.py` around lines 470 - 473, The version-parsing
try/except in insideLLMs/schemas/registry.py currently catches Exception too
broadly; change the broad except to only catch parse-related errors (e.g.,
except (ValueError, IndexError, TypeError):) around the block that returns
(int(parts[0]), int(parts[1]), int(parts[2])) so only invalid integer
conversions, missing parts, or wrong types are handled and unrelated exceptions
still surface.

insideLLMs/semantic_cache.py (1)

1036-1038: Consider adding debug-level logging in Redis exception fallbacks.

These handlers intentionally degrade gracefully, but fully swallowing the exception makes production diagnosis difficult when Redis connectivity/serialization issues occur.

Also applies to: 1076-1077, 1085-1086, 1095-1096, 1104-1106, 1118-1119
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@insideLLMs/semantic_cache.py` around lines 1036 - 1038, In each Redis
exception fallback (the except Exception as _ blocks that increment
self._stats.misses and return None at/around the shown sites and the other
ranges 1076-1077, 1085-1086, 1095-1096, 1104-1106, 1118-1119), add a debug-level
log that records the caught exception and relevant context (operation name,
key/id, and any serialization details available) instead of fully swallowing it;
use the existing logger on the object (e.g., self._logger or module logger) and
include the exception info/traceback (exception instance) so the message gives
actionable diagnostics while preserving the current graceful fallback behavior.

insideLLMs/contrib/chains.py (1)

2563-2565: Preserve error-handler failure context before breaking.

If the custom error_handler itself fails, the exception is dropped. Capture and record it so failures remain debuggable.

Suggested adjustment

-                    except Exception as _:
+                    except Exception as handler_exc:
+                        errors.append(f"{step.name}: error_handler failed: {handler_exc}")
                         state.status = ChainStatus.FAILED
                         break

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@insideLLMs/contrib/chains.py` around lines 2563 - 2565, The except block
currently swallows any Exception as "_" then sets state.status =
ChainStatus.FAILED and breaks, losing the original failure context; change it to
capture the exception (e.g., "except Exception as err:"), record the error into
the chain state (for example set state.error or state.error_message to err or
str(err)) and/or log it, then set state.status = ChainStatus.FAILED and break so
the original exception context is preserved for debugging; reference the
error_handler invocation and the state/ChainStatus.FAILED update when making
this change.

insideLLMs/retry.py (2)

2327-2327: Prefer explicit module exports over globals-derived __all__.

Line 2327 exports every non-underscore binding in module globals, including incidental imports. An explicit __all__ keeps the public API stable and avoids accidental surface growth.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@insideLLMs/retry.py` at line 2327, The module currently sets __all__
dynamically via a globals() comprehension which exports every non-underscore
name (risking accidental exports); replace that line with an explicit __all__
list containing only the intended public API symbols (e.g., the public
functions, classes, and constants defined in this module such as RetryPolicy,
retry, backoff, etc.—use the actual names present in insideLLMs/retry.py),
remove the globals()-based expression, and update this explicit list whenever
you add or remove public symbols so the module export surface remains stable.

1118-1119: Fail fast on invariant break instead of synthesizing a retry cause.

On Line 1118 and Line 1266, if last_exception is still None, that indicates an internal invariant break. Converting it into a synthetic RuntimeError and then raising RetryExhaustedError can blur the real failure mode. Raise directly at the invariant boundary.

Suggested patch

-    if last_exception is None:
-        last_exception = RuntimeError("Retry exhausted without captured exception")
+    if last_exception is None:
+        raise RuntimeError("Retry exhausted without captured exception")
@@
-    if last_exception is None:
-        last_exception = RuntimeError("Retry exhausted without captured exception")
+    if last_exception is None:
+        raise RuntimeError("Retry exhausted without captured exception")

Also applies to: 1266-1267

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@insideLLMs/retry.py` around lines 1118 - 1119, The code currently synthesizes
a fake RuntimeError when last_exception is None before raising
RetryExhaustedError; instead, treat last_exception == None as an invariant
violation and raise immediately (e.g., raise AssertionError or a clear
RuntimeError with an "internal invariant broken: last_exception is None"
message) in the same locations where the synthetic exception is created (the
blocks that check last_exception prior to raising RetryExhaustedError). Remove
the synthetic-wrapping logic so the function fails fast on the invariant breach
rather than embedding a fabricated cause.

insideLLMs/cost_tracking.py (1)

1654-1654: Prefer explicit __all__ to avoid accidental public API expansion.

Building __all__ from globals() exports imported helpers and future globals unintentionally. An explicit allowlist keeps the module boundary stable.

♻️ Suggested change

-__all__ = [name for name in globals() if not name.startswith("_")]  # pyright: ignore[reportUnsupportedDunderAll]
+__all__ = [
+    "PricingTier",
+    "CostCategory",
+    "AlertLevel",
+    "TimeGranularity",
+    "ModelPricing",
+    "UsageRecord",
+    "Budget",
+    "BudgetAlert",
+    "CostSummary",
+    "CostForecast",
+    "PricingRegistry",
+    "TokenCostCalculator",
+    "UsageTracker",
+    "BudgetManager",
+    "CostForecaster",
+    "CostReporter",
+    "calculate_cost",
+    "estimate_request_cost",
+    "compare_model_costs",
+    "get_cheapest_model",
+    "create_usage_tracker",
+    "create_budget_manager",
+    "quick_cost_estimate",
+    "CostCalculator",
+]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@insideLLMs/cost_tracking.py` at line 1654, The module currently builds
__all__ dynamically using globals(), which accidentally exports imports and
future names; replace that line with an explicit allowlist by listing only the
intended public names (functions, classes, constants) exported by
insideLLMs.cost_tracking (e.g., __all__ = ["PublicClassA", "public_function_b",
...]) and remove the globals()-based expression (and the pyright ignore) so the
public API is stable and only the declared symbols are exported; update the list
whenever you intentionally add or remove public symbols.

insideLLMs/contrib/routing.py (1)

2303-2305: Add a direct regression test for model=None in _classify_with_model.

The new early return is valuable, but there isn’t an explicit assertion for this direct-call path in the referenced branch-coverage tests. Add one test to lock behavior to ("unknown", 0.0).

As per coding guidelines, Ensure test coverage is at least 95% for the insideLLMs module (coverage gating: --cov-fail-under=95).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@insideLLMs/contrib/routing.py` around lines 2303 - 2305, Add a direct
regression test that calls the _classify_with_model function with self.model set
to None and asserts it returns the tuple ("unknown", 0.0); specifically,
instantiate or mock the object that defines _classify_with_model (the routing
class in insideLLMs.contrib.routing), set its model attribute to None, call
_classify_with_model and assert the return equals ("unknown", 0.0); place the
test in the existing test suite for insideLLMs (new test name e.g.
test_classify_with_model_none) so module coverage stays >=95% to satisfy the
coverage gating.

insideLLMs/rate_limiting.py (1)

3429-3429: Prefer explicit __all__ over dynamic globals export for API stability.

The current dynamic __all__ unintentionally exports 46 names, including imported modules (asyncio, datetime, random, threading, time, inspect), type imports (Any, Optional, Callable, Enum, TypeVar), and utilities (deque, dataclass, field, wraps). These are implementation details that shouldn't be part of the public API. An explicit __all__ listing only intended exports (e.g., classes, decorators, handlers) will clarify the stable API surface and prevent drift from unrelated changes.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@insideLLMs/rate_limiting.py` at line 3429, The module currently builds
__all__ dynamically (the comprehension at the end) which exports many internal
imports; replace this with an explicit __all__ list containing only the intended
public API names (e.g., the module's public classes, decorators and handler
functions) by auditing the module for the real public symbols (RateLimiter-like
classes, any decorators such as rate_limit, exceptions like RateLimitExceeded,
public factory/getter functions) and listing them explicitly in __all__ instead
of using the globals() comprehension so implementation details (asyncio,
datetime, type imports, utilities) are not exported.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@insideLLMs/contrib/agents.py`:
- Around line 3322-3324: The public contract of create_python_tool no longer
matches its implementation: when called with allow_exec=True it only runs
arithmetic assignments/expressions via _safe_exec_python_subset but the
function/docstring still says it will “execute Python code”; either restore
broader execution behavior or update the public contract/docs and name to
reflect the restricted arithmetic-only execution. Modify the create_python_tool
docstring and any user-facing description to state “execute arithmetic-only
Python snippets” (or similar) and ensure symbols mentioned (create_python_tool,
allow_exec, and _safe_exec_python_subset) are updated so callers and tests are
aware of the narrowed behavior.
- Around line 3019-3081: The evaluator (_safe_exec_python_subset,
_eval_exec_expression and _apply_binary_operator) currently allows expressions
that can cause computational DoS (huge literals or exponents); add explicit
resource bounds: enforce a max number of statements in parsed.body in
_safe_exec_python_subset, reject ast.Constant values whose absolute value or
bit-length exceed a MAX_NUM_BITS, and reject ast.BinOp with ast.Pow where the
exponent or base exceed safe thresholds (e.g., max exponent magnitude or
bit-length) before evaluation; also enforce a maximum AST recursion/depth when
walking nodes in _eval_exec_expression and return clear ValueError messages when
limits are hit. Ensure checks run prior to performing any arithmetic and
reference the existing functions (_safe_exec_python_subset,
_eval_exec_expression, _apply_binary_operator) so the evaluator refuses
oversized constants/exponents and overly deep/long programs.

In `@insideLLMs/contrib/synthesis.py`:
- Around line 2546-2547: The except clause binds an unused variable `e` even
though the exception context is captured via logger.debug(..., exc_info=True);
remove the unnecessary exception binding by changing the except clause that
currently reads "except Exception as e:" to just "except Exception:" in the file
and ensure the logger.debug call (line shown) remains unchanged; look for the
except block containing logger.debug("Balance augmentation generation failed",
exc_info=True) to apply this edit.

In `@insideLLMs/dataset_utils.py`:
- Around line 69-73: The try/except around importlib.import_module("datasets")
currently catches ImportError too broadly; change it to catch
ModuleNotFoundError so missing optional package results in _datasets_module =
None, but real import errors inside the package still propagate. Specifically,
update the try/except that assigns _datasets_module (the
importlib.import_module("datasets") call) to use "except ModuleNotFoundError:
_datasets_module = None" (or add a separate except Exception: raise to re-raise
non-missing-package errors) so only the absent-package case is swallowed.

In `@insideLLMs/experiment_tracking.py`:
- Around line 119-124: The helper _load_optional_module currently catches
ImportError broadly and can mask transitive import failures; change the
exception handling to first catch ModuleNotFoundError as e and return None only
when e.name == module_name (re-raise otherwise), and then optionally keep a
general except ImportError: return None for other ImportError cases; reference
the function name _load_optional_module and the variable module_name to locate
where to adjust the try/except logic.

In `@insideLLMs/models/openai.py`:
- Line 454: When extracting the first message inside the chat exception handler,
avoid indexing into messages directly since messages[0]["content"] can raise and
mask the original exception; update the extraction of first_msg (the code
referencing messages and first_msg) to safely access content (e.g., check
messages truthiness and type of messages[0], use .get("content", "") or a guard)
or wrap only that access in its own try/except and preserve/re-raise/log the
original exception separately so the root error isn't hidden.

In `@insideLLMs/rate_limiting.py`:
- Around line 2899-2902: The except block currently re-raises the caught
exception with "raise e", which loses the original traceback; change the
re-raise to a bare "raise" so the original traceback is preserved (update the
block that calls self.circuit_breaker.record_failure() and replace "raise e"
with "raise").

---

Outside diff comments:
In `@insideLLMs/contrib/debugging.py`:
- Around line 1681-2034: Update the Returns docstrings to reflect the
Optional[DebugTraceEvent] return type for the changed APIs: step_start,
step_end, log_prompt, log_response, log_variable, log_error, and log_warning.
For each of these methods, change the "Returns" section text to state that the
method returns an Optional[DebugTraceEvent] (i.e., a DebugTraceEvent or None)
and preserve any existing descriptive details (e.g., truncation behavior) so
callers understand when a None might be returned; keep the examples and See Also
as-is. Ensure the method signatures already showing Optional[...] remain
consistent with the updated Returns description.

In `@insideLLMs/dataset_utils.py`:
- Around line 229-255: The docstring and function signature for load_hf_dataset
disagree about the default split (signature sets split="train" while the Args
text says default "test") and the Returns section incorrectly claims the
function can return None; update them to be consistent: decide which default
split you want to keep (if preserving backward compatibility, change the
signature to split="test"; otherwise update the Args text to state default
"train"), and change the Returns text to state that an empty dataset yields an
empty list ([]) rather than None; ensure the function definition name
load_hf_dataset and its docstring are updated together so callers and docs
match.

In `@insideLLMs/experiment_tracking.py`:
- Around line 2504-2512: MultiTracker.start_run currently loops through
self.trackers calling tracker.start_run and if one raises, previously started
trackers remain active; modify start_run so that if any tracker.start_run raises
an exception you iterate over the trackers that already started (use the
collected run_ids or tracker slice) and call the appropriate cleanup method
(e.g., tracker.stop_run or tracker.end_run — whichever the tracker API provides)
for each started tracker, catching and logging any cleanup exceptions but not
suppressing the original error, then re-raise the original exception; finally,
only set self._run_active and self._run_id after all trackers have successfully
started and return the first_run_id as before.

In `@insideLLMs/safety.py`:
- Line 735: The bare Pattern type used in annotations must be parameterized;
update the annotations to use Pattern[str] (e.g., change def __init__(self,
patterns: Optional[dict[str, Pattern]] = None) to def __init__(self, patterns:
Optional[dict[str, Pattern[str]]] = None)) and do the same for the other
occurrence around line 1110 (replace Pattern with Pattern[str]); also ensure the
Pattern symbol you import (from typing or re) supports the type parameter and
adjust the import if necessary.

---

Nitpick comments:
In `@insideLLMs/contrib/chains.py`:
- Around line 2563-2565: The except block currently swallows any Exception as
"_" then sets state.status = ChainStatus.FAILED and breaks, losing the original
failure context; change it to capture the exception (e.g., "except Exception as
err:"), record the error into the chain state (for example set state.error or
state.error_message to err or str(err)) and/or log it, then set state.status =
ChainStatus.FAILED and break so the original exception context is preserved for
debugging; reference the error_handler invocation and the
state/ChainStatus.FAILED update when making this change.

In `@insideLLMs/contrib/routing.py`:
- Around line 2303-2305: Add a direct regression test that calls the
_classify_with_model function with self.model set to None and asserts it returns
the tuple ("unknown", 0.0); specifically, instantiate or mock the object that
defines _classify_with_model (the routing class in insideLLMs.contrib.routing),
set its model attribute to None, call _classify_with_model and assert the return
equals ("unknown", 0.0); place the test in the existing test suite for
insideLLMs (new test name e.g. test_classify_with_model_none) so module coverage
stays >=95% to satisfy the coverage gating.

In `@insideLLMs/cost_tracking.py`:
- Line 1654: The module currently builds __all__ dynamically using globals(),
which accidentally exports imports and future names; replace that line with an
explicit allowlist by listing only the intended public names (functions,
classes, constants) exported by insideLLMs.cost_tracking (e.g., __all__ =
["PublicClassA", "public_function_b", ...]) and remove the globals()-based
expression (and the pyright ignore) so the public API is stable and only the
declared symbols are exported; update the list whenever you intentionally add or
remove public symbols.

In `@insideLLMs/rate_limiting.py`:
- Line 3429: The module currently builds __all__ dynamically (the comprehension
at the end) which exports many internal imports; replace this with an explicit
__all__ list containing only the intended public API names (e.g., the module's
public classes, decorators and handler functions) by auditing the module for the
real public symbols (RateLimiter-like classes, any decorators such as
rate_limit, exceptions like RateLimitExceeded, public factory/getter functions)
and listing them explicitly in __all__ instead of using the globals()
comprehension so implementation details (asyncio, datetime, type imports,
utilities) are not exported.

In `@insideLLMs/retry.py`:
- Line 2327: The module currently sets __all__ dynamically via a globals()
comprehension which exports every non-underscore name (risking accidental
exports); replace that line with an explicit __all__ list containing only the
intended public API symbols (e.g., the public functions, classes, and constants
defined in this module such as RetryPolicy, retry, backoff, etc.—use the actual
names present in insideLLMs/retry.py), remove the globals()-based expression,
and update this explicit list whenever you add or remove public symbols so the
module export surface remains stable.
- Around line 1118-1119: The code currently synthesizes a fake RuntimeError when
last_exception is None before raising RetryExhaustedError; instead, treat
last_exception == None as an invariant violation and raise immediately (e.g.,
raise AssertionError or a clear RuntimeError with an "internal invariant broken:
last_exception is None" message) in the same locations where the synthetic
exception is created (the blocks that check last_exception prior to raising
RetryExhaustedError). Remove the synthetic-wrapping logic so the function fails
fast on the invariant breach rather than embedding a fabricated cause.

In `@insideLLMs/schemas/registry.py`:
- Around line 470-473: The version-parsing try/except in
insideLLMs/schemas/registry.py currently catches Exception too broadly; change
the broad except to only catch parse-related errors (e.g., except (ValueError,
IndexError, TypeError):) around the block that returns (int(parts[0]),
int(parts[1]), int(parts[2])) so only invalid integer conversions, missing
parts, or wrong types are handled and unrelated exceptions still surface.

In `@insideLLMs/semantic_cache.py`:
- Around line 1036-1038: In each Redis exception fallback (the except Exception
as _ blocks that increment self._stats.misses and return None at/around the
shown sites and the other ranges 1076-1077, 1085-1086, 1095-1096, 1104-1106,
1118-1119), add a debug-level log that records the caught exception and relevant
context (operation name, key/id, and any serialization details available)
instead of fully swallowing it; use the existing logger on the object (e.g.,
self._logger or module logger) and include the exception info/traceback
(exception instance) so the message gives actionable diagnostics while
preserving the current graceful fallback behavior.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7c67e7c8-8537-4e32-8cd6-49f64b64d552

📥 Commits

Reviewing files that changed from the base of the PR and between e0d353c and b34124b.

📒 Files selected for processing (40)

insideLLMs/async_utils.py
insideLLMs/cli/_record_utils.py
insideLLMs/cli/commands/doctor.py
insideLLMs/config.py
insideLLMs/contrib/adapters.py
insideLLMs/contrib/agents.py
insideLLMs/contrib/chains.py
insideLLMs/contrib/debugging.py
insideLLMs/contrib/ensemble.py
insideLLMs/contrib/evalbom.py
insideLLMs/contrib/hitl.py
insideLLMs/contrib/routing.py
insideLLMs/contrib/synthesis.py
insideLLMs/cost_tracking.py
insideLLMs/dataset_utils.py
insideLLMs/experiment_tracking.py
insideLLMs/integrations/langchain.py
insideLLMs/models/base.py
insideLLMs/models/openai.py
insideLLMs/probes/agent_probe.py
insideLLMs/rate_limiting.py
insideLLMs/registry.py
insideLLMs/resources.py
insideLLMs/retry.py
insideLLMs/runtime/__init__.py
insideLLMs/runtime/_base.py
insideLLMs/runtime/_config_loader.py
insideLLMs/runtime/_high_level.py
insideLLMs/runtime/_result_utils.py
insideLLMs/runtime/reproducibility.py
insideLLMs/safety.py
insideLLMs/schemas/registry.py
insideLLMs/semantic_cache.py
insideLLMs/shadow.py
insideLLMs/structured_extraction.py
insideLLMs/trace/tracing.py
insideLLMs/types.py
tests/test_models_config_coverage.py
tests/test_probes_models_coverage.py
wiki/tutorials/Bias-Testing.md

coderabbitai · 2026-03-04T11:24:45Z

+def _apply_binary_operator(
+    op: ast.operator, left: Union[int, float], right: Union[int, float]
+) -> Union[int, float]:
+    if isinstance(op, ast.Add):
+        return left + right
+    if isinstance(op, ast.Sub):
+        return left - right
+    if isinstance(op, ast.Mult):
+        return left * right
+    if isinstance(op, ast.Div):
+        return left / right
+    if isinstance(op, ast.FloorDiv):
+        return left // right
+    if isinstance(op, ast.Mod):
+        return left % right
+    if isinstance(op, ast.Pow):
+        return left**right
+    raise ValueError("Unsupported arithmetic operator")
+
+
+def _apply_unary_operator(op: ast.unaryop, operand: Union[int, float]) -> Union[int, float]:
+    if isinstance(op, ast.UAdd):
+        return +operand
+    if isinstance(op, ast.USub):
+        return -operand
+    raise ValueError("Unsupported unary operator")
+
+
+def _eval_exec_expression(
+    node: ast.AST, variables: dict[str, Union[int, float]]
+) -> Union[int, float]:
+    if isinstance(node, ast.Constant):
+        if isinstance(node.value, bool) or not isinstance(node.value, (int, float)):
+            raise ValueError("Only numeric constants are allowed")
+        return node.value
+    if isinstance(node, ast.Name):
+        if node.id not in variables:
+            raise ValueError(f"Unknown variable: {node.id}")
+        return variables[node.id]
+    if isinstance(node, ast.BinOp):
+        left = _eval_exec_expression(node.left, variables)
+        right = _eval_exec_expression(node.right, variables)
+        return _apply_binary_operator(node.op, left, right)
+    if isinstance(node, ast.UnaryOp):
+        return _apply_unary_operator(node.op, _eval_exec_expression(node.operand, variables))
+    raise ValueError("Unsupported expression")
+
+
+def _safe_exec_python_subset(code: str) -> dict[str, Union[int, float]]:
+    """Execute a tightly restricted Python subset (assignments + arithmetic)."""
+    parsed = ast.parse(code, mode="exec")
+    variables: dict[str, Union[int, float]] = {}
+    for statement in parsed.body:
+        if isinstance(statement, ast.Assign):
+            if len(statement.targets) != 1 or not isinstance(statement.targets[0], ast.Name):
+                raise ValueError("Only simple variable assignments are supported")
+            variables[statement.targets[0].id] = _eval_exec_expression(statement.value, variables)
+            continue
+        if isinstance(statement, ast.Expr):
+            _eval_exec_expression(statement.value, variables)
+            continue
+        raise ValueError("Only arithmetic expressions and assignments are supported")
+    return variables


⚠️ Potential issue | 🟠 Major

Add resource bounds to the restricted evaluator.

This path is safer than raw exec, but it still allows computational DoS via huge literals/exponents (for example, very large ** operations). Add explicit complexity limits before evaluation.

💡 Proposed hardening patch

+MAX_CODE_CHARS = 4096 +MAX_AST_NODES = 256 +MAX_POWER_EXPONENT = 1000 + def _apply_binary_operator( op: ast.operator, left: Union[int, float], right: Union[int, float] ) -> Union[int, float]: @@ if isinstance(op, ast.Pow): - return left**right + if abs(right) > MAX_POWER_EXPONENT: + raise ValueError("Exponent too large") + return left**right @@ def _safe_exec_python_subset(code: str) -> dict[str, Union[int, float]]: """Execute a tightly restricted Python subset (assignments + arithmetic).""" + if len(code) > MAX_CODE_CHARS: + raise ValueError("Code snippet too large") parsed = ast.parse(code, mode="exec") + if sum(1 for _ in ast.walk(parsed)) > MAX_AST_NODES: + raise ValueError("Expression too complex") variables: dict[str, Union[int, float]] = {}

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def _apply_binary_operator(

op: ast.operator, left: Union[int, float], right: Union[int, float]

) -> Union[int, float]:

if isinstance(op, ast.Add):

return left + right

if isinstance(op, ast.Sub):

return left - right

if isinstance(op, ast.Mult):

return left * right

if isinstance(op, ast.Div):

return left / right

if isinstance(op, ast.FloorDiv):

return left // right

if isinstance(op, ast.Mod):

return left % right

if isinstance(op, ast.Pow):

return left**right

raise ValueError("Unsupported arithmetic operator")

def _apply_unary_operator(op: ast.unaryop, operand: Union[int, float]) -> Union[int, float]:

if isinstance(op, ast.UAdd):

return +operand

if isinstance(op, ast.USub):

return -operand

raise ValueError("Unsupported unary operator")

def _eval_exec_expression(

node: ast.AST, variables: dict[str, Union[int, float]]

) -> Union[int, float]:

if isinstance(node, ast.Constant):

if isinstance(node.value, bool) or not isinstance(node.value, (int, float)):

raise ValueError("Only numeric constants are allowed")

return node.value

if isinstance(node, ast.Name):

if node.id not in variables:

raise ValueError(f"Unknown variable: {node.id}")

return variables[node.id]

if isinstance(node, ast.BinOp):

left = _eval_exec_expression(node.left, variables)

right = _eval_exec_expression(node.right, variables)

return _apply_binary_operator(node.op, left, right)

if isinstance(node, ast.UnaryOp):

return _apply_unary_operator(node.op, _eval_exec_expression(node.operand, variables))

raise ValueError("Unsupported expression")

def _safe_exec_python_subset(code: str) -> dict[str, Union[int, float]]:

"""Execute a tightly restricted Python subset (assignments + arithmetic)."""

parsed = ast.parse(code, mode="exec")

variables: dict[str, Union[int, float]] = {}

for statement in parsed.body:

if isinstance(statement, ast.Assign):

if len(statement.targets) != 1 or not isinstance(statement.targets[0], ast.Name):

raise ValueError("Only simple variable assignments are supported")

variables[statement.targets[0].id] = _eval_exec_expression(statement.value, variables)

continue

if isinstance(statement, ast.Expr):

_eval_exec_expression(statement.value, variables)

continue

raise ValueError("Only arithmetic expressions and assignments are supported")

return variables

MAX_CODE_CHARS = 4096

MAX_AST_NODES = 256

MAX_POWER_EXPONENT = 1000

def _apply_binary_operator(

op: ast.operator, left: Union[int, float], right: Union[int, float]

) -> Union[int, float]:

if isinstance(op, ast.Add):

return left + right

if isinstance(op, ast.Sub):

return left - right

if isinstance(op, ast.Mult):

return left * right

if isinstance(op, ast.Div):

return left / right

if isinstance(op, ast.FloorDiv):

return left // right

if isinstance(op, ast.Mod):

return left % right

if isinstance(op, ast.Pow):

if abs(right) > MAX_POWER_EXPONENT:

raise ValueError("Exponent too large")

return left**right

raise ValueError("Unsupported arithmetic operator")

def _apply_unary_operator(op: ast.unaryop, operand: Union[int, float]) -> Union[int, float]:

if isinstance(op, ast.UAdd):

return +operand

if isinstance(op, ast.USub):

return -operand

raise ValueError("Unsupported unary operator")

def _eval_exec_expression(

node: ast.AST, variables: dict[str, Union[int, float]]

) -> Union[int, float]:

if isinstance(node, ast.Constant):

if isinstance(node.value, bool) or not isinstance(node.value, (int, float)):

raise ValueError("Only numeric constants are allowed")

return node.value

if isinstance(node, ast.Name):

if node.id not in variables:

raise ValueError(f"Unknown variable: {node.id}")

return variables[node.id]

if isinstance(node, ast.BinOp):

left = _eval_exec_expression(node.left, variables)

right = _eval_exec_expression(node.right, variables)

return _apply_binary_operator(node.op, left, right)

if isinstance(node, ast.UnaryOp):

return _apply_unary_operator(node.op, _eval_exec_expression(node.operand, variables))

raise ValueError("Unsupported expression")

def _safe_exec_python_subset(code: str) -> dict[str, Union[int, float]]:

"""Execute a tightly restricted Python subset (assignments + arithmetic)."""

if len(code) > MAX_CODE_CHARS:

raise ValueError("Code snippet too large")

parsed = ast.parse(code, mode="exec")

if sum(1 for _ in ast.walk(parsed)) > MAX_AST_NODES:

raise ValueError("Expression too complex")

variables: dict[str, Union[int, float]] = {}

for statement in parsed.body:

if isinstance(statement, ast.Assign):

if len(statement.targets) != 1 or not isinstance(statement.targets[0], ast.Name):

raise ValueError("Only simple variable assignments are supported")

variables[statement.targets[0].id] = _eval_exec_expression(statement.value, variables)

continue

if isinstance(statement, ast.Expr):

_eval_exec_expression(statement.value, variables)

continue

raise ValueError("Only arithmetic expressions and assignments are supported")

return variables

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@insideLLMs/contrib/agents.py` around lines 3019 - 3081, The evaluator (_safe_exec_python_subset, _eval_exec_expression and _apply_binary_operator) currently allows expressions that can cause computational DoS (huge literals or exponents); add explicit resource bounds: enforce a max number of statements in parsed.body in _safe_exec_python_subset, reject ast.Constant values whose absolute value or bit-length exceed a MAX_NUM_BITS, and reject ast.BinOp with ast.Pow where the exponent or base exceed safe thresholds (e.g., max exponent magnitude or bit-length) before evaluation; also enforce a maximum AST recursion/depth when walking nodes in _eval_exec_expression and return clear ValueError messages when limits are hit. Ensure checks run prior to performing any arithmetic and reference the existing functions (_safe_exec_python_subset, _eval_exec_expression, _apply_binary_operator) so the evaluator refuses oversized constants/exponents and overly deep/long programs.

coderabbitai · 2026-03-04T11:24:45Z

+            # Restricted execution for arithmetic-only snippets.
+            local_vars = _safe_exec_python_subset(code)
            return str(local_vars.get("result", "Code executed successfully"))


⚠️ Potential issue | 🟠 Major

Execution contract is now narrower than the public tool description.

create_python_tool(allow_exec=True) now executes only arithmetic assignments/expressions, but the function/doc text still says “execute Python code”. That is a behavior break for existing callers expecting general Python snippets.

💡 Proposed contract/docs alignment

def create_python_tool( @@ - """Create a Python code execution tool. + """Create a restricted Python-subset execution tool. @@ - Creates a tool that can execute Python code. By default, execution is - disabled for safety. Only enable execution in trusted environments. + Creates a tool that executes a restricted subset of Python + (simple assignments + arithmetic expressions). By default, execution + is disabled for safety. @@ return Tool( name="python", func=execute_python, - description="Execute Python code (use result variable for output)", + description="Execute restricted arithmetic subset (assign to `result` for output)", parameters=[ToolParameter("code", "str", "Python code to execute")], )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@insideLLMs/contrib/agents.py` around lines 3322 - 3324, The public contract of create_python_tool no longer matches its implementation: when called with allow_exec=True it only runs arithmetic assignments/expressions via _safe_exec_python_subset but the function/docstring still says it will “execute Python code”; either restore broader execution behavior or update the public contract/docs and name to reflect the restricted arithmetic-only execution. Modify the create_python_tool docstring and any user-facing description to state “execute arithmetic-only Python snippets” (or similar) and ensure symbols mentioned (create_python_tool, allow_exec, and _safe_exec_python_subset) are updated so callers and tests are aware of the narrowed behavior.

coderabbitai · 2026-03-04T11:24:45Z

+                except Exception as e:
                    logger.debug("Balance augmentation generation failed", exc_info=True)


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # First, verify the exact code at the target lines echo "=== Code at lines 2546-2547 ===" sed -n '2544,2548p' insideLLMs/contrib/synthesis.py echo -e "\n=== Check for unused exception variable pattern ===" rg -n 'except Exception as e' insideLLMs/contrib/synthesis.py echo -e "\n=== Run Ruff check on F841 errors in the file ===" ruff check insideLLMs/contrib/synthesis.py --select F841

Repository: dr-gareth-roberts/insideLLMs

Length of output: 1098

Remove unused exception binding at line 2546.

Local variable e is assigned to but never used in the except clause. The logger.debug() call uses exc_info=True to capture exception context, so the variable is not needed.

✅ Fix

- except Exception as e: + except Exception as _: logger.debug("Balance augmentation generation failed", exc_info=True)

🧰 Tools

🪛 GitHub Actions: CI

[error] 2546-2546: F841 Local variable 'e' is assigned to but never used. This is fixable with 'ruff --fix'.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@insideLLMs/contrib/synthesis.py` around lines 2546 - 2547, The except clause binds an unused variable `e` even though the exception context is captured via logger.debug(..., exc_info=True); remove the unnecessary exception binding by changing the except clause that currently reads "except Exception as e:" to just "except Exception:" in the file and ensure the logger.debug call (line shown) remains unchanged; look for the except block containing logger.debug("Balance augmentation generation failed", exc_info=True) to apply this edit.

coderabbitai · 2026-03-04T11:24:45Z

 try:
-    from datasets import load_dataset
-
-    HF_DATASETS_AVAILABLE = True
+    _datasets_module = importlib.import_module("datasets")
 except ImportError:
-    HF_DATASETS_AVAILABLE = False
+    _datasets_module = None
+


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

head -n 85 insideLLMs/dataset_utils.py | tail -n +60

Repository: dr-gareth-roberts/insideLLMs

Length of output: 818

🌐 Web query:

Python 3.10 ModuleNotFoundError vs ImportError best practices optional imports

💡 Result:

ModuleNotFoundError vs ImportError (Python 3.10)

ModuleNotFoundError is a subclass of ImportError that’s raised when the module can’t be located (including the special case where sys.modules[name] is None). [1]

ImportError is broader: it covers “module found but failed to import”, and also “from x import y where y doesn’t exist”, plus many other import-time failures. [1]

Best practices for optional imports

1) Prefer catching ModuleNotFoundError when “dependency not installed” is the only acceptable failure

This avoids accidentally hiding real bugs inside the optional dependency (or inside its import-time side effects).

try: import orjson except ModuleNotFoundError: orjson = None

(Reason: missing-module is the specific condition ModuleNotFoundError represents. [1])

2) Catch ImportError (not just ModuleNotFoundError) when you want “usable import” semantics

Some optional deps—especially compiled extensions—may be present but still fail to import due to missing shared libraries, ABI mismatches, etc., which often surface as ImportError. If your feature should be disabled in those cases too, catch ImportError and keep the exception for diagnostics. [2]

_orjson_exc = None try: import orjson except ImportError as e: orjson = None _orjson_exc = e

3) Fail at call time with a clear message (not at import time)

This keeps base functionality working without the extra dependency.

def dumps(obj) -> bytes: if orjson is None: raise ImportError("dumps() requires 'orjson'") from _orjson_exc return orjson.dumps(obj)

Using raise ... from None (or from the saved exception) is the standard way to control exception chaining and avoid noisy/irrelevant context. [3]

4) If you only need to check availability, use importlib.util.find_spec()

This lets you test whether a module is discoverable without importing it (and triggering side effects). [4]

from importlib.util import find_spec HAS_ORJSON = find_spec("orjson") is not None

5) Packaging: declare optional deps as “extras”

This is the ecosystem-standard way to make optional features installable (e.g., pip install yourpkg[orjson]). (Not a Python-runtime rule, but it complements the runtime patterns above.)

Sources

[1] Python docs – Built-in exceptions (ModuleNotFoundError is a subclass of ImportError, raised when a module can’t be located) (docs.python.org)
[2] Discussion of practical cases where ImportError is the right catch for optional compiled deps (scivision.dev)
[3] PEP 409 – raise ... from None to suppress irrelevant exception context (peps.python.org)
[4] Python docs – importlib.util.find_spec() for checking importability without importing (docs.python.org)

Citations:

1: https://docs.python.org/pt-br/3.10/library/exceptions.html?utm_source=openai

2: https://www.scivision.dev/python-importerror-vs-modulenotfound/?utm_source=openai

3: https://peps.python.org/pep-0409/?utm_source=openai

4: https://docs.python.org/3/library/importlib.html?utm_source=openai

Use ModuleNotFoundError instead of broad ImportError for optional import.

The current catch-all ImportError at line 71 can mask real import failures within the datasets module itself (e.g., missing dependencies, syntax errors during import), silently falling back to disabled functionality when the actual issue should be raised.

💡 Suggested fix

try: _datasets_module = importlib.import_module("datasets") -except ImportError: - _datasets_module = None +except ModuleNotFoundError: + _datasets_module = None

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@insideLLMs/dataset_utils.py` around lines 69 - 73, The try/except around importlib.import_module("datasets") currently catches ImportError too broadly; change it to catch ModuleNotFoundError so missing optional package results in _datasets_module = None, but real import errors inside the package still propagate. Specifically, update the try/except that assigns _datasets_module (the importlib.import_module("datasets") call) to use "except ModuleNotFoundError: _datasets_module = None" (or add a separate except Exception: raise to re-raise non-missing-package errors) so only the absent-package case is swallowed.

coderabbitai · 2026-03-04T11:24:45Z

+def _load_optional_module(module_name: str) -> Any | None:
+    """Load optional dependency module, returning None when unavailable."""
+    try:
+        return importlib.import_module(module_name)
+    except ImportError:
+        return None


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's check the actual code in the repository cat -n insideLLMs/experiment_tracking.py | sed -n '119,124p'

Repository: dr-gareth-roberts/insideLLMs

Length of output: 353

🏁 Script executed:

# Check if there are any usages of _load_optional_module in the codebase rg "_load_optional_module" -n

Repository: dr-gareth-roberts/insideLLMs

Length of output: 552

🏁 Script executed:

# Check for any tests related to this function fd "test.*\.py$" | xargs rg "_load_optional_module" -l

Repository: dr-gareth-roberts/insideLLMs

Length of output: 56

🏁 Script executed:

# Let's also check what Python version is being targeted cat setup.py 2>/dev/null || cat pyproject.toml 2>/dev/null | head -50

Repository: dr-gareth-roberts/insideLLMs

Length of output: 1400

🏁 Script executed:

# Verify Python's ModuleNotFoundError behavior and attributes in Python 3.10+ python3 <<'EOF' import sys print(f"Python version: {sys.version}") # Test ModuleNotFoundError attributes try: import nonexistent_module_xyz except ModuleNotFoundError as e: print(f"ModuleNotFoundError has 'name' attr: {hasattr(e, 'name')}") if hasattr(e, 'name'): print(f" e.name = {e.name}") print(f" str(e) = {str(e)}") # Test transitive import failure import tempfile import os # Create a test module that imports a missing dependency with tempfile.TemporaryDirectory() as tmpdir: test_module_path = os.path.join(tmpdir, 'test_mod.py') with open(test_module_path, 'w') as f: f.write('import missing_dependency\n') sys.path.insert(0, tmpdir) try: import test_mod except ModuleNotFoundError as e: print(f"\nTransitive ModuleNotFoundError:") print(f" hasattr(e, 'name'): {hasattr(e, 'name')}") if hasattr(e, 'name'): print(f" e.name = {e.name}") print(f" str(e) = {str(e)}") except ImportError as e: print(f"\nTransitive ImportError (not ModuleNotFoundError):") print(f" type(e) = {type(e).__name__}") print(f" str(e) = {str(e)}") EOF

Repository: dr-gareth-roberts/insideLLMs

Length of output: 403

🏁 Script executed:

# Check if there are any tests in the repository for this module fd "test" -t f -e py | head -20

Repository: dr-gareth-roberts/insideLLMs

Length of output: 786

🏁 Script executed:

# Verify ModuleNotFoundError is a subclass of ImportError python3 <<'EOF' print(f"ModuleNotFoundError is subclass of ImportError: {issubclass(ModuleNotFoundError, ImportError)}") # Check the exception hierarchy import inspect print(f"\nModuleNotFoundError MRO: {inspect.getmro(ModuleNotFoundError)}") EOF

Repository: dr-gareth-roberts/insideLLMs

Length of output: 269

🏁 Script executed:

# Search for any tests related to optional module loading rg "optional" -i tests/ | head -20

Repository: dr-gareth-roberts/insideLLMs

Length of output: 1610

🏁 Script executed:

# Look for tests specifically for experiment_tracking or _load_optional_module fd "test.*experiment" -t f tests/

Repository: dr-gareth-roberts/insideLLMs

Length of output: 148

🏁 Script executed:

# Check if there are any tests in tests/ directory that test the _load_optional_module function rg "_load_optional_module|experiment_tracking" tests/ -l

Repository: dr-gareth-roberts/insideLLMs

Length of output: 286

🏁 Script executed:

# Look at the test file for experiment_tracking to see if there are tests for optional module loading cat tests/test_experiment_tracking_coverage.py | head -100

Repository: dr-gareth-roberts/insideLLMs

Length of output: 2852

🏁 Script executed:

# Let's check test_experiment_tracking.py as well cat tests/test_experiment_tracking.py | head -100

Repository: dr-gareth-roberts/insideLLMs

Length of output: 2747

🏁 Script executed:

# Now let's understand the actual impact by looking at how _load_optional_module is used # Let's examine the context around lines 128-140 where it's used sed -n '119,145p' insideLLMs/experiment_tracking.py

Repository: dr-gareth-roberts/insideLLMs

Length of output: 1103

🏁 Script executed:

# Search for how wandb, mlflow, and _torch_tensorboard are used later in the code rg "wandb\.|mlflow\.|_torch_tensorboard" insideLLMs/experiment_tracking.py | head -30

Repository: dr-gareth-roberts/insideLLMs

Length of output: 1414

🏁 Script executed:

# Check the test_experiment_tracking_coverage.py file for tests about optional imports rg "ImportError|optional|_load_optional" tests/test_experiment_tracking_coverage.py -B 2 -A 5

Repository: dr-gareth-roberts/insideLLMs

Length of output: 1061

🏁 Script executed:

# Let's look for any other error handling patterns in the codebase for optional imports rg "except.*Error" insideLLMs/experiment_tracking.py -B 2 -A 2 | head -40

Repository: dr-gareth-roberts/insideLLMs

Length of output: 177

🏁 Script executed:

# Test the actual behavior difference between the current and proposed fix python3 <<'EOF' import sys import tempfile import os import importlib # Test 1: Module truly not found print("=== Test 1: Module not found ===") try: importlib.import_module("nonexistent_module_12345") except ImportError as e: print(f"Exception type: {type(e).__name__}") print(f"Is ModuleNotFoundError: {isinstance(e, ModuleNotFoundError)}") if isinstance(e, ModuleNotFoundError): print(f"exc.name: {e.name}") # Test 2: Module found but has broken import inside print("\n=== Test 2: Module with broken internal import ===") with tempfile.TemporaryDirectory() as tmpdir: broken_module = os.path.join(tmpdir, 'broken_mod.py') with open(broken_module, 'w') as f: f.write('import nonexistent_internal_dep\n') sys.path.insert(0, tmpdir) try: importlib.import_module("broken_mod") except ImportError as e: print(f"Exception type: {type(e).__name__}") print(f"Is ModuleNotFoundError: {isinstance(e, ModuleNotFoundError)}") if isinstance(e, ModuleNotFoundError): print(f"exc.name: {e.name}") print(f"exc.name == 'broken_mod': {e.name == 'broken_mod'}") print(f"Exception msg: {str(e)}") # Test 3: Other ImportError (hypothetical) print("\n=== Test 3: Circular import scenario ===") with tempfile.TemporaryDirectory() as tmpdir: mod_a = os.path.join(tmpdir, 'mod_a.py') mod_b = os.path.join(tmpdir, 'mod_b.py') with open(mod_a, 'w') as f: f.write('import mod_b\nx = 1\n') with open(mod_b, 'w') as f: f.write('from mod_a import x\n') sys.path.insert(0, tmpdir) try: importlib.import_module("mod_a") except ImportError as e: print(f"Exception type: {type(e).__name__}") print(f"Is ModuleNotFoundError: {isinstance(e, ModuleNotFoundError)}") print(f"Exception msg: {str(e)[:100]}") EOF

Repository: dr-gareth-roberts/insideLLMs

Length of output: 662

Catch ModuleNotFoundError specifically to avoid masking transitive import failures.

Line 123 catches ImportError broadly, which silently treats installed-but-broken dependencies (with missing transitive imports) as "not installed". The exception's name attribute indicates which module failed; re-raise if it's not the module you're trying to load.

💡 Suggested fix

def _load_optional_module(module_name: str) -> Any | None: """Load optional dependency module, returning None when unavailable.""" try: return importlib.import_module(module_name) - except ImportError: - return None + except ModuleNotFoundError as exc: + if exc.name == module_name: + return None + raise

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@insideLLMs/experiment_tracking.py` around lines 119 - 124, The helper _load_optional_module currently catches ImportError broadly and can mask transitive import failures; change the exception handling to first catch ModuleNotFoundError as e and return None only when e.name == module_name (re-raise otherwise), and then optionally keep a general except ImportError: return None for other ImportError cases; reference the function name _load_optional_module and the variable module_name to locate where to adjust the try/except logic.

coderabbitai · 2026-03-04T11:24:45Z

        except Exception as e:
            # Get first message content for error context
-            first_msg = messages[0].get("content", "") if messages else ""
+            first_msg = messages[0]["content"] if messages else ""


⚠️ Potential issue | 🟡 Minor

Avoid masking the original error in chat exception handling.

On Line 454, messages[0]["content"] can raise while already handling an exception, which hides the real failure source.

💡 Proposed fix

- first_msg = messages[0]["content"] if messages else "" + first_msg = "" + if messages: + first_msg = str(messages[0].get("content", ""))

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

first_msg = messages[0]["content"] if messages else ""

first_msg = ""

if messages:

first_msg = str(messages[0].get("content", ""))

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@insideLLMs/models/openai.py` at line 454, When extracting the first message inside the chat exception handler, avoid indexing into messages directly since messages[0]["content"] can raise and mask the original exception; update the extraction of first_msg (the code referencing messages and first_msg) to safely access content (e.g., check messages truthiness and type of messages[0], use .get("content", "") or a guard) or wrap only that access in its own try/except and preserve/re-raise/log the original exception separately so the root error isn't hidden.

coderabbitai · 2026-03-04T11:24:46Z

+                except Exception as e:
                    if self.circuit_breaker:
                        self.circuit_breaker.record_failure()
-                    raise
+                    raise e


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, check the structure and size of the file wc -l insideLLMs/rate_limiting.py

Repository: dr-gareth-roberts/insideLLMs

Length of output: 104

🏁 Script executed:

# Read the specific lines mentioned in the review (2899-2902) sed -n '2895,2905p' insideLLMs/rate_limiting.py

Repository: dr-gareth-roberts/insideLLMs

Length of output: 561

🏁 Script executed:

# Run the verification script from the review to find all similar patterns rg -nPU 'except\s+Exception\s+as\s+([A-Za-z_]\w*):(?:\n[ \t]+.*){0,8}\n[ \t]+raise\s+\1\b' insideLLMs/rate_limiting.py

Repository: dr-gareth-roberts/insideLLMs

Length of output: 144

🏁 Script executed:

# Search for re-raise patterns with exception variables using simpler regex rg -n 'except.*as\s+([a-z_]\w*)' insideLLMs/rate_limiting.py -A 5 | grep -E '(except|raise)' | head -50

Repository: dr-gareth-roberts/insideLLMs

Length of output: 704

🏁 Script executed:

# More direct: find all lines with "raise e" or similar patterns rg -n 'raise\s+[a-z_]\w+' insideLLMs/rate_limiting.py

Repository: dr-gareth-roberts/insideLLMs

Length of output: 54

🏁 Script executed:

# Check if bare raise is used elsewhere in the file for comparison rg -n '^\s+raise\s*$' insideLLMs/rate_limiting.py

Repository: dr-gareth-roberts/insideLLMs

Length of output: 194

Use bare raise to preserve the traceback chain.

The raise e at line 2902 drops the original traceback. Use bare raise instead to maintain source context, consistent with the pattern used throughout the rest of this file.

💡 Proposed fix

except Exception as e: if self.circuit_breaker: self.circuit_breaker.record_failure() - raise e + raise

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

except Exception as e:

if self.circuit_breaker:

self.circuit_breaker.record_failure()

raise

raise e

except Exception as e:

if self.circuit_breaker:

self.circuit_breaker.record_failure()

raise

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@insideLLMs/rate_limiting.py` around lines 2899 - 2902, The except block currently re-raises the caught exception with "raise e", which loses the original traceback; change the re-raise to a bare "raise" so the original traceback is preserved (update the block that calls self.circuit_breaker.record_failure() and replace "raise e" with "raise").

Hardening pass: tracking, models, runtime, tests, and docs alignment

b34124b

Copilot AI review requested due to automatic review settings March 4, 2026 11:13

Copilot started reviewing on behalf of dr-gareth-roberts March 4, 2026 11:14 View session

sourcery-ai Bot reviewed Mar 4, 2026

View reviewed changes

Copilot AI reviewed Mar 4, 2026

View reviewed changes

devin-ai-integration Bot reviewed Mar 4, 2026

View reviewed changes

coderabbitai Bot reviewed Mar 4, 2026

View reviewed changes

	def load_hf_dataset(dataset_name: str, split: str = "train", **kwargs) -> list[dict[str, Any]]:
	def load_hf_dataset(dataset_name: str, split: str = "test", **kwargs) -> list[dict[str, Any]]:

		except Exception as e:
		logger.debug("Balance augmentation generation failed", exc_info=True)

-            first_msg = messages[0]["content"] if messages else ""
+            first_msg = ""
+            if messages:
+                first_msg = str(messages[0].get("content", ""))

Conversation

dr-gareth-roberts commented Mar 4, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Summary by CodeRabbit

Uh oh!

bolt-new-by-stackblitz Bot commented Mar 4, 2026

Uh oh!

sourcery-ai Bot commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Updated class diagram for experiment trackers and MultiTracker

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

github-actions Bot commented Mar 4, 2026

insideLLMs Behavioural Diff

Regressions (0)

Improvements (0)

Other changes (0)

Only in baseline (0)

Only in candidate (0)

Trace drifts (0)

Trace violation increases (0)

Uh oh!

coderabbitai Bot commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

dosubot Bot commented Mar 4, 2026

Agent Probe and Trace-Aware Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 4, 2026

dr-gareth-roberts commented Mar 4, 2026 •

edited by coderabbitai Bot

Loading

sourcery-ai Bot commented Mar 4, 2026 •

edited

Loading

coderabbitai Bot commented Mar 4, 2026 •

edited

Loading

`ModuleNotFoundError` vs `ImportError` (Python 3.10)

1) Prefer catching `ModuleNotFoundError` when “dependency not installed” is the only acceptable failure

2) Catch `ImportError` (not just `ModuleNotFoundError`) when you want “usable import” semantics

4) If you only need to check availability, use `importlib.util.find_spec()`