Adds bead.protocol annotation-protocol layer (v0.4.0) by aaronstevenwhite · Pull Request #5 · FACTSlab/bead

aaronstevenwhite · 2026-05-07T13:00:53Z

Description

Adds the bead.protocol package — a type-theoretic stack for defining annotation protocols — and wires it into every stage of the bead pipeline. Anchors define question types, contexts are dependent indices, realization strategies (template / contextual / LM) are computational content, and drift guards type-check realized prompts. QuestionFamily and AnnotationProtocol compose these into a sequenced, conditional pipeline.

The release also eliminates several pre-existing redundancies the integration touched: three independent [[label]] regex parsers, two parallel LM caches, and string-keyed task-type → model-class dicts replicated across the CLI.

Motivation

Annotation-protocol design previously had no first-class home in bead. Anchors / drift / realization / response-encoding existed as concepts but were not expressible declaratively or composable into experimental items, deployable trials, training data, or analyses. This release makes them a first-class part of the pipeline.

Fixes #

Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation update
Refactoring (no functional changes)
Tests (adding or updating tests)

What's in the release

New top-level package — bead.protocol

SemanticAnchor / ResponseSpace / SemanticPoles (type-level question spec)
ProtocolContext / ContextItem + named-predicate registry
RealizationStrategy Protocol + TemplateRealization, ContextualTemplateRealization, LMRealization
DriftScore / DriftGuard + StructuralDriftValidator, EmbeddingDriftValidator, PerplexityDriftValidator
QuestionFamily, AnnotationProtocol (with depends_on graph validation)
ScaleType / ResponseEncoding (likelihood-agnostic) + encode_response_space
DiagnosticLevel / DiagnosticRecord / DatasetReport / ConditionalObservationValidator

Companion module — bead.evaluation.reliability

AnnotationRecord, AnnotatorReliability, annotator_reliability, low_entropy_annotators

Pipeline integration

bead.labels — single canonical [[label]] parser used by drift, deployment, items
bead.config.protocol.ProtocolConfig — declarative TOML/YAML form, plugged into BeadConfig.protocol
bead.protocol.items — family_to_item_template / realization_to_item / realize_protocol_to_items + canonical scale_type_to_task_type
bead.active_learning.models.registry — single MODEL_CLASSES / CONFIG_CLASSES dicts and model_class_for_encoding
bead.deployment.protocol_trials.protocol_to_jspsych_trials — protocol → jsPsych trials end-to-end
bead.data_collection.jatos_results_to_annotation_records — JATOS results → AnnotationRecord
bead.cli.protocol — bead protocol validate / realize / items

Breaking changes

LMRealization now requires model_name: str and accepts cache: ModelOutputCache | None. The internal FIFO _cache, max_cache_size, cache: bool, clear_cache(), and cache_size are removed; the bead-wide ModelOutputCache is the single canonical caching surface.
bead.cli.models no longer exposes TASK_TYPE_MODELS / TASK_TYPE_CONFIGS / _import_class. Callers use bead.active_learning.models.registry directly. bead.cli.training follows the same pattern.
bead.deployment.jspsych.trials._parse_prompt_references, _SpanReference, _SPAN_REF_PATTERN, and the duplicate _SPAN_REF_PATTERN in bead.items.span_labeling are deleted; callers use bead.labels.parse_label_refs / LabelRef.

No backward-compat shims. Every call site is migrated in this PR.

Checklist

I have read the CONTRIBUTING guidelines
My code follows the project's style guidelines
I have run uv run ruff check . and uv run ruff format .
I have run uv run pyright with no errors
I have added tests that prove my fix/feature works
All tests pass (uv run pytest tests/)
I have updated documentation as needed

Testing

178 new tests across tests/protocol/, tests/test_labels.py, tests/config/test_protocol_config.py, tests/data_collection/test_records.py, tests/cli/test_protocol.py, plus migrated mocks in tests/cli/test_models.py and tests/cli/test_training.py.
Full suite: 3201/3201 passing on this branch (excluding pre-existing failures on main: tests/active_learning/trainers/test_lightning.py, tests/active_learning/models/* save-load tests, tests/items/test_span_labeling.py::test_default_config (spaCy/pydantic-v1 conflict), tests/behavioral/* (slopit not installed)).
Pyright strict and ruff clean on every new module.
End-to-end integration test: tests/protocol/test_end_to_end.py exercises anchor → context → realization → drift → reliability → diagnostics.
Per-bridge integration tests: items (test_items_bridge.py), deployment (test_deployment_bridge.py), JATOS records (test_records.py), CLI (test_protocol.py), config (test_protocol_config.py).

Documentation

docs/api/protocol.md, docs/api/labels.md, docs/api/evaluation.md, docs/api/config.md, docs/api/active_learning.md, docs/api/deployment.md, docs/api/data_collection.md — auto-rendered API reference for every new module.
docs/user-guide/protocols.md — narrative walkthrough including the configuration-driven workflow, the CLI, the item / deployment / JATOS bridges, and active-learning model selection.
docs/user-guide/concepts.md, docs/user-guide/index.md, docs/index.md, docs/developer-guide/architecture.md, README.md — cross-linked.
CHANGELOG.md — [0.4.0] - 2026-05-07 entry.

The new bead.protocol package provides a type-theoretic stack for defining annotation protocols: SemanticAnchor as the question type, ProtocolContext as the dependent index, RealizationStrategy (Template / Contextual / LM) as the computational content, and DriftGuard with Structural / Embedding / Perplexity validators as the type-checker. QuestionFamily packages these together; AnnotationProtocol sequences families into the iterated dependent product, threading responses through the context so later families can condition on earlier answers. bead.evaluation gains AnnotationRecord, AnnotatorReliability, and annotator_reliability / low_entropy_annotators for per-annotator Shannon-entropy diagnostics that complement the existing InterAnnotatorMetrics. Documentation: docs/api/protocol.md, docs/api/evaluation.md update, docs/user-guide/protocols.md, mkdocs nav, and CHANGELOG entry. 95 new tests, 0 pyright / ruff errors on the new code, 0 new mkdocs strict warnings.

Round 1: Adds n_levels/labels/uniqueness invariant to ResponseEncoding via @dx.model_validator (BINARY must have 2 levels, no duplicate labels). Adds forward-only depends_on graph validation to AnnotationProtocol (rejects self-dependencies and forward / unknown references at __post_init__ and append). Round 2: Consolidates the duplicate always predicate (was defined in both context.py and realization.py); the canonical definition lives in context.py and is registered as the "always" entry, with realization.py importing it. Imports ContextPredicate from its canonical module in bead.protocol.__init__. Round 3: LMRealization now raises RuntimeError on an empty LM response instead of silently caching an empty string. Round 4: Tightens EmbeddingDriftValidator and PerplexityDriftValidator docstrings to reference the EmbeddingAdapter / PerplexityAdapter Protocols rather than naming a single adapter implementation. Cleans up DriftGuard docstring inconsistency between the (removed) Parameters section and the actual default (empty list, not None). Removes the "downstream packages" framing in the context-predicate registry doc. Test additions (12 new tests, 105 total): - ResponseEncoding validator boundary cases (mismatched n_levels, duplicate labels, BINARY with non-2 levels) - AnnotationProtocol dependency-graph rejection (self-dependency, forward dependency, unknown dependency, append paths) - LMRealization empty-response and quoted-empty-response cases

…e gaps Surfaces bead.protocol from the cross-cutting overview docs that previously didn't reference it: the README feature list, docs/index.md key-features list, docs/user-guide/index.md core-concepts section, a new "Annotation Protocols" section in docs/user-guide/concepts.md, and a new "bead/protocol/" subsection in docs/developer-guide/architecture.md plus a paragraph framing it as a cross-cutting layer that feeds Stage 3 item construction. Extends docs/user-guide/protocols.md with the pieces that were only in the auto-rendered API reference: ContextItem.attribute(), the context-predicate registry (register / get / list_context_predicates), LMClient details (caching, FIFO eviction, RuntimeError on empty / backend-failure responses), the named EmbeddingAdapter / PerplexityAdapter Protocols, the construction-time invariants on ResponseEncoding and AnnotationProtocol (forward-only depends_on, BINARY-must-have-2-levels, n_levels matches labels), the encode_response_space bridge to the modeling layer, the RecordLike Protocol consumed by ConditionalObservationValidator, and the question_name / require_min_responses refinements on low_entropy_annotators. CHANGELOG: documents the construction-time invariants, the empty-LM- response RuntimeError, the RecordLike Protocol, and the cross-link edits to overview docs.

Eliminates the parallel implementations identified in the prior review and makes bead.protocol a fully integrated part of the bead pipeline. Every integration replaces the duplicate it touches; no shims, no deprecation aliases. Single canonical sites ---------------------- - bead.labels: parse_label_refs / find_label_names / replace_label_refs with one compiled regex. The regex copies in bead.protocol.drift, bead.deployment.jspsych.trials, and bead.items.span_labeling are deleted; their three callers now use the shared parser. - bead.active_learning.models.registry: MODEL_CLASSES / CONFIG_CLASSES dicts plus model_class_for_task_type / config_class_for_task_type / model_class_for_encoding / config_class_for_encoding. The string-keyed TASK_TYPE_MODELS and TASK_TYPE_CONFIGS dicts and the dynamic _import_class helper in bead.cli.models and bead.cli.training are deleted; both CLIs now call the registry directly. - ModelOutputCache is the single canonical caching surface. LMRealization gains required model_name and ModelOutputCache parameters; the internal FIFO dict, max_cache_size, cache, clear_cache, and cache_size are deleted. New integration modules ----------------------- - bead.config.protocol: AnchorSpec / TemplateVariantSpec / FamilySpec / DriftConfig / ProtocolConfig give a declarative TOML/YAML form for the entire protocol. ProtocolConfig.build(lm_client=..., cache=...) materializes a live AnnotationProtocol. Plugged into BeadConfig.protocol so the same config drives Python and CLI. - bead.protocol.items: scale_type_to_task_type (the single ScaleType -> TaskType mapping), family_to_item_template, realization_to_item, protocol_to_item_templates, realize_protocol_to_items. - bead.deployment.protocol_trials: protocol_to_jspsych_trials packs AnnotationProtocol + contexts -> jsPsych trial dicts end-to-end. - bead.data_collection.records: jatos_results_to_annotation_records bridges JATOS results to AnnotationRecord, the canonical input to reliability and inter-annotator-agreement metrics. - bead.cli.protocol: bead protocol validate / realize / items drive ProtocolConfig from the shell. Registered in cli/main.py alongside the other stage subcommands. Tests ----- - 178 new and migrated tests across protocol, labels, config, data_collection, CLI; full suite passes (3201/3201, excluding the pre-existing pydantic-v1 spaCy and slopit and Lightning failures that exist on main). - Pyright and ruff clean on every new file. Dev deps -------- - spacy / stanza added to the dev extra so tokenizer-dependent tests can run.

Pipeline-wide integration of the bead.protocol layer: shared label parser, ModelOutputCache-backed LMRealization, single task-type → model-class registry, declarative ProtocolConfig wired into BeadConfig, item / deployment / JATOS-record / CLI bridges, and the bead protocol subcommand. Every duplicate replaced; no shims.

aaronstevenwhite added 6 commits May 6, 2026 19:04

Applies ruff format

b849437

aaronstevenwhite merged commit db46619 into main May 7, 2026
8 checks passed

aaronstevenwhite deleted the feat/protocol-layer branch May 7, 2026 13:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds bead.protocol annotation-protocol layer (v0.4.0)#5

Adds bead.protocol annotation-protocol layer (v0.4.0)#5
aaronstevenwhite merged 6 commits intomainfrom
feat/protocol-layer

aaronstevenwhite commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aaronstevenwhite commented May 7, 2026

Description

Motivation

Type of Change

What's in the release

Breaking changes

Checklist

Testing

Documentation

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant