Skip to content

Adds bead.protocol annotation-protocol layer (v0.4.0)#5

Merged
aaronstevenwhite merged 6 commits intomainfrom
feat/protocol-layer
May 7, 2026
Merged

Adds bead.protocol annotation-protocol layer (v0.4.0)#5
aaronstevenwhite merged 6 commits intomainfrom
feat/protocol-layer

Conversation

@aaronstevenwhite
Copy link
Copy Markdown
Collaborator

Description

Adds the bead.protocol package — a type-theoretic stack for defining annotation protocols — and wires it into every stage of the bead pipeline. Anchors define question types, contexts are dependent indices, realization strategies (template / contextual / LM) are computational content, and drift guards type-check realized prompts. QuestionFamily and AnnotationProtocol compose these into a sequenced, conditional pipeline.

The release also eliminates several pre-existing redundancies the integration touched: three independent [[label]] regex parsers, two parallel LM caches, and string-keyed task-type → model-class dicts replicated across the CLI.

Motivation

Annotation-protocol design previously had no first-class home in bead. Anchors / drift / realization / response-encoding existed as concepts but were not expressible declaratively or composable into experimental items, deployable trials, training data, or analyses. This release makes them a first-class part of the pipeline.

Fixes #

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update
  • Refactoring (no functional changes)
  • Tests (adding or updating tests)

What's in the release

New top-level package — bead.protocol

  • SemanticAnchor / ResponseSpace / SemanticPoles (type-level question spec)
  • ProtocolContext / ContextItem + named-predicate registry
  • RealizationStrategy Protocol + TemplateRealization, ContextualTemplateRealization, LMRealization
  • DriftScore / DriftGuard + StructuralDriftValidator, EmbeddingDriftValidator, PerplexityDriftValidator
  • QuestionFamily, AnnotationProtocol (with depends_on graph validation)
  • ScaleType / ResponseEncoding (likelihood-agnostic) + encode_response_space
  • DiagnosticLevel / DiagnosticRecord / DatasetReport / ConditionalObservationValidator

Companion module — bead.evaluation.reliability

  • AnnotationRecord, AnnotatorReliability, annotator_reliability, low_entropy_annotators

Pipeline integration

  • bead.labels — single canonical [[label]] parser used by drift, deployment, items
  • bead.config.protocol.ProtocolConfig — declarative TOML/YAML form, plugged into BeadConfig.protocol
  • bead.protocol.itemsfamily_to_item_template / realization_to_item / realize_protocol_to_items + canonical scale_type_to_task_type
  • bead.active_learning.models.registry — single MODEL_CLASSES / CONFIG_CLASSES dicts and model_class_for_encoding
  • bead.deployment.protocol_trials.protocol_to_jspsych_trials — protocol → jsPsych trials end-to-end
  • bead.data_collection.jatos_results_to_annotation_records — JATOS results → AnnotationRecord
  • bead.cli.protocolbead protocol validate / realize / items

Breaking changes

  • LMRealization now requires model_name: str and accepts cache: ModelOutputCache | None. The internal FIFO _cache, max_cache_size, cache: bool, clear_cache(), and cache_size are removed; the bead-wide ModelOutputCache is the single canonical caching surface.
  • bead.cli.models no longer exposes TASK_TYPE_MODELS / TASK_TYPE_CONFIGS / _import_class. Callers use bead.active_learning.models.registry directly. bead.cli.training follows the same pattern.
  • bead.deployment.jspsych.trials._parse_prompt_references, _SpanReference, _SPAN_REF_PATTERN, and the duplicate _SPAN_REF_PATTERN in bead.items.span_labeling are deleted; callers use bead.labels.parse_label_refs / LabelRef.

No backward-compat shims. Every call site is migrated in this PR.

Checklist

  • I have read the CONTRIBUTING guidelines
  • My code follows the project's style guidelines
  • I have run uv run ruff check . and uv run ruff format .
  • I have run uv run pyright with no errors
  • I have added tests that prove my fix/feature works
  • All tests pass (uv run pytest tests/)
  • I have updated documentation as needed

Testing

  • 178 new tests across tests/protocol/, tests/test_labels.py, tests/config/test_protocol_config.py, tests/data_collection/test_records.py, tests/cli/test_protocol.py, plus migrated mocks in tests/cli/test_models.py and tests/cli/test_training.py.
  • Full suite: 3201/3201 passing on this branch (excluding pre-existing failures on main: tests/active_learning/trainers/test_lightning.py, tests/active_learning/models/* save-load tests, tests/items/test_span_labeling.py::test_default_config (spaCy/pydantic-v1 conflict), tests/behavioral/* (slopit not installed)).
  • Pyright strict and ruff clean on every new module.
  • End-to-end integration test: tests/protocol/test_end_to_end.py exercises anchor → context → realization → drift → reliability → diagnostics.
  • Per-bridge integration tests: items (test_items_bridge.py), deployment (test_deployment_bridge.py), JATOS records (test_records.py), CLI (test_protocol.py), config (test_protocol_config.py).

Documentation

  • docs/api/protocol.md, docs/api/labels.md, docs/api/evaluation.md, docs/api/config.md, docs/api/active_learning.md, docs/api/deployment.md, docs/api/data_collection.md — auto-rendered API reference for every new module.
  • docs/user-guide/protocols.md — narrative walkthrough including the configuration-driven workflow, the CLI, the item / deployment / JATOS bridges, and active-learning model selection.
  • docs/user-guide/concepts.md, docs/user-guide/index.md, docs/index.md, docs/developer-guide/architecture.md, README.md — cross-linked.
  • CHANGELOG.md[0.4.0] - 2026-05-07 entry.

The new bead.protocol package provides a type-theoretic stack for
defining annotation protocols: SemanticAnchor as the question type,
ProtocolContext as the dependent index, RealizationStrategy
(Template / Contextual / LM) as the computational content, and
DriftGuard with Structural / Embedding / Perplexity validators as the
type-checker. QuestionFamily packages these together; AnnotationProtocol
sequences families into the iterated dependent product, threading
responses through the context so later families can condition on
earlier answers.

bead.evaluation gains AnnotationRecord, AnnotatorReliability, and
annotator_reliability / low_entropy_annotators for per-annotator
Shannon-entropy diagnostics that complement the existing
InterAnnotatorMetrics.

Documentation: docs/api/protocol.md, docs/api/evaluation.md update,
docs/user-guide/protocols.md, mkdocs nav, and CHANGELOG entry.

95 new tests, 0 pyright / ruff errors on the new code, 0 new mkdocs
strict warnings.
Round 1: Adds n_levels/labels/uniqueness invariant to ResponseEncoding
via @dx.model_validator (BINARY must have 2 levels, no duplicate labels).
Adds forward-only depends_on graph validation to AnnotationProtocol
(rejects self-dependencies and forward / unknown references at
__post_init__ and append).

Round 2: Consolidates the duplicate always predicate (was defined in
both context.py and realization.py); the canonical definition lives in
context.py and is registered as the "always" entry, with realization.py
importing it. Imports ContextPredicate from its canonical module in
bead.protocol.__init__.

Round 3: LMRealization now raises RuntimeError on an empty LM response
instead of silently caching an empty string.

Round 4: Tightens EmbeddingDriftValidator and PerplexityDriftValidator
docstrings to reference the EmbeddingAdapter / PerplexityAdapter
Protocols rather than naming a single adapter implementation. Cleans
up DriftGuard docstring inconsistency between the (removed) Parameters
section and the actual default (empty list, not None). Removes the
"downstream packages" framing in the context-predicate registry doc.

Test additions (12 new tests, 105 total):
- ResponseEncoding validator boundary cases (mismatched n_levels,
  duplicate labels, BINARY with non-2 levels)
- AnnotationProtocol dependency-graph rejection (self-dependency,
  forward dependency, unknown dependency, append paths)
- LMRealization empty-response and quoted-empty-response cases
…e gaps

Surfaces bead.protocol from the cross-cutting overview docs that
previously didn't reference it: the README feature list, docs/index.md
key-features list, docs/user-guide/index.md core-concepts section, a new
"Annotation Protocols" section in docs/user-guide/concepts.md, and a
new "bead/protocol/" subsection in docs/developer-guide/architecture.md
plus a paragraph framing it as a cross-cutting layer that feeds Stage 3
item construction.

Extends docs/user-guide/protocols.md with the pieces that were only in
the auto-rendered API reference: ContextItem.attribute(), the
context-predicate registry (register / get / list_context_predicates),
LMClient details (caching, FIFO eviction, RuntimeError on empty /
backend-failure responses), the named EmbeddingAdapter / PerplexityAdapter
Protocols, the construction-time invariants on ResponseEncoding and
AnnotationProtocol (forward-only depends_on, BINARY-must-have-2-levels,
n_levels matches labels), the encode_response_space bridge to the
modeling layer, the RecordLike Protocol consumed by
ConditionalObservationValidator, and the question_name /
require_min_responses refinements on low_entropy_annotators.

CHANGELOG: documents the construction-time invariants, the empty-LM-
response RuntimeError, the RecordLike Protocol, and the cross-link
edits to overview docs.
Eliminates the parallel implementations identified in the prior review
and makes bead.protocol a fully integrated part of the bead pipeline.
Every integration replaces the duplicate it touches; no shims, no
deprecation aliases.

Single canonical sites
----------------------
- bead.labels: parse_label_refs / find_label_names / replace_label_refs
  with one compiled regex. The regex copies in
  bead.protocol.drift, bead.deployment.jspsych.trials, and
  bead.items.span_labeling are deleted; their three callers now use
  the shared parser.
- bead.active_learning.models.registry: MODEL_CLASSES /
  CONFIG_CLASSES dicts plus model_class_for_task_type /
  config_class_for_task_type / model_class_for_encoding /
  config_class_for_encoding. The string-keyed TASK_TYPE_MODELS and
  TASK_TYPE_CONFIGS dicts and the dynamic _import_class helper in
  bead.cli.models and bead.cli.training are deleted; both CLIs now
  call the registry directly.
- ModelOutputCache is the single canonical caching surface.
  LMRealization gains required model_name and ModelOutputCache
  parameters; the internal FIFO dict, max_cache_size, cache,
  clear_cache, and cache_size are deleted.

New integration modules
-----------------------
- bead.config.protocol: AnchorSpec / TemplateVariantSpec / FamilySpec /
  DriftConfig / ProtocolConfig give a declarative TOML/YAML form for
  the entire protocol. ProtocolConfig.build(lm_client=..., cache=...)
  materializes a live AnnotationProtocol. Plugged into
  BeadConfig.protocol so the same config drives Python and CLI.
- bead.protocol.items: scale_type_to_task_type (the single
  ScaleType -> TaskType mapping), family_to_item_template,
  realization_to_item, protocol_to_item_templates,
  realize_protocol_to_items.
- bead.deployment.protocol_trials: protocol_to_jspsych_trials packs
  AnnotationProtocol + contexts -> jsPsych trial dicts end-to-end.
- bead.data_collection.records: jatos_results_to_annotation_records
  bridges JATOS results to AnnotationRecord, the canonical input to
  reliability and inter-annotator-agreement metrics.
- bead.cli.protocol: bead protocol validate / realize / items
  drive ProtocolConfig from the shell. Registered in cli/main.py
  alongside the other stage subcommands.

Tests
-----
- 178 new and migrated tests across protocol, labels, config,
  data_collection, CLI; full suite passes (3201/3201, excluding the
  pre-existing pydantic-v1 spaCy and slopit and Lightning failures
  that exist on main).
- Pyright and ruff clean on every new file.

Dev deps
--------
- spacy / stanza added to the dev extra so tokenizer-dependent tests
  can run.
Pipeline-wide integration of the bead.protocol layer: shared label
parser, ModelOutputCache-backed LMRealization, single
task-type → model-class registry, declarative ProtocolConfig wired
into BeadConfig, item / deployment / JATOS-record / CLI bridges, and
the bead protocol subcommand. Every duplicate replaced; no shims.
@aaronstevenwhite aaronstevenwhite merged commit db46619 into main May 7, 2026
8 checks passed
@aaronstevenwhite aaronstevenwhite deleted the feat/protocol-layer branch May 7, 2026 13:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant