Summary
The filtering stage's LLM filter prompt (bioscancast/filtering/llm_filter.py) does not contain the word "json" anywhere in the message text. When using OpenAI's response_format: {"type": "json_object"} mode, the API requires the word "json" to appear in the messages, and returns a 400 error without it:
openai.BadRequestError: 'messages' must contain the word 'json' in some form,
to use 'response_format' of type 'json_object'.
This means any real OpenAI call through the LLM filter has always been broken. It was not caught earlier because:
- Offline tests use
FakeLLMClient which doesn't enforce this constraint.
- The filtering pipeline defaults to
llm_client=None (fail-closed mode), so the LLM filter path was never exercised in integration tests.
- The search stage's
OpenAIClient works fine because its prompts include "Return JSON: ..." in the text.
How to reproduce
from bioscancast.llm.client import OpenAIClient
from bioscancast.filtering.llm_filter import build_filter_prompt
from bioscancast.filtering.models import ForecastQuestion
from datetime import datetime, timezone
question = ForecastQuestion(
id="test", text="Will H5N1 spread?",
created_at=datetime.now(timezone.utc),
)
prompt = build_filter_prompt(question, [{"result_id": "r1", "url": "...", "title": "Test"}])
client = OpenAIClient()
client.generate_json(prompt) # Raises BadRequestError
Fix
Add "Return your response as JSON matching the output_schema below." to the task description in build_filter_prompt() in bioscancast/filtering/llm_filter.py (line 22).
Already fixed on temp/integration branch — needs to be applied to main.
Files
bioscancast/filtering/llm_filter.py — build_filter_prompt(), line 22
Summary
The filtering stage's LLM filter prompt (
bioscancast/filtering/llm_filter.py) does not contain the word "json" anywhere in the message text. When using OpenAI'sresponse_format: {"type": "json_object"}mode, the API requires the word "json" to appear in the messages, and returns a 400 error without it:This means any real OpenAI call through the LLM filter has always been broken. It was not caught earlier because:
FakeLLMClientwhich doesn't enforce this constraint.llm_client=None(fail-closed mode), so the LLM filter path was never exercised in integration tests.OpenAIClientworks fine because its prompts include "Return JSON: ..." in the text.How to reproduce
Fix
Add "Return your response as JSON matching the output_schema below." to the task description in
build_filter_prompt()inbioscancast/filtering/llm_filter.py(line 22).Already fixed on
temp/integrationbranch — needs to be applied tomain.Files
bioscancast/filtering/llm_filter.py—build_filter_prompt(), line 22