[DRAFT] FEAT: Tool Use + MCP#1811
Draft
ValbuenaVC wants to merge 7 commits into
Draft
Conversation
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…s for tool calling.
… into PromptTarget.send_prompt_async C4 lands the in-tree wiring for the generic tool-use loop introduced by C2/C3: - TargetCapabilities gains supports_tool_use: bool (default False) and CapabilityName.TOOL_USE for the corresponding enum value, matching the existing supports_X / "supports_X" naming convention used by every other capability. - TargetConfiguration grows tool_event_policy + tool_backend kwargs, both gettable/settable properties. The setter (and constructor) validate that a non-None tool_backend requires supports_tool_use=True; otherwise they raise ValueError immediately. ToolBackend / ToolEventPolicy imports are quoted + behind TYPE_CHECKING to keep pyrit.prompt_target.common from importing pyrit.tools eagerly. - PromptTarget.send_prompt_async picks up @tool_loop (below the existing @Final). The wrapper is a no-op when tool_event_policy is None, so every existing target keeps its current behavior. _tool_parser (property, default None) and _tool_schemas() (default []) are added on the base class as the two collaborators @tool_loop reads. - _permissive_configuration is updated to flip supports_tool_use=True alongside the other supports_X flags so the all-flags-on probe loop in test_discover_target_capabilities still sees every CapabilityName value as supported. tests/unit/tools/conftest.py drops the hand-decorated @tool_loop on _FakeToolTarget.send_prompt_async (which would now violate the base class's @Final) and instead wires policy + backend through TargetConfiguration. _tool_parser becomes a subclass property since the base class now defines one. Tests: - test_tool_event_policy.py adds U7 (capability flag wiring through the wrapper) plus dataclass field defaults and the TargetConfiguration validator. - test_prompt_target_tool_loop.py adds U1 / U2 (DB-end) / U8 / U9 / U11 exercised against a _ProductionShapedTarget that uses the real base-class _get_normalized_conversation_async (memory round-trip via patch_central_database). Plus default-_tool_parser / -_tool_schemas assertions. Validation: 8104 unit tests pass; pre-commit clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Today PyRIT's tool-calling story is fragmented and incomplete:
OpenAIChatTargetparsestool_callsintofunction_callpieces and stops — no execution, no loop.OpenAIResponseTargethand-rolls a complete agentic loop inside_send_prompt_to_target_async(pyrit/prompt_target/openai/openai_response_target.py:590-626), accepting acustom_functionsregistry of Python callables. It handles one tool call per turn.This PR introduces a single, target-agnostic tool-use primitive:
pyrit/tools/package with atool_loopdecorator, aToolCallParserProtocol (per-target detection), and aToolBackendABC with two concrete backends in v1 —LocalToolBackend(Python callables, preserves today'scustom_functionsbehavior) andMCPToolBackend(stdio MCP servers via the officialmcpSDK, composing oneMCPClientperMCPServerSpecthrough a sharedAsyncExitStack).TargetCapabilities.tool_useflag andToolEventPolicy(EXECUTE/RAISE/RETURN_RAW) onTargetConfiguration.OpenAIResponseTargetmigrated onto the decorator (behavior-preserving, except multiple tool calls in one turn are now dispatched all-at-once sequentially — the protocol-intended behavior).OpenAIChatTargetgains end-to-end tool calling.custom_functionskwarg onOpenAIResponseTargetdeprecated (warns,removed_in="0.16.0"); internally rewrapped as aLocalToolBackend.Future MCP transports (HTTP/SSE, Docker sandbox), additional sandbox providers, and streaming all plug in behind the existing
ToolBackend/MCPServerSpecinterfaces with no abstraction changes. TheMCPServerSpecunion ships with three variants in v1:LocalMCPServerSpec(the only one with a working transport) plus stub declarations ofRemoteMCPServerSpecandDockerMCPServerSpecthat raiseNotImplementedErrorinconnect_async. The follow-up sandbox PR's diff becomes "implement an already-declared variant" rather than "expand the union + add an implementation."Closes nothing existing; tracks future work in TODOs marked
# TODO(streaming-v2),# TODO(mcp-http-transport),# TODO(mcp-resources),# TODO(sandbox-provider).Compatibility
This PR is not breaking for the standard tool-calling path. A short list of source- and behavior-compat caveats reviewers should know about:
PromptTarget.send_prompt_asyncis@final(C5). External subclasses that override the public entrypoint (not just_send_prompt_to_target_async) will fail to import. No in-tree target overrides it today.OpenAIChatTargetfunction_callenvelope rename (C6). The Chat target'sfunction_callpiece switches to the canonical envelope (call_id/name/argumentsat the top level) shared withOpenAIResponseTarget. Callers that introspected the previous nested{"function": {...}}JSON shape will need to update. Today's Chat target only parses tool calls — it does not dispatch — so callers that forwarded pieces verbatim downstream are unaffected.OpenAIResponseTarget(custom_functions=...)(C7). The kwarg now emitsDeprecationWarning(removed_in="0.16.0")and is internally rewrapped as aLocalToolBackend. No runtime behavior change in 0.15.x.OpenAIResponseTarget._perform_async_with_toolsonly dispatched the last call per turn. This is strictly more dispatching, not less, so it cannot regress any working code; it matches the OpenAI protocol's actual intent._find_last_pending_tool_call,_execute_call_section, and_make_tool_pieceonOpenAIResponseTargetare removed; their logic moves intopyrit/tools/openai_response_helpers.py. Listed for changelog completeness — these were always private.Tests and Documentation
tests/unit/tools/directory covering the decorator, parsing,LocalToolBackend,MCPClient(real stdio subprocess against a deterministicFastMCPfixture), andMCPToolBackend(multi-server routing, name-collision detection,name_prefixdisambiguation,allowed_toolsfiltering, and concurrent-dispatch serialization).tests/unit/prompt_target/common/test_prompt_target_tool_loop.pyasserting decorator-order-of-execution against_FakeToolTargetand usingpatch_central_databaseto verify per-message insert ordering, per-role labeling (assistant,tool), and per-data-type labeling (function_call,function_call_output) in the actual DB schema.tests/unit/prompt_target/target/test_openai_response_target_function_chaining.pyand newtests/unit/prompt_target/target/test_openai_chat_target_tool_loop.pycovering both targets' parsers,_tool_schemas()outbound translation, deprecation warning oncustom_functions, and multi-call-per-turn sequential dispatch.tests/integration/tools/test_red_teaming_with_tools.pyrunning the realRedTeamingAttackagainst both targets with only the HTTP layer mocked. Tools served by the realecho_mcp_serversubprocess.tests/integration/scenarios/test_scenario_with_mcp_tools.py(C10) running a lightweight scenario end-to-end against a realecho_mcp_serversubprocess and verifying the Memory DB transcript forfunction_call/function_call_outputpieces in declaration order.JupyText: not applicable (no notebook changes).