Releases: google-gemini/genai-processors
Releases · google-gemini/genai-processors
GenAI Processors v2.0
Today we are releasing GenAI Processors v2.0. With the amount of features added, it deserves a major version increase.
We've overhauled function calling support, bringing client-side async function calling and MCP tool support. Agent code can now be even simpler thanks to the new ContentStream concept. Development and triage are now much easier thanks to the new tracing infrastructure and the fully-overhauled documentation microsite.
This release also marks our initial efforts to optimize for Antigravity, though we are only scratching the surface in this direction.
🔌 Async Function Calling & MCP
- Non-blocking tool execution: A state-of-the-art feature that allows the agent to execute tasks in the background without interrupting the conversation. Being a client-side feature, it is highly customizable. The implementation has been significantly improved since its initial release in 1.1.1.
- Async Generators: If defined as generators, async tools can stream their responses over multiple turns.
- MCP Session Support: Integrate Model Context Protocol (MCP) sessions as tools within processors. For real-time processors, MCP calls are executed asynchronously in the background.
✨ Core Processor & Stream Enhancements
- ContentStream Class: A big improvement to our data model: instead of plain
AsyncIterable[ProcessorPart], processors now work withContentStreamobjects, which provide extra syntax sugar. For example, you can get the text output of a model withawait model('Hello world!').text(). For multimodal output, there is.gather(), and for constrained decoding, we have.get_dataclass(MyDataclass). - Much easier to call:
Processor.callused to require its input to be anAsyncIterator. Now anyProcessorContentTypeswill do: processors can be invoked withstr,PIL.Image,ProcessorContent, or lists of these.
🔍 Tracing
- One-line Tracing: Enable pipeline tracing with a single line of code.
- Visual Debugging: Generates an HTML summary of the pipeline execution, including cancellations and exceptions.
- Multimodal support: Replay audio or view images directly within the trace, specifically designed for debugging real-time agents.
📚 Documentation & Examples
- New Docs Site: Added comprehensive documentation on GitHub Pages covering design principles and core concepts.
- New Examples:
- Widgets: An agent utilizing async function calling to enrich its output with custom UI widgets.
- Critic-reviser: Improves the model response by iteratively refining it.
- Documentation for the AI coding agents: Added specific guardrails, instructions, and extensive docstring style improvements to help AI coding agents correctly leverage the
genai_processorslibrary.
🛠️ New processors
- VideoExtract: Transforms a video into a sequence of audio and image Parts. Useful for emulating streaming during tests or for applying a rolling Window on a long input.
- Hugging Face Transformers support: Run agents on top of local transformers models.
GlobSourcedoesn't block the asyncio event loop anymore.
🌐 Websocket Server
- Live Server Module: Moved the generic logic that turns any processor into a websocket server from the Live Commentator example into a
live_server.pymodule. - AI Studio Integration: Simplifies building real-time agent demos with custom UIs when combined with the AI Studio applet.
Genai Processors 1.1.1
- introduces Function calling (sync/async and all automated) that can work with all GenAI processors, including realtime processors. See associated notebook.
- adds url fetcher in web.py and html content extraction in text.py to get content from the web.
- adds Hugging Face transformers to supported models
- supports authoring ADK agents with processors
GenAI Processors 1.1.0
- core.window.Window: apply a processor to a rolling window across the stream.
- Models output dataclass Parts when constrained decoding is used. This greatly simplifies writing pipelines that extract data on intermediate steps. Now the whole response will be in a single Part and accessible as
part.get_dataclass(MyData). - Added a text-based turn-by-turn chat example with multimodal support through downloading images or PDFs by their URLs.
- Add GenAILangChainProcessor to contrib.
GenAI Processors 1.0.5
- CachedPartProcessor - support caching for PartProcessors: if the same part is received again, result can be taken from the cache.
- GlobProcessor - stream files from local filesystem in-to processors.
- Contrib processors: OpenRouterModel - add compatibility with a variety of LLMs available through OpenRouter.
- Numerous bugfixes.
- Added links to processors from @mbeacom: mbeacom/genai-processors-pydantic and mbeacom/genai-processors-url-fetch
v1.0.3
GenAI Processors 1.0.2
Additions:
- processor sources as an easy way to add data sources (microphones, camera) to a processor chain.
- jinja template rendering of processor parts based on dataclasses.
- extra documentation.
GenAI Processors 1.0.1
Adds the Switch operation for processors, similar to a std switch statement.
Core processors additions:
- text_to_speech and speech_to_text processor using Google Cloud APIs
- realtime processor to create a live aka realtime processor from a turn-based LLM (audio in/out only)
- pdf processor to work with PDF files and extract tokens for LLMs
- drive processor to get documents from Google Drive (sheets, slides and docs)
- github processor to get code from github
- jinja processor to create prompts from classes
Examples:
- several CLIs to test processors from the CLI (including live and realtime processors)
GenAI Processors 1.0.0
Initial Release of the GenAI Processors library.