🧠 Conversational Insights

AI-powered talking point generation for client advisors — built on Multi-Agent Systems (MAS), Agentic RAG, and intelligent information triaging.

Overview

Conversational Insights is a LangGraph / LangChain-powered backend that helps private banking client advisors prepare for client meetings by automatically surfacing relevant talking points from past interactions.

The system ingests emails and call transcripts, stores them in a vector database, and uses an agentic retrieval pipeline to answer advisor queries grounded in historical context.

Key Features

Feature	Description
📥 ETL Pipeline	Ingests email PDFs, extracts structured metadata via LLM, and stores chunked vectors in ChromaDB
🤖 Agentic RAG	A LangGraph `ReAct` agent that retrieves relevant context before answering — never hallucinates without evidence
🏛️ Multi-Agent System (MAS)	Modular agent architecture designed for extension with specialised sub-agents (e.g. triage, summarisation, compliance)
🗂️ Information Triaging	Retrieval is semantically ranked so the most relevant past interactions are surfaced first
🔒 Prompt Injection Safe	Retrieved content is treated as data only; embedded instructions are ignored

Architecture

┌─────────────────────────────────────────────┐
│                  main.py                     │
│           (advisor query entrypoint)         │
└────────────────────┬────────────────────────┘
                     │
          ┌──────────▼──────────┐
          │   Retrieval Agent   │  ← LangGraph ReAct Agent
          │  (retrieval_agent)  │
          └──────────┬──────────┘
                     │  tool call
          ┌──────────▼──────────┐
          │  retrieve_context   │  ← similarity search (k=2)
          └──────────┬──────────┘
                     │
          ┌──────────▼──────────┐
          │      ChromaDB       │  ← populated by ETL pipeline
          │   (email vectors)   │
          └─────────────────────┘

ETL Pipeline (run separately):
  data/emails/*.pdf  →  PyPDFLoader  →  LLM metadata extraction
  →  RecursiveCharacterTextSplitter  →  OpenAI Embeddings  →  ChromaDB

Project Structure

conversational-insights/
├── main.py                        # Query entrypoint
├── script/
│   └── run_etl_pipeline.py        # Run email ingestion
├── src/
│   ├── config.py                  # Central config (models, paths, keys)
│   ├── agents/
│   │   └── retrieval_agent.py     # LangGraph ReAct retrieval agent
│   ├── etl/
│   │   └── etl_pipeline.py        # Email PDF → ChromaDB pipeline
│   ├── models/
│   │   └── models.py              # Pydantic schemas (EmailMetadata)
│   ├── prompts/
│   │   └── retrieve_agent_prompt.py
│   ├── tools/
│   │   └── retrieve_content.py    # LangChain retrieval tool
│   └── utils/
│       └── vector_store.py        # Shared ChromaDB instance
└── data/
    ├── emails/                    # Raw email PDFs (input)
    ├── transcripts/               # Call transcripts
    └── vector/                    # ChromaDB persist directory

Quickstart

1. Prerequisites

Python ≥ 3.14
uv package manager
OpenAI API key

2. Install dependencies

uv sync

3. Configure environment

cp .env.example .env
# Fill in your OPENAI_API_KEY

4. Run the ETL pipeline

Place your email PDFs in data/emails/, then:

uv run python3 ./script/run_etl_pipeline.py

This will extract metadata, chunk each email, embed it, and persist to data/vector/.

5. Query the agent

uv run python3 main.py

Edit the query variable in main.py to ask any question about client interactions.

Configuration

All tunable constants live in src/config.py:

Variable	Default	Description
`CHAT_MODEL`	`gpt-4o-mini`	LLM used for metadata extraction and the agent
`EMBEDDING_MODEL`	`text-embedding-3-small`	OpenAI embedding model
`CHROMA_COLLECTION_NAME`	`email_collection`	ChromaDB collection
`CHROMA_PERSIST_DIRECTORY`	`data/vector/`	Where vectors are persisted
`CHUNK_SIZE`	`512`	Token chunk size for text splitting
`CHUNK_OVERLAP`	`100`	Overlap between chunks

🚧 Work In Progress

Transcript & Message Analysis

Call transcripts are already present in data/transcripts/ but are not yet ingested or searchable. Planned work:

Transcript ETL — a parallel ingestion pipeline for .txt call transcripts (alongside the existing email PDF pipeline), with its own metadata schema (caller, client, call type, date)
Unified retrieval — a single vector collection (or separate namespaced collections) covering both emails and transcripts, so the agent can reason across all interaction types
Interaction summarisation — an LLM-powered summarisation step that distills long transcripts into key points before embedding, improving retrieval quality

Multi-Agent System (MAS) for Query Handling

The current system is a single retrieval agent. The planned MAS will introduce a supervisor + specialist architecture:

User Query
    │
    ▼
┌─────────────────────┐
│   Supervisor Agent  │  ← routes query to the right specialist(s)
└──────┬──────┬───────┘
       │      │
  ┌────▼──┐ ┌─▼──────────────┐
  │Triage │ │  Retrieval      │
  │Agent  │ │  Agent(s)       │  ← email / transcript specialists
  └───────┘ └─────────────────┘
       │
  ┌────▼────────────────┐
  │ Synthesis Agent     │  ← aggregates results → talking points
  └─────────────────────┘

Triage Agent — classifies the query type (investment discussion, trade instruction, payment request, general enquiry) and decides which specialists to invoke
Specialist Retrieval Agents — domain-scoped agents for emails and transcripts respectively
Synthesis Agent — aggregates retrieved context and generates structured talking points for the advisor

Tech Stack

LangChain — tool definitions, prompt construction, LLM wrappers
LangGraph — stateful agent graph (ReAct loop)
ChromaDB — local vector store
OpenAI — gpt-4o-mini + text-embedding-3-small
Pydantic — structured LLM output schemas
uv — dependency and environment management

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
script		script
src		src
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 Conversational Insights

Overview

Key Features

Architecture

Project Structure

Quickstart

1. Prerequisites

2. Install dependencies

3. Configure environment

4. Run the ETL pipeline

5. Query the agent

Configuration

🚧 Work In Progress

Transcript & Message Analysis

Multi-Agent System (MAS) for Query Handling

Tech Stack

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🧠 Conversational Insights

Overview

Key Features

Architecture

Project Structure

Quickstart

1. Prerequisites

2. Install dependencies

3. Configure environment

4. Run the ETL pipeline

5. Query the agent

Configuration

🚧 Work In Progress

Transcript & Message Analysis

Multi-Agent System (MAS) for Query Handling

Tech Stack

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages