Skip to content

Code-With-Samuel/RAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

15 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

RAG Patent Analysis System

A comprehensive Retrieval-Augmented Generation (RAG) system for intelligent patent analysis using multi-agent AI, semantic search, and privacy-preserving local LLMs. This system combines CrewAI agents with OpenSearch and Ollama to provide powerful patent trend analysis, forecasting, and discovery capabilities.

🎯 Project Overview

This project implements an intelligent patent analysis and search system that combines cutting-edge technologies:

  • Semantic Search: Advanced vector-based patent retrieval using embeddings for meaning-based search
  • CrewAI Multi-Agent System: Specialized AI agents working together for comprehensive patent trend analysis and forecasting
  • Hybrid Search: Combines keyword and semantic search for optimal retrieval accuracy
  • OpenSearch Backend: Distributed vector and full-text search engine with dashboard visualization
  • Local LLM Privacy: Uses Ollama for on-premise, privacy-preserving language model inference
  • Lithium Battery Focus: Specialized analysis for lithium battery technology patents

Key Features

  • πŸ” Multiple Search Strategies: Semantic, keyword, hybrid, and iterative search methods
  • πŸ€– AI-Powered Analysis: CrewAI agents for automated patent analysis and trend forecasting
  • πŸ“Š Citation Analysis: Extract and analyze patent citations for technology evolution tracking
  • 🎨 Interactive Dashboard: OpenSearch Dashboards for visualization and monitoring
  • πŸ” Privacy-First: All processing done locally without external API calls for data
  • πŸ“ˆ Scalable Architecture: Designed to handle large patent datasets efficiently

πŸ“ Project Structure

RAG/
β”œβ”€β”€ Core Application Files
β”‚   β”œβ”€β”€ agentic_rag.py            # Main CLI interface with interactive menu system
β”‚   β”œβ”€β”€ patent_crew.py            # CrewAI agent definitions and orchestration
β”‚   └── patent_search_tools.py    # Search implementations (semantic, keyword, hybrid, iterative)
β”‚
β”œβ”€β”€ Data Processing & Integration
β”‚   β”œβ”€β”€ ingestion.py              # Patent data loading, chunking, and indexing pipeline
β”‚   β”œβ”€β”€ embedding.py              # Vector embedding generation using Ollama
β”‚   β”œβ”€β”€ opensearch_client.py      # OpenSearch client, connection, and index management
β”‚   β”œβ”€β”€ information_collector.py  # Web data collection and processing utilities
β”‚   └── helper.py                 # General utility and helper functions
β”‚
β”œβ”€β”€ Configuration & Deployment
β”‚   β”œβ”€β”€ docker-compose.yml        # Docker Compose configuration for OpenSearch services
β”‚   β”œβ”€β”€ .env                      # Environment variables (API keys, model configs)
β”‚   └── requirements.txt          # Python dependencies
β”‚
β”œβ”€β”€ Development & Analysis
β”‚   β”œβ”€β”€ dev.ipynb                 # Jupyter notebook for experimentation and development
β”‚   └── patent_analysis_*.txt     # Generated analysis reports with timestamps
β”‚
β”œβ”€β”€ Data Storage
β”‚   β”œβ”€β”€ files/
β”‚   β”‚   β”œβ”€β”€ patents.json          # Primary patent dataset with metadata
β”‚   β”‚   └── patent_details.json   # Extended patent information (claims, descriptions)
β”‚   β”‚
β”‚   └── results/                  # Analysis outputs and extracted data
β”‚       β”œβ”€β”€ patent_data_*.json    # Indexed patent data by batch
β”‚       β”œβ”€β”€ citation_*.json       # Extracted citation networks
β”‚       └── [Analysis reports]
β”‚
└── Documentation
    β”œβ”€β”€ README.md                 # This file
    └── [Additional documentation]

πŸ“Š Data Files

  • patents.json: Main patent dataset containing core metadata, abstracts, and search parameters
  • patent_details.json: Extended patent information including detailed descriptions, claims, and technical classifications
  • results/: Output directory containing:
    • patent_data_*.json: Chunked and processed patent records ready for indexing
    • citation_*.json: Extracted patent citation relationships and networks
    • Analysis reports with timestamps for trend analysis and forecasting results

πŸš€ Setup & Installation

Prerequisites

  • Python 3.10+: Core language for the application
  • Docker & Docker Compose: For running OpenSearch and dashboards
  • Ollama: Local LLM inference engine with models pre-installed
  • Git: For repository management
  • pip/pip-tools: Python package management

Installation Steps

Step 1: Clone and Setup Environment

# Clone the repository
git clone <repository-url>
cd RAG

# Create virtual environment
python -m venv venv

# Activate virtual environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate

Step 2: Install Python Dependencies

# Install required packages
pip install -r requirements.txt

# Optional: Install Jupyter for development notebook
pip install jupyter jupyterlab

Step 3: Setup Environment Variables

Create a .env file in the project root:

SERP_API_KEY=your_serp_api_key_here
OLLAMA_HOST=http://localhost:11434
OPENSEARCH_HOST=localhost
OPENSEARCH_PORT=9200
OPENSEARCH_USER=admin
OPENSEARCH_PASSWORD=admin

Step 4: Start Infrastructure Services

# Start OpenSearch and OpenSearch Dashboards using Docker Compose
docker-compose up -d

# Verify services are running
docker-compose ps

Service Endpoints:

Step 5: Setup Ollama Models

In a new terminal, start Ollama and pull required models:

# Start Ollama service
ollama serve

# In another terminal, pull models
ollama pull llama2
ollama pull nomic-embed-text

# Verify models are available
curl http://localhost:11434/api/tags

Step 6: Ingest Patent Data

# Load patents into OpenSearch and generate embeddings
python ingestion.py

What this does:

  • Loads patent data from files/patents.json and files/patent_details.json
  • Generates vector embeddings using Ollama's embedding model
  • Creates/updates OpenSearch indexes with proper vector mappings
  • Stores metadata alongside embeddings for retrieval
  • Generates processed data files in results/ directory

Expected output:

Connected to OpenSearch!
Loading patent data...
Processing 5000 patents...
Creating index 'patents' with vector mappings...
Indexing complete: 5000 documents indexed

πŸ’» Usage Guide

Interactive Patent Analysis CLI

The main entry point is agentic_rag.py, which provides an interactive menu-driven interface:

python agentic_rag.py

Main Menu Options:

1. Run complete patent trend analysis and forecasting
   - Analyzes patent trends in the specified research area
   - Generates insights using CrewAI agents
   - Produces forecasting reports
   - Outputs results to patent_analysis_[timestamp].txt

2. Search for specific patents
   - Query by keywords or concepts
   - Returns top matching patents
   - Shows title, abstract, publication date, and patent ID
   - Displays relevance scores

3. Iterative patent exploration
   - Step-by-step patent discovery
   - Refine searches based on results
   - Explore related patents and citations
   - Build comprehensive understanding of topic

4. View system status
   - Check OpenSearch connectivity
   - Verify Ollama availability and models
   - Display system information
   - Diagnose connectivity issues

5. Exit
   - Gracefully shutdown application

Search Operations

The patent_search_tools.py module provides four search strategies:

1. Semantic Search

from patent_search_tools import semantic_search

results = semantic_search(
    query_text="solid-state battery technology advancements",
    top_k=10
)
# Returns: Semantically similar patents based on meaning
  • Uses vector embeddings for meaning-based retrieval
  • Best for: Conceptual searches, technology trend analysis
  • Performance: Slower but more accurate for complex queries

2. Keyword Search

from patent_search_tools import keyword_search

results = keyword_search(
    query_text="lithium ion battery cathode",
    top_k=10
)
# Returns: Patents matching keywords in title/abstract
  • Traditional inverted-index based search
  • Best for: Specific terms, known technologies
  • Performance: Fast, real-time results

3. Hybrid Search

from patent_search_tools import hybrid_search

results = hybrid_search(
    query_text="battery thermal management",
    top_k=10
)
# Returns: Combination of semantic and keyword results
  • Merges results from both search methods
  • Best for: Balanced retrieval, general exploration
  • Performance: Combines benefits of both methods

4. Iterative Search

from patent_search_tools import iterative_search

results = iterative_search(
    query_text="electrode material innovation",
    top_k=10,
    iterations=3
)
# Returns: Refined results through multiple iterations
  • Progressively refines search results
  • Best for: Deep topic exploration, citation networks
  • Performance: Slowest but most comprehensive

Development Notebook

For experimentation and interactive analysis:

jupyter notebook dev.ipynb

The notebook provides:

  • Pre-configured search examples
  • Data exploration utilities
  • Custom analysis scripts
  • Visualization helpers

Advanced Usage: CrewAI Agent System

Access the underlying agent system directly:

from patent_crew import run_patent_analysis

result = run_patent_analysis(
    research_area="Solid-State Batteries",
    model_name="llama2"
)
print(result)

CrewAI Agents Involved:

  • Research Analyst: Gathers patent information
  • Trend Analyst: Identifies emerging technologies
  • Forecaster: Predicts future patent directions
  • Report Writer: Synthesizes findings into reports

πŸ—οΈ System Architecture

Component Overview

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    User Interface Layer                      β”‚
β”‚                    (agentic_rag.py CLI)                      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                  Application Logic Layer                     β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  β€’ Patent Crew (Agents)    β€’ Search Tools     β€’ Helpers      β”‚
β”‚  β€’ Data Processing         β€’ Information Collection          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                 Integration & Embedding Layer               β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  β€’ OpenSearch Client       β€’ Embedding Generation            β”‚
β”‚  β€’ Index Management        β€’ Vector Operations               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                  External Services Layer                     β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  β€’ OpenSearch (Port 9200)         β€’ Ollama (Port 11434)      β”‚
β”‚  β€’ OpenSearch Dashboards (5601)   β€’ Docker Infrastructure    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Core Components

1. OpenSearch Backend

  • Purpose: Distributed search and analytics engine
  • Features:
    • Vector similarity search (KNN-based)
    • Full-text keyword search
    • JSON document storage with metadata
    • Index management and mapping
  • Address: http://localhost:9200
  • Dashboard: http://localhost:5601 (OpenSearch Dashboards)

2. Ollama Local LLM

  • Purpose: Privacy-preserving language model inference
  • Features:
    • Local model execution (no external API calls)
    • Embedding generation
    • Text generation and analysis
  • Address: http://localhost:11434
  • Models: llama2 (reasoning), nomic-embed-text (embeddings)

3. CrewAI Multi-Agent System

  • Purpose: Orchestrate specialized AI agents for complex tasks
  • Agent Types:
    • Research Analyst: Queries and analyzes patent data
    • Trend Analyst: Identifies technology trends
    • Forecaster: Makes predictions about future developments
    • Report Writer: Synthesizes findings into comprehensive reports

4. Search Pipeline

User Query
    ↓
[Query Preprocessing & Normalization]
    ↓
[Search Strategy Selection]
    β”œβ†’ Semantic Path (Vector Embedding)
    β”œβ†’ Keyword Path (Inverted Index)
    β””β†’ Hybrid Path (Combined Ranking)
    ↓
[OpenSearch Query Execution]
    ↓
[Result Ranking & Deduplication]
    ↓
[Metadata Enrichment]
    ↓
[Results Returned to User/Agent]

Data Flow

Ingestion Pipeline

JSON Patent Files
    ↓
[Load & Parse]
    ↓
[Data Validation & Cleaning]
    ↓
[Text Chunking & Preprocessing]
    ↓
[Embedding Generation (Ollama)]
    ↓
[Index Creation/Update (OpenSearch)]
    ↓
[Results Stored in results/ directory]

Analysis Pipeline

User Request
    ↓
[Patent Search (Multiple Strategies)]
    ↓
[Results Aggregation]
    ↓
[CrewAI Agent Processing]
    ↓
[LLM-based Analysis]
    ↓
[Report Generation]
    ↓
[File Output + Console Display]

βš™οΈ Configuration

Environment Variables (.env file)

# SERP API Configuration (for web search capabilities)
SERP_API_KEY=your_api_key_here

# Ollama Configuration
OLLAMA_HOST=http://localhost:11434
OLLAMA_MODEL=llama2
EMBEDDING_MODEL=nomic-embed-text

# OpenSearch Configuration
OPENSEARCH_HOST=localhost
OPENSEARCH_PORT=9200
OPENSEARCH_USER=admin
OPENSEARCH_PASSWORD=admin

# Application Settings
PATENT_RESEARCH_AREA=Lithium Battery
DATA_DIR=./files
RESULTS_DIR=./results

Docker Services Configuration

OpenSearch Settings (docker-compose.yml):

  • Single-node cluster (development mode)
  • Security disabled for local development
  • Persistent data volume: opensearch-data
  • Memory: 2GB (adjustable)

OpenSearch Dashboards:

  • Auto-connects to OpenSearch
  • Security plugin disabled
  • Port: 5601

Customization:

# Adjust memory limits
environment:
  - "ES_JAVA_OPTS=-Xms1g -Xmx2g"

# Enable security (production)
- DISABLE_SECURITY_PLUGIN=false

πŸ“Š Data Management

Patent Data Format

patents.json structure:

{
  "patent_id": "US123456789",
  "title": "Advanced Lithium Battery System",
  "abstract": "A novel approach to lithium battery technology...",
  "publication_date": "2023-06-15",
  "assignee": "Company Name",
  "claims": ["Claim 1", "Claim 2"],
  "references": ["Citation 1", "Citation 2"]
}

Processing Pipeline

  1. Loading: JSON files parsed and validated
  2. Normalization: Standardized field formats
  3. Chunking: Large documents split into processable chunks
  4. Embedding: Vector representations generated via Ollama
  5. Indexing: Documents stored in OpenSearch with embeddings
  6. Export: Processed data saved to results/ directory

Managing Indexes

from opensearch_client import get_opensearch_client, create_index_if_not_exists

# Connect to OpenSearch
client = get_opensearch_client("localhost", 9200)

# Create or recreate index with proper vector mappings
create_index_if_not_exists(client, "patents")

# Check index status
status = client.cat.indices(format="json")

# Delete index
client.indices.delete(index="patents")

πŸ”§ Troubleshooting

Issue: Connection Refused - OpenSearch

Symptoms: Connection refused at 9200

Solutions:

# Check if Docker containers are running
docker-compose ps

# Start services
docker-compose up -d

# View logs
docker-compose logs opensearch

# Verify connectivity
curl http://localhost:9200/_cluster/health

Issue: Ollama Not Available

Symptoms: Error connecting to Ollama: Connection refused

Solutions:

# Verify Ollama is running
curl http://localhost:11434/api/tags

# Start Ollama service
ollama serve

# Check available models
ollama list

# Pull required models if missing
ollama pull llama2
ollama pull nomic-embed-text

Issue: Index Not Found

Symptoms: index_not_found_exception

Solutions:

# Re-run data ingestion
python ingestion.py

# Verify index exists
curl http://localhost:9200/_cat/indices

# Check index mapping
curl http://localhost:9200/patents/_mapping

Issue: Out of Memory

Symptoms: Container crashes, Python process killed

Solutions:

# Increase Docker memory allocation
# Edit docker-compose.yml:
  mem_limit: 4g

# Or set system limits:
# Windows: Docker Desktop β†’ Settings β†’ Resources
# Linux: docker update -m 4g opensearch

# Restart services
docker-compose restart

Issue: Slow Search Performance

Causes & Solutions:

  • Large dataset: Optimize embeddings, increase batch size
  • Undersized cluster: Add more nodes (production)
  • Poor network: Use persistent connections, connection pooling
  • Index not refreshed: Force refresh: curl -X POST http://localhost:9200/patents/_refresh

πŸ“ˆ Performance Optimization

Ingestion Optimization

  • Batch processing of embeddings (default: 100 documents)
  • Parallel index updates with multiple threads
  • Incremental indexing for new data

Search Optimization

  • Query result caching
  • Vector similarity optimization using HNSW
  • Index refresh scheduling

Configuration Tuning

# Adjust batch sizes in ingestion.py
BATCH_SIZE = 200

# Modify search parameters in patent_search_tools.py
TOP_K = 50  # Return more results
VECTOR_K = 20  # KNN neighbors

πŸ” Security Considerations

Development vs. Production

Development (Current Setup):

  • Security plugins disabled
  • Admin credentials: admin/admin
  • Single-node cluster
  • No SSL/TLS encryption

Production Recommendations:

  • Enable OpenSearch security plugins
  • Use strong passwords
  • Configure SSL/TLS
  • Multi-node cluster setup
  • Authentication & authorization
  • Data encryption at rest
  • Network isolation

Data Privacy

  • All processing occurs locally
  • No data sent to external APIs (except optional SERP API)
  • Embeddings generated on-premise
  • Models run locally on Ollama

πŸ“š API Reference

Patent Search Functions

# Semantic Search
semantic_search(query_text: str, top_k: int = 20) β†’ List[Dict]

# Keyword Search
keyword_search(query_text: str, top_k: int = 20) β†’ List[Dict]

# Hybrid Search
hybrid_search(query_text: str, top_k: int = 20) β†’ List[Dict]

# Iterative Search
iterative_search(query_text: str, top_k: int = 20, iterations: int = 3) β†’ List[Dict]

OpenSearch Client Functions

# Get connected client
get_opensearch_client(host: str, port: int) β†’ OpenSearch

# Create index with vector mappings
create_index_if_not_exists(client: OpenSearch, index_name: str) β†’ None

# Index document
index_document(client: OpenSearch, index: str, doc_id: str, body: Dict) β†’ None

# Search
search(client: OpenSearch, index: str, query: Dict) β†’ Dict

Embedding Functions

# Generate embedding for text
get_embedding(text: str, model: str = "nomic-embed-text") β†’ List[float]

# Batch generate embeddings
batch_embeddings(texts: List[str]) β†’ List[List[float]]

πŸš€ Advanced Topics

Extending the CrewAI Agents

Create custom agents in patent_crew.py:

from crewai import Agent, Task

class MyCustomAgent(Agent):
    role = "Custom Analyst"
    goal = "Analyze specific patent aspects"
    backstory = "Expert in specific domain..."

# Add to crew
crew = Crew(
    agents=[researcher, trend_analyst, custom_agent],
    tasks=[...],
    process=Process.hierarchical
)

Custom Search Strategies

Implement in patent_search_tools.py:

def custom_search(query_text, top_k=20, **kwargs):
    # Your custom logic here
    client = get_opensearch_client("localhost", 9200)
    # Build and execute custom query
    return results

Integration with External Systems

# Example: Connect to external patent database
from information_collector import fetch_external_patents

patents = fetch_external_patents(query="battery technology")
# Process and index

πŸ“‹ Requirements & Dependencies

Core Dependencies

  • opensearch-py: OpenSearch Python client
  • crewai: Multi-agent orchestration framework
  • langchain: LLM chain building
  • langchain-ollama: Ollama integration
  • python-dotenv: Environment variable management
  • requests: HTTP client
  • tiktoken: Token counting

Development Dependencies

  • jupyter/jupyterlab: Interactive notebook
  • pytest: Testing framework
  • black: Code formatter
  • pylint: Code linter

System Requirements

  • RAM: 8GB minimum (16GB+ recommended)
  • Storage: 20GB+ for Docker images and data
  • CPU: Multi-core recommended for parallel processing
  • Network: Docker networking for container communication

License

This project is provided as-is for research and development purposes. Please include proper attribution when using this code.

Contributing

Contributions are welcome! To contribute:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Development Guidelines

  • Write clean, documented code
  • Follow PEP 8 style guidelines
  • Add tests for new features
  • Update documentation
  • Use meaningful commit messages

Reporting Issues

When reporting bugs, please include:

  • System information (OS, Python version)
  • Error messages and traceback
  • Steps to reproduce
  • Expected vs. actual behavior

Support & Contact

For questions, issues, or suggestions:

  • Open an issue on GitHub
  • Check existing issues for solutions
  • Review troubleshooting section above
  • Check OpenSearch Dashboards for status

Acknowledgments

  • OpenSearch Project: Powerful search and analytics engine
  • CrewAI: Multi-agent orchestration framework
  • Ollama: Local LLM capabilities
  • LangChain: LLM integration framework
  • Patent Data Sources: USPTO and related patent databases

Additional Resources

Official Documentation

Tutorials & Guides

  • Vector Search Fundamentals
  • Patent Data Analysis Best Practices
  • Multi-Agent System Design Patterns
  • Local LLM Deployment Guide

Related Projects

  • Patent analysis systems
  • RAG implementations
  • CrewAI example projects
  • OpenSearch use cases

Roadmap

Planned Features

  • Advanced citation network analysis
  • Technology trend visualization dashboards
  • Batch processing optimization
  • Multi-language patent support
  • Real-time patent feed integration
  • Collaborative analysis features
  • Export to multiple formats (PDF, Excel, etc.)
  • REST API endpoint

Performance Improvements

  • Distributed indexing
  • Caching layer optimization
  • GPU acceleration for embeddings
  • Query optimization
  • Index compression

Integration Opportunities

  • Elasticsearch compatibility
  • Cloud deployment options
  • API gateway integration
  • Workflow automation tools

Last Updated: January 2026 Version: 1.0.0 Status: Active Development

About

A comprehensive Retrieval-Augmented Generation (RAG) system for intelligent patent analysis using multi-agent AI, semantic search, and privacy-preserving local LLMs. This system combines CrewAI agents with OpenSearch and Ollama to provide powerful patent trend analysis, forecasting, and discovery capabilities.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors