Skip to content

trieu/leo-cdp-framework

Repository files navigation

LEO CDP framework

LEO CDP – The Open Source AI-first Customer Data Platform

LEO CDP is an Open Source AI-first Customer Data Platform (CDP) framework that empowers organizations to build and operate their own fully customizable CDP infrastructure — with machine learning and big data at its core.

Designed for developers, data scientists, marketers, and enterprises, LEO CDP enables unified data collection, real-time customer analytics, audience segmentation, and personalized marketing — all while remaining self-hosted and privacy-friendly.

LEO CDP framework

🚀 Vision & Philosophy

  • The philosophy of Dataism → USPA → LEO CDP
  • Democratize AI-powered data platforms for digital transformation
  • Promote data sovereignty, on-premise intelligence, and open collaboration

🔥 LEO CDP - Key Features

1. Omnichannel Data Collection & Unification

Collect customer data from websites, mobile apps, CRM, POS, e-commerce platforms, customer service systems, social media, advertising platforms, IoT devices, and APIs. Unify all data into a single customer profile and source of truth.

2. Real-Time Customer 360 Intelligence

Build a comprehensive, real-time view of every customer by combining behavioral, transactional, demographic, and engagement data across all touchpoints.

3. AI-Based Segmentation & Customer Scoring

Leverage machine learning to automatically create intelligent audiences and predictive customer segments using:

  • RFM Analysis
  • Customer Lifetime Value (CLV)
  • Churn Prediction
  • Purchase Propensity
  • Lead Scoring
  • Dynamic Audience Generation

4. Behavioral Tracking & Journey Mapping

Capture customer interactions and behavioral events in real time. Visualize and analyze customer journeys across channels to understand engagement patterns, conversion paths, and drop-off points.

5. Predictive Analytics & Customer Insights

Apply machine learning and AI models to uncover hidden patterns, predict customer behavior, identify growth opportunities, and generate actionable business insights.

6. Real-Time Audience Activation

Activate customer segments instantly across marketing, sales, customer service, and advertising platforms. Trigger actions based on customer behavior, events, attributes, and predictive scores.

7. Event-Driven Data Processing & Orchestration

Support real-time and batch data pipelines using event-driven architecture. Integrate with Apache Airflow for data ingestion, transformation, workflow orchestration, and automation.

8. Omnichannel Personalization with Agentic AI & LLMs

Deliver personalized customer experiences across every channel using Agentic AI and Large Language Models (LLMs). Automatically generate content, recommend next-best actions, optimize campaigns, and orchestrate customer journeys in real time based on customer behavior, intent, and business goals.

Key capabilities include:

  • AI-generated personalized content
  • Next Best Action recommendations
  • Dynamic journey orchestration
  • Real-time offer personalization
  • Conversational AI assistants
  • Intelligent campaign optimization
  • Context-aware customer engagement
  • Autonomous AI-driven marketing workflows

9. Open Ecosystem & API-First Architecture

Integrate seamlessly with CRM, ERP, Marketing Automation, Data Warehouse, Business Intelligence, AI platforms, and third-party applications through APIs, webhooks, and modular services.

10. Enterprise Security, Governance & Scalability

Ensure enterprise-grade reliability with:

  • Consent Management
  • Data Privacy Controls
  • Role-Based Access Control (RBAC)
  • Audit Logs
  • Data Governance
  • On-Premise & Private Cloud Deployment
  • Multi-Tenant Architecture
  • Docker & Kubernetes Support
  • Prometheus & Grafana Monitoring
  • High Availability & Scalable Infrastructure

🌍 Why Open Source?

  • Break away from SaaS lock-in. Full customization and ownership of your CDP.
  • Ideal for agencies, startups, enterprises, and researchers building AI-powered marketing stacks.
  • Open source encourages transparency, innovation, and community-driven evolution.

📈 Roadmap 2025+

Feature Status
✅ Core CDP Platform (Profiles, Events, Segmentation) Complete
✅ CDP SDKs (JavaScript, Python) Complete
🔄 Identity Resolution with Graph + Vector Matching In Progress
🔄 AI Assistant (Chatbot for Audience Insights & Suggestions) In Progress
🔄 Agentic AI: Personalizing the Customer Experience In Progress
🔄 Embedding Model for Customer Vector Search (via Qdrant) In Progress
🆕 CDP Mobile SDKs (Android, iOS, React Native) Planned
🆕 Open Source Campaign Management UI Planned
🆕 Integration Marketplace for Martech Tools Planned
🆕 Webhook + Event Bus Support (Kafka / RabbitMQ / SQS) Planned
🆕 Federated Identity Graph using OpenID & OAuth Planned

Want to contribute? Join the community!


🧪 System Demo


📚 Documents


🛠️ Tech Stack

  • Backend: Java 11 (Amazon Corretto), Python 3.10 or Python 3.12
  • Database: ArangoDB 3.11 (Multi-model: Document + Graph + Search)
  • Monitoring: Prometheus 2 + Grafana 8
  • Data Pipeline: Apache Airflow
  • Analytics & ML: Jupyter Notebook / Google Colab
  • Messaging: Redis 8, OneSignal, Firebase
  • Deployment: Ubuntu 22 LTS, Docker, On-Prem / Cloud

☁️ Cloud Options

  • Google Cloud, AWS, VNG Cloud, Viettel Cloud or your own private infrastructure

🔧 Installation

See: Installation Guide


🧑‍💻 Author & License

Created and maintained by Trieu Nguyen (Thomas) — Founder of the LEO CDP Framework and advocate for open, data-driven innovation.

🌐 Connect: https://www.facebook.com/dataism.one

License

Released under the MIT License.

You are free to:

  • ✅ Use in personal and commercial projects
  • ✅ Modify and extend the source code
  • ✅ Build and distribute your own white-label solutions
  • ✅ Integrate into proprietary products

Attribution is appreciated and helps support the continued growth of the open-source community.

Built for organizations that want to own their customer data, AI capabilities, and digital future.

Contributors

Special thanks to all contributors who have helped improve the framework.

Contributors

💬 Community & Support

⭐ If this project helps your business or development team, consider starring the repository and sharing it with the community.


📜 Historical Proof of Innovation

About

LEO CDP is an open-source, AI-first Customer Data Platform for building customizable, self-hosted, privacy-friendly CDP infrastructure. It unifies data collection, enables real-time analytics, audience segmentation, and personalized marketing — powered by big data and machine learning.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors