Skip to content

Google Drive connector#3983

Open
ambakick wants to merge 3 commits intoMintplex-Labs:masterfrom
ambakick:drive_connector
Open

Google Drive connector#3983
ambakick wants to merge 3 commits intoMintplex-Labs:masterfrom
ambakick:drive_connector

Conversation

@ambakick
Copy link
Copy Markdown

Pull Request Type

  • ✨ feat
  • 🐛 fix
  • ♻️ refactor
  • 💄 style
  • 🔨 chore
  • 📝 docs

Relevant Issues

resolves #xxx

What is in this change?

This PR implements comprehensive Google Drive integration for automatic document synchronization with AnythingLLM:

Core Features:

  • Google Drive Connector: Full integration allowing users to connect Google Drive folders via service account authentication
  • Automatic Sync: Configurable sync frequencies (hourly, daily, weekly) with background job processing
  • Incremental Updates: Uses Google Drive change tokens for efficient incremental sync, only processing modified files
  • Document Archival: 30-90 day retention system for deleted documents with automatic cleanup
  • PDF Processing: Proper PDF text extraction using pdf-parse library for Google Drive documents

Database Changes:

  • Added Google Drive support to document_sync_queues table (syncFrequency, driveChangeToken, metadata columns)
  • Enhanced workspace_documents with archival support (archived, archivedAt columns)
  • Created new document_archives table for retention management
  • Added database migration: 20250101000000_add_googledrive_support

New Components:

  • collector/utils/extensions/GoogleDrive/ - Complete Google Drive integration module
  • collector/utils/extensions/GoogleDrive/GoogleDriveLoader/ - Document loading and processing
  • Enhanced resync system for Google Drive documents
  • Background worker integration for automatic sync jobs

Technical Implementation:

  • Service account authentication with Google Drive API
  • Metadata normalization to prevent LanceDB schema conflicts
  • Direct server document storage (bypasses hotdir for immediate availability)
  • Comprehensive error handling and retry logic
  • Security: Encrypted storage of service account credentials

User Experience:

  • Documents appear immediately in AnythingLLM after sync
  • Seamless integration with existing workspace document management
  • Real-time status updates and sync monitoring

Additional Information

Setup Requirements:

  • Added googleapis dependency to collector package
  • Created comprehensive setup documentation in GOOGLE_DRIVE_SETUP.md
  • Database migration required for new Google Drive functionality

Security Considerations:

  • Service account credentials encrypted using AnythingLLM's encryption worker
  • Minimal required Google Drive API permissions
  • No user authentication tokens stored

Performance Optimizations:

  • Incremental sync prevents unnecessary re-processing of unchanged files
  • Background job system prevents UI blocking during large folder sync
  • Efficient PDF text extraction with proper error handling

Known Issues Resolved:

  • Fixed LanceDB schema conflicts by normalizing document metadata structure

Testing Notes:

  • Tested with various file types (PDF, text, documents)
  • Verified incremental sync behavior with file modifications
  • Confirmed proper error handling for network issues and invalid credentials

Developer Validations

  • I ran yarn lint from the root of the repo & committed changes
  • Relevant documentation has been updated
  • I have tested my code functionality
  • Docker build succeeds locally

@ambakick ambakick changed the title Drive connector Google Drive connector Jun 11, 2025
@timothycarambat timothycarambat added the PR:needs review Needs review by core team label Jun 27, 2025
@timothycarambat timothycarambat added the Integration Request Request for support of a new LLM, Embedder, or Vector database label Jul 30, 2025
@rubengonlab
Copy link
Copy Markdown

Hello! This change would be incredibly helpful to have integrated. Is this PR still active? Please let me know if you need any help updating it against the main branch or testing it, I'd be glad to help out.

@ambakick
Copy link
Copy Markdown
Author

ambakick commented Mar 6, 2026

I’d be happy to work on this if the maintainers confirm that this is still an open issue and aligned with the project’s current direction.

@bevman
Copy link
Copy Markdown

bevman commented Mar 30, 2026

this is brilliant, great work.
would be a great use case.
how do we get this pushed along... i'm just an end user/supporter

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Integration Request Request for support of a new LLM, Embedder, or Vector database PR:needs review Needs review by core team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants