Skip to content

Latest commit

 

History

History
122 lines (65 loc) · 31 KB

File metadata and controls

122 lines (65 loc) · 31 KB

Memory Server – Engineering Learnings (Agent Notes)

These notes capture repeatable learnings from recent fixes and improvements in this repository (memory server). Each item includes brief context so we can reuse the approach confidently next time.

  • Prefer Pydantic models over raw dicts for cross-layer contracts: When routing complex data between FastAPI routes, Temporal workflows, and activities, use Pydantic types found under models/ (e.g., MemoryMetadata). In this fix we replaced Dict[str, Any] metadata with MemoryMetadata in the workflow, route, and activities to ensure validation, schema consistency, and safer refactors.

  • Unify provider selection with an enum: Define a single enum in models/shared_types.py (e.g., PreferredProvider) and use it end-to-end (route form param, workflow arg, activities). This avoids string drift ("gemini" vs "Gemini" etc.), simplifies validation, and keeps provider_manager mappings consistent.

  • Use request parameters for runtime choices, not environment flags: Feature toggles that affect a single request (e.g., hierarchical_enabled) should come from the API request, not .env. In this fix we replaced HIERARCHICAL_FLAG with a boolean form param to make behavior explicit, testable, and tenant/user specific.

  • Return strongly-typed batch payloads from activities: Activities that generate memory items should return a structure aligned with our batch ingestion (BatchMemoryRequest). We now return a batch_request alongside legacy memory_requests for compatibility, making it easier to pipe results directly into the batch processor.

  • Standardize environment access and loading: Load .env once (python-dotenv) and access variables via from os import environ as env. This avoids NameError: os is not defined pitfalls and keeps configuration lookups (env.get(...)) consistent across modules.

  • Budget tokens proactively in LLM calls: Add utility methods at the LLM client layer (e.g., estimate_message_tokens, trim_messages_to_token_budget) and call them where prompts can get large (classification, reranking). This reduces transient length-limit errors and stabilizes production behavior.

  • Normalize model-specific chat parameters centrally: For o-series and gpt‑5 models, normalize max_tokensmax_completion_tokens and strip unsupported params in a single helper. Keeping this in the chat client avoids per-call conditionals and prevents subtle API errors.

  • Prefer typed fallbacks for LLM parsing: When structured parsing fails (e.g., typed parse), fall back to JSON-object responses with conservative token caps. This preserves resiliency without sacrificing the downstream schema.

  • Apply multi-tenant scoping with typed metadata: Run metadata through apply_multi_tenant_scoping_to_metadata(MemoryMetadata, auth_context) in routes before handing off to workflows/activities. Typed metadata ensures list normalization and safe serialization (e.g., topics, ACL lists, createdAt handling).

  • Centralize provider adapters and keep names lowercase: provider_manager and provider adapters expect normalized lowercase names ("tensorlake", "reducto", "gemini"). Using the enum ensures consistent lowercasing and predictable fallbacks across providers.

  • Use Pydantic PostParseServer and objectId for Parse Posts (no mocks): When creating/fetching Posts, return and pass strongly-typed PostParseServer instead of raw dicts and prefer objectId over ad-hoc keys like post_id. For tests, avoid monkeypatching HTTP and use a real Parse Server call guarded by env vars; require PARSE_SERVER_URL, PARSE_APPLICATION_ID, PARSE_MASTER_KEY, and a valid WORKSPACE_ID, and clean up created Posts after assertions.

  • Fetch large provider results from Parse Files in activities: When extract_structured_content_from_provider receives a post_id, it should fetch the Post from Parse and check if content.provider_result_file exists (Parse File pointer). Download the file URL to get the full provider JSON, as large results (Reducto, etc.) are stored as files to avoid Parse body-parser limits. Fall back to inline content.provider_result for small/legacy payloads.

  • Decompress gzipped Parse Files when downloading provider results: Parse Server stores large provider results (uploaded via create_post_with_provider_json) as gzipped files to save space. When downloading these files in activities, always attempt gzip.decompress() on the raw bytes before JSON parsing. Fall back to direct parsing if decompression fails (for backward compatibility with non-gzipped uploads). Without decompression, JSON parsing will fail with cryptic errors and the activity will return empty results.

  • Place user_id in MemoryMetadata, not AddMemoryRequest constructor: AddMemoryRequest does not accept external_user_id as a constructor parameter. Instead, set external_user_id or user_id fields within the MemoryMetadata object passed to AddMemoryRequest. This applies to all memory transformation code (Reducto transformer, hierarchical processors, etc.). Passing external_user_id directly to AddMemoryRequest causes Pydantic validation errors: "Extra inputs are not permitted".

  • Preserve user_id vs external_user_id distinction in transformers: Document processing transformers (Reducto, hierarchical, etc.) should accept MemoryMetadata objects directly rather than extracting user IDs as string parameters. This preserves the distinction between user_id (internal Papr developer ID) and external_user_id (end user ID). When creating new MemoryMetadata objects in transformers, copy both user_id and external_user_id from the base metadata to avoid incorrectly mapping one to the other. The BatchMemoryRequest pattern shows correct usage: developers pass user_id (internal) and optionally external_user_id (external), and both should be preserved through the pipeline.

  • Always pass user_id through workflow to activities for ACL: Posts created from document processing should have private ACL (user-only read/write). Ensure user_id flows from the workflow run() args down to all activities that call create_post_with_provider_json or similar Parse methods. Without user_id, Posts will have no ACL or default to public, which is a security issue. When adding new parameters to Temporal activities, place optional params at the end with defaults to maintain backward compatibility with running workflows.

  • Adjust MAX_CONTENT_LENGTH for LLM-generated memories: LLM-generated memory structures can be large (400KB+) when processing documents like Reducto outputs. Instead of chunking after generation, increase MAX_CONTENT_LENGTH in the validation layer (e.g., 600KB in models/memory_models.py) to accommodate rich, structured memories from hierarchical processing. This preserves semantic coherence better than post-hoc chunking.

  • Always return Pydantic types from service methods, not raw dicts: Service layer methods (e.g., fetch_post_with_provider_result_async) should return strongly-typed Pydantic models instead of Dict[str, Any]. This ensures validation, schema consistency, and safer refactors across Temporal activities, routes, and other consumers. Create dedicated response models (e.g., PostWithProviderResult) that wrap Parse Server models and include extracted convenience fields. Temporal activities handle Pydantic models seamlessly—they serialize/deserialize automatically across workflow boundaries.

Naming

  • Use agent.md at the repo root for agent learnings; Cursor conventions recognize agent.md more commonly than custom names. If we later need per-area notes, we can add additional files (e.g., agent-docs.md) and link them from this page.

  • Use Parse File storage for large Temporal payloads (BatchMemoryRequest pattern): When batch operations risk exceeding Temporal's gRPC limits (~2MB), create a dedicated Parse class (e.g., BatchMemoryRequest) with metadata fields and a File pointer for compressed data. Store the full payload as a gzipped Parse File and pass only the objectId (~50 bytes) to Temporal workflows. Activities fetch and decompress the data from Parse. This pattern achieved 99% payload reduction, enabled 100+ memory batches, and simplified workflow code by 77% (265 → 60 lines). Always compress with gzip for 7-10x ratio on JSON data.

  • Consolidate batch processing into single activity for simplicity: Instead of per-item Temporal activities (causing 4N activity calls for N items), create one activity that fetches the batch from Parse and processes all items internally with progress tracking. This reduces network overhead, simplifies workflow logic, enables better error handling, and maintains real-time status via Parse updates. Use heartbeats every 10 items and update Parse status fields (processedCount, successCount, failCount) for observability without bloating Temporal history.

  • Always use dedicated Parse classes over generic Post for domain-specific workflows: For specialized workflows like batch processing, create purpose-built Parse classes (e.g., BatchMemoryRequest) rather than reusing generic classes like Post. This provides clear semantics, proper field types, targeted indexes, and prevents field conflicts. Follow the pattern: create Pydantic model in models/parse_server.py with model_dump() override for __type transformation, add migration script for schema setup, and implement typed storage/fetch helpers in services/memory_management.py.

  • Construct Parse headers manually for session token and API key auth: The get_parse_headers() utility function in services/user_utils.py does not accept parameters. When creating HTTP requests to Parse Server, construct headers manually with the pattern: base headers include X-Parse-Application-Id and Content-Type, then conditionally add X-Parse-Master-Key if api_key is provided, or X-Parse-Session-Token if session_token is provided. This pattern is used throughout services/memory_management.py and must be followed for all Parse Server interactions that require authentication flexibility.

  • ParseFile model requires name field in all contexts: The ParseFile Pydantic model (defined in models/parse_server.py) requires both name and url fields, not just url and __type. When creating test mocks or Parse Server responses that include file pointers, always include the name field (e.g., "name": "batch_test_batch.json.gz"). Without the name field, Pydantic validation will fail with "Field required" errors. This applies to all Parse classes that use file pointers, including BatchMemoryRequest.batchDataFile, Post.provider_result_file, etc.

  • Start both Temporal workers when running Papr Memory locally: The memory server requires two Temporal workers to be running for document processing and batch memory workflows. Start them with: ps aux | grep "start_temporal_worker.py\|start_document_worker.py" | grep -v grep | awk '{print $2}' | xargs kill -9 2>/dev/null && sleep 2 && cd /Users/shawkatkabbara/Documents/GitHub/memory && poetry run python start_temporal_worker.py > .temporal_worker.out 2>&1 & poetry run python start_document_worker.py > .document_worker.out 2>&1 &. This command kills any existing workers, waits 2 seconds, then starts both workers in the background with output redirected to .temporal_worker.out and .document_worker.out. Without both workers running, document uploads and batch operations will queue but never process. Check worker logs with tail -f .temporal_worker.out or tail -f .document_worker.out.

  • Extract page counts from ALL document providers, not just Reducto: Different providers return page information in different formats. In process_document_with_provider_from_reference activity, add provider-specific logic to extract actual page counts: Reducto uses result.parse.result.usage.num_pages (with bbox-based fallback), TensorLake uses parsed_pages_count, Gemini/PaddleOCR/DeepSeek-OCR use len(result.pages). Always use provider_name.lower() for string comparisons and log extracted counts for observability. Without provider-specific extraction, workflows will report incorrect page counts (often defaulting to 1) which breaks UI progress indicators and billing calculations.

  • Validate provider content early with explicit errors (fail fast): For providers that should return parsed content (e.g., TensorLake), validate that provider_specific["content"] exists and is non-empty immediately after process_document() returns, before creating Parse Posts. If content is missing, raise an exception with context (parse_id, file_id, status) instead of silently proceeding with placeholder text. Add detailed logging showing provider_specific keys at each serialization boundary (Pydantic → dict → JSON) to catch where content might be lost. This prevents downstream "simple path" markdown generation from creating memories with just references instead of actual document text.

  • Use google-genai SDK (not google-generativeai) for Gemini to avoid dependency conflicts: The NEW unified google-genai SDK (v1.46+) has no protobuf version constraints and works seamlessly with TensorLake SDK (which requires protobuf 6.x). The OLD google-generativeai SDK (<0.8) conflicts with TensorLake, is deprecated (EOL Aug 2025), and should be replaced. Update provider code: import google.genai as genai, use genai.Client(api_key=...) instead of genai.configure(), and update model references to gemini-2.0-flash-exp. Pin in pyproject.toml: google-genai = "^1.46.0" and comment out the old SDK. The new SDK resolves all protobuf/grpcio-tools/PyYAML conflicts that previously blocked simultaneous use of Gemini + TensorLake.

Docker Deployment

  • Use split Docker Compose for production-like development: The docker-compose-split.yaml configuration runs web server and workers in separate containers, matching production architecture. This isolates web API (FastAPI) from Temporal workers (memory-processing + document-processing), prevents worker crashes from affecting API, and enables independent scaling. Web container runs poetry run uvicorn main:app --host 0.0.0.0 --port 5001, workers container runs poetry run python start_all_workers.py which starts both memory and document workers in a single process using asyncio.gather(). The all-in-one docker-compose.yaml (runs everything in one container via start_all_services.py) is simpler but less production-like.

  • Fix Docker healthchecks to use GET not HEAD requests: When using wget for Docker healthchecks, always use -O /dev/null (GET request) instead of --spider (HEAD request). Most FastAPI endpoints only respond to GET by default, causing 405 Method Not Allowed errors with HEAD. Correct healthcheck: test: ["CMD-SHELL", "wget --no-verbose --tries=1 -O /dev/null http://localhost:5001/health || exit 1"]. Add start_period: 60s to give app time to initialize (MongoDB, Neo4j, Qdrant connections) before first healthcheck. Without proper healthcheck, Docker marks containers as unhealthy even when they're working fine.

  • Start and monitor split services with comprehensive logging: To start: docker-compose -f docker-compose-split.yaml up -d (detached) or docker-compose -f docker-compose-split.yaml up (foreground with logs). To view logs: docker-compose -f docker-compose-split.yaml logs -f (all services mixed), docker-compose -f docker-compose-split.yaml logs -f web (only web server), docker-compose -f docker-compose-split.yaml logs -f workers (only Temporal workers). Use --tail=50 to limit history, pipe to grep for filtering (e.g., grep "Temporal" to see worker connections, grep "Successfully connected" for startup confirmations). Workers log shows: "✅ Successfully connected to Temporal", "🔧 Starting Memory Worker on task queue: memory-processing", "🔧 Starting Document Worker on task queue: document-processing", confirming both workers are polling Temporal Cloud for tasks.

  • Stop services cleanly with orphan removal: Always use docker-compose -f docker-compose-split.yaml down --remove-orphans to stop containers, remove networks, and clean up orphaned containers from previous configurations (e.g., old separate memory-worker/document-worker containers). Without --remove-orphans, switching between split and all-in-one configs leaves stale containers that cause warnings and confusion. To force clean slate: stop all memory containers (docker ps -a | grep memory | awk '{print $1}' | xargs docker stop && xargs docker rm), remove networks (docker network rm memory_default memory_network), then rebuild (docker-compose -f docker-compose-split.yaml build --no-cache && docker-compose -f docker-compose-split.yaml up -d).

  • Verify worker health with Temporal Cloud task queue pollers: After starting workers, check Temporal Cloud UI → Namespaces → Your namespace → Task Queues tab. Both memory-processing and document-processing should show "Pollers: 1" with recent activity timestamps (< 5 seconds ago). Green status indicators confirm workers are actively polling for tasks. In Docker: docker-compose -f docker-compose-split.yaml ps should show both web and workers as "Up (healthy)" after 60 seconds (healthcheck start_period). If web shows "(unhealthy)", check logs for healthcheck errors (typically 405 if using wrong HTTP method in wget).

GraphQL Integration with Neo4j

  • Use simple proxy pattern for Neo4j GraphQL: When integrating with Neo4j's hosted GraphQL endpoint, implement a simple proxy that translates existing authentication (API keys, bearer tokens, session tokens) to JWT tokens. Avoid building custom GraphQL servers with resolvers when Neo4j already provides the GraphQL layer. This keeps the architecture simple: API Key → FastAPI auth → JWT generation → Neo4j GraphQL. The proxy forwards queries with JWT in Authorization header and Neo4j provider credentials (X-Provider-ID, X-Provider-Key) for authentication.

  • JWT service for multi-tenant GraphQL authorization: Create a singleton JWTService in services/jwt_service.py that generates RS256-signed JWTs with claims for Neo4j's @authorization directive. Required claims: user_id (developer ID), workspace_id (for multi-tenancy), sub (subject), iss (issuer: "https://memory.papr.ai"), aud (audience: "neo4j-graphql"), exp (expiration), and iat (issued at). Use RSA-2048 keypair generated with openssl genrsa -out keys/jwt-private.pem 2048 && openssl rsa -in keys/jwt-private.pem -pubout -out keys/jwt-public.pem. Keep private key in .gitignore for security. This allows Neo4j to enforce row-level security using @authorization(validate: [{ where: { node: { user_id: "$jwt.user_id" } } }]) directives in the schema.

  • JWKS endpoint for public key distribution: Implement /.well-known/jwks.json endpoint (registered at root level, not under /v1) that serves the RSA public key in JWK format for Neo4j to validate JWT signatures. Load the public key from keys/jwt-public.pem, extract RSA components (modulus n and exponent e), base64url-encode them, and return as {"keys": [{"kty": "RSA", "use": "sig", "kid": "papr-memory-key-1", "alg": "RS256", "n": "...", "e": "AQAB"}]}. Add Cache-Control: public, max-age=3600 and Access-Control-Allow-Origin: * headers for proper caching and CORS. This follows OAuth2/OIDC standards and allows Neo4j to autonomously validate JWTs without API calls.

  • GraphQL Playground with GraphiQL v3: Serve an interactive GraphQL IDE at GET /v1/graphql using GraphiQL v3 (React 18) for development. Disable in production with if os.getenv("ENVIRONMENT") == "production": raise HTTPException(404). Implement with loading indicator, console logging, custom fetcher with error handling, API key prompt (saved to localStorage), introspection support, and header editor. Use ReactDOM.createRoot() for React 18 compatibility. Include example introspection query in default query to help developers discover the schema. After entering API key, the playground makes GraphQL requests to /v1/graphql POST endpoint with X-API-Key header.

  • Test GraphQL with real authentication, not mocks: When writing pytest tests for GraphQL endpoints, use real authentication from .env (TEST_X_USER_API_KEY) instead of mocking the authentication layer. Only mock the final Neo4j GraphQL response with AsyncMock. This ensures the entire authentication flow (API key → JWT generation → header construction) is tested end-to-end. Pattern: with patch('httpx.AsyncClient.post') as mock_post: mock_response = AsyncMock(); mock_response.status_code = 200; mock_response.content = json.dumps({...}).encode(); mock_post.return_value = mock_response. Verify JWT was added to Neo4j request by checking call_args.kwargs["headers"]["Authorization"].startswith("Bearer "). This caught issues with JWT generation, JWKS formatting, and provider credential forwarding that mocks would have hidden.

  • Register GraphQL routes separately from v1 router: GraphQL routes go under /v1 prefix (via v1_router.include_router(graphql_router)), but JWKS endpoint must be at root level (app.include_router(jwks_router) in app_factory.py). This is because Neo4j expects JWKS at standard /.well-known/jwks.json path, not /v1/.well-known/jwks.json. Import both routers in routers/v1/__init__.py and register them appropriately. Without this split registration, Neo4j cannot validate JWTs and all GraphQL queries will fail with authentication errors.

Temporal Worker Development & Debugging

  • Clear Python bytecode cache when adding new Temporal activities: When adding new activities to Temporal workers in local development, Python may load stale bytecode from __pycache__ directories even after restarting workers. Symptoms: "Activity function X is not registered on this worker" errors despite the activity being defined and registered in code. Solution: force kill workers (pkill -9 -f start_document_worker), clear bytecode cache (find cloud_plugins/temporal -type d -name __pycache__ -exec rm -rf {} +), then restart. In production, this shouldn't happen because proper deployment practices (Docker builds, pod restarts, etc.) always use fresh Python environments. For local development, consider adding find . -type d -name __pycache__ -exec rm -rf {} + to your worker restart scripts to prevent this class of issues.

  • Use task queue versioning to avoid stuck workflows from previous runs: When Temporal workflows fail mid-execution in development, they can remain queued in Temporal Cloud and interfere with new test runs, causing "Post does not exist" errors from workflow replay. Even clearing Python bytecode cache doesn't help because the stuck workflows are stored server-side. Best solution: Change the task queue name (e.g., document-processingdocument-processing-v2) in both the workflow starter (routers/v1/document_routes_v2.py) and worker (start_all_workers.py, start_document_worker.py). This forces new workflows to use a fresh queue, bypassing all old stuck workflows. Workers listening to the old queue can be safely ignored. Pattern: Add a version suffix or timestamp to task queue names during development iterations. For production, stuck workflows should be terminated via Temporal UI or CLI rather than changing task queues.

  • Systematically kill and restart Temporal workers to resolve port conflicts and versioning mismatches: When workers fail to start with OSError: [Errno 48] Address already in use, old worker processes are still holding port 8080 (health check server). Step 1 - Find workers: ps aux | grep 'start_all_workers\|start_temporal_worker\|start_document_worker' | grep -v grep lists all worker processes. Step 2 - Find port usage: lsof -ti:8080 returns PIDs using port 8080 (may include workers and other apps like Electron). Step 3 - Identify processes: ps -p <PIDs> -o pid,command shows what each process is. Step 4 - Kill workers: kill -9 <worker_PID> kills specific workers, or kill -9 $(lsof -ti:8080) force-kills everything on port 8080. Step 5 - Verify port is free: lsof -ti:8080 should return nothing. Step 6 - Restart: cd /path/to/memory && nohup poetry run python start_all_workers.py > /tmp/workers.log 2>&1 & starts workers in background. Step 7 - Verify startup: tail -30 /tmp/workers.log should show "✅ Successfully connected to Temporal", "Task Queue: memory-processing", "Task Queue: document-processing-v2", and "🚀 Starting both workers...". Critical for versioning: Ensure task queue names match across all configuration points: start_all_workers.py (memory_task_queue, document_task_queue), start_document_worker.py (task_queue), routers/v1/document_routes_v2.py (task_queue for workflow dispatch), cloud_plugins/temporal/workflows/batch_memory.py (task_queue, memory_task_queue), cloud_plugins/temporal/workflows/document_processing.py (memory_task_queue for child workflows), tests/conftest.py (task_queue for test worker), and config/cloud.yaml (temporal.task_queue). Current standard: memory-processing (no v2) and document-processing-v2 (with v2). If workers show in logs but workflows aren't picked up, check Temporal Cloud UI to see which task queue workflows are actually hitting - this reveals routing mismatches. For development, prefer unversioned mode (TEMPORAL_USE_VERSIONING=false) to avoid build ID configuration in Temporal Cloud.

Holographic Neural Embeddings & Metadata Selection for Frequency Tuning

  • Domain-specific metadata schemas outperform generic schemas for H-COND scoring: When using holographic embeddings with 13-frequency metadata, generic schema fields like mega_domain="Science", domain="Biology", entity_type="Gene" produce nearly identical metadata for all documents in a domain. This causes uniformly high alignment scores (0.72-0.88) for both relevant AND non-relevant documents, eliminating the discriminative power of H-COND. Instead, use domain-specific schemas with highly discriminating fields. For scientific/biomedical text, effective fields include: primary_entity (specific gene/protein: "APOE4", "TP53"), molecular_mechanism ("lipid metabolism dysregulation"), disease_condition ("Late-onset Alzheimer's"), key_finding (5-10 word summary of the claim). These produce metadata that varies across documents, enabling phase alignment to discriminate relevant from non-relevant.

  • Map metadata fields to frequencies based on semantic granularity: The 13 brain-inspired frequencies (0.1Hz-70Hz) should correspond to semantic layers from broad context to fine details:

    • LOW frequencies (0.1-2Hz): Broad stable context that varies LESS across documents (research_field, study_type)
    • MID frequencies (4-12Hz): Specific entities that vary MORE across documents (primary_entity, secondary_entity, molecular_mechanism, causal_relationship)
    • HIGH frequencies (18-40Hz): Fine-grained details that vary MOST (effect_direction, experimental_method, statistical_evidence, organism_model)
    • ULTRA-HIGH frequencies (50-70Hz): Most discriminating details (disease_condition, key_finding) Documents with matching fine-grained details (high frequencies) should score much higher than those matching only broad context (low frequencies).
  • LLM extraction prompts need domain-specific guidance: Generic prompts like "Extract metadata from this text" produce generic outputs. Include explicit instructions with examples: "Extract the MAIN gene/protein/cell (e.g., 'APOE4', 'TP53', 'CD8+ T cells')" instead of just "entity". Add critical reminders: "Extract SPECIFIC terms that would differentiate this from similar texts. Generic terms like 'Biology', 'Science', 'cells' reduce retrieval precision."

  • Use DSPy for metadata extraction optimization: When alignment gap (avg_relevant_alignment - avg_nonrelevant_alignment) is low, the metadata extraction isn't discriminating well. DSPy's BootstrapFewShot or MIPRO optimizers can automatically improve the extraction prompt using SciFact (or similar) relevance labels as training signal. The optimization metric should maximize alignment gap, not just extraction accuracy. Install with poetry add dspy-ai (requires Python version restriction >=3.11,<3.14 due to dependency constraints).

  • Debug metadata extraction by logging actual values: When H-COND scoring doesn't improve over baseline, add verbose logging to see actual extracted metadata for queries and documents side-by-side. Pattern: log query text, then each of 13 frequency values; log doc text (truncated), then each of 13 frequency values. Look for fields that are identical across all documents (bad) vs fields that vary meaningfully (good). This was implemented in scifact_llm_13freq_eval.py with FREQUENCY_SCHEMA mapping.

  • Use ONLY embedding-based phases for FREE_TEXT fields, never hash fallback: For complex interference similarity to work correctly, semantically similar text must produce similar phases. Hash-based phase computation (SHA256 mod 2π) is completely random - "Quality of Life" and "Life Quality" get entirely different phases despite being semantically equivalent. This destroys the interference pattern intended to boost relevant documents. In PhaseComputer, remove the hash fallback (hash_weight = 0.3) and use only embedding-based phase computation. The embedding → phase mapping uses np.tanh(np.mean(emb) * 10) + 1) / 2 to produce phases in [0, 1]. While this loses some semantic information by collapsing to a scalar, it preserves the critical property that similar text → similar phases → constructive interference. Root cause discovered: interference scores were consistently LOWER than base cosine similarity because the 30% hash component introduced random phase noise that caused destructive rather than constructive interference.

  • Complex interference requires query AND document phases to align well: The interference equation |ψ_q + ψ_d|² / (|ψ_q|² + |ψ_d|²) returns 2.0 (constructive) when cos(θ_q - θ_d) = 1 (phases match), 0.0 (destructive) when phases oppose. For this to discriminate relevant from non-relevant documents, the phase computation must satisfy: (1) same semantic content → same phase (deterministic), (2) similar semantic content → similar phase (continuous), (3) different semantic content → different phase (discriminative). Hash-based phases violate property (2) completely, making interference worse than random. Embedding-based phases satisfy all three properties but with reduced sensitivity due to scalar collapse.

Next Steps (automation)

  • After each completed task, append 2–3 sentence learnings to this file and also create a Memory item via scripts/add_agent_learning.py using customMetadata.category=memory_server_eng_learnings. This ensures the learnings are queryable within the product.
  • In upload_document_v2, ensure incoming metadata is parsed into MemoryMetadata and PreferredProvider is coerced from string to enum before invoking the workflow.
  • Add unit tests for hierarchical processing: core generators (DocumentToMemoryTransformer, generate_optimized_memory_structures) and the activities: extract_structured_content_from_provider, generate_llm_optimized_memory_structures, create_hierarchical_memory_batch, and link_batch_memories_to_post.