These notes capture repeatable learnings from recent fixes and improvements in this repository (memory server). Each item includes brief context so we can reuse the approach confidently next time.
-
Prefer Pydantic models over raw dicts for cross-layer contracts: When routing complex data between FastAPI routes, Temporal workflows, and activities, use Pydantic types found under
models/(e.g.,MemoryMetadata). In this fix we replacedDict[str, Any]metadata withMemoryMetadatain the workflow, route, and activities to ensure validation, schema consistency, and safer refactors. -
Unify provider selection with an enum: Define a single enum in
models/shared_types.py(e.g.,PreferredProvider) and use it end-to-end (route form param, workflow arg, activities). This avoids string drift ("gemini" vs "Gemini" etc.), simplifies validation, and keepsprovider_managermappings consistent. -
Use request parameters for runtime choices, not environment flags: Feature toggles that affect a single request (e.g.,
hierarchical_enabled) should come from the API request, not.env. In this fix we replacedHIERARCHICAL_FLAGwith a boolean form param to make behavior explicit, testable, and tenant/user specific. -
Return strongly-typed batch payloads from activities: Activities that generate memory items should return a structure aligned with our batch ingestion (
BatchMemoryRequest). We now return abatch_requestalongside legacymemory_requestsfor compatibility, making it easier to pipe results directly into the batch processor. -
Standardize environment access and loading: Load
.envonce (python-dotenv) and access variables viafrom os import environ as env. This avoidsNameError: os is not definedpitfalls and keeps configuration lookups (env.get(...)) consistent across modules. -
Budget tokens proactively in LLM calls: Add utility methods at the LLM client layer (e.g.,
estimate_message_tokens,trim_messages_to_token_budget) and call them where prompts can get large (classification, reranking). This reduces transient length-limit errors and stabilizes production behavior. -
Normalize model-specific chat parameters centrally: For o-series and gpt‑5 models, normalize
max_tokens→max_completion_tokensand strip unsupported params in a single helper. Keeping this in the chat client avoids per-call conditionals and prevents subtle API errors. -
Prefer typed fallbacks for LLM parsing: When structured parsing fails (e.g., typed parse), fall back to JSON-object responses with conservative token caps. This preserves resiliency without sacrificing the downstream schema.
-
Apply multi-tenant scoping with typed metadata: Run metadata through
apply_multi_tenant_scoping_to_metadata(MemoryMetadata, auth_context)in routes before handing off to workflows/activities. Typed metadata ensures list normalization and safe serialization (e.g., topics, ACL lists, createdAt handling). -
Centralize provider adapters and keep names lowercase:
provider_managerand provider adapters expect normalized lowercase names ("tensorlake", "reducto", "gemini"). Using the enum ensures consistent lowercasing and predictable fallbacks across providers. -
Use Pydantic
PostParseServerandobjectIdfor Parse Posts (no mocks): When creating/fetching Posts, return and pass strongly-typedPostParseServerinstead of raw dicts and preferobjectIdover ad-hoc keys likepost_id. For tests, avoid monkeypatching HTTP and use a real Parse Server call guarded by env vars; requirePARSE_SERVER_URL,PARSE_APPLICATION_ID,PARSE_MASTER_KEY, and a validWORKSPACE_ID, and clean up created Posts after assertions. -
Fetch large provider results from Parse Files in activities: When
extract_structured_content_from_providerreceives apost_id, it should fetch the Post from Parse and check ifcontent.provider_result_fileexists (Parse File pointer). Download the file URL to get the full provider JSON, as large results (Reducto, etc.) are stored as files to avoid Parse body-parser limits. Fall back to inlinecontent.provider_resultfor small/legacy payloads. -
Decompress gzipped Parse Files when downloading provider results: Parse Server stores large provider results (uploaded via
create_post_with_provider_json) as gzipped files to save space. When downloading these files in activities, always attemptgzip.decompress()on the raw bytes before JSON parsing. Fall back to direct parsing if decompression fails (for backward compatibility with non-gzipped uploads). Without decompression, JSON parsing will fail with cryptic errors and the activity will return empty results. -
Place user_id in MemoryMetadata, not AddMemoryRequest constructor:
AddMemoryRequestdoes not acceptexternal_user_idas a constructor parameter. Instead, setexternal_user_idoruser_idfields within theMemoryMetadataobject passed toAddMemoryRequest. This applies to all memory transformation code (Reducto transformer, hierarchical processors, etc.). Passingexternal_user_iddirectly toAddMemoryRequestcauses Pydantic validation errors: "Extra inputs are not permitted". -
Preserve user_id vs external_user_id distinction in transformers: Document processing transformers (Reducto, hierarchical, etc.) should accept
MemoryMetadataobjects directly rather than extracting user IDs as string parameters. This preserves the distinction betweenuser_id(internal Papr developer ID) andexternal_user_id(end user ID). When creating newMemoryMetadataobjects in transformers, copy bothuser_idandexternal_user_idfrom the base metadata to avoid incorrectly mapping one to the other. TheBatchMemoryRequestpattern shows correct usage: developers passuser_id(internal) and optionallyexternal_user_id(external), and both should be preserved through the pipeline. -
Always pass
user_idthrough workflow to activities for ACL: Posts created from document processing should have private ACL (user-only read/write). Ensureuser_idflows from the workflowrun()args down to all activities that callcreate_post_with_provider_jsonor similar Parse methods. Withoutuser_id, Posts will have no ACL or default to public, which is a security issue. When adding new parameters to Temporal activities, place optional params at the end with defaults to maintain backward compatibility with running workflows. -
Adjust MAX_CONTENT_LENGTH for LLM-generated memories: LLM-generated memory structures can be large (400KB+) when processing documents like Reducto outputs. Instead of chunking after generation, increase
MAX_CONTENT_LENGTHin the validation layer (e.g., 600KB inmodels/memory_models.py) to accommodate rich, structured memories from hierarchical processing. This preserves semantic coherence better than post-hoc chunking. -
Always return Pydantic types from service methods, not raw dicts: Service layer methods (e.g.,
fetch_post_with_provider_result_async) should return strongly-typed Pydantic models instead ofDict[str, Any]. This ensures validation, schema consistency, and safer refactors across Temporal activities, routes, and other consumers. Create dedicated response models (e.g.,PostWithProviderResult) that wrap Parse Server models and include extracted convenience fields. Temporal activities handle Pydantic models seamlessly—they serialize/deserialize automatically across workflow boundaries.
-
Use
agent.mdat the repo root for agent learnings; Cursor conventions recognizeagent.mdmore commonly than custom names. If we later need per-area notes, we can add additional files (e.g.,agent-docs.md) and link them from this page. -
Use Parse File storage for large Temporal payloads (BatchMemoryRequest pattern): When batch operations risk exceeding Temporal's gRPC limits (~2MB), create a dedicated Parse class (e.g.,
BatchMemoryRequest) with metadata fields and a File pointer for compressed data. Store the full payload as a gzipped Parse File and pass only the objectId (~50 bytes) to Temporal workflows. Activities fetch and decompress the data from Parse. This pattern achieved 99% payload reduction, enabled 100+ memory batches, and simplified workflow code by 77% (265 → 60 lines). Always compress with gzip for 7-10x ratio on JSON data. -
Consolidate batch processing into single activity for simplicity: Instead of per-item Temporal activities (causing 4N activity calls for N items), create one activity that fetches the batch from Parse and processes all items internally with progress tracking. This reduces network overhead, simplifies workflow logic, enables better error handling, and maintains real-time status via Parse updates. Use heartbeats every 10 items and update Parse status fields (
processedCount,successCount,failCount) for observability without bloating Temporal history. -
Always use dedicated Parse classes over generic Post for domain-specific workflows: For specialized workflows like batch processing, create purpose-built Parse classes (e.g.,
BatchMemoryRequest) rather than reusing generic classes likePost. This provides clear semantics, proper field types, targeted indexes, and prevents field conflicts. Follow the pattern: create Pydantic model inmodels/parse_server.pywithmodel_dump()override for__typetransformation, add migration script for schema setup, and implement typed storage/fetch helpers inservices/memory_management.py. -
Construct Parse headers manually for session token and API key auth: The
get_parse_headers()utility function inservices/user_utils.pydoes not accept parameters. When creating HTTP requests to Parse Server, construct headers manually with the pattern: base headers includeX-Parse-Application-IdandContent-Type, then conditionally addX-Parse-Master-Keyifapi_keyis provided, orX-Parse-Session-Tokenifsession_tokenis provided. This pattern is used throughoutservices/memory_management.pyand must be followed for all Parse Server interactions that require authentication flexibility. -
ParseFile model requires name field in all contexts: The
ParseFilePydantic model (defined inmodels/parse_server.py) requires bothnameandurlfields, not justurland__type. When creating test mocks or Parse Server responses that include file pointers, always include thenamefield (e.g.,"name": "batch_test_batch.json.gz"). Without thenamefield, Pydantic validation will fail with "Field required" errors. This applies to all Parse classes that use file pointers, includingBatchMemoryRequest.batchDataFile,Post.provider_result_file, etc. -
Start both Temporal workers when running Papr Memory locally: The memory server requires two Temporal workers to be running for document processing and batch memory workflows. Start them with:
ps aux | grep "start_temporal_worker.py\|start_document_worker.py" | grep -v grep | awk '{print $2}' | xargs kill -9 2>/dev/null && sleep 2 && cd /Users/shawkatkabbara/Documents/GitHub/memory && poetry run python start_temporal_worker.py > .temporal_worker.out 2>&1 & poetry run python start_document_worker.py > .document_worker.out 2>&1 &. This command kills any existing workers, waits 2 seconds, then starts both workers in the background with output redirected to.temporal_worker.outand.document_worker.out. Without both workers running, document uploads and batch operations will queue but never process. Check worker logs withtail -f .temporal_worker.outortail -f .document_worker.out. -
Extract page counts from ALL document providers, not just Reducto: Different providers return page information in different formats. In
process_document_with_provider_from_referenceactivity, add provider-specific logic to extract actual page counts: Reducto usesresult.parse.result.usage.num_pages(with bbox-based fallback), TensorLake usesparsed_pages_count, Gemini/PaddleOCR/DeepSeek-OCR uselen(result.pages). Always useprovider_name.lower()for string comparisons and log extracted counts for observability. Without provider-specific extraction, workflows will report incorrect page counts (often defaulting to 1) which breaks UI progress indicators and billing calculations. -
Validate provider content early with explicit errors (fail fast): For providers that should return parsed content (e.g., TensorLake), validate that
provider_specific["content"]exists and is non-empty immediately afterprocess_document()returns, before creating Parse Posts. If content is missing, raise an exception with context (parse_id, file_id, status) instead of silently proceeding with placeholder text. Add detailed logging showingprovider_specifickeys at each serialization boundary (Pydantic → dict → JSON) to catch where content might be lost. This prevents downstream "simple path" markdown generation from creating memories with just references instead of actual document text. -
Use google-genai SDK (not google-generativeai) for Gemini to avoid dependency conflicts: The NEW unified
google-genaiSDK (v1.46+) has no protobuf version constraints and works seamlessly with TensorLake SDK (which requires protobuf 6.x). The OLDgoogle-generativeaiSDK (<0.8) conflicts with TensorLake, is deprecated (EOL Aug 2025), and should be replaced. Update provider code:import google.genai as genai, usegenai.Client(api_key=...)instead ofgenai.configure(), and update model references togemini-2.0-flash-exp. Pin in pyproject.toml:google-genai = "^1.46.0"and comment out the old SDK. The new SDK resolves all protobuf/grpcio-tools/PyYAML conflicts that previously blocked simultaneous use of Gemini + TensorLake.
-
Use split Docker Compose for production-like development: The
docker-compose-split.yamlconfiguration runs web server and workers in separate containers, matching production architecture. This isolates web API (FastAPI) from Temporal workers (memory-processing + document-processing), prevents worker crashes from affecting API, and enables independent scaling. Web container runspoetry run uvicorn main:app --host 0.0.0.0 --port 5001, workers container runspoetry run python start_all_workers.pywhich starts both memory and document workers in a single process usingasyncio.gather(). The all-in-onedocker-compose.yaml(runs everything in one container viastart_all_services.py) is simpler but less production-like. -
Fix Docker healthchecks to use GET not HEAD requests: When using
wgetfor Docker healthchecks, always use-O /dev/null(GET request) instead of--spider(HEAD request). Most FastAPI endpoints only respond to GET by default, causing 405 Method Not Allowed errors with HEAD. Correct healthcheck:test: ["CMD-SHELL", "wget --no-verbose --tries=1 -O /dev/null http://localhost:5001/health || exit 1"]. Addstart_period: 60sto give app time to initialize (MongoDB, Neo4j, Qdrant connections) before first healthcheck. Without proper healthcheck, Docker marks containers as unhealthy even when they're working fine. -
Start and monitor split services with comprehensive logging: To start:
docker-compose -f docker-compose-split.yaml up -d(detached) ordocker-compose -f docker-compose-split.yaml up(foreground with logs). To view logs:docker-compose -f docker-compose-split.yaml logs -f(all services mixed),docker-compose -f docker-compose-split.yaml logs -f web(only web server),docker-compose -f docker-compose-split.yaml logs -f workers(only Temporal workers). Use--tail=50to limit history, pipe togrepfor filtering (e.g.,grep "Temporal"to see worker connections,grep "Successfully connected"for startup confirmations). Workers log shows: "✅ Successfully connected to Temporal", "🔧 Starting Memory Worker on task queue: memory-processing", "🔧 Starting Document Worker on task queue: document-processing", confirming both workers are polling Temporal Cloud for tasks. -
Stop services cleanly with orphan removal: Always use
docker-compose -f docker-compose-split.yaml down --remove-orphansto stop containers, remove networks, and clean up orphaned containers from previous configurations (e.g., old separate memory-worker/document-worker containers). Without--remove-orphans, switching between split and all-in-one configs leaves stale containers that cause warnings and confusion. To force clean slate: stop all memory containers (docker ps -a | grep memory | awk '{print $1}' | xargs docker stop && xargs docker rm), remove networks (docker network rm memory_default memory_network), then rebuild (docker-compose -f docker-compose-split.yaml build --no-cache && docker-compose -f docker-compose-split.yaml up -d). -
Verify worker health with Temporal Cloud task queue pollers: After starting workers, check Temporal Cloud UI → Namespaces → Your namespace → Task Queues tab. Both
memory-processinganddocument-processingshould show "Pollers: 1" with recent activity timestamps (< 5 seconds ago). Green status indicators confirm workers are actively polling for tasks. In Docker:docker-compose -f docker-compose-split.yaml psshould show both web and workers as "Up (healthy)" after 60 seconds (healthcheck start_period). If web shows "(unhealthy)", check logs for healthcheck errors (typically 405 if using wrong HTTP method in wget).
-
Use simple proxy pattern for Neo4j GraphQL: When integrating with Neo4j's hosted GraphQL endpoint, implement a simple proxy that translates existing authentication (API keys, bearer tokens, session tokens) to JWT tokens. Avoid building custom GraphQL servers with resolvers when Neo4j already provides the GraphQL layer. This keeps the architecture simple:
API Key → FastAPI auth → JWT generation → Neo4j GraphQL. The proxy forwards queries with JWT in Authorization header and Neo4j provider credentials (X-Provider-ID, X-Provider-Key) for authentication. -
JWT service for multi-tenant GraphQL authorization: Create a singleton
JWTServiceinservices/jwt_service.pythat generates RS256-signed JWTs with claims for Neo4j's@authorizationdirective. Required claims:user_id(developer ID),workspace_id(for multi-tenancy),sub(subject),iss(issuer: "https://memory.papr.ai"),aud(audience: "neo4j-graphql"),exp(expiration), andiat(issued at). Use RSA-2048 keypair generated withopenssl genrsa -out keys/jwt-private.pem 2048 && openssl rsa -in keys/jwt-private.pem -pubout -out keys/jwt-public.pem. Keep private key in.gitignorefor security. This allows Neo4j to enforce row-level security using@authorization(validate: [{ where: { node: { user_id: "$jwt.user_id" } } }])directives in the schema. -
JWKS endpoint for public key distribution: Implement
/.well-known/jwks.jsonendpoint (registered at root level, not under/v1) that serves the RSA public key in JWK format for Neo4j to validate JWT signatures. Load the public key fromkeys/jwt-public.pem, extract RSA components (modulusnand exponente), base64url-encode them, and return as{"keys": [{"kty": "RSA", "use": "sig", "kid": "papr-memory-key-1", "alg": "RS256", "n": "...", "e": "AQAB"}]}. AddCache-Control: public, max-age=3600andAccess-Control-Allow-Origin: *headers for proper caching and CORS. This follows OAuth2/OIDC standards and allows Neo4j to autonomously validate JWTs without API calls. -
GraphQL Playground with GraphiQL v3: Serve an interactive GraphQL IDE at
GET /v1/graphqlusing GraphiQL v3 (React 18) for development. Disable in production withif os.getenv("ENVIRONMENT") == "production": raise HTTPException(404). Implement with loading indicator, console logging, custom fetcher with error handling, API key prompt (saved to localStorage), introspection support, and header editor. UseReactDOM.createRoot()for React 18 compatibility. Include example introspection query in default query to help developers discover the schema. After entering API key, the playground makes GraphQL requests to/v1/graphqlPOST endpoint withX-API-Keyheader. -
Test GraphQL with real authentication, not mocks: When writing pytest tests for GraphQL endpoints, use real authentication from .env (
TEST_X_USER_API_KEY) instead of mocking the authentication layer. Only mock the final Neo4j GraphQL response withAsyncMock. This ensures the entire authentication flow (API key → JWT generation → header construction) is tested end-to-end. Pattern:with patch('httpx.AsyncClient.post') as mock_post: mock_response = AsyncMock(); mock_response.status_code = 200; mock_response.content = json.dumps({...}).encode(); mock_post.return_value = mock_response. Verify JWT was added to Neo4j request by checkingcall_args.kwargs["headers"]["Authorization"].startswith("Bearer "). This caught issues with JWT generation, JWKS formatting, and provider credential forwarding that mocks would have hidden. -
Register GraphQL routes separately from v1 router: GraphQL routes go under
/v1prefix (viav1_router.include_router(graphql_router)), but JWKS endpoint must be at root level (app.include_router(jwks_router)inapp_factory.py). This is because Neo4j expects JWKS at standard/.well-known/jwks.jsonpath, not/v1/.well-known/jwks.json. Import both routers inrouters/v1/__init__.pyand register them appropriately. Without this split registration, Neo4j cannot validate JWTs and all GraphQL queries will fail with authentication errors.
-
Clear Python bytecode cache when adding new Temporal activities: When adding new activities to Temporal workers in local development, Python may load stale bytecode from
__pycache__directories even after restarting workers. Symptoms: "Activity function X is not registered on this worker" errors despite the activity being defined and registered in code. Solution: force kill workers (pkill -9 -f start_document_worker), clear bytecode cache (find cloud_plugins/temporal -type d -name __pycache__ -exec rm -rf {} +), then restart. In production, this shouldn't happen because proper deployment practices (Docker builds, pod restarts, etc.) always use fresh Python environments. For local development, consider addingfind . -type d -name __pycache__ -exec rm -rf {} +to your worker restart scripts to prevent this class of issues. -
Use task queue versioning to avoid stuck workflows from previous runs: When Temporal workflows fail mid-execution in development, they can remain queued in Temporal Cloud and interfere with new test runs, causing "Post does not exist" errors from workflow replay. Even clearing Python bytecode cache doesn't help because the stuck workflows are stored server-side. Best solution: Change the task queue name (e.g.,
document-processing→document-processing-v2) in both the workflow starter (routers/v1/document_routes_v2.py) and worker (start_all_workers.py,start_document_worker.py). This forces new workflows to use a fresh queue, bypassing all old stuck workflows. Workers listening to the old queue can be safely ignored. Pattern: Add a version suffix or timestamp to task queue names during development iterations. For production, stuck workflows should be terminated via Temporal UI or CLI rather than changing task queues. -
Systematically kill and restart Temporal workers to resolve port conflicts and versioning mismatches: When workers fail to start with
OSError: [Errno 48] Address already in use, old worker processes are still holding port 8080 (health check server). Step 1 - Find workers:ps aux | grep 'start_all_workers\|start_temporal_worker\|start_document_worker' | grep -v greplists all worker processes. Step 2 - Find port usage:lsof -ti:8080returns PIDs using port 8080 (may include workers and other apps like Electron). Step 3 - Identify processes:ps -p <PIDs> -o pid,commandshows what each process is. Step 4 - Kill workers:kill -9 <worker_PID>kills specific workers, orkill -9 $(lsof -ti:8080)force-kills everything on port 8080. Step 5 - Verify port is free:lsof -ti:8080should return nothing. Step 6 - Restart:cd /path/to/memory && nohup poetry run python start_all_workers.py > /tmp/workers.log 2>&1 &starts workers in background. Step 7 - Verify startup:tail -30 /tmp/workers.logshould show "✅ Successfully connected to Temporal", "Task Queue: memory-processing", "Task Queue: document-processing-v2", and "🚀 Starting both workers...". Critical for versioning: Ensure task queue names match across all configuration points:start_all_workers.py(memory_task_queue, document_task_queue),start_document_worker.py(task_queue),routers/v1/document_routes_v2.py(task_queue for workflow dispatch),cloud_plugins/temporal/workflows/batch_memory.py(task_queue, memory_task_queue),cloud_plugins/temporal/workflows/document_processing.py(memory_task_queue for child workflows),tests/conftest.py(task_queue for test worker), andconfig/cloud.yaml(temporal.task_queue). Current standard:memory-processing(no v2) anddocument-processing-v2(with v2). If workers show in logs but workflows aren't picked up, check Temporal Cloud UI to see which task queue workflows are actually hitting - this reveals routing mismatches. For development, prefer unversioned mode (TEMPORAL_USE_VERSIONING=false) to avoid build ID configuration in Temporal Cloud.
-
Domain-specific metadata schemas outperform generic schemas for H-COND scoring: When using holographic embeddings with 13-frequency metadata, generic schema fields like
mega_domain="Science",domain="Biology",entity_type="Gene"produce nearly identical metadata for all documents in a domain. This causes uniformly high alignment scores (0.72-0.88) for both relevant AND non-relevant documents, eliminating the discriminative power of H-COND. Instead, use domain-specific schemas with highly discriminating fields. For scientific/biomedical text, effective fields include:primary_entity(specific gene/protein: "APOE4", "TP53"),molecular_mechanism("lipid metabolism dysregulation"),disease_condition("Late-onset Alzheimer's"),key_finding(5-10 word summary of the claim). These produce metadata that varies across documents, enabling phase alignment to discriminate relevant from non-relevant. -
Map metadata fields to frequencies based on semantic granularity: The 13 brain-inspired frequencies (0.1Hz-70Hz) should correspond to semantic layers from broad context to fine details:
- LOW frequencies (0.1-2Hz): Broad stable context that varies LESS across documents (research_field, study_type)
- MID frequencies (4-12Hz): Specific entities that vary MORE across documents (primary_entity, secondary_entity, molecular_mechanism, causal_relationship)
- HIGH frequencies (18-40Hz): Fine-grained details that vary MOST (effect_direction, experimental_method, statistical_evidence, organism_model)
- ULTRA-HIGH frequencies (50-70Hz): Most discriminating details (disease_condition, key_finding) Documents with matching fine-grained details (high frequencies) should score much higher than those matching only broad context (low frequencies).
-
LLM extraction prompts need domain-specific guidance: Generic prompts like "Extract metadata from this text" produce generic outputs. Include explicit instructions with examples: "Extract the MAIN gene/protein/cell (e.g., 'APOE4', 'TP53', 'CD8+ T cells')" instead of just "entity". Add critical reminders: "Extract SPECIFIC terms that would differentiate this from similar texts. Generic terms like 'Biology', 'Science', 'cells' reduce retrieval precision."
-
Use DSPy for metadata extraction optimization: When alignment gap (avg_relevant_alignment - avg_nonrelevant_alignment) is low, the metadata extraction isn't discriminating well. DSPy's BootstrapFewShot or MIPRO optimizers can automatically improve the extraction prompt using SciFact (or similar) relevance labels as training signal. The optimization metric should maximize alignment gap, not just extraction accuracy. Install with
poetry add dspy-ai(requires Python version restriction>=3.11,<3.14due to dependency constraints). -
Debug metadata extraction by logging actual values: When H-COND scoring doesn't improve over baseline, add verbose logging to see actual extracted metadata for queries and documents side-by-side. Pattern: log query text, then each of 13 frequency values; log doc text (truncated), then each of 13 frequency values. Look for fields that are identical across all documents (bad) vs fields that vary meaningfully (good). This was implemented in
scifact_llm_13freq_eval.pywithFREQUENCY_SCHEMAmapping. -
Use ONLY embedding-based phases for FREE_TEXT fields, never hash fallback: For complex interference similarity to work correctly, semantically similar text must produce similar phases. Hash-based phase computation (SHA256 mod 2π) is completely random - "Quality of Life" and "Life Quality" get entirely different phases despite being semantically equivalent. This destroys the interference pattern intended to boost relevant documents. In
PhaseComputer, remove the hash fallback (hash_weight = 0.3) and use only embedding-based phase computation. The embedding → phase mapping usesnp.tanh(np.mean(emb) * 10) + 1) / 2to produce phases in [0, 1]. While this loses some semantic information by collapsing to a scalar, it preserves the critical property that similar text → similar phases → constructive interference. Root cause discovered: interference scores were consistently LOWER than base cosine similarity because the 30% hash component introduced random phase noise that caused destructive rather than constructive interference. -
Complex interference requires query AND document phases to align well: The interference equation
|ψ_q + ψ_d|² / (|ψ_q|² + |ψ_d|²)returns 2.0 (constructive) whencos(θ_q - θ_d) = 1(phases match), 0.0 (destructive) when phases oppose. For this to discriminate relevant from non-relevant documents, the phase computation must satisfy: (1) same semantic content → same phase (deterministic), (2) similar semantic content → similar phase (continuous), (3) different semantic content → different phase (discriminative). Hash-based phases violate property (2) completely, making interference worse than random. Embedding-based phases satisfy all three properties but with reduced sensitivity due to scalar collapse.
- After each completed task, append 2–3 sentence learnings to this file and also create a Memory item via
scripts/add_agent_learning.pyusingcustomMetadata.category=memory_server_eng_learnings. This ensures the learnings are queryable within the product. - In
upload_document_v2, ensure incoming metadata is parsed intoMemoryMetadataandPreferredProvideris coerced from string to enum before invoking the workflow. - Add unit tests for hierarchical processing: core generators (
DocumentToMemoryTransformer,generate_optimized_memory_structures) and the activities:extract_structured_content_from_provider,generate_llm_optimized_memory_structures,create_hierarchical_memory_batch, andlink_batch_memories_to_post.