govecDB is a lightweight, high-performance, in-memory Vector Database built from scratch in Go. It supports dense vector embedding storage, cosine similarity search, and thread-safe persistence.
Designed as a systems programming project to understand the internals of vector stores like Pinecone or Weaviate.
- Core Arithmetic: Implements Cosine Similarity and Euclidean Magnitude calculations.
-
In-Memory Storage: Uses Go Maps for
$O(1)$ Insert and Delete operations. - Search: Performs k-Nearest Neighbor (k-NN) search using linear scanning and score-based sorting.
-
Concurrency Safe: Implements
sync.RWMutexto handle concurrent Reads (Search/Save) and Writes (Insert) efficiently. -
Persistence: Background snapshotting to disk using Go's binary
encoding/gobformat. - HTTP API: REST-like JSON endpoints for inserting and querying vectors.
The core storage engine moves away from simple slices to a Hash Map structure to optimize lifecycle management:
- Storage:
map[string]*VectorPoint - Optimization: Magnitudes are pre-calculated upon insertion to avoid repeated
sqrtoperations during search queries (O(1) lookup vs O(D) calculation).
The system uses a Reader-Writer Lock (sync.RWMutex):
- Search & Save: Acquire
RLock()(Multiple readers allowed in parallel). - Insert & Delete: Acquire
Lock()(Exclusive access). - Snapshotting: A background Goroutine wakes up every 5 minutes to dump the memory state to
vector.gobwithout blocking search traffic.
-
Clone the repository
git clone https://github.com/SyedAsadK/govecDB.git cd govecDB -
Run the Server
go run main.go
Server will start on port
:7777
Adds a new vector to the database. If the ID exists, it returns an error (idempotency check).
-
Endpoint:
POST /vectors -
Body:
{ "id": "doc_1", "vector": [0.12, 0.45, 0.88, -0.23] }
cURL Example:
curl -X POST http://localhost:7777/vectors \
-H "Content-Type: application/json" \
-d '{"id": "doc_1", "vector": [1.0, 2.0, 3.0]}'Finds the top k most similar vectors to the query vector.
-
Endpoint:
POST /search -
Body:
{ "vector": [0.12, 0.45, 0.88, -0.23], "k": 3 }
cURL Example:
curl -X POST http://localhost:7777/search \
-H "Content-Type: application/json" \
-d '{"vector": [1.0, 2.0, 3.1], "k": 2}'Response:
[
{
"id": "doc_1",
"score": 0.998,
"data": [1.0, 2.0, 3.0]
}
]- Indexing: Move from Brute Force ($O(N)$) to Approximate Nearest Neighbor (ANN) using HNSW or IVF for large datasets.
- WAL (Write Ahead Logging): Implement an append-only log for durability guarantees between snapshots.
- Sharding: Distribute data across multiple nodes.
-
Quantization: Compress vectors to
int8to reduce memory usage.
