This repository shows how to use the simple-data-analysis, maintained by Nael Shiab, to set up a free and open-source local RAG search, which allows you to search your data using natural language.
The default dataset is a list of AI-generated recipes. Here's an example query and result.
🔍 Question:
I am looking for a fruity pastry for breakfast.
📝 Answer:
The only pastry in the provided data that contains fruit and would be suitable for breakfast is **Apple Pie**. It uses sliced apples (such as Granny Smith or Honeycrisp) as the main fruit ingredient.
⏱️ Query duration: 8235ms
📊 Rows searched: 10
🧠 Thinking level: minimal
🤖 Model used: gpt-oss:20b
You can also return just the data, before it's passed to the LLM, which is way faster.
- The data is loaded and converted to vectors (embeddings)
- When a question is asked, two things happen in parallel:
- A vector search
- The question is converted to vectors
- The question vectors are compared the original data vectors
- The closest data entries are kept
- A BM25 search
- The words in the questions are searched in the original data text
- This search takes into accounts how many times the words are found and how long are the data entries
- The most relevant data entries are kept
- Then the results of the vector and BM25 searches are reranked to prioritize search results that are appearing high with both search technics.
- The selected data entries are sent to a LLM instructed to answer the question based only on the provided data, reducing the risk of hallucinations.
On a technical note, indexes are created for the embeddings and for the BM25 search. Data is cached locally. The reranking is a reciprocal rank fusion (RRF).
-
Fork and then clone this repository
-
Install Deno
-
Install Ollama
-
Pull the nomic-embed-text model for the embeddings:
ollama pull nomic-embed-text -
Pull the gpt-oss:20b LLM model:
ollama pull gpt-oss:20b -
Create an
.envfile from the.env.example -
Install all dependencies:
deno install -
Run
deno task datato load the data, create the embeddings and write the DB to disk. -
Run
deno task serverin one terminal to load and keep the DB in memory. -
Run
deno task queryin another terminal to ask a default question. The first time, it will load the LLM in memory. After that, it will be faster. Ollama keeps models in memory for 5 minutes by default.
All query options can be combined. Here are the available parameters:
Basic query with custom question:
deno task query "I am looking for a pastry with herbal flavours."Thinking level (-t or --thinking): Controls the depth of reasoning the
LLM uses. Options: minimal, low, medium, high
deno task query "What dessert has chocolate?" -t highNumber of results (-n or --numDocs): How many semantically closest data
points to pass to the LLM (default: 10)
deno task query "What dessert has chocolate?" -n 5Endpoint (-e or --endpoint): Choose between query (default - uses LLM
to answer) or data (returns just the search results without LLM, much faster)
# Get LLM-generated answer
deno task query "desserts with chocolate" -e query
# Get just the matching data without LLM processing
deno task query "desserts with chocolate" -e dataCombined example:
deno task query "I am looking for a pastry with herbal flavours." -t high -n 5 -e queryIf you want to start fresh and remove the cache, run deno task clean.
Reach out to Nael Shiab.