Skip to content

mlliarm/determinES

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

determinES

A grounded diagnostic framework for Elasticsearch that reduces LLM reasoning errors. It uses a Python-to-Prolog pipeline to provide a deterministic "source of truth," ensuring that AI-generated reports are anchored in verified cluster metrics and official troubleshooting logic.


This project is a small deterministic toolchain that turns Elastic support diagnostics (JSON from elastic-support-diagnostics or similar bundles) into Prolog facts, runs logical rules against them, and treats the engine’s output as the source of truth for “what looks unhealthy” in that snapshot—before layering any narrative or log correlation on top.

What it does

  1. Extract selected metrics from a single unzipped diagnostic folder (cluster_health.json, cluster_pending_tasks.json, nodes_stats.json, mapping.json) and emit facts.pl as Prolog facts (e.g. node_heap_percent('node-1', 72).).
  2. Verify those facts with SWI-Prolog and ~/rules.pl, which define an issue/2 predicate (entity + human-readable reason) for conditions aligned with Elastic’s Fix common cluster issues themes (disk pressure when facts exist, circuit breakers, JVM heap bands, CPU usage and hot spotting, cluster colour, unassigned shards, pending tasks, thread-pool rejections, mapping “explosion” heuristic, etc.).
  3. Interpret (optional, outside this repo): use the printed issue/2 lines to drive log grep / remediation suggestions—without using raw JSON to contradict what Prolog already proved from facts.pl.

Each diagnostic bundle gets its own facts.pl inside that folder, so running the pipeline on another date’s bundle does not overwrite previous outputs.


Architecture

flowchart LR
  subgraph inputs [Diagnostic bundle]
    CH[cluster_health.json]
    PT[cluster_pending_tasks.json]
    NS[nodes_stats.json]
    MP[mapping.json]
  end

  PY["diag_to_prolog.py\n(SWI: not used here)"]
  FP["facts.pl\n(per bundle)"]
  PL["rules.pl\n(issue/2 rules)"]
  SW["swipl\n(consult facts + rules)"]
  OUT["stdout:\nEntity: Reason"]

  CH --> PY
  PT --> PY
  NS --> PY
  MP --> PY
  PY --> FP
  FP --> SW
  PL --> SW
  SW --> OUT
Loading
Layer Role Typical location
Extractor Reads JSON paths relative to the bundle directory; writes facts.pl in that same directory ~/diag_to_prolog.py
Facts Ground atoms Prolog can query (cluster_status/1, node_heap_percent/2, index_field_count/2, …) <bundle>/facts.pl
Rules Declares issue(Entity, Reason) from those facts; :- consult('facts.pl'). expects cwd = bundle dir ~/rules.pl
Engine SWI-Prolog evaluates rules swipl on PATH

The Cursor skill elasticsearch-diagnostic-prolog-verifier (~/.cursor/skills/elasticsearch-diagnostic-prolog-verifier/SKILL.md) documents the operational sequence (extract → cd to bundle → run swipl → synthesize with logs only as explanation, not as a veto of Prolog).


How to run

Replace <BUNDLE> with the absolute path to an unzipped diagnostic directory (the folder that contains nodes_stats.json).

# 1. Generate facts inside the bundle
python3 ~/diag_to_prolog.py "<BUNDLE>"

# 2. Verify from that directory (facts.pl is loaded by relative path)
cd "<BUNDLE>" && swipl -q -s ~/rules.pl \
  -g 'forall(issue(E, R), format("~w: ~w~n", [E, R])), halt.'

Optional: custom output path.

python3 ~/diag_to_prolog.py "<BUNDLE>" -o /tmp/my-facts.pl
# Then run swipl from the directory that contains my-facts.pl, or adjust rules to consult that file name.

Default output is <BUNDLE>/facts.pl.


Design notes

  • Deterministic core: Thresholds and rule structure live in rules.pl; numeric grounding for each run lives in facts.pl from diag_to_prolog.py. If a metric is missing from the Python emitter, the corresponding rules simply do not fire (or you extend the script to emit new fact predicates and add matching rules).
  • Hot spotting can produce many duplicate lines for the same node because several peer nodes satisfy the comparison; the logic is still one relation (is_hotspot/1issue/2).
  • Mapping “explosion” uses a heuristic ('type': count in serialized mapping); internal indices such as .internal.alerts-* may score high—treat as a verified signal under the rules, then triage operationally (alerting / ILM / field caps), not as “Prolog is wrong” without changing the heuristic.

Related files

File Purpose
~/diag_to_prolog.py JSON → facts.pl
~/rules.pl Prolog rules + consult('facts.pl')
~/.cursor/skills/elasticsearch-diagnostic-prolog-verifier/SKILL.md Agent workflow and guardrails

This README is only documentation; it does not install dependencies. You need Python 3 (stdlib + JSON files) and SWI-Prolog for the verification step.

About

A Python to Prolog pipeline that bounds Elasticsearch LLM errors by grounding ES diagnostics analysis in verified metrics.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages