This project is an in-house solution (for more details, see history) to provide developers visibility into PostgreSQL instances. Developers have no access to production deployments, but still need to see data (such as query statistics or pg_stat_activity/pg_locks). This catalog exposes that data.
It can handle PostgreSQL HA setups with primary and replica instances, and saves data in a timeseries database (TimescaleDB).
Originally built for VM-based deployments, it "should" do fine as a k8s sidecar (the extractor, that is). Untested, though!
- PostgreSQL 14 or higher
This all started as a hackathon project, written in Python. It was then rewritten in Rust for improved performance and safety (as well as improved development speed). I am now releasing it as an open-source variant, written in Go. For various reasons (mostly the fact that the rust version is tightly coupled to the in-house deployments of PG), a rewrite for the open-source version is in order. Since I (the author and the rust dev, but not the original dev) want to learn Go this makes sense to me. Additionally, we're also moving from a JSON response to gRPC communication.
Originally, I wrote a long text on what the PostgreSQL deploys look like where I'm at, but that doesn't matter here. The only thing you need to know is that traffic only goes one way: collectors can reach extractors, but extractors cannot reach collectors. It's pull, not push. An unfortunate side-effect of our in-house setup (multiple AZs, VLANs and etc).
Simple gRPC server that exposes the data we want from a given PostgreSQL server.
TODO: specify user access levels for the extractor.
The collector reaches out to a specific host and collects data from that host. It then inserts that data into the central database.
A simple API service, exposes the data collected for any given frontend to use/display as they see fit.
Before the extractor can connect to a PostgreSQL instance, the following must be configured on the target server:
Create the monitoring schema and cluster configuration table:
CREATE SCHEMA IF NOT EXISTS monitoring;
CREATE TABLE IF NOT EXISTS monitoring.cluster_config (
key TEXT PRIMARY KEY,
value TEXT NOT NULL
);
-- Required: Set cluster name (used for cluster identification)
INSERT INTO monitoring.cluster_config (key, value)
VALUES ('cluster_name', 'your-cluster-name');
-- Required: Set pgBackRest stanza name (if using pgBackRest)
INSERT INTO monitoring.cluster_config (key, value)
VALUES ('stanza', 'your-stanza-name');Important: Replace 'your-cluster-name' and 'your-stanza-name' with appropriate values for your environment.
The PostgreSQL user that the extractor uses must have:
SELECTpermission onmonitoring.cluster_configtable- Access to system catalog tables (
pg_database,pg_class, etc.) - Recommended: Grant
pg_monitorrole for read-only access to statistics views
-- Example: Create monitoring user (certificate-based authentication)
CREATE USER pg_inventory_extractor;
-- Grant necessary permissions
GRANT USAGE ON SCHEMA monitoring TO pg_inventory_extractor;
GRANT SELECT ON monitoring.cluster_config TO pg_inventory_extractor;
GRANT pg_monitor TO pg_inventory_extractor;Authentication: Configure pg_hba.conf for certificate-based authentication:
# Allow certificate authentication for monitoring user
# TYPE DATABASE USER ADDRESS METHOD OPTIONS
hostssl all pg_inventory_extractor 0.0.0.0/0 cert clientcert=verify-full
Note: The extractor requires read-only access. It does not modify any data in the target PostgreSQL instance.
TODO!
TODO!