This directory contains a draft self-hosted data distribution portal for
data.realgoodresearch.com.
This repository is deployment-specific as written. Before reusing it for another organization or public demo, replace the example domain names, network ranges, and sample credentials references with values appropriate for that environment.
The default docker-compose.yml is the production cloud stack:
postgres: Catalog and token-grant database, running on the cloud hostbroker-api: FastAPI service that lists datasets and returns policy-aware download decisionsnginx: TLS gateway for the frontend and reverse proxy for multiple subdomainscertbot: Automated Let's Encrypt renewal sidecar
The local development stack in docker-compose.local.yml keeps all services together and adds:
minio: Private S3-compatible object storage
- On a fresh Ubuntu cloud host, install the system dependencies, including Docker and Quarto:
sudo ./scripts/install-ubuntu-dependencies.sh --configure-firewallThe script sets the VM hostname to realgooddata by default. Use
--hostname NAME or --skip-hostname if you need different behavior.
- Copy
.env.exampleto.env. - Set Postgres credentials, admin credentials, and the cloud Postgres data path.
- Set
MINIO_ENDPOINTto the local MinIO API origin that the cloud broker can reach, usuallyhttps://your-minio-origin.example.com:9000. - Create a dedicated MinIO broker user using the policy in
minio/policies/broker-readonly.json,
then set
MINIO_ACCESS_KEYandMINIO_SECRET_KEYto that user's credentials. - On the local storage server, run MinIO and allow inbound
9000only from the cloud host's static IP address. - Start the cloud stack:
docker compose up -d --build- Request the first certificate for
data.realgoodresearch.com:
./scripts/request-certificate.sh data.realgoodresearch.com- Reload Nginx once the certificate is issued:
docker compose restart nginxThe certbot container will renew existing certificates automatically. The Nginx
container also watches the certificate directory and reloads itself after renewals.
Use the local compose file when you want Postgres, MinIO, broker, Nginx, and certbot on one machine:
cp .env.local.example .env
docker compose -f docker-compose.local.yml up -d --buildThe local broker uses the local MinIO root credentials for convenience. Production
should use the dedicated MINIO_ACCESS_KEY and MINIO_SECRET_KEY variables in
.env.example.
The broker now serves a minimal admin panel at /admin.
The nginx config currently restricts /admin to:
127.0.0.1::110.6.0.0/24192.168.50.0/24
Anything outside that VPN/local range receives 403 Forbidden before the login
page is reached.
Set these additional env vars in .env before rebuilding the broker:
ADMIN_USERNAME=admin
ADMIN_PASSWORD=replace-with-a-long-random-password
ADMIN_SESSION_SECRET=replace-with-a-separate-long-random-secret
ADMIN_SESSION_TTL_SECONDS=43200Then rebuild the broker and restart Nginx:
docker compose up -d --build broker-api nginxFor the local development stack, add -f docker-compose.local.yml to the command.
The first admin release supports:
- env-based login at
/admin/login - token creation and revocation
- collection create/edit
- dataset create/edit
- bulk import from a MinIO bucket/prefix into a collection
Bulk import behavior:
- target collection is required
- imported rows use the selected
classificationandvisibility storage_bucket,storage_key, andfile_size_bytesare populated from MinIO- title and slug are auto-generated from the object filename unless a close
storage_keymatch is found, in which case title and summary are copied from the existing dataset - existing catalog rows for the same bucket/object key are skipped
In production, MinIO runs on the local/internal storage server, not on the cloud
host. The broker reaches it through MINIO_ENDPOINT.
On the local storage server, MinIO stores object data on the host path defined by
MINIO_DATA_PATH. That path is mounted into the container as /data.
Example:
MINIO_DATA_PATH=/data/raid/minio
MINIO_BIND_ADDRESS=0.0.0.0
MINIO_API_PORT=9000
MINIO_CONSOLE_PORT=9001Forward or expose only the MinIO API port, 9000, and firewall it so the only
allowed source is the cloud host's static IP address. Keep the MinIO console,
9001, local-only or reachable through an SSH tunnel.
If you want to run just the production MinIO origin from this repo on the local
storage server, use the local compose file and start only minio:
cp .env.local.example .env
docker compose -f docker-compose.local.yml up -d minioFor local development, keep MinIO bound to loopback:
MINIO_BIND_ADDRESS=127.0.0.1
MINIO_API_PORT=9000
MINIO_CONSOLE_PORT=9001MINIO_BIND_ADDRESS controls which host interface exposes the MinIO S3 API and
console in the local stack.
Production MinIO traffic from the cloud broker to the local storage server must use HTTPS. Keep normal DNS in Lightsail, and delegate only the ACME challenge name to Route 53 for automatic DNS-01 renewal.
- In Route 53, create a public hosted zone:
_acme-challenge.minio.realgoodresearch.com
- Copy the hosted zone's four Route 53 name servers.
- In the Lightsail DNS zone for
realgoodresearch.com, add anNSrecord:
Record name: _acme-challenge.minio
Type: NS
Value: the four Route 53 name servers
- Create an IAM user for certbot and attach a policy scoped to that Route 53 hosted zone:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"route53:ListHostedZones",
"route53:GetChange"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": "route53:ChangeResourceRecordSets",
"Resource": "arn:aws:route53:::hostedzone/ROUTE53_ACME_ZONE_ID",
"Condition": {
"ForAllValues:StringEquals": {
"route53:ChangeResourceRecordSetsRecordTypes": ["TXT"]
}
}
}
]
}- On the local storage server, install certbot's Route 53 plugin and add the IAM
access key to
/root/.aws/credentials:
sudo apt install -y certbot python3-certbot-dns-route53
sudo install -m 0700 -d /root/.aws
sudo install -m 0600 /dev/null /root/.aws/credentials
sudoedit /root/.aws/credentials[default]
aws_access_key_id = replace-with-certbot-access-key
aws_secret_access_key = replace-with-certbot-secret-key- Issue the certificate and copy it into MinIO's cert directory:
sudo certbot certonly --dns-route53 -d minio.realgoodresearch.com
sudo install -m 0755 -d /home/realgooddata/minio-certs
sudo install -m 0644 /etc/letsencrypt/live/minio.realgoodresearch.com/fullchain.pem /home/realgooddata/minio-certs/public.crt
sudo install -m 0600 /etc/letsencrypt/live/minio.realgoodresearch.com/privkey.pem /home/realgooddata/minio-certs/private.key- On the local storage server, set:
MINIO_CERTS_PATH=/home/realgooddata/minio-certs
MINIO_HEALTHCHECK_URL=https://127.0.0.1:9000/minio/health/live- Restart local MinIO:
docker compose -f docker-compose.local.yml up -d --force-recreate minio- Add the renewal deploy hook:
sudo tee /etc/letsencrypt/renewal-hooks/deploy/reload-minio.sh >/dev/null <<'SH'
#!/bin/sh
set -eu
DOMAIN=minio.realgoodresearch.com
CERT_DIR=/home/realgooddata/minio-certs
REPO_DIR=/home/realgooddata
install -m 0644 "/etc/letsencrypt/live/${DOMAIN}/fullchain.pem" "${CERT_DIR}/public.crt"
install -m 0600 "/etc/letsencrypt/live/${DOMAIN}/privkey.pem" "${CERT_DIR}/private.key"
cd "${REPO_DIR}"
docker compose -f docker-compose.local.yml restart minio
SH
sudo chmod +x /etc/letsencrypt/renewal-hooks/deploy/reload-minio.sh
sudo certbot renew --dry-run- On the cloud host, set:
MINIO_ENDPOINT=https://minio.realgoodresearch.com:9000
MINIO_SECURE=trueThen restart the broker:
docker compose up -d --force-recreate broker-apiDo not use the MinIO root credentials in the cloud broker .env. Create a
dedicated broker user with a read-only policy instead.
On the local storage server, configure mc with the MinIO root credentials from
that server's .env:
mc alias set local http://127.0.0.1:9000 'your-minio-root-user' 'your-minio-root-password'Create or update the broker read-only policy from this repo:
mc admin policy create local broker-readonly minio/policies/broker-readonly.jsonCreate the broker user, attach the policy, and print the cloud .env values:
BROKER_SECRET="$(openssl rand -base64 36)"
mc admin user add local broker-service-account "$BROKER_SECRET"
mc admin policy attach local broker-readonly --user broker-service-account
printf 'MINIO_ACCESS_KEY=%s\n' 'broker-service-account'
printf 'MINIO_SECRET_KEY=%s\n' "$BROKER_SECRET"Then set those values on the cloud host:
MINIO_ACCESS_KEY=broker-service-account
MINIO_SECRET_KEY=replace-with-the-generated-secretThe default broker policy can list buckets and read objects across the MinIO deployment. If the portal should only serve specific buckets, replace the wildcard bucket resources in the policy with explicit bucket ARNs before creating or updating it.
In production, Postgres runs on the cloud host. Its data directory is controlled
by POSTGRES_DATA_PATH in .env.
Example:
POSTGRES_DATA_PATH=/data/raid/postgres
POSTGRES_BIND_ADDRESS=127.0.0.1
POSTGRES_PORT=5432
POSTGRES_SERVICE_PORT=5432POSTGRES_BIND_ADDRESS controls which host interface exposes PostgreSQL on the
cloud host. Keep it at 127.0.0.1 unless you have a specific reason to expose
Postgres beyond the host.
Example DBeaver connection settings:
- Host:
127.0.0.1through an SSH tunnel to the cloud host, or the host/interface that matchesPOSTGRES_BIND_ADDRESS - Port:
POSTGRES_PORT - Database:
POSTGRES_DB - Username:
POSTGRES_USER - Password:
POSTGRES_PASSWORD
Add one file per public hostname under nginx/conf.d/. For example, a dashboard
site can live at dashboard.realgoodresearch.com and proxy to a separate Docker
service while the data portal continues serving data.realgoodresearch.com.
See nginx/conf.d/dashboard.conf.example for a template.
Recommended order for a new subdomain:
- Point the new DNS record at this server.
- Run
./scripts/request-certificate.sh dashboard.realgoodresearch.com. - Copy the example config into
nginx/conf.d/and adjust the upstream service. - Run
docker compose restart nginx.
Postgres now stores collections, dataset records, tags, access tokens, and token grants. The broker treats MinIO bucket layout as an implementation detail.
Classification rules:
public: listed and downloadable without a tokenrestricted: listed for everyone, downloadable only with a token grantconfidential: listed for everyone, never downloadable via the public API
Token grant rules:
- Each
token_grantsrow is evaluated conjunctively. - If a row specifies both
bucketandclassification, both must match. - If a row specifies
bucket,classification, andkey_prefix, all three must match. dataset_idcan still be used for one-off exact dataset grants.
Collections are editorial containers only. Classification stays on each dataset, so a single collection can mix public, restricted, and confidential files.
The database bootstrap files live in postgres/initdb. On a fresh Postgres data directory they already include the current schema, including collection tags and dataset roles (data, documentation, visuals, GIS). A clean initialization creates:
collectionscollection_tagsdatasetsdataset_tagsaccess_tokenstoken_grants
Dataset timestamps:
created_at: auto-filled on insertupdated_at: auto-updated on each row changepublished_at: now defaults to insert time unless you set it explicitly
Schema policy:
- postgres/initdb is the canonical baseline for fresh databases.
- Before go-live, it is acceptable to rebuild the Postgres data directory and rely on
initdb/. - After go-live, every schema change should ship in two places:
- a new forward-only SQL file under
postgres/migrations/ - the updated canonical schema in
postgres/initdb/001_schema.sql
- a new forward-only SQL file under
- Fresh installs should initialize from
initdb/. Existing live databases should apply only the migration files created after they were initialized.
If postgres/migrations/ is currently empty, that is fine. Add new migration files there only for future post-launch schema changes.
The seed file inserts one example collection, four example datasets, and two example tokens for local testing only. These plaintext token values are public demo fixtures and must never be used for any real deployment:
partner-alpha-2026-rotate-mepartner-beta-2026-rotate-me
If you have not loaded real data yet, the cleanest way to pick up the latest schema is to rebuild the Postgres data directory from scratch.
- Stop the Postgres service:
docker compose stop postgres- Remove the existing Postgres container:
docker compose rm -f postgres- Delete the contents of the host directory referenced by
POSTGRES_DATA_PATHin your.env.
Example pattern:
rm -rf /path/from/POSTGRES_DATA_PATH/*Only do this if you are sure the database contains no real data you need to keep.
- Start Postgres again:
docker compose up -d postgres- Confirm initialization succeeded:
docker compose logs postgresOn first startup, Postgres should run the SQL files in postgres/initdb/.
- Rebuild the broker after the database is back:
docker compose up -d --build broker-apiFor the local development stack, add -f docker-compose.local.yml to each
docker compose command in this section.
GET /api/v1/collections
- No token required
- Optional
X-Access-Tokenheader - Returns one list of collection summaries with counts for public, restricted, and confidential files
Example:
curl https://data.realgoodresearch.com/api/v1/collections \
-kGET /api/v1/collections/{slug}
- Returns one collection with its README URL and the current access state of each dataset inside it
GET /api/v1/datasets/{slug}
- Returns one dataset with its current access decision for the caller
POST /api/v1/download-url
- Accepts a
dataset_id - Returns
allowed: falsefor restricted items without a matching token and for all confidential items
Example:
curl -X POST https://data.realgoodresearch.com/api/v1/download-url \
-H 'Content-Type: application/json' \
-H 'X-Access-Token: partner-alpha-2026-rotate-me' \
-d '{
"dataset_id": "22222222-2222-2222-2222-222222222222",
"download_filename": "briefing-apr-2026.xlsx"
}'JSON Schemas for the collection, catalog, and download endpoints live in broker-api/schemas.
The generated frontend/ directory is now treated as a build artifact and does
not need to be committed. The Quarto source of truth lives under
site.
The hand-served files in frontend/ are produced by the Quarto-built site
defined under site.
The Quarto source mirrors the typography and navigation style used in the main
Real Good Research docs site, while the browser-side collection logic stays in:
site/assets/catalog.jsfor the collection search pagesite/assets/collection-detail.jsfor the collection detail page
The Quarto project is configured to render directly into frontend/:
cd data-portal/site
quarto renderRender directly in place. Do not rename or replace the frontend/ directory
while Nginx is running, or the bind mount can point at a stale empty directory
and return 403.
Safe workflow:
cd data-portal/site
quarto render
docker compose -f ../docker-compose.yml up -d --force-recreate nginxOn a production Ubuntu host bootstrapped with
scripts/install-ubuntu-dependencies.sh, Quarto is installed by default.