Skip to content

Comments

No-fuss DB connection configuration via libpq PG* environment variables#1748

Open
brontolosone wants to merge 7 commits intogetodk:masterfrom
brontolosone:db-unconfig
Open

No-fuss DB connection configuration via libpq PG* environment variables#1748
brontolosone wants to merge 7 commits intogetodk:masterfrom
brontolosone:db-unconfig

Conversation

@brontolosone
Copy link
Contributor

@brontolosone brontolosone commented Feb 11, 2026

Related

Motivation, background

Our code / config / knex (+deptree) / slonik (+deptree), they're all miming to eachother about DB connection configuration. It's easy to forget that *eventually*, that information just needs to be conveyed to `libpq` — the *actual* bedrock database driver. Not one of these middlemen inherently cares about the specifics, it's like a bucket brigade of passing & mangling information; but in the end we just want a DB connection.

That whole wrapper mille-feuille is rife with misconceptions (eg, how SSL works / should be conveyed), incomplete exposure of connection features (eg, TCP connection timeouts), and incompatibilities (again, SSL, and also Unix domain sockets).

But it just so happens that libpq (the thing that actually does the thing)
is designed so that one can talk directly to it to configure its connections. This works with a set of environment variables that libpq reads from the process environment. It will use that information unless in the connection setup call to libpq something else is explicitly configured. In the documentation it says about these env vars:

These are useful to avoid hard-coding database connection information into simple client applications, for example.

And indeed one can imagine that it is nice to be able to write a client application without having to re-expose the complete postgres connection parameter interface.

So if we adopt that configuration mechanism (which means: we get out of the way!), we can stop worrying about what to support, what configuration interface to present to users, and how to convey things to knex/slonik in their idiosyncratic way (if even supported). 🥳

Advantages would be:

  1. The complete configuration interface can be understood from the documentation straight from the people who make the thing that does the actual thing; no more digging through issues in half-abandoned NPM projects to understand why something doesn't work or how to coax one lib (but then coax the other lib slightly differently). 🥳 Less ambiguity, simpler support.
  2. It's easy for people to test connection setups, without having to fiddle with ODK. One can just set the appropriate environment variables and start psql. If you get a connection, you did it right, if you don't, you did it wrong (and probably there'll be some output from the tool hinting at what's wrong). psql, libpq, postgresql are all from the same vendor; they're not miming to eachother, there's just so much less ambiguity around why something isn't working. It also makes it easy for a sysadmin to communicate connection details and support their users, because of that uniform configuration interface.

Approach

  1. Get out of the way. Don't supply any connection information directly.
  2. Preserve the legacy interface for setting DB connection info. Our legacy DB config interface is through the use of the config module, and its cascading inheritance. Initially I toyed a bit with sourcing different files with different environment variables (eg, through the Makefile) depending on whether we're running tests or not, but I didn't quite get good DX this way, plus I think that getting used to a new config interface would be an impediment to getting this PR in. Thus, I wrapped (oh no! wrapping! interpretation) that interface, one can still set the same limited set of db config variables that way, but they'll go into env vars rather than into connection setup call parameters. So everything still works! Yet, one can also override things throug environment variables, eg if one runs PGDATABASE=someotherdatabase make dev, then that's what you'll get, it won't be overridden with what's in config. Example:
    PGDATABASE=staging_odkcentral make dev
    node lib/bin/enforce-node-version.js
    node lib/bin/run-migrations.js
    Warning: Environment variable "PGDATABASE" already set, with value: "staging_odkcentral".
    Not applying DB configuration of "database: odkcentral"
    
    Here we're warned that the value from the config (odkcentral in this case) is not being applied, because we've explicitly set PGDATABASE. So you'll be made aware of what's going on. No surprises. You're going to connect to the staging_odkcentral database.
    As an example of the flexibility, my own default.database config in config/local.json is, with this PR:
    "database": {
      "host": "/run/postgresql",
      "database": "odkcentral",
      "user": "",
      "password": "",
    }
    which gives me a peer-authenticated Unix socket connection, with my DB user the same as my Unix user. I need to blank the user and password field, otherwise I'd get the values from the higher-up config precedence waterfall (default.json etc).
    The "ssl": true config fragment is also supported, but I thought it appropriate to print a warning, as I don't feel doing SSL without authenticating the other party is on par with the warm feeling of safety that "ssl": true instills. So you'll see this helpful warning:
    Warning: Database SSL mode set to "require", which means that while the connection will be *encrypted*, it will not be *authenticated*.
    However, you can set up strong authentication through the use of libpq environment variables:
    https://www.postgresql.org/docs/current/libpq-envars.html#LIBPQ-ENVARS                                                                                                        
    to configure server certificate verification:                                     
    https://www.postgresql.org/docs/current/libpq-ssl.html#LIBQ-SSL-CERTIFICATES.
    
    For this config-sourcing mechanism, for which we have to do some interpretation, the guiding principle is: in case of ambiguity, crash or warn.
  3. Now for the Docker setup (central repo), a PR is here. Via that Central branch, this code is also running on dev.getodk.cloud at the moment.

What has been done to verify that this works as intended?

Tests still pass, but see note at the top re #1746 .

Why is this the best possible solution? Were any other approaches considered?

Don't get me started 😆 . See background/rationale at the top of this PR.

How does this change affect users? Describe intentional changes to behavior and behavior that could have accidentally been affected by code changes. In other words, what are the regression risks?

For all quotidian setups, there should be no regression, legacy config will continue to work. But, if people have been patching code to do workarounds to get certain connection parameters in that we didn't support, there may be surprises, hard to predict 🤷‍♂️

Does this change require updates to the API documentation? If so, please update docs/api.yaml as part of this PR.

Not to the API documentation.
But the central install howto, and the .env.example file, should be streamlined with this code. It could also be useful to zombiepost into the various forum posts where people have been asking about DB configuration, so that newcomers searching for info will find the uptodate New Way of Configuring the Database Any Way You Like.

Before submitting this PR, please make sure you have:

  • run make test and confirmed all checks still pass OR confirm CircleCI build passes
  • verified that any code from external sources are properly credited in comments or that everything is internally sourced

@brontolosone brontolosone changed the title Db unconfig No-fuss DB connection configuration via libpq PG* environment variablesconfig Feb 11, 2026
@brontolosone brontolosone changed the title No-fuss DB connection configuration via libpq PG* environment variablesconfig No-fuss DB connection configuration via libpq PG* environment variables Feb 11, 2026
@brontolosone brontolosone force-pushed the db-unconfig branch 2 times, most recently from 9a7bb60 to 088afaf Compare February 11, 2026 18:03
@brontolosone brontolosone moved this to ✏️ in progress in ODK Central Feb 11, 2026
@brontolosone brontolosone marked this pull request as ready for review February 11, 2026 18:48
@brontolosone brontolosone force-pushed the db-unconfig branch 2 times, most recently from 007711a to 3d75b9b Compare February 11, 2026 19:22
@brontolosone brontolosone force-pushed the db-unconfig branch 4 times, most recently from 6885630 to c0be247 Compare February 12, 2026 09:11
@brontolosone brontolosone marked this pull request as draft February 13, 2026 07:16
@brontolosone brontolosone marked this pull request as ready for review February 14, 2026 12:36
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious whether you considered adding the function here to the existing lib/util/db.js? That file has been getting kind of big, so I'm open to splitting it up like this, but I'd be curious to learn more about your thinking in this area.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like files as an organizational unit, and they're pretty much free, well at least for me they are, df -i tells me I'm not nearly out of inodes yet ;-)

  • Yes this functionality is related to the DB, in a sense, but lib/util/db.js by and large has to do with SQL templating and pseudo-ORM functions, while this new configuration functionality isn't.
  • This new load-db-env module often needs to be loaded early on, and for that my instinct is that it should be small, simple, and have minimal dependencies (also to avoid cyclic dependencies)
  • everything in ODK Central backend is related to ODK, maybe we should put everything in one file odk.js, it'd be perfectly organized 🤡
  • one thing you may have noticed as well is that in this codebase there are quite a few large (to my taste) modules that are informally sectioned using quite decorative header comments, eg in lib/util/db.js we have:
    ////////////////////////////////////////////////////////////////////////////////
    // SLONIK UTIL
    
    which to me is a faint code smell, I feel one should use the language's formal modularization features for modularization, not comments, which impose a cognitive load on human readers. Since I don't like that smell I err on the side of making a module rather than following the (anti-?) pattern of cordoning off a "module section" with elaborate decorative comments as section headers (it even has its own typographical mini-style) within a module.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Additional SSL configuration for database

3 participants