Skip to content

Refactor loaders module#2719

Open
evansd wants to merge 15 commits intomainfrom
evansd/refactor-loaders
Open

Refactor loaders module#2719
evansd wants to merge 15 commits intomainfrom
evansd/refactor-loaders

Conversation

@evansd
Copy link
Contributor

@evansd evansd commented Mar 17, 2026

This refactors some of the code for loading user-supplied modules to use a single ModuleDetails type to represent all the values we might care about in one of these modules, including potential errors. The aim is to end up with more consistent handling of these such that we can have more features that can work with either datasets or measures (see #2313).

We recently added measures support to create-dummy-tables but this was made unnecessarily difficult by the way loaders were implemented. This refactoring aims to change that.

We also treat running user-supplied modules in "debug" mode as a separate task to serializing their contents which removes some incidental complexity.

@evansd evansd changed the title Evansd/refactor loaders Refactor loaders module Mar 17, 2026
@evansd evansd force-pushed the evansd/refactor-loaders branch from 3ab2394 to 41cde8b Compare March 17, 2026 17:56
@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Mar 17, 2026

Deploying databuilder-docs with  Cloudflare Pages  Cloudflare Pages

Latest commit: 0fd99bc
Status: ✅  Deploy successful!
Preview URL: https://7476cbf1.databuilder.pages.dev
Branch Preview URL: https://evansd-refactor-loaders.databuilder.pages.dev

View logs

This pulls a generic `run_ehrql_command_in_subprocess()` function out of
the more specific `load_definition_in_subprocess()` function.
evansd added 13 commits March 24, 2026 11:07
We're not trying to retrieve a value when running `debug`: we just want
to execute the Python and collect any output. Making this a separate
function removes special-casing elsewhere.
We can also simplify the argument handling here by using `choices`.
This is designed to represent all the various values we may be
interested in from a user-supplied definition module.

We need to support serializing errors because the pre-serialization code
will no longer know which kinds of error the calling code cares about.
Instead of having specialised loaders for each definition type we now
have a single function which loads the module, calls various
`populate_*_details` functions to grab potentailly relevant attributes
from the module, and returns a `ModuleDetails` object which collects all
these attributes.

This `ModuleDetails` object can then be serialized, passed back to the
parent process and it can then decide what details it cares about.

Note that we now return exceptions rather than raising them immediately.
For example, if we are loading measure definitions we don't care if
`dataset` doesn't have a population defined yet. But we can't just go
ahead and serialize it because it's not possible to construct a
serialized dataset without a population definition. So we return the
error and let the caller decide whether to raise it or not.

To make the change easier to read we continue passing around now useless
`definition_type` arguments. We'll remove these in a later commit.
The serializer, and functions upstream of it, no longer care what type
of definition module they're working with.
The old name made it sound like is was loading a particular "debug" type
of definition, whereas it's not really loading anything (as it doesn't
return a value) it's running an arbitrary definition in a particular
mode.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant