-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
Description
Imagine using Kedro for running pipelines both locally and remote, with mock and warehouse data.
Development takes place on mock data (shared between the environments), but computational configuration differs (e.g. spark local vs cluster mode). On the warehouse data, again the data definitions are shared, but some properties vary, e.g. database names.
The two shared configurations above make that the configuration could be far more concise if we would be able to inherit from multiple environments:
graph TD;
base["Base (base)"];
dev["Dev"];
local["Local (default)"];
remote["Remote"];
cluster["Data Warehouse"];
acc["Acceptance"];
prd["Production"];
base --> dev --> local;
dev --> remote;
base --> cluster --> acc;
cluster --> prd;
The bottom four environments are used by the Kedro user.
Alternatively, we could reduce the number of environments:
graph TD;
base["Base (base)"];
dev["Dev + Remote"];
local["Local (default)"];
cluster["Data Warehouse + Acceptance"];
prd["Production"];
base --> dev --> local;
base --> cluster --> prd;
This relies on the local environment overwriting the configuration set for remote in the combined dev + remote env.
Is this something that could be supported?
Context
As mentioned above, this would simplify the config and avoid duplication of entries.
Possible Implementation
Configuration the base environment takes place in settings.py. The base_env argument could be simply accepting a dict as well as a string to specify the base per environment.
Possible Alternatives
Something might be achieved through the advanced features of the OmegaConfigLoader that I am not aware of.