-
Notifications
You must be signed in to change notification settings - Fork 11
Description
Currently the extension schemata are a mix of self-contained files (like file and label) and schema requiring arbitrary URI resolution (like tiled assets and card4l). If we use remote references in the published schemata, we expose ourselves to two kinds of risk:
- Failure to read a URI can happen for way more reasons. The URI could be behind an authenticated endpoint. The server could be down. Someone could have replaced the content at the URI with something else by accident or maliciously. These risks multiply with each URI we have to read. In the authenticated case, open source servers like Franklin and stac-fastapi would need some way to authenticate. Figuring out how to provide that is hard. (This is still a problem with self-contained extensions, but less so, because there are fewer links.)
- Deep / wide trees of refs increase the latency for validating an item. For example,
tiled-assetsreferences the item schema, which references remote schemata for geojson features (by url), basics, datetime, instrument, licensing, and provider (by relative path), and the catalog schema, which references the catalog-core schema. So to take one JSON item and validate it against thetiled-assetsextension (the first time -- obviously these things can be cached), I have to make ten http requests.
Additionally, there are varying degrees of JSON schema remote $ref support in common languages used for STAC:
- Everit (Java) which backs
circe-json-schema(Scala) desires to read refs as file paths - Ajv (JS) allows providing an arbitrary loading function but the link explaining the option 404s. This shifts the complexity onto the user, who is responsible for correctly interpreting each ref.
- JSON Schema in python makes some guesses about what kind of ref you have and attempts to resolve
- I don't know anything about C#, PHP, or R support, notes welcome.
The cost of doing away with remote refs everywhere is duplication and no more inheritance. That's a pretty hefty cost, which is why I'm only proposing that published schemata be self-contained. In particular:
- the repository versions of the schema can still refer to whatever they want, but
- the template should have node scripts for inlining all schema referenced
The benefits of inlining will be that any language with a tool that can load a JSON schema from JSON will be equally supported for STAC tooling work, and servers won't have to do as much work the first time they see a schema URL.