Proposal: published extension schema should be self-contained

Currently the extension schemata are a mix of self-contained files (like [file](https://github.com/stac-extensions/file/blob/main/json-schema/schema.json) and [label](https://github.com/stac-extensions/label/blob/main/json-schema/schema.json)) and schema requiring arbitrary URI resolution (like [tiled assets](https://github.com/stac-extensions/tiled-assets/blob/main/json-schema/schema.json) and [card4l](https://github.com/stac-extensions/card4l/blob/main/sar/json-schema/source.json)). If we use remote references in the published schemata, we expose ourselves to two kinds of risk:

1. Failure to read a URI can happen for way more reasons. The URI could be behind an authenticated endpoint. The server could be down. Someone could have replaced the content at the URI with something else by accident or maliciously. These risks multiply with each URI we have to read. In the authenticated case, open source servers like Franklin and stac-fastapi would need some way to authenticate. Figuring out how to provide that is _hard_. (This is still a problem with self-contained extensions, but less so, because there are fewer links.)
2. Deep / wide trees of refs increase the latency for validating an item. For example, `tiled-assets` references the item schema, which references remote schemata for geojson features (by url), basics, datetime, instrument, licensing, and provider (by relative path), and the catalog schema, which references the catalog-core schema. So to take one JSON item and validate it against the `tiled-assets` extension (the first time -- obviously these things can be cached), I have to make **ten** http requests.

Additionally, there are varying degrees of JSON schema remote $ref support in common languages used for STAC:

- Everit (Java) which backs `circe-json-schema` (Scala) desires to read refs as file paths
- Ajv (JS) allows providing an [arbitrary loading function](https://ajv.js.org/guide/managing-schemas.html#asynchronous-schema-loading) but the link explaining the option [404s](https://ajv.js.org/guide/api.html#options). This shifts the complexity onto the user, who is responsible for correctly interpreting each ref.
- JSON Schema in python [makes some guesses about what kind of ref you have](https://github.com/Julian/jsonschema/blob/833b6a9562487b4c46cd5640de5e2ae074c08796/jsonschema/validators.py#L806-L855) and attempts to resolve
- I don't know anything about C#, PHP, or R support, notes welcome.

The cost of doing away with remote refs everywhere is _duplication_ and no more inheritance. That's a pretty hefty cost, which is why I'm only proposing that _published schemata_ be self-contained. In particular:

- the repository versions of the schema can still refer to whatever they want, but
- the template should have node scripts for inlining all schema referenced

The benefits of inlining will be that any language with a tool that can load a JSON schema from JSON will be equally supported for STAC tooling work, and servers won't have to do as much work the first time they see a schema URL.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: published extension schema should be self-contained #5

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Proposal: published extension schema should be self-contained #5

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions