Skip to content

Standardise method for referencing specific resource inside datapackage #373

@akariv

Description

@akariv

The goal here is to have a URI that could point to a specific resource inside a datapackage.
We already have a standard way for identifying a datapackage: http://specs.frictionlessdata.io/data-package-identifier/.

This proposal suggests to add a means for referencing the data file contained inside the data package.

Implementation options (not mutually exclusive):

  • Using a JSON pointer notation:
    <datapackage-identifier>#/resources/<resource-index>/data
    Examples:
    • http://mywebsite.com/mydatapackage/datapackage.json#/resources/0/data
    • http://mywebsite.com/mydatapackage/#/resources/1/data
    • http://github.com/datasets/gold-prices#/resources/2/data
    • gold-prices#/resources/3/data
  • Using the resource name:
    <datapackage-identifier>#<resource-name>
    Since resources is an array, you can't reference a resource by its name - unless we start using stronger pointing mechanisms such as XPath (and we shouldn't...)
    Examples:
    • http://mywebsite.com/mydatapackage/datapackage.json#my-lovely-resource
    • http://mywebsite.com/mydatapackage/#finances-2012-q3
    • http://github.com/datasets/gold-prices#all-data
    • gold-prices#all-data

These two options are mutually exclusive as a resource name cannot start with a / (it's a slug)

Implementors might use this notation in the following ways:

  • For basic datapackages, this URI might redirect to the URL of the actual data file of the resource. If the data is inline, it should resolve to an application/json download of that part of the datapackage descriptor.
  • For tabular datapackages, a supporting library might return for this URI an iterator over all the rows in the data.

Thoughts:
This is important since right now there's no way to provide a stable link to a specific data file inside a datapackage. This led me to think whether we wanted to provide a means for having a stable link to a specific row inside a tabular datapackage? Perhaps even a specific field?
This is also important, as (for example), in case you wanted to substantiate a specific claim ('The budget for NHS was £350M in 2016') you could have a single URI pointing to that specific number.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Done

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions