Skip to content

Parameterized Queries #15

Description

@bryonjacob

Parameterized Queries

I'd like to propose a spec for how parameterized queries should be processed by the query endpoints - there's two parts to that.

  • How Parameter Values and Types are specified .
  • How Parameter Names and Values are provided on the URL.

Parameter Values and Types

for both SQL and SPARQL, I think we should support "simple", "safe", and "RDF" parameter values.

  • RDF parameters allow for complete precision in the same language the underlying query engine understands, at the cost of a pretty verbose and esoteric syntax. If you want total precision, use this syntax.
  • "safe" parameters allow you to be specific about String/URI parameters by wrapping values in "" or <> - definitely a necessity for user-entered content. If you're building an SDK or integration, this helps make sure that you can be precise about types without the loss of readability in RDF types.
  • "simple" parameters mean that we'll do the right thing with most values, defaulting to String where we can't make a better guess. This maximizes the chance that ad-hoc queries will return results when the user's meaning is clear.

to parse values, here is the algorithm:

try "RDF" parameters:

if value matches /^"(.*)"\^\^<([^<>]*)>$/ :

  "abcdef"^^<http://www.w3.org/2001/XMLSchema#string>
  "3"^^<http://www.w3.org/2001/XMLSchema#integer>
  "4.2"^^<http://www.w3.org/2001/XMLSchema#decimal>
  "true"^^<http://www.w3.org/2001/XMLSchema#boolean>

(matches two "groups" - the string type value and the URI of the type)

"Safe" parameters:

if value matches /^"(.*)"$/ :

  "abcdef"                                 <- String
  "3"                                      <- String
  "4.2"                                    <- String
  "true"                                   <- String
  "https://data.world/"                    <- String

(matches one group - the string value)

if value matches /^<(.*)>$/ :

  <https://data.world/>                    <- URI
  <abcdef>                                 <- URI
  <3>                                      <- URI

(matches one group - the URI)

"Simple" parameters:

if value matches /^([0-9]+)$/ :

  3                                        <- Integer

if value matches /^([0-9]*[.][0-9]+)$/ :

  4.2                                      <- Decimal

if value matches /^(true|false)$/ :

  true                                     <- Boolean

if value matches /^([a-z]+:\/\/.*)$/ :

  https://data.world/                      <- URI

(all of the above match one group - the value to interpret as Integer/Decimal/Boolean/URI)

otherwise :

  abcdef                                   <- String

(just treat the whole value as a String if nothing else matches)

Parameter Names and Values

For SPARQL:

SPARQL supports named parameters, and parameters in queries can be specified either as ?var or $var - it's a very common convention to use ?var for variables that are meant to be matched and $var for variables that are bound to the query execution. Because of that, using the $ syntax as query string parameters is a common way to pass bound variables on a HTTP URL. No reason we shouldn't use that syntax here:

  .../sparql/user/dataset?query=<QUERY>&$var1=<VALUE1>&$var2=<VALUE2>

where and are values according to the spec above

For SQL:

SQL only supports positional parameters. Luckily, HTTP query parameters have a straightforward way to specify an arbitrary length sequence of values for a query parameter - simply repeat the same query parameter name, and multiple instances of that will be treated as a sequence of those values. I'm proposing that we use p for the name of our parameter variable (to keep the URLs nice and short), but could do param or parameter too:

  .../sql/user/dataset?query=<QUERY>&p=<VALUE1>&p=<VALUE2>

where, again, and are values according to the spec above

In both cases (SPARQL and SQL) the way we interpret values is identical. Clearly the values will need to be URL-encoded when actually sent on a URL (as with any value)...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions