Parameterized Queries
I'd like to propose a spec for how parameterized queries should be processed by the query endpoints - there's two parts to that.
- How Parameter Values and Types are specified .
- How Parameter Names and Values are provided on the URL.
Parameter Values and Types
for both SQL and SPARQL, I think we should support "simple", "safe", and "RDF" parameter values.
- RDF parameters allow for complete precision in the same language the underlying query engine understands, at the cost of a pretty verbose and esoteric syntax. If you want total precision, use this syntax.
- "safe" parameters allow you to be specific about String/URI parameters by wrapping values in "" or <> - definitely a necessity for user-entered content. If you're building an SDK or integration, this helps make sure that you can be precise about types without the loss of readability in RDF types.
- "simple" parameters mean that we'll do the right thing with most values, defaulting to String where we can't make a better guess. This maximizes the chance that ad-hoc queries will return results when the user's meaning is clear.
to parse values, here is the algorithm:
try "RDF" parameters:
if value matches /^"(.*)"\^\^<([^<>]*)>$/ :
"abcdef"^^<http://www.w3.org/2001/XMLSchema#string>
"3"^^<http://www.w3.org/2001/XMLSchema#integer>
"4.2"^^<http://www.w3.org/2001/XMLSchema#decimal>
"true"^^<http://www.w3.org/2001/XMLSchema#boolean>
(matches two "groups" - the string type value and the URI of the type)
"Safe" parameters:
if value matches /^"(.*)"$/ :
"abcdef" <- String
"3" <- String
"4.2" <- String
"true" <- String
"https://data.world/" <- String
(matches one group - the string value)
if value matches /^<(.*)>$/ :
<https://data.world/> <- URI
<abcdef> <- URI
<3> <- URI
(matches one group - the URI)
"Simple" parameters:
if value matches /^([0-9]+)$/ :
if value matches /^([0-9]*[.][0-9]+)$/ :
if value matches /^(true|false)$/ :
if value matches /^([a-z]+:\/\/.*)$/ :
https://data.world/ <- URI
(all of the above match one group - the value to interpret as Integer/Decimal/Boolean/URI)
otherwise :
(just treat the whole value as a String if nothing else matches)
Parameter Names and Values
For SPARQL:
SPARQL supports named parameters, and parameters in queries can be specified either as ?var or $var - it's a very common convention to use ?var for variables that are meant to be matched and $var for variables that are bound to the query execution. Because of that, using the $ syntax as query string parameters is a common way to pass bound variables on a HTTP URL. No reason we shouldn't use that syntax here:
.../sparql/user/dataset?query=<QUERY>&$var1=<VALUE1>&$var2=<VALUE2>
where and are values according to the spec above
For SQL:
SQL only supports positional parameters. Luckily, HTTP query parameters have a straightforward way to specify an arbitrary length sequence of values for a query parameter - simply repeat the same query parameter name, and multiple instances of that will be treated as a sequence of those values. I'm proposing that we use p for the name of our parameter variable (to keep the URLs nice and short), but could do param or parameter too:
.../sql/user/dataset?query=<QUERY>&p=<VALUE1>&p=<VALUE2>
where, again, and are values according to the spec above
In both cases (SPARQL and SQL) the way we interpret values is identical. Clearly the values will need to be URL-encoded when actually sent on a URL (as with any value)...
Parameterized Queries
I'd like to propose a spec for how parameterized queries should be processed by the query endpoints - there's two parts to that.
Parameter Values and Types
for both SQL and SPARQL, I think we should support "simple", "safe", and "RDF" parameter values.
to parse values, here is the algorithm:
try "RDF" parameters:
if value matches
/^"(.*)"\^\^<([^<>]*)>$/:(matches two "groups" - the string type value and the URI of the type)
"Safe" parameters:
if value matches
/^"(.*)"$/:(matches one group - the string value)
if value matches
/^<(.*)>$/:(matches one group - the URI)
"Simple" parameters:
if value matches
/^([0-9]+)$/:if value matches
/^([0-9]*[.][0-9]+)$/:if value matches
/^(true|false)$/:if value matches
/^([a-z]+:\/\/.*)$/:(all of the above match one group - the value to interpret as Integer/Decimal/Boolean/URI)
otherwise :
(just treat the whole value as a String if nothing else matches)
Parameter Names and Values
For SPARQL:
SPARQL supports named parameters, and parameters in queries can be specified either as
?varor$var- it's a very common convention to use?varfor variables that are meant to be matched and$varfor variables that are bound to the query execution. Because of that, using the$syntax as query string parameters is a common way to pass bound variables on a HTTP URL. No reason we shouldn't use that syntax here:where and are values according to the spec above
For SQL:
SQL only supports positional parameters. Luckily, HTTP query parameters have a straightforward way to specify an arbitrary length sequence of values for a query parameter - simply repeat the same query parameter name, and multiple instances of that will be treated as a sequence of those values. I'm proposing that we use
pfor the name of our parameter variable (to keep the URLs nice and short), but could doparamorparametertoo:where, again, and are values according to the spec above
In both cases (SPARQL and SQL) the way we interpret values is identical. Clearly the values will need to be URL-encoded when actually sent on a URL (as with any value)...