Skip to content

Add canonical parsing of lexical forms #8

@wouterbeek

Description

@wouterbeek

A datatype defines its own functions, mapping the lexical form to the value space, and mapping values from the value space to the lexical space. Oftentimes, there is a canonical variant of the latter, which allows values to be serialized in one, canonical way. It would be great if Serd would allow literals to be parsed in a way that guarantees that the lexical forms of some common datatypes are written down in a canonical way.

Here are some examples of non-canonical lexical forms and their canonical variants:

"01"^^xsd:int ⇒ "1"^^xsd:int
"0.0001"^^xsd:double ⇒ "0.1E-3"^^xsd:double
"1"^^xsd:boolean ⇒ "true"^^xsd:boolean

Canonical lexical forms are preferred over non-canonical ones, because (1) they allow value equality to be determined simply by comparing lexical form equality, and (2) they allow the relative ordering of many values (i.e., < and >) to be determined based on a simple comparison of their lexical forms.

There are many use cases in which these two properties have a profound impact. Two use cases of only the former property are (1) SPARQL filter expressions that check for literal value equality, and (2) graph navigation where the nodes "1"^^xsd:boolean and "true"^^xsd:boolean should be treated as the same node.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions