Skip to content

Feat: support a stable representation for geometry datatypes #7392

@NickCrews

Description

@NickCrews

Is your feature request related to a problem? Please describe.

sqlglot currently parses the type parameters of GEOMETRY datatypes opaquely, leaving them just as found, not converting to a uniform format. This means that when we work with the datatype expression, we aren't able to know which DataTypeParam represents what in the expression:

import sqlglot
import sqlglot.expressions as sge

duckdb = "GEOMETRY('EPSG:4269')"
e = sqlglot.parse_one(duckdb, read="duckdb", into=sge.DataType)
print(repr(e))
print("duckdb")
print(e.sql(dialect="duckdb"))
print("postgres")
print(e.sql(dialect="postgres"))

print()
print("---")
print()

postgres = "geography(POINT,4269)"
e = sqlglot.parse_one(postgres, read="postgres", into=sge.DataType)
print(repr(e))
print("duckdb")
print(e.sql(dialect="duckdb"))
print("postgres")
print(e.sql(dialect="postgres"))

Gives:

DataType(
  this=DType.GEOMETRY,
  expressions=[
    DataTypeParam(
      this=Literal(this='EPSG:4269', is_string=True))],
  nested=False)
duckdb
GEOMETRY('EPSG:4269')
postgres
GEOMETRY('EPSG:4269')

---

DataType(
  this=DType.GEOGRAPHY,
  expressions=[
    DataTypeParam(
      this=Var(this=POINT)),
    DataTypeParam(
      this=Literal(this=4269, is_string=False))],
  nested=False)
duckdb
GEOGRAPHY(POINT, 4269)
postgres
GEOGRAPHY(POINT, 4269)

This appeared in ibis-project/ibis#11978. In particular, when ibis tries to convert a sqlglot datatype to an ibis datatype, we would need to know the dialect it is coming from. This is POSSIBLE to do on the ibis side, but if it was solved here in sqlglot, then the above transpilation would work in vanilla sqlglot (which would be good for your users) and it would be much simpler to implement in ibis since we wouldn't need to branch based on the dialect.

Describe the solution you'd like
Some stable contract where, when handed a sge.DataType with a typecode of GEOMETRY, I know how to extract out the shapecode (eg POINT, LINE, ...) and the SRID (eg "EPSG:4269") WITHOUT needing to know which dialect it came from.

Similarly, when CONSTRUCTING a sge.DataType, there is a stable contract of where I should put the shapecode and SRID such that sqlglot will compile it to the right thing for each dialect.

Describe alternatives you've considered
Do nothing. We solve it on the ibis side by delegating based on the dialect. This means that your pure-sqlglot users wouldn't be able to get the above transpilation though.

Additional context
I have NOT done a survey of other SQL dialects to see what sort of parameterization they support in their geography types. But that probably would be a pre-requisite if we wanted to design one IR here that worked for all dialects.

dialect params example link
bigquery GEOGRAPHY https://docs.cloud.google.com/bigquery/docs/reference/standard-sql/data-types#geography_type
postgres shapecode, numeric srid GEOGRAPHY(POINT, 4269)
postgres namespaced srid GEOMETRY('EPSG:4269')

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions