-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Feat: support a stable representation for geometry datatypes #7392
Description
Is your feature request related to a problem? Please describe.
sqlglot currently parses the type parameters of GEOMETRY datatypes opaquely, leaving them just as found, not converting to a uniform format. This means that when we work with the datatype expression, we aren't able to know which DataTypeParam represents what in the expression:
import sqlglot
import sqlglot.expressions as sge
duckdb = "GEOMETRY('EPSG:4269')"
e = sqlglot.parse_one(duckdb, read="duckdb", into=sge.DataType)
print(repr(e))
print("duckdb")
print(e.sql(dialect="duckdb"))
print("postgres")
print(e.sql(dialect="postgres"))
print()
print("---")
print()
postgres = "geography(POINT,4269)"
e = sqlglot.parse_one(postgres, read="postgres", into=sge.DataType)
print(repr(e))
print("duckdb")
print(e.sql(dialect="duckdb"))
print("postgres")
print(e.sql(dialect="postgres"))Gives:
DataType(
this=DType.GEOMETRY,
expressions=[
DataTypeParam(
this=Literal(this='EPSG:4269', is_string=True))],
nested=False)
duckdb
GEOMETRY('EPSG:4269')
postgres
GEOMETRY('EPSG:4269')
---
DataType(
this=DType.GEOGRAPHY,
expressions=[
DataTypeParam(
this=Var(this=POINT)),
DataTypeParam(
this=Literal(this=4269, is_string=False))],
nested=False)
duckdb
GEOGRAPHY(POINT, 4269)
postgres
GEOGRAPHY(POINT, 4269)
This appeared in ibis-project/ibis#11978. In particular, when ibis tries to convert a sqlglot datatype to an ibis datatype, we would need to know the dialect it is coming from. This is POSSIBLE to do on the ibis side, but if it was solved here in sqlglot, then the above transpilation would work in vanilla sqlglot (which would be good for your users) and it would be much simpler to implement in ibis since we wouldn't need to branch based on the dialect.
Describe the solution you'd like
Some stable contract where, when handed a sge.DataType with a typecode of GEOMETRY, I know how to extract out the shapecode (eg POINT, LINE, ...) and the SRID (eg "EPSG:4269") WITHOUT needing to know which dialect it came from.
Similarly, when CONSTRUCTING a sge.DataType, there is a stable contract of where I should put the shapecode and SRID such that sqlglot will compile it to the right thing for each dialect.
Describe alternatives you've considered
Do nothing. We solve it on the ibis side by delegating based on the dialect. This means that your pure-sqlglot users wouldn't be able to get the above transpilation though.
Additional context
I have NOT done a survey of other SQL dialects to see what sort of parameterization they support in their geography types. But that probably would be a pre-requisite if we wanted to design one IR here that worked for all dialects.
| dialect | params | example | link |
|---|---|---|---|
| bigquery | GEOGRAPHY | https://docs.cloud.google.com/bigquery/docs/reference/standard-sql/data-types#geography_type | |
| postgres | shapecode, numeric srid | GEOGRAPHY(POINT, 4269) | |
| postgres | namespaced srid | GEOMETRY('EPSG:4269') |