Skip to content

richarah/libsqlglot

Repository files navigation

libsqlglot

What this is

sqlglot, in C++. 45 dialects, 126× faster on benchmark SQL, 252× on the kind your ORM generates when nobody's looking. Performance gap scales with query complexity, see Benchmarks.

Supports stored procedures (PL/pgSQL, T-SQL, MySQL, PL/SQL): where sqlglot falls back to passthrough, libsqlglot parses them into the AST.

What it's for

Anywhere SQL parsing meets the hot path: proxies, sidecars, migration, linters, Git pre-commit hooks, CI/CD pipelines. Also a replacement for every regex that's pretending to be a parser.

Why this exists

Because life is too short to parse SQL in Python.

Inspired by the original sqlglot, which did the decade-long work of mapping 31+ SQL dialects into an elegant, universal AST. libsqlglot does the comparatively trivial work of compiling it: the algorithm was already O(n), the runtime wasn't.

Contents

Functionality

Transpiles SQL dialects via sqlglot AST. Full stored procedure support. Fail-fast errors with exact line and column. Python bindings available at 95-98% of C++ speed (import libsqlglot as sqlglot and go).

Handles the full SQL surface: SELECT, INSERT, UPDATE, DELETE, CREATE TABLE, ALTER TABLE, DROP TABLE, TRUNCATE, MERGE, plus stored procedures (CALL, RETURN, DECLARE, IF/ELSEIF/ELSE, WHILE, FOR loops). Also handles CTEs, window functions, subqueries, and various JOIN types.

Compatible with 45 dialects (see Supported SQL dialects for the full list).

Quickstart

C++

#include <libsqlglot/transpiler.h>

using namespace libsqlglot;

// Transpile between dialects (MySQL → PostgreSQL)
std::string output = Transpiler::transpile(
    "SELECT `id`, `name` FROM `users` LIMIT 10",
    Dialect::MySQL,
    Dialect::PostgreSQL
);
// Returns: SELECT "users"."id", "users"."name" FROM "users" LIMIT 10

// Parse SQL into AST
Arena arena;
auto stmt = Transpiler::parse(arena, "SELECT name FROM users WHERE age > 18");
// Returns: AST with SelectStmt node

// Optimise AST (column qualification, predicate pushdown, constant folding)
Transpiler::optimize(arena, stmt);
// Modifies AST: name → users.name, age → users.age

// Generate SQL from AST
std::string sql = Transpiler::generate(stmt, Dialect::PostgreSQL);
// Returns: SELECT users.name FROM users WHERE users.age > 18

See Supported SQL dialects for all available Dialect:: values.

Python

Available on PyPI: pip install libsqlglot

System requirements: Linux with glibc 2.35+ (Ubuntu 22.04+, Debian 12+, RHEL 9+)

import libsqlglot as sqlglot

# Transpile
sql = "SELECT `id`, `name` FROM `users` LIMIT 10"
result = sqlglot.transpile(sql, read="mysql", write="postgres")
result = sqlglot.transpile(sql, "mysql", "postgres") # Both styles work

# Parse
stmt = sqlglot.parse_one(sql) # single statement
stmts = sqlglot.parse("SELECT 1; SELECT 2") # multiple statements

# Generate
sql = stmt.sql() # default ANSI
sql = stmt.sql(dialect="postgres", pretty=True) # pretty-print
sql = stmt.sql(dialect="sqlserver") # TRUE → 1

# AST traverse
columns = stmt.find_all(sqlglot.ExprType.COLUMN)
tables = stmt.find_all(sqlglot.ExprType.TABLE_REF)
stmt.walk(lambda n: print(n.type))

# Optimise
optimized = sqlglot.optimize(stmt)

# Builder
stmt = (sqlglot.select(["id", "name"])
        .from_("users")
        .where("active = TRUE")
        .order_by("name")
        .limit(10))

# Diff
diff = sqlglot.diff("SELECT id FROM users", "SELECT user_id FROM users")

Unix Pipes

libsqlglot can be used in Unix pipelines via the Python CLI:

# Basic transpilation
echo "SELECT \`id\` FROM \`users\`" | python3 -m libsqlglot -r mysql -w postgres
# Output: SELECT "id" FROM "users"

# Pipeline with grep
cat queries.sql | python3 -m libsqlglot -r mysql -w bigquery | grep "SELECT"

# Process multiple files
cat *.sql | python3 -m libsqlglot -r sqlserver -w postgres > output.sql

# Combine with other tools
find . -name "*.sql" -exec cat {} \; | python3 -m libsqlglot -r mysql -w postgres | wc -l

The CLI reads SQL from stdin and writes transpiled SQL to stdout, making it composable with standard Unix tools.

See Supported SQL dialects for all available dialect names.

Python API: parse(), parse_one(), generate(), transpile(), optimize(), diff(), .sql(), .find_all(), .walk(), select() builder.

Performance: 95-98% of C++ speed. Overhead is ~125ns per call (10ns function call + 100ns string marshal + 15ns misc). On typical 2.5μs parse, this is 5% overhead. On complex 25μs parse, 0.5% overhead.

Memory: Thread-local arenas, no runtime dependencies.

Differences from original sqlglot

libsqlglot Python sqlglot
Performance 32-242× faster (126× avg) Baseline
Stored procedures Support for PL/pgSQL, T-SQL, MySQL, PL/SQL Limited (exp.Command passthrough)
Error handling Fail-fast with precise errors (line, column, context) Error recovery (IDE-friendly, slower)
Memory Arena allocation (O(1) cleanup) Garbage collection
Optimiser Column qualification, predicate pushdown, constant folding, subquery elimination Same + additional passes + full execution engine
Codebase Header-only C++26 library 50,000+ lines Python
Keywords C++26 reflection: auto-generated from enum (300+ keywords, zero maintenance) Manually maintained dictionaries
Binary C++ library 14KB, Python wheel 1.5MB N/A
Dialects 45 SQL dialects (14 unique to libsqlglot, including ANSI) 32 dialects (including PRQL, which libsqlglot doesn't support)
SQL coverage Same as Python sqlglot Full SQL support
Dependencies None (no runtime deps) None

Building

Requires C++26 (GCC 14+ with -freflection) and CMake 3.21+.

C++26 features used:

  • Keyword reflection (std::meta): Auto-generates 300+ keyword mappings from TokenType enum at compile time. Zero maintenance, impossible to desync.
  • Dialect reflection: Build-time code generation parses the Dialect enum and generates compile-time mappings (CMake → Python script → generated header). When GCC fixes the reflection bug, will switch to pure C++26 reflection.
  • ExprType bindings: Python bindings auto-generated from the C++ ExprType enum (124 values). Run python3 scripts/generate_expr_type_bindings.py > src/python/expr_type_bindings_generated.h after modifying expression.h.
  • Advanced constexpr: Perfect hash tables, compile-time string processing.

Docker (Recommended)

The easiest way to build with GCC trunk + reflection support:

# Build the project (first build takes 30-45 min to compile GCC trunk)
docker compose -f docker/docker-compose.yml run --rm build

# Run tests
docker compose -f docker/docker-compose.yml run --rm test

# Build Python wheel
docker compose -f docker/docker-compose.yml run --rm wheel

# Development shell
docker compose -f docker/docker-compose.yml run --rm dev

See docker/README.md for full documentation.

C++ library (native)

Requires GCC trunk built from source with -freflection support:

cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j$(nproc)
sudo cmake --install build

Python package

pip install -e .

Building with tests and benchmarks

cmake -B build -DCMAKE_BUILD_TYPE=Release \
               -DLIBSQLGLOT_BUILD_TESTS=ON \
               -DLIBSQLGLOT_BUILD_BENCHMARKS=ON
cmake --build build -j$(nproc)
ctest --test-dir build

Compiled sizes (stripped, -O3): C++ library 14KB, Python extension 2.0MB.

Code quality: Compiles with -Wall -Wextra -Wpedantic -Werror. No runtime dependencies. No RTTI. Passes 27,040 assertions across 361 test cases. Fuzz-tested with libFuzzer + AddressSanitizer.

Advanced optimisations

Profile-Guided Optimisation (PGO): For production deployments requiring maximum performance, enable PGO in 3 steps:

# Step 1: Build with profiling instrumentation
cmake -B build -DCMAKE_BUILD_TYPE=Release -DLIBSQLGLOT_PGO_GENERATE=ON
cmake --build build

# Step 2: Run with representative workload to collect profile data
./build/benchmarks/bench_transpiler  # or your own queries

# Step 3: Rebuild using profile data for optimisation (10-30% faster)
cmake -B build -DCMAKE_BUILD_TYPE=Release -DLIBSQLGLOT_PGO_USE=ON
cmake --build build

Compiler optimisations enabled (Release builds): Link-Time Optimisation (LTO), constant merging, symbol visibility optimisation.

Architecture

Header-only design: you only pay for what you use. 19 header files, no .cpp. See include/libsqlglot/ for the full layout. Core files: parser.h (4191 lines), generator.h (2149), expression.h (1385, 115 expression types). Entry point is transpiler.h (86 lines).

Memory management

Arena allocation: all AST nodes allocated in contiguous chunks, freed together in O(1) time. String interning deduplicates identifiers. Tokenisation is zero-copy via string_view. Everything uses RAII, no manual delete calls.

Grammar pipeline

Grammar definitions from multiple sources (ANTLR and normalised JSON specs) are unified into a canonical IR, then compiled into a cache-efficient LUT used by the runtime parser.

SQL support

Statements: SELECT, INSERT, UPDATE, DELETE, MERGE, CREATE TABLE, ALTER TABLE, DROP TABLE, DROP INDEX, TRUNCATE, CREATE VIEW, DROP VIEW, CREATE SCHEMA, DROP SCHEMA

Stored procedures: CALL, RETURN, DECLARE, IF/ELSEIF/ELSE/END IF, WHILE/DO/END WHILE, FOR/IN/LOOP/END LOOP

Clauses: WHERE, GROUP BY, HAVING, ORDER BY, LIMIT, OFFSET, QUALIFY

JOINs: INNER, LEFT, RIGHT, FULL, CROSS, NATURAL, ASOF

Advanced: CTEs (WITH RECURSIVE), window functions (PARTITION BY, ORDER BY, ROWS/RANGE, named windows), subqueries, PIVOT/UNPIVOT

Operators: Arithmetic, comparison, logical, LIKE, ILIKE, IS NULL, BETWEEN, IN, EXISTS

Expressions: CASE WHEN, arrays, array indexing, set operations (UNION, INTERSECT, EXCEPT)

Transactions: BEGIN, COMMIT, ROLLBACK, SAVEPOINT

Utilities: SET, DESCRIBE, SHOW, EXPLAIN

Testing

361 test cases, 27,040 assertions, all passing.

cd build
ctest --output-on-failure

# Run specific test suites
./tests/libsqlglot_tests "[parser]"
./tests/libsqlglot_tests "[security]"
./tests/libsqlglot_tests "[stored_procedures]"
./tests/libsqlglot_tests "[performance]"

# Run benchmarks
./benchmarks/bench_tokenizer
./benchmarks/bench_parser
./benchmarks/bench_transpiler

Security

27,040 assertions covering SQL injection, buffer overflow, stack overflow (recursion depth at 256, adjustable via Parser::kMaxRecursionDepth in parser.h), memory corruption (arena prevents use-after-free and double-free), integer overflow, and encoding attacks (UTF-8 identifiers rejected, UTF-8 string literals accepted). All pass.

Fuzzing

There is a libFuzzer target in fuzzing/fuzz_parser.cpp:

cd fuzzing
clang++ -fsanitize=fuzzer,address -std=c++26 -freflection -I../include fuzz_parser.cpp -o fuzz_parser
./fuzz_parser -max_len=10000 -timeout=10

Compatibility testing

Test compatibility with Python sqlglot's test fixtures:

cd compat
python sqlglot_compat.py --sqlglot-path /path/to/sqlglot --limit 1000

Errors

libsqlglot provides precise, actionable error messages with exact locations:

Arena arena;
Parser parser(arena, "SELECT FROM users");  // Missing column list

// Throws: Line 1, column 8: Expected column list after SELECT (found: 'FROM')

All parse errors include line number, column number, the actual token that caused the error, and a human-readable explanation of what was expected. Fail-fast, no cascading secondary errors.

Examples

Dialect conversion

// MySQL -> PostgreSQL
std::string mysql_sql = "SELECT `user_id`, `name` FROM `users` LIMIT 10";
std::string pg_sql = Transpiler::transpile(
    mysql_sql, Dialect::MySQL, Dialect::PostgreSQL
);
// Output: SELECT "users"."user_id", "users"."name" FROM "users" LIMIT 10

Building AST by hand

Arena arena;
auto stmt = arena.create<SelectStmt>();

// SELECT *
stmt->columns.push_back(arena.create<Star>());

// FROM users
stmt->from = arena.create<TableRef>("users");

// WHERE age > 18
auto age_col = arena.create<Column>("age");
auto threshold = arena.create<Literal>("18");
stmt->where = arena.create<BinaryOp>(ExprType::GT, age_col, threshold);

std::string sql = Generator::generate(stmt);
// "SELECT * FROM users WHERE age > 18"

Optimisation

Arena arena;
auto stmt = Transpiler::parse(arena,
    "SELECT name FROM users WHERE age > 18");

// Apply optimisation passes
Transpiler::optimize(arena, stmt);

// Optimisations include:
// - Column qualification (name -> users.name)
// - Predicate pushdown
// - Constant folding
// - Subquery elimination

std::string sql = Transpiler::generate(stmt);
// "SELECT users.name FROM users WHERE users.age > 18"

Stored procedures (PL/pgSQL, T-SQL, MySQL, Oracle)

Arena arena;
std::string plpgsql = R"(
    FOR i IN 1..100 LOOP
        IF i > 50 THEN
            RETURN i;
        END IF
    END LOOP
)";

auto stmt = Transpiler::parse(arena, plpgsql);
std::string output = Transpiler::generate(stmt, Dialect::PostgreSQL);

Transpiling between dialects

// Real-world analytics query: CTEs, window functions, multiple JOINs
std::string mysql_query = R"(
    WITH regional_sales AS (
        SELECT
            region,
            product_id,
            SUM(amount) as total_sales,
            COUNT(order_id) as order_count,
            ROW_NUMBER() OVER (PARTITION BY region ORDER BY SUM(amount) DESC) as sales_rank
        FROM orders
        WHERE order_date >= '2024-01-01'
        GROUP BY region, product_id
        HAVING SUM(amount) > 10000
    ),
    top_regions AS (
        SELECT region
        FROM regional_sales
        WHERE sales_rank <= 5
        GROUP BY region
        HAVING COUNT(*) > 3
    )
    SELECT
        r.region,
        r.product_id,
        p.product_name,
        r.total_sales,
        r.order_count,
        RANK() OVER (ORDER BY r.total_sales DESC) as overall_rank,
        LAG(r.total_sales) OVER (ORDER BY r.total_sales DESC) as prev_sales
    FROM regional_sales r
    INNER JOIN products p ON r.product_id = p.id
    INNER JOIN top_regions t ON r.region = t.region
    WHERE r.sales_rank <= 10
    ORDER BY r.total_sales DESC
    LIMIT 100
)";

// MySQL → BigQuery
std::string bigquery = Transpiler::transpile(
    mysql_query, Dialect::MySQL, Dialect::BigQuery
);
// Full semantic preservation: CTEs, window functions, JOINs all intact

// Round-trip: MySQL → BigQuery → PostgreSQL → Snowflake → MySQL
std::string step1 = Transpiler::transpile(mysql_query, Dialect::MySQL, Dialect::BigQuery);
std::string step2 = Transpiler::transpile(step1, Dialect::BigQuery, Dialect::PostgreSQL);
std::string step3 = Transpiler::transpile(step2, Dialect::PostgreSQL, Dialect::Snowflake);
std::string back_to_mysql = Transpiler::transpile(step3, Dialect::Snowflake, Dialect::MySQL);
// Query semantics preserved across 4 dialect conversions

// Simple transformations: PostgreSQL → SQL Server
std::string pg_query = "SELECT * FROM users WHERE active = TRUE LIMIT 10";
Arena arena;
Parser parser(arena, pg_query);
auto stmt = parser.parse_select();
std::string sql_server = Generator::generate(stmt, Dialect::SQLServer);
// Result: SELECT TOP 10 * FROM users WHERE active = 1
// LIMIT → TOP, TRUE → 1

// Multi-dialect pipeline: Parse once, generate for multiple targets
Arena shared_arena;
auto ast = Transpiler::parse(shared_arena, "SELECT name FROM users WHERE age > 18");

std::string postgres_sql = Generator::generate(ast, Dialect::PostgreSQL);
std::string mysql_sql = Generator::generate(ast, Dialect::MySQL);
std::string duckdb_sql = Generator::generate(ast, Dialect::DuckDB);
std::string snowflake_sql = Generator::generate(ast, Dialect::Snowflake);
// Single parse, multiple outputs - efficient for multi-target scenarios

Benchmarks

Benchmarks run on x86-64 Linux with -O3 optimisation. libsqlglot compared against pure Python sqlglot 30.0.1. Python bindings add ~125ns overhead (95-98% of C++ performance).

What we measure: Full parse + generate round-trip (SQL → AST → SQL). No optimisation applied in either implementation. Both produce identical output, proving identical work done. Apples-to-apples comparison of parser and generator performance.

Measurement: std::chrono::high_resolution_clock with 1000 iterations per query, averaged.

The 16 standard queries are sqlglot's benchmark. The 8 stress tests are ours, excluded from the average. They're here to show the scaling doesn't stop: 178.6× on benchmarks, 252× on these. What happens past 252× is left as an exercise for the reader.

Standard benchmarks (16 queries)

Query sqlglot (μs) libsqlglot (μs) Speedup
many_ctes 1,097.84 4.53 242.4×
many_joins 1,038.03 5.81 178.7×
nested_functions 892.91 6.27 142.4×
nested_subqueries 495.97 2.79 177.8×
many_unions 2,585.04 17.30 149.4×
tpch 1,110.16 7.74 143.4×
complex_where 575.69 4.27 134.8×
many_windows 933.16 13.00 71.8×
deep_arithmetic 388.10 2.86 135.7×
many_columns 1,643.41 14.81 110.9×
values 15,951.28 162.90 97.9×
large_case 4,480.21 35.30 126.9×
large_in 11,756.63 116.38 101.0×
many_numbers 6,463.57 67.71 95.5×
short 118.61 1.54 77.0×
large_strings 146.92 4.54 32.4×

Average: 126.1× faster (range: 32.4× to 242.4×). A million queries: 29 seconds vs 52 minutes.

libsqlglot achieves this through:

  • Memory: Arena allocation (O(1) cleanup), string interning (pointer equality), zero-copy tokenisation (string_view)
  • Algorithms: Perfect hash keyword lookup (O(1)), branchless uppercase conversion (no branch misprediction)
  • Compiler: LTO (whole-program optimisation), aggressive inlining, constant folding, C++23 constexpr
  • Cache: Contiguous memory layout, spatial locality, no per-node malloc/new fragmentation

Stress tests (8 queries, supplementary)

Designed to break parsers: 15-level nested CTEs, 35-level CASE expressions, 100+ WHERE conditions, 20-way joins. 1,186 lines of SQL, 51KB total.

Query sqlglot (μs) libsqlglot (μs) Speedup
Correlated subqueries 14,400.86 60.60 237.6×
Complex subqueries 18,414.37 73.18 251.6×
Complex string functions 15,609.81 71.42 218.6×
Deep CTE nesting 17,214.85 82.36 209.0×
Union chains 24,075.57 115.77 208.0×
Complex WHERE clause 14,499.12 81.99 176.8×
Deep CASE nesting 14,651.62 103.10 142.1×
Multi-table joins 13,862.49 141.16 98.2×

Average: 192.7× faster (range: 98.2× to 251.6×). A million queries: 86 seconds vs 4.6 hours.

Validation

# Python sqlglot: 3,917 μs
parsed = sqlglot.parse_one("SELECT * FROM users WHERE age > 18")
output = parsed.sql()  # "SELECT * FROM users WHERE age > 18"
// libsqlglot: 45 μs (87× faster)
Arena arena;
auto ast = Transpiler::parse(arena, "SELECT * FROM users WHERE age > 18");
auto output = Transpiler::generate(ast);  // "SELECT * FROM users WHERE age > 18"

See benchmarks/bench_complete_comparison.py to reproduce.

Supported SQL dialects

45 dialects with full parse and generation support. Use Dialect::Name in C++ or "name" strings in Python (e.g., read="mysql", write="postgres").

Note: Dialect names are case-sensitive (e.g., Dialect::PostgreSQL, not Dialect::postgresql).

Each dialect includes proper identifier quoting, keyword handling, function name translation, and syntax transformations (e.g. LIMIT vs TOP vs FETCH FIRST).

If somehow a dialect is missing from the map, get_features() returns ANSI SQL defaults.

Core dialects (explicit configurations)

Database Dialect Supported features
ANSI SQL ANSI Standard SQL compliance: ANSI joins, standard aggregations, CTEs
BigQuery BigQuery STRUCT types, ARRAY literals [1,2,3], SAFE_CAST, INT64/STRING types, nested field access
Calcite Calcite TABLESAMPLE BERNOULLI, Apache Calcite optimiser hints
ClickHouse ClickHouse Column-oriented syntax, MergeTree engines, SAMPLE BY, ARRAY JOIN
DB2 DB2 FETCH FIRST n ROWS ONLY, DB2 stored procedures, OLAP functions
Drill Drill Schema-free JSON queries, nested data access, FLATTEN
DuckDB DuckDB QUALIFY clause, ASOF joins, PIVOT/UNPIVOT, macro functions, LIST type
Hive Hive PARTITIONED BY, CLUSTERED BY, SerDe formats, Hive UDFs
MySQL MySQL Backtick identifiers, MySQL-specific functions, storage engines
Oracle Oracle CONNECT BY hierarchical queries, PRIOR, START WITH, DUAL table, PL/SQL blocks
Phoenix Phoenix HBase integration: SALT_BUCKETS, ARRAY_APPEND, UPSERT VALUES
Pinot Pinot Real-time OLAP: segment pruning, star-tree indexes, broker queries
PostgreSQL PostgreSQL RETURNING, ON CONFLICT, LATERAL joins, window functions, array types, JSONB
Presto Presto APPROX_DISTINCT, UNNEST, ROW types, lambda functions
Snowflake Snowflake FLATTEN for JSON, VARIANT type, TIME_TRAVEL, CLUSTER BY, RESULT_SCAN
Solr Solr score() relevance function, faceted search, Lucene query syntax
Spark Spark NULL-SAFE equality <=>, Hive metastore, broadcast hints, cache table
SQL Server SQLServer T-SQL syntax: TOP, IDENTITY, OUTPUT clause, EXEC, GO batches, temp tables #
SQLite SQLite Minimal SQL: no RIGHT JOIN, PRAGMA commands, autoincrement
Teradata Teradata MULTISET tables, BTEQ syntax, FastLoad/MultiLoad hints

Inherited dialect configurations

These dialects inherit features from a compatible base dialect and add specific extensions.

Database Dialect Inherits from Supported features
Athena Athena Presto AWS Athena Presto syntax, S3 partitioning
CockroachDB CockroachDB PostgreSQL UPSERT statement, distributed transactions, RETURNING clause
Databricks Databricks Spark OPTIMIZE tables, ZORDER BY clustering, Delta Lake operations
Doris Doris MySQL DUPLICATE KEY model, BUCKETS distribution, DISTRIBUTED BY HASH
Dremio Dremio Presto CREATE REFLECTION for materialisation, data lakehouse queries
Druid Druid MySQL TIME_FLOOR for time bucketing, approximate aggregations, roll-up
Dune Dune PostgreSQL Analytics: bytearray_to_uint256, ETH address functions
Exasol Exasol PostgreSQL DISTRIBUTE BY for parallel execution, Lua scripting UDFs
Fabric Fabric SQL Server Three-part lakehouse.schema.table naming, OneLake integration
Greenplum Greenplum PostgreSQL DISTRIBUTED BY/RANDOMLY, column/append-optimised tables, GPORCA optimiser
Impala Impala Hive COMPUTE STATS, Kudu integration, CACHED IN pools
MariaDB MariaDB MySQL MySQL-compatible with RETURNING, window functions, JSON functions
Materialize Materialize PostgreSQL TAIL for streaming results, materialised views, temporal filters
Netezza Netezza PostgreSQL DISTRIBUTE ON distribution keys, zone maps, statistics
Redshift Redshift PostgreSQL DISTKEY distribution, SORTKEY ordering, SUPER type (JSON)
RisingWave RisingWave PostgreSQL EMIT CHANGES for streaming, temporal joins, watermarks
SingleStore SingleStore MySQL VECTOR type for embeddings, DOT_PRODUCT, columnstore/rowstore
Spark2 Spark2 Spark Legacy Spark 2.x: CACHE TABLE, broadcast joins, RDD compatibility
StarRocks StarRocks MySQL Vectorised execution, primary key model, materialised views, bitmap indexes
Tableau Tableau PostgreSQL ZN() null-to-zero, Tableau calculation functions, RAWSQL passthrough
Teradata Teradata Teradata MULTISET tables, BTEQ syntax, FastLoad/MultiLoad hints
TiDB TiDB MySQL AUTO_RANDOM for distributed primary keys, MySQL compatibility, TiKV storage
TimescaleDB TimescaleDB PostgreSQL time_bucket() for time-series, hypertables, continuous aggregates
Trino Trino Presto Presto-compatible: UNNEST, lambda expressions, ROW types
Vertica Vertica PostgreSQL CREATE PROJECTION for physical design, SEGMENTED BY HASH, columnar storage
YugabyteDB YugabyteDB PostgreSQL SPLIT INTO n TABLETS, distributed SQL, PostgreSQL compatibility

Contributing

libsqlglot is currently a solo project. Bug reports, test cases, and dialect edge cases are welcome via GitHub issues. If you have a dialect you wish to see added, please open an issue or PR.

If a query parses incorrectly, or a dialect transformation that produces wrong output, please open an issue with the input SQL, source dialect, target dialect, expected output and any other pertinent details.

Pull requests are considered but there is no guarantee of merge. The codebase is intentionally small and opinionated.

How about a Rust rewrite?

No.

Licence

Apache 2.0

Packages

 
 
 

Contributors

Languages