This file provides guidance to Agents when working with code in this repository.
sqlparse is a non-validating SQL parser for Python that provides support for parsing, splitting, and formatting SQL statements. It's compatible with Python 3.8+ and supports multiple SQL dialects (Oracle, MySQL, PostgreSQL/PL/pgSQL, HQL, MS Access, Snowflake, BigQuery).
This project uses pixi for dependency and environment management. Common commands:
- Run all tests across Python versions:
pixi run test-all - Run tests for specific Python version:
pixi run -e py311 pytest tests/ - Run single test file:
pixi run -e py311 pytest tests/test_format.py - Run specific test:
pixi run -e py311 pytest tests/test_format.py::test_name - Using Makefile:
make test
pixi run lintormake lint
make coverage(runs tests with coverage and shows report)make coverage-xml(generates XML coverage report)
python -m build(builds distribution packages)
The parsing and formatting workflow follows this sequence:
- Lexing (
sqlparse/lexer.py): Tokenizes SQL text into(token_type, value)pairs using regex-based pattern matching - Filtering (
sqlparse/engine/filter_stack.py): Processes token stream through aFilterStackwith three stages:preprocess: Token-level filtersstmtprocess: Statement-level filterspostprocess: Final output filters
- Statement Splitting (
sqlparse/engine/statement_splitter.py): Splits token stream into individual SQL statements - Grouping (
sqlparse/engine/grouping.py): Groups tokens into higher-level syntactic structures (parentheses, functions, identifiers, etc.) - Formatting (
sqlparse/formatter.py+sqlparse/filters/): Applies formatting filters based on options
The token system is defined in sqlparse/sql.py:
Token: Base class withvalue,ttype(token type), andparentattributesTokenList: Group of tokens, base for all syntactic structuresStatement: Top-level SQL statementIdentifier: Table/column names, possibly with aliasesIdentifierList: Comma-separated identifiersFunction: Function calls with parametersParenthesis,SquareBrackets: Bracketed expressionsCase,If,For,Begin: Control structuresWhere,Having,Over: SQL clausesComparison,Operation: Expressions
All tokens maintain parent-child relationships for tree traversal.
Token types are defined in sqlparse/tokens.py and used for classification during lexing (e.g., T.Keyword.DML, T.Name, T.Punctuation).
sqlparse/keywords.py contains:
SQL_REGEX: List of regex patterns for tokenization- Multiple
KEYWORDS_*dictionaries for different SQL dialects - The
Lexerclass uses a singleton pattern (Lexer.get_default_instance()) that can be configured with different keyword sets
sqlparse/engine/grouping.py contains the grouping logic that transforms flat token lists into nested tree structures. Key functions:
_group_matching(): Groups tokens with matching open/close markers (parentheses, CASE/END, etc.)- Various
group_*()functions for specific constructs (identifiers, functions, comparisons, etc.) - Includes DoS protection via
MAX_GROUPING_DEPTHandMAX_GROUPING_TOKENSlimits
sqlparse/filters/ contains various formatting filters:
reindent.py: Indentation logicaligned_indent.py: Aligned indentation styleright_margin.py: Line wrappingtokens.py: Token-level transformations (keyword case, etc.)output.py: Output format serialization (SQL, Python, PHP)others.py: Miscellaneous filters (strip comments, whitespace, etc.)
The main entry points in sqlparse/__init__.py:
parse(sql, encoding=None): Parse SQL into tuple ofStatementobjectsformat(sql, encoding=None, **options): Format SQL with options (reindent, keyword_case, etc.)split(sql, encoding=None, strip_semicolon=False): Split SQL into individual statement stringsparsestream(stream, encoding=None): Generator version of parse for file-like objects
token.flatten(): Recursively yields all leaf tokens (ungrouped)token_first(),token_next(),token_prev(): Navigate token liststoken_next_by(i=, m=, t=): Find next token by instance type, match criteria, or token typetoken.match(ttype, values, regex=False): Check if token matches criteria
Use Lexer.add_keywords() to extend the parser with new keywords for different SQL dialects.
Be aware of recursion limits and token count limits in grouping operations when handling untrusted SQL input.
- Tests are in
tests/directory - Test files follow pattern
test_*.py - Uses pytest framework
- Test data often includes SQL strings with expected parsing/formatting results