Skip to content

mraba/reflection-optimisation: skip schema resolution for single table#656

Draft
sfc-gh-mraba wants to merge 5 commits intomainfrom
mraba/reflection-optimisation
Draft

mraba/reflection-optimisation: skip schema resolution for single table#656
sfc-gh-mraba wants to merge 5 commits intomainfrom
mraba/reflection-optimisation

Conversation

@sfc-gh-mraba
Copy link
Copy Markdown
Collaborator

@sfc-gh-mraba sfc-gh-mraba commented Feb 25, 2026

Please answer these questions before submitting your pull requests. Thanks!

  1. What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes #SNOW-689531

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
    • I am adding new logging messages
    • I am adding new credentials
    • I am adding a new dependency
  3. Please describe how your code solves the related issue.

Optimise reflection performance (SNOW-689531)

Summary

Adds SQLAlchemy 2.x get_multi_* bulk reflection hooks and per-table SHOW … IN TABLE / DESC TABLE reflection paths. Schema-wide queries run once per MetaData.reflect() pass instead of once per table.
Single-table Inspector calls use fast per-table queries that work correctly for all table types including temporary and dynamic tables.

Changes

Performance

• Add get_multi_columns, get_multi_pk_constraint, get_multi_unique_constraints, get_multi_foreign_keys — SA 2.x bulk hooks that each issue one schema-wide query per reflection pass instead of one
query per table.
• cache_column_metadata=True opt-in enables per-table SHOW … IN TABLE queries for PK, UK, FK, and indexes on both SA versions.

Correctness

• Fix FK referred_schema resolution: both per-table and schema-wide paths now include the explicitly reflected schema in the same-schema set, preventing incorrect cross-schema references when
reflecting a non-default schema.
• Fix unique constraint keying: constraints are now keyed by (table_name, constraint_name) instead of constraint_name alone, preventing silent overwrites when two tables have identically-named
constraints.
• Add _always_quote_join helper that always quotes denormalised identifiers, ensuring correct SQL for case-sensitive table and schema names.
• Replace SHOW TABLES LIKE with SHOW INDEXES IN TABLE for single-table index reflection, eliminating SQL LIKE wildcard false-positives and case-sensitivity bugs.

Robustness

• Narrow exception handling in all per-table reflection methods (_get_table_primary_keys, _get_table_unique_constraints, _get_table_foreign_keys, _get_table_indexes) from sa_exc.DBAPIError to
sa_exc.ProgrammingError — connection and operational errors now propagate instead of being silently swallowed.
• Narrow _execute_desc from bare except Exception to sa_exc.ProgrammingError — only SQL-level errors (e.g. table dropped by another session) are swallowed; network/permission errors propagate with
actionable diagnostics.
• Add logger.debug to all swallowed exceptions for diagnosability.

Code quality

• Add shared row-parsing helpers (_parse_pk_rows, _parse_uk_rows, _parse_fk_rows, _parse_index_rows) so correctness fixes propagate to both per-table and schema-wide paths.
• Hoist _StructuredTypeInfoManager allocation outside the fallback loop in get_multi_columns.
• Fix misleading docstrings on get_table* methods that claimed "results are not cached" when the calling methods apply @reflection.cache.

Documentation

• README: document SA 2.x vs 1.4 reflection routing, cache_column_metadata opt-in, performance implications, and best practices for different schema sizes.
• README: note that Inspector.get_columns() on SA 2.x uses DESC TABLE, which includes Snowflake's resolved default sizes in type str() representations (e.g. BINARY(8388608)). Type objects are
functionally identical; use isinstance() for type checks.

Behavioural notes

• SA 2.x Inspector.get_columns() now uses DESC TABLE instead of information_schema. Reflected type objects are functionally identical, but str() output may include explicit default sizes (e.g.
BINARY(8388608) vs BINARY). MetaData.reflect() and Alembic are unaffected — they use get_multi_columns which still queries information_schema.
• SA 2.x get_columns() for missing tables returns [] instead of raising NoSuchTableError, aligning with the SA 2.0 convention of returning falsy values. reflect_table handles the empty-columns
case separately.
• Per-table reflection methods now let non-ProgrammingError exceptions (network failures, permission errors) propagate instead of returning empty metadata silently.

@sfc-gh-mraba sfc-gh-mraba self-assigned this Feb 25, 2026
@sfc-gh-mraba sfc-gh-mraba force-pushed the mraba/reflection-optimisation branch from 737c2aa to ca3d80e Compare March 25, 2026 15:35
Get primary key constraint for a specific table using table-specific query.
Used when cache_column_metadata=False for optimized single-table reflection.
"""
full_name = self._denormalize_quote_join(schema, table_name)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it always quote? For newly added functions we could enforce always quoting rule

for row in result:
if constraint_name is None:
constraint_name = self.normalize_name(
row._mapping["constraint_name"]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (and other newly added helper functions for single table key query) will somewhat collide with my change in #669

Used when cache_column_metadata=False or in single-table reflection.
"""
full_name = self._denormalize_quote_join(schema, table_name)
_, current_schema = self._current_database_schema(connection, **kw)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't / shouldn't we use the schema passed by the function rather than the one coming from _current_database_schema?

)
foreign_key_map = {}
for row in result:
name = self.normalize_name(row._mapping["fk_name"])
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole mapping should be extracted and used in both functions (for all and single tables); same for other newly added functions. It will keep consistency and minimize error (e.g. modifying one function and forgetting about the other one).


# Use smart detection to decide between table-specific vs schema-wide query
if self._should_use_table_specific_query(schema, **kw):
return self._get_table_foreign_keys(connection, table_name, schema, **kw)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Does it even make sense to leave the old behavior? Afaik, get_foreign_keys and other key-querying methods are expecting keys for a given table anyway. Those functions are never interested in keys in schema. Because of this, maybe we should consider making this a new default? The output should be the same anyway, so it's not a breaking change. In case you want to make it safe, let's leave feature param and do gradual migration and make it a new default in the next major

Check if a table name is safe to use with DESC TABLE command.

DESC TABLE is more restrictive than information_schema queries.
Avoid optimization for table names with problematic characters.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this Snowflake or dialect-specific (connected to the identifiers within the dialect)?

"""
return any(isinstance(key, tuple) and key[0] == fn_name for key in info_cache)

def _should_use_table_specific_query(self, schema, **kw):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we would need this with the new approach of making new functions a default. This feels a bit hacky, and caching alone that was on those key-quering functions also feels hacky, as any DDL in between queries won't be reflected in those functions, returning stale data.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, I think we could differenciate between single and multiple column query by implementing get_multi_column and call IN SCHEMA there and for get_column use IN TABLE

except sa_exc.DBAPIError:
return []

n2i = self._map_name_to_idx(result)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: let's use more descriptive names (shortcuts like this or tbl or any other one impacts the readability)


# Optimization: For single-table reflection, use table-specific query
# Skip optimization for edge cases (empty strings, special characters in DESC TABLE)
# DESC TABLE is more restrictive than information_schema queries for certain characters
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the desc table is restrictive, can't we use information_schema with a where clause that would effectively return the same data?

)
n2i = self._map_name_to_idx(result)
for row in result.cursor.fetchall():
prefixes = self.get_prefixes_from_data(n2i, row)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we just use the kind column that is returned from show tables? This way of recognizing table kind is very confusing

… optimise reflection

- Add get_multi_columns, get_multi_pk_constraint, get_multi_unique_constraints,
  get_multi_foreign_keys for SA 2.x bulk reflection (one schema-wide query per
  reflection pass instead of one query per table).
- SA 2.x get_columns uses DESC TABLE directly (IS_VERSION_20 guard); get_multi_columns
  handles all bulk reflection including temp/dynamic tables via DESC TABLE fallback.
- Fix schema key mismatch in all get_multi_* methods: use effective_schema for SQL
  queries, preserve original schema (possibly None) as the return dict key so SA's
  _reflect_info lookup succeeds.
- Remove _should_use_table_specific_query; get_columns uses _is_single_table_reflection.
- Fix SHOW INDEXES IN TABLE replacing SHOW TABLES LIKE for single-table index reflection.
- Add _always_quote_join helper for correctly-quoted per-table SQL identifiers.
- Add shared row-parsing helpers (_parse_pk_rows, _parse_uk_rows, _parse_fk_rows).
- Fix foreign key referred_schema resolution (same_schemas includes explicit schema).
- Update README Cache Column Metadata section with accurate SA 2.x dispatch model.
- Update DESCRIPTION.md unreleased notes.
@sfc-gh-mraba sfc-gh-mraba force-pushed the mraba/reflection-optimisation branch from 7216ee1 to b706b59 Compare March 26, 2026 15:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants