Skip to content

Create Storage Access#910

Open
zubednarova wants to merge 13 commits intomainfrom
storage-access-patch-1
Open

Create Storage Access#910
zubednarova wants to merge 13 commits intomainfrom
storage-access-patch-1

Conversation

@zubednarova
Copy link
Copy Markdown
Contributor

@zubednarova zubednarova commented Apr 13, 2026

Jira issue(s): PROOF-XXX

https://linear.app/keboola/issue/AJDA-2521/dokumentace-k-direct-access

Changes:

  • Add Storage Access documentation page for Data Apps (data-apps/storage-access/index.md)
  • Add page to site navigation in _data/navigation.yml
  • Small update to data-apps/index.md — Data Integration bullet now lists the real Data App env vars and links to the canonical list in keboola/data-app-python-js

The page covers:

  • Overview, architecture (Query Service API), and workspace lifecycle
  • Setup steps (enable feature, select tables, deploy)
  • Reading data (Query Service client, custom queries)
  • Writing data (INSERT, UPDATE, DELETE, TRUNCATE with metadata refresh)
  • Environment variables (BRANCH_ID, WORKSPACE_ID, QUERY_SERVICE_URL, KBC_TOKEN)
  • Comparison: Input Mapping vs Direct Storage Access
  • Full Flask example app with input validation
  • Best practices (error handling, SQL-injection prevention, keyset pagination, caching, logging)
  • Limitations (Snowflake-only, no column-level permissions)

Updates since the initial review (addressing @odinuv's comments):

  • Fixed SQL-injection section: clarified that Query Service accepts raw SQL, so we show input validation (type casting, allowlists) instead of claiming "parameterized queries"
  • Replaced OFFSET-based pagination with keyset (cursor-based) pagination to avoid duplicates/gaps on live data
  • Added generic Python in-memory cache example alongside Streamlit-specific st.cache_data
  • Clarified logging: stdout goes to Terminal Log tab; write ops are auto-tracked for billing
  • Removed duplicate "column-level permissions" note (kept only in Limitations)
  • Flask example: client initialization moved to module level (once at startup, not per-request), added input validation with allowlist

Latest updates (2026-04-22 – 2026-04-23):

  • Corrected SDK name, API, and result shape. The examples previously referenced a fictional package (keboola.query-service-client), class (QueryServiceClient), and execute_query(workspace_id=, query=) signature. Replaced with the real keboola-query-service package (PyPI package name uses dashes; imported as keboola_query_service per standard Python naming), Client class, and execute_query(branch_id=, workspace_id=, statements=[...]) returning list[QueryResult] with .columns (Column objects) / .data / .rows_affected attributes.
  • Replaced the workspace-manifest-file flow with the platform env vars actually set by the Data App runtime — BRANCH_ID, WORKSPACE_ID, QUERY_SERVICE_URL, KBC_TOKEN. The earlier "manifest.json / workspaceId" approach that appeared in previous drafts is no longer in the file. Updated the Data Integration summary on data-apps/index.md to match, with a link to the canonical env-vars list in keboola/data-app-python-js.
  • Added an SQL-injection warning callout at the top of "Writing Data Back" that names the limitation honestly (Query Service accepts raw SQL, no parameter binding — apps must validate every untrusted value).
  • Added a forward-looking tip in Best Practices pointing at the upcoming SQL.literal() / sql.format() helpers being developed in keboola/query-service-api-python-sdk#8 and keboola/query-service-api-js-sdk#3. The current allowlist / type-coercion pattern is positioned as the recommended interim approach until those SDKs ship.
  • Flask example hardened. The inline comment next to the UPDATE f-string now explicitly names int() coercion and the ALLOWED_STATUSES allowlist as the only reason the string interpolation is safe, with a warning not to add new form fields without analogous validation.
  • Review nits swept (2026-04-23): removed trailing space on Architecture Overview heading, normalized Step 1 list spacing, replaced the stubby single-bullet "Stick with Input Mapping when" list with three concrete cases, removed the unused from functools import lru_cache in the cache example.

Human review checklist:

  • Replace PROOF-XXX above with the real PROOF ticket number
  • Verify the listed permissions (INSERT, SELECT, UPDATE, TRUNCATE, DELETE) match the actual implementation
  • Confirm QUERY_SERVICE_URL / WORKSPACE_ID / BRANCH_ID are the exact env var names set by the Data App runtime when Storage Access is enabled
  • Confirm KBC_TOKEN is the right token for Query Service calls from a Data App with Storage Access (the data-app-python-js README lists KBC_TOKEN as "Only with Data Loader" — clarify whether Storage Access apps receive it through the same env var)
  • Verify the Query Service API docs link is live and correct
  • Decide whether to merge this PR before the SDK helper release (feat(sql): add dialect-aware SQL escape helper query-service-api-python-sdk#8) lands, or to wait so the docs can swap to the new helpers in the same cycle

Link to Devin session: https://app.devin.ai/sessions/736de691ac2745488c7bcae1df0a850c
Requested by: @zubednarova

@zubednarova zubednarova marked this pull request as draft April 13, 2026 05:59
@zubednarova zubednarova marked this pull request as ready for review April 15, 2026 05:41
Comment thread data-apps/storage-access/index.md
Comment thread data-apps/storage-access/index.md Outdated
Comment thread data-apps/storage-access/index.md Outdated
Comment thread data-apps/storage-access/index.md
```python
import logging

logging.info(f"User {current_user} updated record {record_id} to status {new_status}")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A ten log někde skončí nebo ho zahodíme? :)
U nás v eventech není vůbec nic?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Renamed to "Track write operations" and clarified that: (1) write operations are automatically tracked by the Query Service for billing, and (2) logging.info() output goes to stdout, which is visible in the Terminal Log tab of the Data App.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(aside) tady by bylo nejlepsi dat example z te libky, kterou udelal/updatnul Soustruh.
A celkove bychom sem meli pridat "best-practices" prave s pouzitim tech libek.
Udelam to.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(aside) tohle zatim nechme

Comment thread data-apps/storage-access/index.md Outdated
Comment thread data-apps/storage-access/index.md
Comment thread data-apps/storage-access/index.md Outdated
Comment thread data-apps/storage-access/index.md Outdated
Comment thread data-apps/storage-access/index.md
devin-ai-integration Bot and others added 3 commits April 16, 2026 05:49
- Add links to Query Service API docs and recommended client library
- Remove duplicate column-level permissions note (kept in Limitations)
- Add error handling for workspace manifest reading
- Fix Flask example: initialize client once at startup, add input validation
- Rename 'parameterized queries' to 'validate and sanitize input' with allowlist examples
- Replace OFFSET pagination with keyset (cursor-based) pagination
- Add generic Python cache example alongside Streamlit-specific one
- Clarify logging: mention stdout destination and Terminal Log tab
- Make workspace ID reading consistent (always from manifest with error handling)
- Remove WORKSPACE_ID env var (use manifest file consistently)
- Add Storage Access page to site navigation

Co-Authored-By: Zuzana Bednářová <zuzana.bednarova@keboola.com>
…ission list

- Rename Step 2 heading to "Configure Writable Tables" and renumber list (was 1,2,3,4,2,3)
- Use consistent permission ordering (SELECT, INSERT, UPDATE, DELETE, TRUNCATE) across the page
- Drop SELECT from the "Write capability" cell in the comparison table
- Remove unused kbcstorage dependency from both pyproject.toml snippets
- Remove unused pandas and jsonify imports from the Flask example
- Clarify that code examples are Python; same concepts apply to JavaScript
- Convert the truncation warning to {% include warning.html %} and soften the undo claim

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
MiroCillik and others added 3 commits April 22, 2026 10:32
…nv vars

Previously the docs referenced a fictional package (keboola.query-service-client),
a nonexistent class (QueryServiceClient), and a non-matching execute_query()
signature (`workspace_id=`, `query=` returning a dict). The real SDK is
keboola-query-service on PyPI with a Client class and
execute_query(branch_id=, workspace_id=, statements=[...]) returning
list[QueryResult] with .columns (Column objects) and .data attributes.

Also replaces the KBC_WORKSPACE_MANIFEST_PATH manifest-file flow with the
direct env vars the Data App runtime actually sets: BRANCH_ID, WORKSPACE_ID,
QUERY_SERVICE_URL, KBC_TOKEN. The Data Integration summary on the Data Apps
overview page is updated to match, with a pointer to the canonical env vars
list in keboola/data-app-python-js.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…d note

Adds a prominent warning callout at the top of the Writing Data Back
section stating plainly that the Query Service does not support
parameterized queries and the app is responsible for validating
untrusted values. Pairs with a tip callout in Best Practices that
points at the upcoming SQL escape helpers in the Python and JS SDKs.

Addresses review feedback from PR #910 that the existing allowlist /
type-coercion pattern is insufficient guidance on its own, especially
for arbitrary string input.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Expand "Stick with Input Mapping when" from one bullet to three
- Remove trailing space on Architecture Overview heading
- Fix Step 1 numbering (double-space before list items)
- Flask example: expand validation comment to explicitly name the
  allowlist + int() coercion as the reason the f-string is safe, and
  warn against adding new form fields without the same guard
- Remove unused `from functools import lru_cache` in cache example

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@MiroCillik MiroCillik requested a review from odinuv April 24, 2026 08:57
Switches all code examples to read the workspace ID from the manifest
file at KBC_WORKSPACE_MANIFEST_PATH (which contains workspaceId plus
other workspace metadata) instead of the WORKSPACE_ID env var. This
matches the recommended pattern from the Data App runtime — the env
var is still set, but the manifest is the canonical source.

- Adds KBC_WORKSPACE_MANIFEST_PATH row to the env vars table and notes
  WORKSPACE_ID is still available but manifest is preferred
- Updates all four example snippets (Using the Client, env vars section
  example, Flask app, and Best Practices #1) to read the manifest with
  proper (KeyError, FileNotFoundError) error handling
- Mentions KBC_WORKSPACE_MANIFEST_PATH in the Data Apps overview page
  env vars summary

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants