feat(waterdata): add get_waterdata for generalized CQL2 queries#284
Draft
thodson-usgs wants to merge 2 commits into
Draft
feat(waterdata): add get_waterdata for generalized CQL2 queries#284thodson-usgs wants to merge 2 commits into
thodson-usgs wants to merge 2 commits into
Conversation
Python analogue of R ``dataRetrieval::read_waterdata``. The typed
``get_*`` wrappers (``get_daily``, ``get_continuous``, …) only support
exact-equality predicates on whitelisted parameters. Some users need
more — top-level ``or``, ``like`` with ``%`` wildcards, comparison
operators, nested boolean trees — and today have no surface for it.
``get_waterdata(service, cql, ...)`` accepts a raw CQL2 query
(``dict`` or pre-serialized JSON string) and POSTs it against any
recognized collection, then walks pages and post-processes the
result with the same pipeline the typed wrappers use.
Reuses existing infrastructure: ``_walk_pages``, ``_deal_with_empty``,
``_arrange_cols``, ``_type_cols``, ``_sort_rows``, and
``_switch_properties_id``. The new pieces are:
- ``_OUTPUT_ID_BY_SERVICE`` (utils.py) — a single mapping from
service name to the renamed-``id`` column the rest of the package
exposes, hoisted from the typed wrappers so the generalized entry
point can pick the right one.
- ``_construct_cql_request`` (utils.py) — focused POST/CQL2 request
builder; distinct from ``_construct_api_requests`` because the
body comes in verbatim rather than being derived from typed
kwargs.
- ``get_waterdata`` (api.py) — public entry point.
CQL2 grammar reference:
https://api.waterdata.usgs.gov/docs/ogcapi/complex-queries/
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Code-review pass on PR DOI-USGS#284. - Lift ``WATERDATA_SERVICES`` Literal into ``types.py``. Use it as the ``service`` arg type of ``get_waterdata`` so editors offer completion and type-checkers catch typos. The runtime source of truth (``_OUTPUT_ID_BY_SERVICE`` in utils.py) is unchanged; the Literal is kept in sync by hand and a comment notes that. - Extract ``_ogc_query_params(properties, bbox, limit, skip_geometry)`` in utils.py. The same ``skipGeometry``/``limit``/``bbox``/``properties`` block previously appeared twice — once in ``_construct_api_requests`` and once in the new ``_construct_cql_request`` — and is now built in one place. - Extract ``_finalize_ogc_frame(df, response, properties, service, output_id, convert_type)`` for the post-processing tail (``_deal_with_empty`` -> ``_type_cols`` -> ``_arrange_cols`` -> ``_sort_rows`` -> ``BaseMetadata``). Both ``get_ogc_data`` and ``get_waterdata`` route through it now, so the typed-kwargs and raw-CQL2 paths produce identically-shaped DataFrames by construction rather than by parallel maintenance. - Drop the ``client`` kwarg from ``get_waterdata``. None of the other public ``get_*`` getters expose it, and the rationale (HTTP session reuse) applies to all of them or none. If we want to expose session reuse, that's a separate PR that touches the whole family. - Collapse the ``properties`` normalization block to None-first ordering so the common case (no properties) reads first. - Drop the docstring breadcrumb to ``utils._OUTPUT_ID_BY_SERVICE``; point readers at ``types.WATERDATA_SERVICES`` (the user-facing Literal) instead. All 148 unit tests pass; ``_construct_api_requests`` and ``_construct_cql_request`` produce byte-identical requests to before.
Collaborator
Author
|
do we need to shield this against string comparisons as we do in filters.py? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
get_waterdata(service, cql, ...)— Python analogue of RdataRetrieval::read_waterdata. The typed wrappers (get_daily,get_continuous,get_peaks, …) only support exact-equality predicates on whitelisted parameters. Some users need more expressive queries:orinstead of justandlikewith%wildcards (e.g. HUC prefix match)<,>,between)This function gives them a single entry point that accepts a raw CQL2 query (either a Python
dictor a pre-serialized JSON string), POSTs it against any recognized OGC collection, walks pages, and runs the same post-processing pipeline (_deal_with_empty→_type_cols→_arrange_cols→_sort_rows) the typed wrappers use.CQL2 grammar reference: https://api.waterdata.usgs.gov/docs/ogcapi/complex-queries/
Examples
Both examples mirror the R reference.
API
servicevalidated against the recognized OGC collections (daily,continuous,latest-*,peaks,field-measurements*,channel-measurements,monitoring-locations,time-series-metadata,combined-metadata).cqlacceptsdict(JSON-serialized internally) orstr(passed through verbatim).propertieshonors the same"id"→output_idrewrite the typed wrappers do.clientlets callers reuse an HTTP session.What's reused vs. new
Reused unchanged:
_walk_pages,_deal_with_empty,_arrange_cols,_type_cols,_sort_rows,_switch_properties_id,_default_headers,BaseMetadata. The whole post-processing pipeline drops in.New:
_OUTPUT_ID_BY_SERVICE(utils.py): single mapping from service to the renamed-idcolumn. Hoisted from the typed wrappers so the generalized entry point picks the right one._construct_cql_request(utils.py): focused POST/CQL2 request builder. Kept separate from_construct_api_requestsbecause that function derives the CQL body from typed kwargs; here the body comes in verbatim.get_waterdata(api.py): the public entry point.Smoke test
The POST goes out, the OGC server filters by the CQL body, pagination handles the multi-page response, and post-processing renames
id→daily_id, types columns, and sorts rows. (140 rows came back from a single site with a smalllimit=5page size — pagination cycled through ~28 pages.)Out of scope
tests/waterdata_test.pypatterns; can follow up).include_hash_idsparameter would apply uniformly toget_waterdataonce it lands (no extra plumbing needed).Test plan
_construct_cql_requestbuilds the right URL, headers, and body offlineinpredicatelikeexample (rate-limited at time of submission)requests_mock🤖 Generated with Claude Code