feat(waterdata): add get_waterdata for generalized CQL2 queries by thodson-usgs · Pull Request #284 · DOI-USGS/dataretrieval-python

thodson-usgs · 2026-05-19T00:23:46Z

Summary

Adds get_waterdata(service, cql, ...) — Python analogue of R dataRetrieval::read_waterdata. The typed wrappers (get_daily, get_continuous, get_peaks, …) only support exact-equality predicates on whitelisted parameters. Some users need more expressive queries:

top-level or instead of just and
like with % wildcards (e.g. HUC prefix match)
comparison operators (<, >, between)
nested boolean trees
geometry predicates beyond a bbox

This function gives them a single entry point that accepts a raw CQL2 query (either a Python dict or a pre-serialized JSON string), POSTs it against any recognized OGC collection, walks pages, and runs the same post-processing pipeline (_deal_with_empty → _type_cols → _arrange_cols → _sort_rows) the typed wrappers use.

CQL2 grammar reference: https://api.waterdata.usgs.gov/docs/ogcapi/complex-queries/

Examples

from dataretrieval import waterdata

# 1. Daily values for two parameter codes at two sites — compound AND-of-INs.
cql = {
    "op": "and",
    "args": [
        {"op": "in", "args": [
            {"property": "parameter_code"},
            ["00060", "00065"],
        ]},
        {"op": "in", "args": [
            {"property": "monitoring_location_id"},
            ["USGS-07367300", "USGS-03277200"],
        ]},
    ],
}
df, md = waterdata.get_waterdata(service="daily", cql=cql)

# 2. Monitoring locations whose HUC starts with "02070010" — LIKE with %.
df, md = waterdata.get_waterdata(
    service="monitoring-locations",
    cql='{"op": "like", "args": ['
        '{"property": "hydrologic_unit_code"}, "02070010%"]}',
)

Both examples mirror the R reference.

API

def get_waterdata(
    service: str,
    cql: str | dict,
    *,
    properties: str | Iterable[str] | None = None,
    bbox: list[float] | None = None,
    limit: int | None = None,
    skip_geometry: bool | None = None,
    convert_type: bool = True,
    client: requests.Session | None = None,
) -> tuple[pd.DataFrame, BaseMetadata]:

service validated against the recognized OGC collections (daily, continuous, latest-*, peaks, field-measurements*, channel-measurements, monitoring-locations, time-series-metadata, combined-metadata).
cql accepts dict (JSON-serialized internally) or str (passed through verbatim).
properties honors the same "id" → output_id rewrite the typed wrappers do.
client lets callers reuse an HTTP session.

What's reused vs. new

Reused unchanged: _walk_pages, _deal_with_empty, _arrange_cols, _type_cols, _sort_rows, _switch_properties_id, _default_headers, BaseMetadata. The whole post-processing pipeline drops in.

New:

_OUTPUT_ID_BY_SERVICE (utils.py): single mapping from service to the renamed-id column. Hoisted from the typed wrappers so the generalized entry point picks the right one.
_construct_cql_request (utils.py): focused POST/CQL2 request builder. Kept separate from _construct_api_requests because that function derives the CQL body from typed kwargs; here the body comes in verbatim.
get_waterdata (api.py): the public entry point.

Smoke test

>>> df, md = get_waterdata(service="daily",
...     cql='{"op":"in","args":[{"property":"monitoring_location_id"},["USGS-02238500"]]}',
...     limit=5)
>>> md.url
'https://api.waterdata.usgs.gov/ogcapi/v0/collections/daily/items?skipGeometry=False&limit=5'
>>> df.shape
(140, 12)

The POST goes out, the OGC server filters by the CQL body, pagination handles the multi-page response, and post-processing renames id → daily_id, types columns, and sorts rows. (140 rows came back from a single site with a small limit=5 page size — pagination cycled through ~28 pages.)

Out of scope

Unit tests for the new function (would mirror existing tests/waterdata_test.py patterns; can follow up).
The hash-ID drop default behavior in feat(waterdata): drop hash-valued ID columns by default #281: that PR's include_hash_ids parameter would apply uniformly to get_waterdata once it lands (no extra plumbing needed).

Test plan

Module import + signature
_construct_cql_request builds the right URL, headers, and body offline
Live smoke test against the USGS OGC API with a simple CQL in predicate
Live test with the wildcard like example (rate-limited at time of submission)
Unit tests via requests_mock

🤖 Generated with Claude Code

Python analogue of R ``dataRetrieval::read_waterdata``. The typed ``get_*`` wrappers (``get_daily``, ``get_continuous``, …) only support exact-equality predicates on whitelisted parameters. Some users need more — top-level ``or``, ``like`` with ``%`` wildcards, comparison operators, nested boolean trees — and today have no surface for it. ``get_waterdata(service, cql, ...)`` accepts a raw CQL2 query (``dict`` or pre-serialized JSON string) and POSTs it against any recognized collection, then walks pages and post-processes the result with the same pipeline the typed wrappers use. Reuses existing infrastructure: ``_walk_pages``, ``_deal_with_empty``, ``_arrange_cols``, ``_type_cols``, ``_sort_rows``, and ``_switch_properties_id``. The new pieces are: - ``_OUTPUT_ID_BY_SERVICE`` (utils.py) — a single mapping from service name to the renamed-``id`` column the rest of the package exposes, hoisted from the typed wrappers so the generalized entry point can pick the right one. - ``_construct_cql_request`` (utils.py) — focused POST/CQL2 request builder; distinct from ``_construct_api_requests`` because the body comes in verbatim rather than being derived from typed kwargs. - ``get_waterdata`` (api.py) — public entry point. CQL2 grammar reference: https://api.waterdata.usgs.gov/docs/ogcapi/complex-queries/ Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Code-review pass on PR DOI-USGS#284. - Lift ``WATERDATA_SERVICES`` Literal into ``types.py``. Use it as the ``service`` arg type of ``get_waterdata`` so editors offer completion and type-checkers catch typos. The runtime source of truth (``_OUTPUT_ID_BY_SERVICE`` in utils.py) is unchanged; the Literal is kept in sync by hand and a comment notes that. - Extract ``_ogc_query_params(properties, bbox, limit, skip_geometry)`` in utils.py. The same ``skipGeometry``/``limit``/``bbox``/``properties`` block previously appeared twice — once in ``_construct_api_requests`` and once in the new ``_construct_cql_request`` — and is now built in one place. - Extract ``_finalize_ogc_frame(df, response, properties, service, output_id, convert_type)`` for the post-processing tail (``_deal_with_empty`` -> ``_type_cols`` -> ``_arrange_cols`` -> ``_sort_rows`` -> ``BaseMetadata``). Both ``get_ogc_data`` and ``get_waterdata`` route through it now, so the typed-kwargs and raw-CQL2 paths produce identically-shaped DataFrames by construction rather than by parallel maintenance. - Drop the ``client`` kwarg from ``get_waterdata``. None of the other public ``get_*`` getters expose it, and the rationale (HTTP session reuse) applies to all of them or none. If we want to expose session reuse, that's a separate PR that touches the whole family. - Collapse the ``properties`` normalization block to None-first ordering so the common case (no properties) reads first. - Drop the docstring breadcrumb to ``utils._OUTPUT_ID_BY_SERVICE``; point readers at ``types.WATERDATA_SERVICES`` (the user-facing Literal) instead. All 148 unit tests pass; ``_construct_api_requests`` and ``_construct_cql_request`` produce byte-identical requests to before.

thodson-usgs · 2026-05-19T00:52:00Z

do we need to shield this against string comparisons as we do in filters.py?

thodson-usgs and others added 2 commits May 18, 2026 19:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(waterdata): add get_waterdata for generalized CQL2 queries#284

feat(waterdata): add get_waterdata for generalized CQL2 queries#284
thodson-usgs wants to merge 2 commits into
DOI-USGS:mainfrom
thodson-usgs:worktree-get-waterdata-cql

thodson-usgs commented May 19, 2026

Uh oh!

thodson-usgs commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

thodson-usgs commented May 19, 2026

Summary

Examples

API

What's reused vs. new

Smoke test

Out of scope

Test plan

Uh oh!

thodson-usgs commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant