v0.3.0 #6
svdC1
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Added
scrape_do.async_apisub-package —ScrapeDoAsyncAPIClient(backed byhttpx.Client) andAsyncScrapeDoAsyncAPIClient(backed byhttpx.AsyncClient) covering the fullq.scrape.dosurface:create_job,get_job,list_jobs,get_task,cancel_job,get_user_info, plus polling helperswait_for_jobandsubmit_and_wait. Typed status-code error routing with automatic retries on transient gateway errors (429/502/503/504) and per-requestr_timeout/extensionsescape hatches.Polling configuration —
PollingStrategy(configurable exponential backoff with jitter, attempt count, and wall-clock budgets) and thePollingFunctiontype alias for fully-custom cadences. Both share the same(attempt, elapsed, job) -> floatsignature sowait_for_jobaccepts either interchangeably.SDK-native event hooks for the Async API —
AsyncAPIEventHooks(sync) andAsyncAPIAsyncEventHooks(async). Lifecycle coversrequest/response/retry/poll; thepollhook receives a parsedJobDetailssnapshot on every non-terminal polling iteration.scrape_do.pluginssub-package — typed*Parametersmodels for the Amazon and Google plugin gateways with cross-field validation. Companion*AsyncPluginadapters underscrape_do.async_api.models.pluginsplug intoJobCreationRequest.pluginvia a discriminated union. Every adapter (and theAsyncPluginunion itself) is also re-exported fromscrape_do.async_apiso the typical import pattern is two lines:from scrape_do.async_api import AsyncScrapeDoAsyncAPIClient, AmazonPdpAsyncPlugin+from scrape_do.plugins import AmazonPdpParameters. Also adds public Google localization constants.Typed Async-API exception hierarchy —
AsyncAPIError(base) and per-status-code subclasses,AsyncAPIUnparsableResponseErrorfor 2xx bodies the SDK can't parse,JobFailedError/JobCanceledError/TaskFailedError/TaskCanceledErrorfor terminal lifecycle states, andJobTimeoutErrorfor exhausted polling budgets.AsyncScrapeDoErrorMessageparses the gateway's{Error, Code}envelope.ScrapeDoJSONErrorMessage— pydantic model for the structured JSON error envelope returned by the synchronous gateway. Exposesstatus_code/messages/url/possible_causes/error_type/error_code/contact, plus anis_auth_throttleproperty for detecting the auth-throttle case.ScrapeDoResponseergonomics —__repr__/__str__for REPL inspection,to_dict()andto_json(**kwargs)for serialization, and a fixedjson(raw_response=False)that extracts thecontentkey from the Scrape.do JSON envelope when present.scrape_do.models.validators— public helpers for parameter cross-validation (check_geo_code,check_postal_code,check_geo_exclusion, screenshot / return-json / play-with-browser dependency rules, etc.) usable standalone without instantiating a parameters model.Changed
APIResponseErrornow usesScrapeDoJSONErrorMessage.try_from_responsefor body parsing instead of the legacy key-list lookup (detail,Error,errorMessage,message,Message). Error messages are richer and the "Unknown API Error" fallback prints status + body on separate lines.Added
typing_extensions>=4.0as a direct runtime dependency.Fixed
ScrapeDoFrame.url/ScrapeDoNetworkRequest.urlrelaxed fromHttpUrltostr. Real-world iframes and network requests produce technically-valid but quirky URLs (e.g.,?feature=oembed?wmode=transparent) that pydantic-core's URL parser rejected, which blew up the whole response parse.ScrapeDoResponse.cookiesregex no longer captures structural whitespace after;separators. Second-and-later cookie names previously came back with a phantom leading space.ScrapeDoResponseconstructor no longer crashes withJSONDecodeErrorwhen Scrape.do returns HTML instead of JSON underreturnJSON=true— the failure is now properly routed throughis_proxy_error.RequestParameters.to_proxy_urlnow double-encodes the param string so values with URL-reserved characters (notably the JSON-stringplayWithBrowserpayload) survive httpx's transparent decode of the proxy password during Basic auth header construction.Python
3.9/3.10compatibility restored. Source files importingSelf/Unpack/TypeAliasfromtyping(only available in3.11+/3.10+) now usetyping_extensions. Previously the package raisedImportErrorat import time on3.9/3.10despite the trove classifiers claiming support.Internal
New
scrape_do.async_apiandscrape_do.pluginssub-package layout. Async-API helpers (_raise_for_status,_parse_response,_build_job_creation_request) live as module-level functions inscrape_do.async_api.clientand are shared by both client classes.New unit tests for
scrape_do.async_apiandmodels/response.py.Integration coverage expanded from 22 → ~120 tests across the Sync API, Proxy Mode, and Async API surfaces. The new
tests/integration/async_api/suite exercises every endpoint, both client classes, polling helpers, event hooks, the render envelope, a livePlayWithBrowseraction sequence, the typed-exception hierarchy, and 12 of the 15*AsyncPluginvariants. The remaining three (google/trends,walmart/store,lowes/store) are unit-only; they hit upstream- or engine-side failures regardless of input.Integration logging pipeline formalized around
pytest.hookimpl-decorated setup / makereport / teardown hooks with per-test tokens stashed onitem.stash;_validate_and_log_error_stateconsolidated into aresponse_tracefixture.Unit test fixtures consolidated; new shared
tests/unit/async_api/conftest.pyfor the Async-API unit suite plustests/integration/async_api/conftest.pyexposing live client fixtures, a tightfast_polling_strategy, best-effort cancel helpers, and a type-dispatchedasync_api_response_trace.CI matrix expanded to Python
3.9/3.10/3.11/3.12/3.13(fail-fast: false);lintjob (ruff + mypy) split out and pinned to3.13.Full Changelog: v0.2.0...v0.3.0
This discussion was created from the release v0.3.0.
Beta Was this translation helpful? Give feedback.
All reactions