feat: add retry/resilience middleware with exponential backoff#1171
feat: add retry/resilience middleware with exponential backoff#1171Shiven0504 wants to merge 1 commit intoqdrant:masterfrom
Conversation
Add configurable automatic retries for transient failures on both REST and gRPC interfaces. Disabled by default (retry_config=None).
✅ Deploy Preview for poetic-froyo-8baba7 ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
📝 WalkthroughWalkthroughThis pull request introduces a comprehensive retry mechanism for the qdrant-client library. A new Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip You can make CodeRabbit's review stricter and more nitpicky using the `assertive` profile, if that's what you prefer.Change the |
There was a problem hiding this comment.
Actionable comments posted: 7
🧹 Nitpick comments (1)
qdrant_client/qdrant_remote.py (1)
66-70:retry_configdoesn't cover the upload helpers yet.
upload_points()andupload_collection()later delegate to_upload_collection(), which still uses the legacymax_retriespath and never readsself._retry_config. That gives callers different retry semantics depending on which public API they use. Either thread the new config into the uploader classes or document that uploads remain on separate retry knobs.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@qdrant_client/qdrant_remote.py` around lines 66 - 70, The retry_config passed into the constructor is not applied to upload helpers: update upload_points and upload_collection to pass self._retry_config into the uploader flow (or make _upload_collection read self._retry_config) so they no longer fall back to the legacy max_retries path; locate the public methods upload_points and upload_collection and the internal _upload_collection and ensure either the uploader classes/constructors receive the RetryConfig instance or change _upload_collection to prefer self._retry_config over max_retries when creating/using the uploader.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@qdrant_client/async_qdrant_fastembed.py`:
- Around line 412-414: The assertion uses membership/indexing on
collection_info.config.params.sparse_vectors without guarding for None; update
the checks in the code around the assertion that references
sparse_vector_field_name and collection_info.config.params.vectors to first
ensure collection_info.config.params.sparse_vectors is not None (e.g., raise the
same validation error if it is None), then perform the membership test and
indexing; apply the same None-guard pattern to the other occurrence that checks
sparse_vectors (the block referencing sparse_vector_field_name in the nearby
lines) so a TypeError cannot be raised when sparse_vectors is None.
- Around line 397-409: Replace the runtime asserts in the compatibility checks
with explicit exceptions: validate that collection_info.config.params.vectors is
a dict and that vector_field_name exists there (raise ValueError/TypeError with
clear messages referencing collection_info.config.params.vectors and
vector_field_name), then fetch vector_params and validate embeddings_size ==
vector_params.size and distance == vector_params.distance raising explicit
exceptions on mismatch; additionally, make the sparse vector membership check
safe by guarding against None (check that
collection_info.config.params.sparse_vectors is not None before testing
membership of sparse_vector_field_name) and raise a clear exception if
sparse_vector_field_name is provided but sparse_vectors is missing.
In `@qdrant_client/common/retry.py`:
- Around line 48-67: Add a __post_init__ on the frozen dataclass RetryConfig
that validates and normalizes inputs: ensure max_retries >= 0, initial_backoff
>= 0, max_backoff >= initial_backoff, backoff_multiplier > 0, and 0.0 <= jitter
<= 1.0, raising ValueError with clear messages for violations; also coerce
retryable_status_codes and retryable_grpc_codes to frozenset (use
object.__setattr__ to mutate fields in the frozen dataclass) so the runtime
object is truly immutable and callers get early, explicit errors instead of
downstream failures in functions that rely on these fields.
- Around line 101-107: The _parse_retry_after helper only handles numeric
delta-seconds; update it to also accept HTTP-date per RFC7231 by: first
attempting the existing float conversion on value, and if that fails, parse the
HTTP-date string (use email.utils.parsedate_to_datetime or an RFC-1123 parser)
to a datetime, compute the seconds until that date relative to now (handling
timezone-aware vs naive datetimes), and return that seconds as a float (or None
if parsing fails or date is in the past). Keep the function name
_parse_retry_after and its return types unchanged, and ensure exceptions from
date parsing are caught and result in None.
- Around line 147-158: The retry path currently abandons the httpx.Response
(variable response) when is_retryable_status(...) triggers a retry, which can
exhaust the connection pool; before each continue in the sync retry branch
(where compute_backoff(...) is used) call response.close(), and in the async
middleware equivalent call await response.aclose() before continue; update both
retry blocks (the shown block and the similar block around lines 198–209) to
ensure responses are closed/aclosed prior to sleeping/continuing.
In `@qdrant_client/qdrant_remote.py`:
- Around line 241-242: Currently install_retry_middleware(self.openapi_client,
self._retry_config) is enabling automatic retries for every REST call made via
self.openapi_client (and the same retry config is reused for gRPC), which risks
replaying non-idempotent mutating operations; change this by removing the
blanket middleware installation and instead apply retries only to safe
endpoints: either add a whitelist/flag mechanism to call
install_retry_middleware per-operation (e.g., when executing
read-only/idempotent methods) or add logic in the middleware installer to accept
an allowed_methods/allowed_paths set and only wrap requests for read/idempotent
operations; update code paths that construct the client (where
self.openapi_client is initialized) and any place that currently relies on
global install_retry_middleware so that mutating methods like snapshot
creation/recovery, alias updates, collection lifecycle, and cluster mutations
are excluded unless explicitly opted-in.
In `@tests/test_retry.py`:
- Around line 360-377: The test test_retries_on_resource_exhausted_exception
currently only checks call_count and doesn't verify the zero-delay Retry-After
behavior; update the test to either (A) assert the interceptor uses the
retry-after value by patching/mocking the sleep function used inside
retry_interceptor (or whatever backoff/sleep helper it calls) and verify it was
called with 0 when ResourceExhaustedResponse(retry_after_s=0) is raised,
referencing the RetryConfig, retry_interceptor, ResourceExhaustedResponse and
intercept_unary_unary symbols, or (B) more simply change the raised
ResourceExhaustedResponse to use a positive retry_after_s (e.g. 0.5) and assert
the mocked sleep was called with that positive value so the retry-after branch
is exercised (also ensure the test patches the same sleep/backoff helper the
interceptor uses).
---
Nitpick comments:
In `@qdrant_client/qdrant_remote.py`:
- Around line 66-70: The retry_config passed into the constructor is not applied
to upload helpers: update upload_points and upload_collection to pass
self._retry_config into the uploader flow (or make _upload_collection read
self._retry_config) so they no longer fall back to the legacy max_retries path;
locate the public methods upload_points and upload_collection and the internal
_upload_collection and ensure either the uploader classes/constructors receive
the RetryConfig instance or change _upload_collection to prefer
self._retry_config over max_retries when creating/using the uploader.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 10486d23-3211-4c06-a67f-da9ba92145c5
📒 Files selected for processing (10)
qdrant_client/__init__.pyqdrant_client/async_qdrant_client.pyqdrant_client/async_qdrant_fastembed.pyqdrant_client/async_qdrant_remote.pyqdrant_client/common/retry.pyqdrant_client/connection.pyqdrant_client/local/async_qdrant_local.pyqdrant_client/qdrant_client.pyqdrant_client/qdrant_remote.pytests/test_retry.py
| assert isinstance(collection_info.config.params.vectors, dict), ( | ||
| f"Collection have incompatible vector params: {collection_info.config.params.vectors}" | ||
| ) | ||
| assert vector_field_name in collection_info.config.params.vectors, ( | ||
| f"Collection have incompatible vector params: {collection_info.config.params.vectors}, expected {vector_field_name}" | ||
| ) | ||
| vector_params = collection_info.config.params.vectors[vector_field_name] | ||
| assert ( | ||
| embeddings_size == vector_params.size | ||
| ), f"Embedding size mismatch: {embeddings_size} != {vector_params.size}" | ||
| assert ( | ||
| distance == vector_params.distance | ||
| ), f"Distance mismatch: {distance} != {vector_params.distance}" | ||
| assert embeddings_size == vector_params.size, ( | ||
| f"Embedding size mismatch: {embeddings_size} != {vector_params.size}" | ||
| ) | ||
| assert distance == vector_params.distance, ( | ||
| f"Distance mismatch: {distance} != {vector_params.distance}" | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
head -50 qdrant_client/async_qdrant_fastembed.pyRepository: qdrant/qdrant-client
Length of output: 1880
🏁 Script executed:
sed -n '370,430p' qdrant_client/async_qdrant_fastembed.pyRepository: qdrant/qdrant-client
Length of output: 3310
🏁 Script executed:
find . -type f -name "*fastembed*" | head -20Repository: qdrant/qdrant-client
Length of output: 410
🏁 Script executed:
rg "autogenerated|auto-generated|generated by" qdrant_client/async_qdrant_fastembed.pyRepository: qdrant/qdrant-client
Length of output: 118
🏁 Script executed:
find . -type f \( -name "*.py" -o -name "*.jinja*" -o -name "*.template" \) | xargs rg -l "async_qdrant_fastembed" 2>/dev/null | head -10Repository: qdrant/qdrant-client
Length of output: 207
🏁 Script executed:
cat -n tools/async_client_generator/fastembed_generator.pyRepository: qdrant/qdrant-client
Length of output: 3423
🏁 Script executed:
cat -n qdrant_client/qdrant_fastembed.py | sed -n '390,430p'Repository: qdrant/qdrant-client
Length of output: 2249
🏁 Script executed:
find tools/async_client_generator -type f -name "*.py" -o -name "*.jinja*" | xargs ls -laRepository: qdrant/qdrant-client
Length of output: 2926
🏁 Script executed:
grep -r "validate_collection_info" tools/async_client_generator/Repository: qdrant/qdrant-client
Length of output: 46
🏁 Script executed:
cat -n qdrant_client/qdrant_fastembed.py | sed -n '405,445p'Repository: qdrant/qdrant-client
Length of output: 2329
🏁 Script executed:
grep -A 50 "_validate_collection_info" qdrant_client/qdrant_fastembed.py | head -60Repository: qdrant/qdrant-client
Length of output: 2629
🏁 Script executed:
cat -n qdrant_client/qdrant_fastembed.py | sed -n '425,445p'Repository: qdrant/qdrant-client
Length of output: 1239
🏁 Script executed:
grep -B 5 -A 5 "sparse_vectors\[" qdrant_client/qdrant_fastembed.pyRepository: qdrant/qdrant-client
Length of output: 730
🏁 Script executed:
rg "sparse_vectors.*None" qdrant_client/ --type py -B 3 -A 3Repository: qdrant/qdrant-client
Length of output: 46770
🏁 Script executed:
rg "class CollectionParams|sparse_vectors.*:" qdrant_client/http/models.py | head -50Repository: qdrant/qdrant-client
Length of output: 132
🏁 Script executed:
rg "sparse_vectors" qdrant_client/http/models/__init__.py -A 2 -B 2 | head -100Repository: qdrant/qdrant-client
Length of output: 46
🏁 Script executed:
find . -type f -name "*.py" -path "*/http/models*" | xargs grep -l "sparse_vectors" | head -5Repository: qdrant/qdrant-client
Length of output: 101
🏁 Script executed:
rg "class.*Params.*:" qdrant_client/http/models.py -A 20 | grep -A 20 "sparse_vectors"Repository: qdrant/qdrant-client
Length of output: 132
🏁 Script executed:
python3 << 'EOF'
import sys
sys.path.insert(0, '.')
try:
from qdrant_client.http import models
collection_info = models.CollectionInfo
config_type = models.CollectionConfig
params_type = models.CollectionParams
print("CollectionParams attributes:")
if hasattr(params_type, '__fields__'):
for name, field in params_type.__fields__.items():
if 'sparse' in name.lower():
print(f" {name}: {field.type_}")
else:
import inspect
print(inspect.signature(params_type.__init__))
except Exception as e:
print(f"Error: {e}")
EOFRepository: qdrant/qdrant-client
Length of output: 93
🏁 Script executed:
grep -r "sparse_vectors:" qdrant_client/http/models/ --include="*.py" | head -10Repository: qdrant/qdrant-client
Length of output: 374
🏁 Script executed:
sed -n '425,440p' qdrant_client/http/models/models.pyRepository: qdrant/qdrant-client
Length of output: 462
🏁 Script executed:
sed -n '428,440p' qdrant_client/qdrant_fastembed.pyRepository: qdrant/qdrant-client
Length of output: 802
🏁 Script executed:
grep -B 5 -A 5 "sparse_vectors: Optional" qdrant_client/http/models/models.py | head -30Repository: qdrant/qdrant-client
Length of output: 1643
🏁 Script executed:
sed -n '428,440p' qdrant_client/qdrant_fastembed.pyRepository: qdrant/qdrant-client
Length of output: 802
🏁 Script executed:
cat -n qdrant_client/http/models/models.py | sed -n '1,50p'Repository: qdrant/qdrant-client
Length of output: 2379
🏁 Script executed:
grep -n "class CollectionParams" qdrant_client/http/models/models.pyRepository: qdrant/qdrant-client
Length of output: 161
🏁 Script executed:
sed -n '1090,1130p' qdrant_client/http/models/models.pyRepository: qdrant/qdrant-client
Length of output: 1433
🏁 Script executed:
cat -n qdrant_client/qdrant_fastembed.py | sed -n '405,440p'Repository: qdrant/qdrant-client
Length of output: 2124
🏁 Script executed:
grep -n "def _validate_collection_info" tools/async_client_generator/transformers/fastembed/function_def_transformer.pyRepository: qdrant/qdrant-client
Length of output: 46
🏁 Script executed:
cat tools/async_client_generator/transformers/fastembed/function_def_transformer.pyRepository: qdrant/qdrant-client
Length of output: 523
🏁 Script executed:
cat -n tools/async_client_generator/transformers/function_def_transformer.pyRepository: qdrant/qdrant-client
Length of output: 1379
🏁 Script executed:
grep -A 15 "sparse_vector_field_name = self.get_sparse_vector_field_name()" qdrant_client/qdrant_fastembed.pyRepository: qdrant/qdrant-client
Length of output: 1501
Replace assert with explicit exceptions and add None-safety check for sparse vectors.
These are runtime compatibility checks that should not rely on assert statements, as they can be completely removed with Python's -O optimization flag, silently skipping validation and causing less actionable errors later. Additionally, the membership check for sparse_vectors at line 431 lacks a None guard—if sparse_vector_field_name is not None but sparse_vectors is None, this will raise TypeError.
Since this file is autogenerated, apply the fix to the source in qdrant_client/qdrant_fastembed.py:
Proposed fix
assert isinstance(
collection_info.config.params.vectors, dict
), f"Collection have incompatible vector params: {collection_info.config.params.vectors}"
+ if not isinstance(collection_info.config.params.vectors, dict):
+ raise ValueError(
+ f"Collection has incompatible vector params: {collection_info.config.params.vectors}"
+ )
assert (
vector_field_name in collection_info.config.params.vectors
), f"Collection have incompatible vector params: {collection_info.config.params.vectors}, expected {vector_field_name}"
+ if vector_field_name not in collection_info.config.params.vectors:
+ raise ValueError(
+ f"Collection has incompatible vector params: {collection_info.config.params.vectors}, expected {vector_field_name}"
+ )
vector_params = collection_info.config.params.vectors[vector_field_name]
assert (
embeddings_size == vector_params.size
), f"Embedding size mismatch: {embeddings_size} != {vector_params.size}"
+ if embeddings_size != vector_params.size:
+ raise ValueError(f"Embedding size mismatch: {embeddings_size} != {vector_params.size}")
assert (
distance == vector_params.distance
), f"Distance mismatch: {distance} != {vector_params.distance}"
+ if distance != vector_params.distance:
+ raise ValueError(f"Distance mismatch: {distance} != {vector_params.distance}")
sparse_vector_field_name = self.get_sparse_vector_field_name()
if sparse_vector_field_name is not None:
+ if collection_info.config.params.sparse_vectors is None:
+ raise ValueError(
+ f"Collection has sparse vector field '{sparse_vector_field_name}' but sparse_vectors config is not set"
+ )
assert (
sparse_vector_field_name in collection_info.config.params.sparse_vectors
), f"Collection have incompatible vector params: {collection_info.config.params.vectors}"
+ if sparse_vector_field_name not in collection_info.config.params.sparse_vectors:
+ raise ValueError(
+ f"Collection has incompatible sparse vector params, expected '{sparse_vector_field_name}'"
+ )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@qdrant_client/async_qdrant_fastembed.py` around lines 397 - 409, Replace the
runtime asserts in the compatibility checks with explicit exceptions: validate
that collection_info.config.params.vectors is a dict and that vector_field_name
exists there (raise ValueError/TypeError with clear messages referencing
collection_info.config.params.vectors and vector_field_name), then fetch
vector_params and validate embeddings_size == vector_params.size and distance ==
vector_params.distance raising explicit exceptions on mismatch; additionally,
make the sparse vector membership check safe by guarding against None (check
that collection_info.config.params.sparse_vectors is not None before testing
membership of sparse_vector_field_name) and raise a clear exception if
sparse_vector_field_name is provided but sparse_vectors is missing.
| assert sparse_vector_field_name in collection_info.config.params.sparse_vectors, ( | ||
| f"Collection have incompatible vector params: {collection_info.config.params.vectors}" | ||
| ) |
There was a problem hiding this comment.
Guard sparse_vectors against None before membership/indexing.
If collection_info.config.params.sparse_vectors is None, the membership test and subsequent indexing can raise TypeError instead of the intended validation error path.
Proposed fix
sparse_vector_field_name = self.get_sparse_vector_field_name()
if sparse_vector_field_name is not None:
- assert sparse_vector_field_name in collection_info.config.params.sparse_vectors, (
- f"Collection have incompatible vector params: {collection_info.config.params.vectors}"
- )
+ sparse_vectors = collection_info.config.params.sparse_vectors
+ if not isinstance(sparse_vectors, dict) or sparse_vector_field_name not in sparse_vectors:
+ raise ValueError(
+ f"Collection has incompatible sparse vector params: {sparse_vectors}, expected {sparse_vector_field_name}"
+ )
if self.sparse_embedding_model_name in IDF_EMBEDDING_MODELS:
- modifier = collection_info.config.params.sparse_vectors[
- sparse_vector_field_name
- ].modifier
- assert modifier == models.Modifier.IDF, (
- f"{self.sparse_embedding_model_name} requires modifier IDF, current modifier is {modifier}"
- )
+ modifier = sparse_vectors[sparse_vector_field_name].modifier
+ if modifier != models.Modifier.IDF:
+ raise ValueError(
+ f"{self.sparse_embedding_model_name} requires modifier IDF, current modifier is {modifier}"
+ )Also applies to: 419-421
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@qdrant_client/async_qdrant_fastembed.py` around lines 412 - 414, The
assertion uses membership/indexing on
collection_info.config.params.sparse_vectors without guarding for None; update
the checks in the code around the assertion that references
sparse_vector_field_name and collection_info.config.params.vectors to first
ensure collection_info.config.params.sparse_vectors is not None (e.g., raise the
same validation error if it is None), then perform the membership test and
indexing; apply the same None-guard pattern to the other occurrence that checks
sparse_vectors (the block referencing sparse_vector_field_name in the nearby
lines) so a TypeError cannot be raised when sparse_vectors is None.
| @dataclass(frozen=True) | ||
| class RetryConfig: | ||
| """User-facing retry configuration. | ||
|
|
||
| Args: | ||
| max_retries: Maximum number of retry attempts (0 = no retries). Default: 3. | ||
| initial_backoff: Base backoff duration in seconds. Default: 0.1. | ||
| max_backoff: Upper bound on backoff duration in seconds. Default: 30. | ||
| backoff_multiplier: Multiplier applied per attempt. Default: 2. | ||
| jitter: Jitter factor (0–1) added to backoff to avoid thundering herd. Default: 0.1. | ||
| retryable_status_codes: HTTP status codes eligible for retry. | ||
| """ | ||
|
|
||
| max_retries: int = 3 | ||
| initial_backoff: float = 0.1 | ||
| max_backoff: float = 30.0 | ||
| backoff_multiplier: float = 2.0 | ||
| jitter: float = 0.1 | ||
| retryable_status_codes: frozenset[int] = _DEFAULT_RETRYABLE_STATUS_CODES | ||
| retryable_grpc_codes: frozenset[grpc.StatusCode] = _DEFAULT_RETRYABLE_GRPC_CODES |
There was a problem hiding this comment.
Reject invalid retry configs up front.
RetryConfig is public, but nothing prevents negative retries/backoffs or a jitter value outside the documented 0..1 range. With max_retries < 0, the loops at Lines 129, 180, 280, and 357 never execute and these functions fall through to an undefined response; negative backoffs will also fail in sleep(). It would also help to coerce the code collections to frozenset here so the frozen dataclass stays actually immutable at runtime.
Suggested fix
`@dataclass`(frozen=True)
class RetryConfig:
@@
retryable_status_codes: frozenset[int] = _DEFAULT_RETRYABLE_STATUS_CODES
retryable_grpc_codes: frozenset[grpc.StatusCode] = _DEFAULT_RETRYABLE_GRPC_CODES
+
+ def __post_init__(self) -> None:
+ object.__setattr__(
+ self, "retryable_status_codes", frozenset(self.retryable_status_codes)
+ )
+ object.__setattr__(
+ self, "retryable_grpc_codes", frozenset(self.retryable_grpc_codes)
+ )
+ if self.max_retries < 0:
+ raise ValueError("max_retries must be >= 0")
+ if self.initial_backoff < 0 or self.max_backoff < 0:
+ raise ValueError("backoff values must be >= 0")
+ if self.max_backoff < self.initial_backoff:
+ raise ValueError("max_backoff must be >= initial_backoff")
+ if self.backoff_multiplier <= 0:
+ raise ValueError("backoff_multiplier must be > 0")
+ if not 0 <= self.jitter <= 1:
+ raise ValueError("jitter must be between 0 and 1")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @dataclass(frozen=True) | |
| class RetryConfig: | |
| """User-facing retry configuration. | |
| Args: | |
| max_retries: Maximum number of retry attempts (0 = no retries). Default: 3. | |
| initial_backoff: Base backoff duration in seconds. Default: 0.1. | |
| max_backoff: Upper bound on backoff duration in seconds. Default: 30. | |
| backoff_multiplier: Multiplier applied per attempt. Default: 2. | |
| jitter: Jitter factor (0–1) added to backoff to avoid thundering herd. Default: 0.1. | |
| retryable_status_codes: HTTP status codes eligible for retry. | |
| """ | |
| max_retries: int = 3 | |
| initial_backoff: float = 0.1 | |
| max_backoff: float = 30.0 | |
| backoff_multiplier: float = 2.0 | |
| jitter: float = 0.1 | |
| retryable_status_codes: frozenset[int] = _DEFAULT_RETRYABLE_STATUS_CODES | |
| retryable_grpc_codes: frozenset[grpc.StatusCode] = _DEFAULT_RETRYABLE_GRPC_CODES | |
| `@dataclass`(frozen=True) | |
| class RetryConfig: | |
| """User-facing retry configuration. | |
| Args: | |
| max_retries: Maximum number of retry attempts (0 = no retries). Default: 3. | |
| initial_backoff: Base backoff duration in seconds. Default: 0.1. | |
| max_backoff: Upper bound on backoff duration in seconds. Default: 30. | |
| backoff_multiplier: Multiplier applied per attempt. Default: 2. | |
| jitter: Jitter factor (0–1) added to backoff to avoid thundering herd. Default: 0.1. | |
| retryable_status_codes: HTTP status codes eligible for retry. | |
| """ | |
| max_retries: int = 3 | |
| initial_backoff: float = 0.1 | |
| max_backoff: float = 30.0 | |
| backoff_multiplier: float = 2.0 | |
| jitter: float = 0.1 | |
| retryable_status_codes: frozenset[int] = _DEFAULT_RETRYABLE_STATUS_CODES | |
| retryable_grpc_codes: frozenset[grpc.StatusCode] = _DEFAULT_RETRYABLE_GRPC_CODES | |
| def __post_init__(self) -> None: | |
| object.__setattr__( | |
| self, "retryable_status_codes", frozenset(self.retryable_status_codes) | |
| ) | |
| object.__setattr__( | |
| self, "retryable_grpc_codes", frozenset(self.retryable_grpc_codes) | |
| ) | |
| if self.max_retries < 0: | |
| raise ValueError("max_retries must be >= 0") | |
| if self.initial_backoff < 0 or self.max_backoff < 0: | |
| raise ValueError("backoff values must be >= 0") | |
| if self.max_backoff < self.initial_backoff: | |
| raise ValueError("max_backoff must be >= initial_backoff") | |
| if self.backoff_multiplier <= 0: | |
| raise ValueError("backoff_multiplier must be > 0") | |
| if not 0 <= self.jitter <= 1: | |
| raise ValueError("jitter must be between 0 and 1") |
🧰 Tools
🪛 Ruff (0.15.5)
[warning] 57-57: Docstring contains ambiguous – (EN DASH). Did you mean - (HYPHEN-MINUS)?
(RUF002)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@qdrant_client/common/retry.py` around lines 48 - 67, Add a __post_init__ on
the frozen dataclass RetryConfig that validates and normalizes inputs: ensure
max_retries >= 0, initial_backoff >= 0, max_backoff >= initial_backoff,
backoff_multiplier > 0, and 0.0 <= jitter <= 1.0, raising ValueError with clear
messages for violations; also coerce retryable_status_codes and
retryable_grpc_codes to frozenset (use object.__setattr__ to mutate fields in
the frozen dataclass) so the runtime object is truly immutable and callers get
early, explicit errors instead of downstream failures in functions that rely on
these fields.
| def _parse_retry_after(value: str | None) -> float | None: | ||
| if value is None: | ||
| return None | ||
| try: | ||
| return float(value) | ||
| except (ValueError, TypeError): | ||
| return None |
There was a problem hiding this comment.
Parse HTTP-date Retry-After values too.
Retry-After can be sent either as delta-seconds or as an HTTP-date. This helper only accepts the numeric form, so 429/503 responses from proxies or load balancers that use a date header will ignore the server cooldown and fall back to client-side backoff.
Suggested fix
+from datetime import datetime, timezone
+from email.utils import parsedate_to_datetime
@@
def _parse_retry_after(value: str | None) -> float | None:
if value is None:
return None
try:
- return float(value)
+ return max(0.0, float(value))
except (ValueError, TypeError):
- return None
+ try:
+ retry_at = parsedate_to_datetime(value)
+ except (TypeError, ValueError, IndexError):
+ return None
+ if retry_at.tzinfo is None:
+ retry_at = retry_at.replace(tzinfo=timezone.utc)
+ return max(0.0, (retry_at - datetime.now(timezone.utc)).total_seconds())📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def _parse_retry_after(value: str | None) -> float | None: | |
| if value is None: | |
| return None | |
| try: | |
| return float(value) | |
| except (ValueError, TypeError): | |
| return None | |
| from datetime import datetime, timezone | |
| from email.utils import parsedate_to_datetime | |
| def _parse_retry_after(value: str | None) -> float | None: | |
| if value is None: | |
| return None | |
| try: | |
| return max(0.0, float(value)) | |
| except (ValueError, TypeError): | |
| try: | |
| retry_at = parsedate_to_datetime(value) | |
| except (TypeError, ValueError, IndexError): | |
| return None | |
| if retry_at.tzinfo is None: | |
| retry_at = retry_at.replace(tzinfo=timezone.utc) | |
| return max(0.0, (retry_at - datetime.now(timezone.utc)).total_seconds()) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@qdrant_client/common/retry.py` around lines 101 - 107, The _parse_retry_after
helper only handles numeric delta-seconds; update it to also accept HTTP-date
per RFC7231 by: first attempting the existing float conversion on value, and if
that fails, parse the HTTP-date string (use email.utils.parsedate_to_datetime or
an RFC-1123 parser) to a datetime, compute the seconds until that date relative
to now (handling timezone-aware vs naive datetimes), and return that seconds as
a float (or None if parsing fails or date is in the past). Keep the function
name _parse_retry_after and its return types unchanged, and ensure exceptions
from date parsing are caught and result in None.
| if is_retryable_status(response.status_code, config) and attempt < config.max_retries: | ||
| retry_after = _parse_retry_after(response.headers.get("Retry-After")) | ||
| backoff = compute_backoff(attempt, config, retry_after=retry_after) | ||
| logger.debug( | ||
| "Retry %d/%d after HTTP %d (backoff=%.2fs)", | ||
| attempt + 1, | ||
| config.max_retries, | ||
| response.status_code, | ||
| backoff, | ||
| ) | ||
| time.sleep(backoff) | ||
| continue |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
sed -n '1,260p' qdrant_client/http/api_client.py
rg -n -C3 'send\(|stream=' qdrant_client/http/api_client.pyRepository: qdrant/qdrant-client
Length of output: 12779
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Find and view retry.py
fd -n 'retry.py' qdrant_client/common/
wc -l qdrant_client/common/retry.py
sed -n '130,220p' qdrant_client/common/retry.pyRepository: qdrant/qdrant-client
Length of output: 294
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Find and view retry.py
fd 'retry.py' qdrant_client/common/
wc -l qdrant_client/common/retry.py
sed -n '130,220p' qdrant_client/common/retry.pyRepository: qdrant/qdrant-client
Length of output: 3455
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# View the full retry middleware implementation
sed -n '1,200p' qdrant_client/common/retry.pyRepository: qdrant/qdrant-client
Length of output: 7138
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Check for any response.close calls in retry.py
rg 'response\.close|response\.aclose' qdrant_client/common/retry.py
# Check for context managers or with statements
rg 'with.*response|async with.*response' qdrant_client/common/retry.py
# Look for complete retry.py to see if there's any cleanup
sed -n '200,250p' qdrant_client/common/retry.pyRepository: qdrant/qdrant-client
Length of output: 46
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Look at the rest of the async retry middleware
sed -n '200,250p' qdrant_client/common/retry.py
# Check how response is used elsewhere in the codebase
rg -B3 -A3 'is_retryable_status' qdrant_client/Repository: qdrant/qdrant-client
Length of output: 3363
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Check how httpx is used elsewhere in the codebase for response handling
rg -B2 -A2 'response\.close|aclose' qdrant_client/ --type py | head -50
# Check imports and httpx usage patterns
rg 'from httpx|import httpx' qdrant_client/ --type py | head -20
# Look for any context manager usage with responses
rg 'with.*Client|async with.*Client' qdrant_client/ --type py | head -20Repository: qdrant/qdrant-client
Length of output: 50376
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Check if there's any documentation or test showing the issue
rg -l 'response.*close|Response.*close' qdrant_client/ --type py
# Look for how responses are used in the API calls
sed -n '111,130p' qdrant_client/http/api_client.pyRepository: qdrant/qdrant-client
Length of output: 1001
🏁 Script executed:
#!/bin/bash
set -euo pipefochash
# Check how qdrant_client uses httpx in context of streaming/responses
rg 'stream.*=' qdrant_client/http/ -A 2 -B 2
# Look at the actual HTTP client send implementation
sed -n '130,140p' qdrant_client/http/api_client.pyRepository: qdrant/qdrant-client
Length of output: 120
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Check how qdrant_client uses httpx in context of streaming/responses
rg 'stream.*=' qdrant_client/http/ -A 2 -B 2
# Look at the actual HTTP client send implementation
sed -n '130,140p' qdrant_client/http/api_client.pyRepository: qdrant/qdrant-client
Length of output: 46
Close discarded retry responses to avoid connection pool exhaustion.
When is_retryable_status() returns true and the retry path executes continue, the unconsumed httpx.Response object is abandoned without being closed. This can exhaust the connection pool across retries. Add response.close() before the continue statement in the sync middleware and await response.aclose() in the async middleware.
Also applies to: lines 198–209
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@qdrant_client/common/retry.py` around lines 147 - 158, The retry path
currently abandons the httpx.Response (variable response) when
is_retryable_status(...) triggers a retry, which can exhaust the connection
pool; before each continue in the sync retry branch (where compute_backoff(...)
is used) call response.close(), and in the async middleware equivalent call
await response.aclose() before continue; update both retry blocks (the shown
block and the similar block around lines 198–209) to ensure responses are
closed/aclosed prior to sleeping/continuing.
| if self._retry_config is not None: | ||
| install_retry_middleware(self.openapi_client, self._retry_config) |
There was a problem hiding this comment.
Don't apply automatic retries to every mutating endpoint.
Installing the middleware here makes every REST call below this class retriable, and the same config is later passed to gRPC on Line 370. That includes non-idempotent operations like snapshot creation/recovery, alias updates, collection lifecycle, and cluster mutations. If the first attempt commits server-side but the client only sees a transient transport failure, the replay can duplicate side effects or turn a successful write into a client-visible failure. Please scope retries to read-only / explicitly idempotent calls, or require per-operation opt-in for writes.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@qdrant_client/qdrant_remote.py` around lines 241 - 242, Currently
install_retry_middleware(self.openapi_client, self._retry_config) is enabling
automatic retries for every REST call made via self.openapi_client (and the same
retry config is reused for gRPC), which risks replaying non-idempotent mutating
operations; change this by removing the blanket middleware installation and
instead apply retries only to safe endpoints: either add a whitelist/flag
mechanism to call install_retry_middleware per-operation (e.g., when executing
read-only/idempotent methods) or add logic in the middleware installer to accept
an allowed_methods/allowed_paths set and only wrap requests for read/idempotent
operations; update code paths that construct the client (where
self.openapi_client is initialized) and any place that currently relies on
global install_retry_middleware so that mutating methods like snapshot
creation/recovery, alias updates, collection lifecycle, and cluster mutations
are excluded unless explicitly opted-in.
| def test_retries_on_resource_exhausted_exception(self): | ||
| config = RetryConfig(max_retries=1, initial_backoff=0.001, jitter=0.0) | ||
| interceptor = retry_interceptor(config) | ||
|
|
||
| call_count = 0 | ||
| mock_success = MagicMock() | ||
| mock_success.code.return_value = grpc.StatusCode.OK | ||
|
|
||
| def continuation(details, request): | ||
| nonlocal call_count | ||
| call_count += 1 | ||
| if call_count == 1: | ||
| raise ResourceExhaustedResponse(message="Rate limited", retry_after_s=0) | ||
| return mock_success | ||
|
|
||
| result = interceptor.intercept_unary_unary(continuation, MagicMock(), MagicMock()) | ||
| assert result is mock_success | ||
| assert call_count == 2 |
There was a problem hiding this comment.
This won't verify the zero-delay Retry-After path.
retry_after_s=0 is the interesting edge case here, but this test only checks call_count. If the interceptor treats 0 as falsy and falls back to exponential backoff, this still passes. Please either assert the sleep duration or use a positive retry_after_s so the retry-after branch is actually exercised.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_retry.py` around lines 360 - 377, The test
test_retries_on_resource_exhausted_exception currently only checks call_count
and doesn't verify the zero-delay Retry-After behavior; update the test to
either (A) assert the interceptor uses the retry-after value by patching/mocking
the sleep function used inside retry_interceptor (or whatever backoff/sleep
helper it calls) and verify it was called with 0 when
ResourceExhaustedResponse(retry_after_s=0) is raised, referencing the
RetryConfig, retry_interceptor, ResourceExhaustedResponse and
intercept_unary_unary symbols, or (B) more simply change the raised
ResourceExhaustedResponse to use a positive retry_after_s (e.g. 0.5) and assert
the mocked sleep was called with that positive value so the retry-after branch
is exercised (also ensure the test patches the same sleep/backoff helper the
interceptor uses).
Body:
What this does -
Adds built-in automatic retries for transient failures (connection resets, 503s, rate limits, etc.) to both REST and gRPC interfaces. It's off by default so existing behavior doesn't change unless you opt in.
Usage -
from qdrant_client import QdrantClient, RetryConfigclient = QdrantClient(url="http://localhost:6333",retry_config=RetryConfig(max_retries=3),)That's it, now the client will retry up to 3 times on transient errors with exponential backoff before giving up.
RetryConfig lets you tweak things if needed such as initial_backoff, max_backoff, backoff_multiplier, jitter, and which HTTP status codes or
gRPC codes count as retryable. The defaults should be sensible for most cases.
How it works -
tries again. It also respects Retry-After headers.
unary (single request/response) calls are retried — streaming calls pass through since you can't replay a consumed iterator.
Backoff is exponential with jitter to avoid thundering herd.
Why this is safe -
Most Qdrant writes (upsert, delete by ID, set payload) are idempotent, so retrying them wont cause duplicates or corruption. The feature is completely disabled by default so nothing changes for existing users.
Changes I've done -