feat: add retry/resilience middleware with exponential backoff by Shiven0504 · Pull Request #1171 · qdrant/qdrant-client

Shiven0504 · 2026-03-13T16:48:17Z

Body:

What this does -
Adds built-in automatic retries for transient failures (connection resets, 503s, rate limits, etc.) to both REST and gRPC interfaces. It's off by default so existing behavior doesn't change unless you opt in.

Usage -

from qdrant_client import QdrantClient, RetryConfig
client = QdrantClient(
url="http://localhost:6333",
retry_config=RetryConfig(max_retries=3),
)

That's it, now the client will retry up to 3 times on transient errors with exponential backoff before giving up.

RetryConfig lets you tweak things if needed such as initial_backoff, max_backoff, backoff_multiplier, jitter, and which HTTP status codes or
gRPC codes count as retryable. The defaults should be sensible for most cases.

How it works -

REST: A middleware wraps each HTTP request. If it gets a retryable status (429, 500, 502, 503, 504) or a connection error, it waits and
tries again. It also respects Retry-After headers.
gRPC: An interceptor does the same thing for gRPC calls. It retries on UNAVAILABLE, DEADLINE_EXCEEDED, and RESOURCE_EXHAUSTED. Only
unary (single request/response) calls are retried — streaming calls pass through since you can't replay a consumed iterator.

Backoff is exponential with jitter to avoid thundering herd.

Why this is safe -
Most Qdrant writes (upsert, delete by ID, set payload) are idempotent, so retrying them wont cause duplicates or corruption. The feature is completely disabled by default so nothing changes for existing users.

Changes I've done -

created a new qdrant_client/common/retry.py so all the retry logic lives here
connection.py for passesing retry interceptors to gRPC channels
qdrant_remote.py / qdrant_client.py which will thread the retry_config param through
init.py to exports RetryConfig
Async clients regenerated (picks up changes automatically)
27 new tests covering backoff math, REST middleware, gRPC interceptor, config validation

Add configurable automatic retries for transient failures on both REST and gRPC interfaces. Disabled by default (retry_config=None).

netlify · 2026-03-13T16:48:23Z

✅ Deploy Preview for poetic-froyo-8baba7 ready!

Name	Link
🔨 Latest commit	`2ba0f45`
🔍 Latest deploy log	https://app.netlify.com/projects/poetic-froyo-8baba7/deploys/69b43fd5a6641c0007296e31
😎 Deploy Preview	https://deploy-preview-1171--poetic-froyo-8baba7.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

coderabbitai · 2026-03-13T17:02:02Z

📝 Walkthrough

Walkthrough

This pull request introduces a comprehensive retry mechanism for the qdrant-client library. A new RetryConfig dataclass is created in qdrant_client/common/retry.py that configures retry behavior with exponential backoff and jitter. REST middleware and gRPC interceptors are implemented for both synchronous and asynchronous flows to handle transient failures across HTTP and gRPC channels. The retry configuration is threaded through client constructors (QdrantClient, AsyncQdrantClient, QdrantRemote, AsyncQdrantRemote) and wired into REST and gRPC initialization. Helper utilities compute backoff intervals, evaluate retryable status codes and gRPC codes, and parse Retry-After headers. Comprehensive tests cover retry scenarios, backoff calculations, and edge cases. Minor assertion formatting changes are also included.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

wip: builtin bm25 support #1060: Modifies client constructor signatures (AsyncQdrantRemote/QdrantRemote __init__) and initialization wiring for propagating configuration parameters.
new: Support for too many requests #914: Addresses gRPC/REST retry handling and error paths through interceptors and middleware, overlapping with retry and rate-limit concerns.

Suggested reviewers

tbung
generall

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 10.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely summarizes the main feature being added: retry/resilience middleware with exponential backoff, which aligns directly with the changeset's primary objective.
Description check	✅ Passed	The description is comprehensive and directly related to the changeset, explaining what the feature does, how to use it, implementation details, and why it's safe.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Tip

You can make CodeRabbit's review stricter and more nitpicky using the `assertive` profile, if that's what you prefer.

Change the reviews.profile setting to assertive to make CodeRabbit's nitpick more issues in your PRs.

coderabbitai

Actionable comments posted: 7

🧹 Nitpick comments (1)

qdrant_client/qdrant_remote.py (1)
66-70: retry_config doesn't cover the upload helpers yet.

upload_points() and upload_collection() later delegate to _upload_collection(), which still uses the legacy max_retries path and never reads self._retry_config. That gives callers different retry semantics depending on which public API they use. Either thread the new config into the uploader classes or document that uploads remain on separate retry knobs.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@qdrant_client/qdrant_remote.py` around lines 66 - 70, The retry_config passed
into the constructor is not applied to upload helpers: update upload_points and
upload_collection to pass self._retry_config into the uploader flow (or make
_upload_collection read self._retry_config) so they no longer fall back to the
legacy max_retries path; locate the public methods upload_points and
upload_collection and the internal _upload_collection and ensure either the
uploader classes/constructors receive the RetryConfig instance or change
_upload_collection to prefer self._retry_config over max_retries when
creating/using the uploader.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@qdrant_client/async_qdrant_fastembed.py`:
- Around line 412-414: The assertion uses membership/indexing on
collection_info.config.params.sparse_vectors without guarding for None; update
the checks in the code around the assertion that references
sparse_vector_field_name and collection_info.config.params.vectors to first
ensure collection_info.config.params.sparse_vectors is not None (e.g., raise the
same validation error if it is None), then perform the membership test and
indexing; apply the same None-guard pattern to the other occurrence that checks
sparse_vectors (the block referencing sparse_vector_field_name in the nearby
lines) so a TypeError cannot be raised when sparse_vectors is None.
- Around line 397-409: Replace the runtime asserts in the compatibility checks
with explicit exceptions: validate that collection_info.config.params.vectors is
a dict and that vector_field_name exists there (raise ValueError/TypeError with
clear messages referencing collection_info.config.params.vectors and
vector_field_name), then fetch vector_params and validate embeddings_size ==
vector_params.size and distance == vector_params.distance raising explicit
exceptions on mismatch; additionally, make the sparse vector membership check
safe by guarding against None (check that
collection_info.config.params.sparse_vectors is not None before testing
membership of sparse_vector_field_name) and raise a clear exception if
sparse_vector_field_name is provided but sparse_vectors is missing.

In `@qdrant_client/common/retry.py`:
- Around line 48-67: Add a __post_init__ on the frozen dataclass RetryConfig
that validates and normalizes inputs: ensure max_retries >= 0, initial_backoff
>= 0, max_backoff >= initial_backoff, backoff_multiplier > 0, and 0.0 <= jitter
<= 1.0, raising ValueError with clear messages for violations; also coerce
retryable_status_codes and retryable_grpc_codes to frozenset (use
object.__setattr__ to mutate fields in the frozen dataclass) so the runtime
object is truly immutable and callers get early, explicit errors instead of
downstream failures in functions that rely on these fields.
- Around line 101-107: The _parse_retry_after helper only handles numeric
delta-seconds; update it to also accept HTTP-date per RFC7231 by: first
attempting the existing float conversion on value, and if that fails, parse the
HTTP-date string (use email.utils.parsedate_to_datetime or an RFC-1123 parser)
to a datetime, compute the seconds until that date relative to now (handling
timezone-aware vs naive datetimes), and return that seconds as a float (or None
if parsing fails or date is in the past). Keep the function name
_parse_retry_after and its return types unchanged, and ensure exceptions from
date parsing are caught and result in None.
- Around line 147-158: The retry path currently abandons the httpx.Response
(variable response) when is_retryable_status(...) triggers a retry, which can
exhaust the connection pool; before each continue in the sync retry branch
(where compute_backoff(...) is used) call response.close(), and in the async
middleware equivalent call await response.aclose() before continue; update both
retry blocks (the shown block and the similar block around lines 198–209) to
ensure responses are closed/aclosed prior to sleeping/continuing.

In `@qdrant_client/qdrant_remote.py`:
- Around line 241-242: Currently install_retry_middleware(self.openapi_client,
self._retry_config) is enabling automatic retries for every REST call made via
self.openapi_client (and the same retry config is reused for gRPC), which risks
replaying non-idempotent mutating operations; change this by removing the
blanket middleware installation and instead apply retries only to safe
endpoints: either add a whitelist/flag mechanism to call
install_retry_middleware per-operation (e.g., when executing
read-only/idempotent methods) or add logic in the middleware installer to accept
an allowed_methods/allowed_paths set and only wrap requests for read/idempotent
operations; update code paths that construct the client (where
self.openapi_client is initialized) and any place that currently relies on
global install_retry_middleware so that mutating methods like snapshot
creation/recovery, alias updates, collection lifecycle, and cluster mutations
are excluded unless explicitly opted-in.

In `@tests/test_retry.py`:
- Around line 360-377: The test test_retries_on_resource_exhausted_exception
currently only checks call_count and doesn't verify the zero-delay Retry-After
behavior; update the test to either (A) assert the interceptor uses the
retry-after value by patching/mocking the sleep function used inside
retry_interceptor (or whatever backoff/sleep helper it calls) and verify it was
called with 0 when ResourceExhaustedResponse(retry_after_s=0) is raised,
referencing the RetryConfig, retry_interceptor, ResourceExhaustedResponse and
intercept_unary_unary symbols, or (B) more simply change the raised
ResourceExhaustedResponse to use a positive retry_after_s (e.g. 0.5) and assert
the mocked sleep was called with that positive value so the retry-after branch
is exercised (also ensure the test patches the same sleep/backoff helper the
interceptor uses).

---

Nitpick comments:
In `@qdrant_client/qdrant_remote.py`:
- Around line 66-70: The retry_config passed into the constructor is not applied
to upload helpers: update upload_points and upload_collection to pass
self._retry_config into the uploader flow (or make _upload_collection read
self._retry_config) so they no longer fall back to the legacy max_retries path;
locate the public methods upload_points and upload_collection and the internal
_upload_collection and ensure either the uploader classes/constructors receive
the RetryConfig instance or change _upload_collection to prefer
self._retry_config over max_retries when creating/using the uploader.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 10486d23-3211-4c06-a67f-da9ba92145c5

📥 Commits

Reviewing files that changed from the base of the PR and between cd5eb25 and 2ba0f45.

📒 Files selected for processing (10)

qdrant_client/__init__.py
qdrant_client/async_qdrant_client.py
qdrant_client/async_qdrant_fastembed.py
qdrant_client/async_qdrant_remote.py
qdrant_client/common/retry.py
qdrant_client/connection.py
qdrant_client/local/async_qdrant_local.py
qdrant_client/qdrant_client.py
qdrant_client/qdrant_remote.py
tests/test_retry.py

coderabbitai · 2026-03-13T17:02:05Z

qdrant_client/async_qdrant_fastembed.py

+        assert isinstance(collection_info.config.params.vectors, dict), (
+            f"Collection have incompatible vector params: {collection_info.config.params.vectors}"
+        )
+        assert vector_field_name in collection_info.config.params.vectors, (
+            f"Collection have incompatible vector params: {collection_info.config.params.vectors}, expected {vector_field_name}"
+        )
        vector_params = collection_info.config.params.vectors[vector_field_name]
-        assert (
-            embeddings_size == vector_params.size
-        ), f"Embedding size mismatch: {embeddings_size} != {vector_params.size}"
-        assert (
-            distance == vector_params.distance
-        ), f"Distance mismatch: {distance} != {vector_params.distance}"
+        assert embeddings_size == vector_params.size, (
+            f"Embedding size mismatch: {embeddings_size} != {vector_params.size}"
+        )
+        assert distance == vector_params.distance, (
+            f"Distance mismatch: {distance} != {vector_params.distance}"
+        )


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

head -50 qdrant_client/async_qdrant_fastembed.py

Repository: qdrant/qdrant-client

Length of output: 1880

🏁 Script executed:

sed -n '370,430p' qdrant_client/async_qdrant_fastembed.py

Repository: qdrant/qdrant-client

Length of output: 3310

🏁 Script executed:

find . -type f -name "*fastembed*" | head -20

Repository: qdrant/qdrant-client

Length of output: 410

🏁 Script executed:

rg "autogenerated|auto-generated|generated by" qdrant_client/async_qdrant_fastembed.py

Repository: qdrant/qdrant-client

Length of output: 118

🏁 Script executed:

find . -type f \( -name "*.py" -o -name "*.jinja*" -o -name "*.template" \) | xargs rg -l "async_qdrant_fastembed" 2>/dev/null | head -10

Repository: qdrant/qdrant-client

Length of output: 207

🏁 Script executed:

cat -n tools/async_client_generator/fastembed_generator.py

Repository: qdrant/qdrant-client

Length of output: 3423

🏁 Script executed:

cat -n qdrant_client/qdrant_fastembed.py | sed -n '390,430p'

Repository: qdrant/qdrant-client

Length of output: 2249

🏁 Script executed:

find tools/async_client_generator -type f -name "*.py" -o -name "*.jinja*" | xargs ls -la

Repository: qdrant/qdrant-client

Length of output: 2926

🏁 Script executed:

grep -r "validate_collection_info" tools/async_client_generator/

Repository: qdrant/qdrant-client

Length of output: 46

🏁 Script executed:

cat -n qdrant_client/qdrant_fastembed.py | sed -n '405,445p'

Repository: qdrant/qdrant-client

Length of output: 2329

🏁 Script executed:

grep -A 50 "_validate_collection_info" qdrant_client/qdrant_fastembed.py | head -60

Repository: qdrant/qdrant-client

Length of output: 2629

🏁 Script executed:

cat -n qdrant_client/qdrant_fastembed.py | sed -n '425,445p'

Repository: qdrant/qdrant-client

Length of output: 1239

🏁 Script executed:

grep -B 5 -A 5 "sparse_vectors\[" qdrant_client/qdrant_fastembed.py

Repository: qdrant/qdrant-client

Length of output: 730

🏁 Script executed:

rg "sparse_vectors.*None" qdrant_client/ --type py -B 3 -A 3

Repository: qdrant/qdrant-client

Length of output: 46770

🏁 Script executed:

rg "class CollectionParams|sparse_vectors.*:" qdrant_client/http/models.py | head -50

Repository: qdrant/qdrant-client

Length of output: 132

🏁 Script executed:

rg "sparse_vectors" qdrant_client/http/models/__init__.py -A 2 -B 2 | head -100

Repository: qdrant/qdrant-client

Length of output: 46

🏁 Script executed:

find . -type f -name "*.py" -path "*/http/models*" | xargs grep -l "sparse_vectors" | head -5

Repository: qdrant/qdrant-client

Length of output: 101

🏁 Script executed:

rg "class.*Params.*:" qdrant_client/http/models.py -A 20 | grep -A 20 "sparse_vectors"

Repository: qdrant/qdrant-client

Length of output: 132

🏁 Script executed:

python3 << 'EOF' import sys sys.path.insert(0, '.') try: from qdrant_client.http import models collection_info = models.CollectionInfo config_type = models.CollectionConfig params_type = models.CollectionParams print("CollectionParams attributes:") if hasattr(params_type, '__fields__'): for name, field in params_type.__fields__.items(): if 'sparse' in name.lower(): print(f" {name}: {field.type_}") else: import inspect print(inspect.signature(params_type.__init__)) except Exception as e: print(f"Error: {e}") EOF

Repository: qdrant/qdrant-client

Length of output: 93

🏁 Script executed:

grep -r "sparse_vectors:" qdrant_client/http/models/ --include="*.py" | head -10

Repository: qdrant/qdrant-client

Length of output: 374

🏁 Script executed:

sed -n '425,440p' qdrant_client/http/models/models.py

Repository: qdrant/qdrant-client

Length of output: 462

🏁 Script executed:

sed -n '428,440p' qdrant_client/qdrant_fastembed.py

Repository: qdrant/qdrant-client

Length of output: 802

🏁 Script executed:

grep -B 5 -A 5 "sparse_vectors: Optional" qdrant_client/http/models/models.py | head -30

Repository: qdrant/qdrant-client

Length of output: 1643

🏁 Script executed:

sed -n '428,440p' qdrant_client/qdrant_fastembed.py

Repository: qdrant/qdrant-client

Length of output: 802

🏁 Script executed:

cat -n qdrant_client/http/models/models.py | sed -n '1,50p'

Repository: qdrant/qdrant-client

Length of output: 2379

🏁 Script executed:

grep -n "class CollectionParams" qdrant_client/http/models/models.py

Repository: qdrant/qdrant-client

Length of output: 161

🏁 Script executed:

sed -n '1090,1130p' qdrant_client/http/models/models.py

Repository: qdrant/qdrant-client

Length of output: 1433

🏁 Script executed:

cat -n qdrant_client/qdrant_fastembed.py | sed -n '405,440p'

Repository: qdrant/qdrant-client

Length of output: 2124

🏁 Script executed:

grep -n "def _validate_collection_info" tools/async_client_generator/transformers/fastembed/function_def_transformer.py

Repository: qdrant/qdrant-client

Length of output: 46

🏁 Script executed:

cat tools/async_client_generator/transformers/fastembed/function_def_transformer.py

Repository: qdrant/qdrant-client

Length of output: 523

🏁 Script executed:

cat -n tools/async_client_generator/transformers/function_def_transformer.py

Repository: qdrant/qdrant-client

Length of output: 1379

🏁 Script executed:

grep -A 15 "sparse_vector_field_name = self.get_sparse_vector_field_name()" qdrant_client/qdrant_fastembed.py

Repository: qdrant/qdrant-client

Length of output: 1501

Replace assert with explicit exceptions and add None-safety check for sparse vectors.

These are runtime compatibility checks that should not rely on assert statements, as they can be completely removed with Python's -O optimization flag, silently skipping validation and causing less actionable errors later. Additionally, the membership check for sparse_vectors at line 431 lacks a None guard—if sparse_vector_field_name is not None but sparse_vectors is None, this will raise TypeError.

Since this file is autogenerated, apply the fix to the source in qdrant_client/qdrant_fastembed.py:

Proposed fix

assert isinstance( collection_info.config.params.vectors, dict ), f"Collection have incompatible vector params: {collection_info.config.params.vectors}" + if not isinstance(collection_info.config.params.vectors, dict): + raise ValueError( + f"Collection has incompatible vector params: {collection_info.config.params.vectors}" + ) assert ( vector_field_name in collection_info.config.params.vectors ), f"Collection have incompatible vector params: {collection_info.config.params.vectors}, expected {vector_field_name}" + if vector_field_name not in collection_info.config.params.vectors: + raise ValueError( + f"Collection has incompatible vector params: {collection_info.config.params.vectors}, expected {vector_field_name}" + ) vector_params = collection_info.config.params.vectors[vector_field_name] assert ( embeddings_size == vector_params.size ), f"Embedding size mismatch: {embeddings_size} != {vector_params.size}" + if embeddings_size != vector_params.size: + raise ValueError(f"Embedding size mismatch: {embeddings_size} != {vector_params.size}") assert ( distance == vector_params.distance ), f"Distance mismatch: {distance} != {vector_params.distance}" + if distance != vector_params.distance: + raise ValueError(f"Distance mismatch: {distance} != {vector_params.distance}") sparse_vector_field_name = self.get_sparse_vector_field_name() if sparse_vector_field_name is not None: + if collection_info.config.params.sparse_vectors is None: + raise ValueError( + f"Collection has sparse vector field '{sparse_vector_field_name}' but sparse_vectors config is not set" + ) assert ( sparse_vector_field_name in collection_info.config.params.sparse_vectors ), f"Collection have incompatible vector params: {collection_info.config.params.vectors}" + if sparse_vector_field_name not in collection_info.config.params.sparse_vectors: + raise ValueError( + f"Collection has incompatible sparse vector params, expected '{sparse_vector_field_name}'" + )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@qdrant_client/async_qdrant_fastembed.py` around lines 397 - 409, Replace the runtime asserts in the compatibility checks with explicit exceptions: validate that collection_info.config.params.vectors is a dict and that vector_field_name exists there (raise ValueError/TypeError with clear messages referencing collection_info.config.params.vectors and vector_field_name), then fetch vector_params and validate embeddings_size == vector_params.size and distance == vector_params.distance raising explicit exceptions on mismatch; additionally, make the sparse vector membership check safe by guarding against None (check that collection_info.config.params.sparse_vectors is not None before testing membership of sparse_vector_field_name) and raise a clear exception if sparse_vector_field_name is provided but sparse_vectors is missing.

coderabbitai · 2026-03-13T17:02:05Z

qdrant_client/async_qdrant_fastembed.py

+            assert sparse_vector_field_name in collection_info.config.params.sparse_vectors, (
+                f"Collection have incompatible vector params: {collection_info.config.params.vectors}"
+            )


⚠️ Potential issue | 🟠 Major

Guard sparse_vectors against None before membership/indexing.

If collection_info.config.params.sparse_vectors is None, the membership test and subsequent indexing can raise TypeError instead of the intended validation error path.

Proposed fix

sparse_vector_field_name = self.get_sparse_vector_field_name() if sparse_vector_field_name is not None: - assert sparse_vector_field_name in collection_info.config.params.sparse_vectors, ( - f"Collection have incompatible vector params: {collection_info.config.params.vectors}" - ) + sparse_vectors = collection_info.config.params.sparse_vectors + if not isinstance(sparse_vectors, dict) or sparse_vector_field_name not in sparse_vectors: + raise ValueError( + f"Collection has incompatible sparse vector params: {sparse_vectors}, expected {sparse_vector_field_name}" + ) if self.sparse_embedding_model_name in IDF_EMBEDDING_MODELS: - modifier = collection_info.config.params.sparse_vectors[ - sparse_vector_field_name - ].modifier - assert modifier == models.Modifier.IDF, ( - f"{self.sparse_embedding_model_name} requires modifier IDF, current modifier is {modifier}" - ) + modifier = sparse_vectors[sparse_vector_field_name].modifier + if modifier != models.Modifier.IDF: + raise ValueError( + f"{self.sparse_embedding_model_name} requires modifier IDF, current modifier is {modifier}" + )

Also applies to: 419-421

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@qdrant_client/async_qdrant_fastembed.py` around lines 412 - 414, The assertion uses membership/indexing on collection_info.config.params.sparse_vectors without guarding for None; update the checks in the code around the assertion that references sparse_vector_field_name and collection_info.config.params.vectors to first ensure collection_info.config.params.sparse_vectors is not None (e.g., raise the same validation error if it is None), then perform the membership test and indexing; apply the same None-guard pattern to the other occurrence that checks sparse_vectors (the block referencing sparse_vector_field_name in the nearby lines) so a TypeError cannot be raised when sparse_vectors is None.

coderabbitai · 2026-03-13T17:02:05Z

qdrant_client/common/retry.py

+@dataclass(frozen=True)
+class RetryConfig:
+    """User-facing retry configuration.
+
+    Args:
+        max_retries: Maximum number of retry attempts (0 = no retries). Default: 3.
+        initial_backoff: Base backoff duration in seconds. Default: 0.1.
+        max_backoff: Upper bound on backoff duration in seconds. Default: 30.
+        backoff_multiplier: Multiplier applied per attempt. Default: 2.
+        jitter: Jitter factor (0–1) added to backoff to avoid thundering herd. Default: 0.1.
+        retryable_status_codes: HTTP status codes eligible for retry.
+    """
+
+    max_retries: int = 3
+    initial_backoff: float = 0.1
+    max_backoff: float = 30.0
+    backoff_multiplier: float = 2.0
+    jitter: float = 0.1
+    retryable_status_codes: frozenset[int] = _DEFAULT_RETRYABLE_STATUS_CODES
+    retryable_grpc_codes: frozenset[grpc.StatusCode] = _DEFAULT_RETRYABLE_GRPC_CODES


⚠️ Potential issue | 🟠 Major

Reject invalid retry configs up front.

RetryConfig is public, but nothing prevents negative retries/backoffs or a jitter value outside the documented 0..1 range. With max_retries < 0, the loops at Lines 129, 180, 280, and 357 never execute and these functions fall through to an undefined response; negative backoffs will also fail in sleep(). It would also help to coerce the code collections to frozenset here so the frozen dataclass stays actually immutable at runtime.

Suggested fix

`@dataclass`(frozen=True) class RetryConfig: @@ retryable_status_codes: frozenset[int] = _DEFAULT_RETRYABLE_STATUS_CODES retryable_grpc_codes: frozenset[grpc.StatusCode] = _DEFAULT_RETRYABLE_GRPC_CODES + + def __post_init__(self) -> None: + object.__setattr__( + self, "retryable_status_codes", frozenset(self.retryable_status_codes) + ) + object.__setattr__( + self, "retryable_grpc_codes", frozenset(self.retryable_grpc_codes) + ) + if self.max_retries < 0: + raise ValueError("max_retries must be >= 0") + if self.initial_backoff < 0 or self.max_backoff < 0: + raise ValueError("backoff values must be >= 0") + if self.max_backoff < self.initial_backoff: + raise ValueError("max_backoff must be >= initial_backoff") + if self.backoff_multiplier <= 0: + raise ValueError("backoff_multiplier must be > 0") + if not 0 <= self.jitter <= 1: + raise ValueError("jitter must be between 0 and 1")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

@dataclass(frozen=True)

class RetryConfig:

"""User-facing retry configuration.

Args:

max_retries: Maximum number of retry attempts (0 = no retries). Default: 3.

initial_backoff: Base backoff duration in seconds. Default: 0.1.

max_backoff: Upper bound on backoff duration in seconds. Default: 30.

backoff_multiplier: Multiplier applied per attempt. Default: 2.

jitter: Jitter factor (0–1) added to backoff to avoid thundering herd. Default: 0.1.

retryable_status_codes: HTTP status codes eligible for retry.

"""

max_retries: int = 3

initial_backoff: float = 0.1

max_backoff: float = 30.0

backoff_multiplier: float = 2.0

jitter: float = 0.1

retryable_status_codes: frozenset[int] = _DEFAULT_RETRYABLE_STATUS_CODES

retryable_grpc_codes: frozenset[grpc.StatusCode] = _DEFAULT_RETRYABLE_GRPC_CODES

`@dataclass`(frozen=True)

class RetryConfig:

"""User-facing retry configuration.

Args:

max_retries: Maximum number of retry attempts (0 = no retries). Default: 3.

initial_backoff: Base backoff duration in seconds. Default: 0.1.

max_backoff: Upper bound on backoff duration in seconds. Default: 30.

backoff_multiplier: Multiplier applied per attempt. Default: 2.

jitter: Jitter factor (0–1) added to backoff to avoid thundering herd. Default: 0.1.

retryable_status_codes: HTTP status codes eligible for retry.

"""

max_retries: int = 3

initial_backoff: float = 0.1

max_backoff: float = 30.0

backoff_multiplier: float = 2.0

jitter: float = 0.1

retryable_status_codes: frozenset[int] = _DEFAULT_RETRYABLE_STATUS_CODES

retryable_grpc_codes: frozenset[grpc.StatusCode] = _DEFAULT_RETRYABLE_GRPC_CODES

def __post_init__(self) -> None:

object.__setattr__(

self, "retryable_status_codes", frozenset(self.retryable_status_codes)

)

object.__setattr__(

self, "retryable_grpc_codes", frozenset(self.retryable_grpc_codes)

)

if self.max_retries < 0:

raise ValueError("max_retries must be >= 0")

if self.initial_backoff < 0 or self.max_backoff < 0:

raise ValueError("backoff values must be >= 0")

if self.max_backoff < self.initial_backoff:

raise ValueError("max_backoff must be >= initial_backoff")

if self.backoff_multiplier <= 0:

raise ValueError("backoff_multiplier must be > 0")

if not 0 <= self.jitter <= 1:

raise ValueError("jitter must be between 0 and 1")

🧰 Tools

🪛 Ruff (0.15.5)

[warning] 57-57: Docstring contains ambiguous – (EN DASH). Did you mean - (HYPHEN-MINUS)?

(RUF002)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@qdrant_client/common/retry.py` around lines 48 - 67, Add a __post_init__ on the frozen dataclass RetryConfig that validates and normalizes inputs: ensure max_retries >= 0, initial_backoff >= 0, max_backoff >= initial_backoff, backoff_multiplier > 0, and 0.0 <= jitter <= 1.0, raising ValueError with clear messages for violations; also coerce retryable_status_codes and retryable_grpc_codes to frozenset (use object.__setattr__ to mutate fields in the frozen dataclass) so the runtime object is truly immutable and callers get early, explicit errors instead of downstream failures in functions that rely on these fields.

coderabbitai · 2026-03-13T17:02:05Z

qdrant_client/common/retry.py

+def _parse_retry_after(value: str | None) -> float | None:
+    if value is None:
+        return None
+    try:
+        return float(value)
+    except (ValueError, TypeError):
+        return None


⚠️ Potential issue | 🟡 Minor

Parse HTTP-date Retry-After values too.

Retry-After can be sent either as delta-seconds or as an HTTP-date. This helper only accepts the numeric form, so 429/503 responses from proxies or load balancers that use a date header will ignore the server cooldown and fall back to client-side backoff.

Suggested fix

+from datetime import datetime, timezone +from email.utils import parsedate_to_datetime @@ def _parse_retry_after(value: str | None) -> float | None: if value is None: return None try: - return float(value) + return max(0.0, float(value)) except (ValueError, TypeError): - return None + try: + retry_at = parsedate_to_datetime(value) + except (TypeError, ValueError, IndexError): + return None + if retry_at.tzinfo is None: + retry_at = retry_at.replace(tzinfo=timezone.utc) + return max(0.0, (retry_at - datetime.now(timezone.utc)).total_seconds())

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def _parse_retry_after(value: str | None) -> float | None:

if value is None:

return None

try:

return float(value)

except (ValueError, TypeError):

return None

from datetime import datetime, timezone

from email.utils import parsedate_to_datetime

def _parse_retry_after(value: str | None) -> float | None:

if value is None:

return None

try:

return max(0.0, float(value))

except (ValueError, TypeError):

try:

retry_at = parsedate_to_datetime(value)

except (TypeError, ValueError, IndexError):

return None

if retry_at.tzinfo is None:

retry_at = retry_at.replace(tzinfo=timezone.utc)

return max(0.0, (retry_at - datetime.now(timezone.utc)).total_seconds())

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@qdrant_client/common/retry.py` around lines 101 - 107, The _parse_retry_after helper only handles numeric delta-seconds; update it to also accept HTTP-date per RFC7231 by: first attempting the existing float conversion on value, and if that fails, parse the HTTP-date string (use email.utils.parsedate_to_datetime or an RFC-1123 parser) to a datetime, compute the seconds until that date relative to now (handling timezone-aware vs naive datetimes), and return that seconds as a float (or None if parsing fails or date is in the past). Keep the function name _parse_retry_after and its return types unchanged, and ensure exceptions from date parsing are caught and result in None.

coderabbitai · 2026-03-13T17:02:06Z

qdrant_client/common/retry.py

+            if is_retryable_status(response.status_code, config) and attempt < config.max_retries:
+                retry_after = _parse_retry_after(response.headers.get("Retry-After"))
+                backoff = compute_backoff(attempt, config, retry_after=retry_after)
+                logger.debug(
+                    "Retry %d/%d after HTTP %d (backoff=%.2fs)",
+                    attempt + 1,
+                    config.max_retries,
+                    response.status_code,
+                    backoff,
+                )
+                time.sleep(backoff)
+                continue


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail sed -n '1,260p' qdrant_client/http/api_client.py rg -n -C3 'send\(|stream=' qdrant_client/http/api_client.py

Repository: qdrant/qdrant-client

Length of output: 12779

🏁 Script executed:

#!/bin/bash set -euo pipefail # Find and view retry.py fd -n 'retry.py' qdrant_client/common/ wc -l qdrant_client/common/retry.py sed -n '130,220p' qdrant_client/common/retry.py

Repository: qdrant/qdrant-client

Length of output: 294

🏁 Script executed:

#!/bin/bash set -euo pipefail # Find and view retry.py fd 'retry.py' qdrant_client/common/ wc -l qdrant_client/common/retry.py sed -n '130,220p' qdrant_client/common/retry.py

Repository: qdrant/qdrant-client

Length of output: 3455

🏁 Script executed:

#!/bin/bash set -euo pipefail # View the full retry middleware implementation sed -n '1,200p' qdrant_client/common/retry.py

Repository: qdrant/qdrant-client

Length of output: 7138

🏁 Script executed:

#!/bin/bash set -euo pipefail # Check for any response.close calls in retry.py rg 'response\.close|response\.aclose' qdrant_client/common/retry.py # Check for context managers or with statements rg 'with.*response|async with.*response' qdrant_client/common/retry.py # Look for complete retry.py to see if there's any cleanup sed -n '200,250p' qdrant_client/common/retry.py

Repository: qdrant/qdrant-client

Length of output: 46

🏁 Script executed:

#!/bin/bash set -euo pipefail # Look at the rest of the async retry middleware sed -n '200,250p' qdrant_client/common/retry.py # Check how response is used elsewhere in the codebase rg -B3 -A3 'is_retryable_status' qdrant_client/

Repository: qdrant/qdrant-client

Length of output: 3363

🏁 Script executed:

#!/bin/bash set -euo pipefail # Check how httpx is used elsewhere in the codebase for response handling rg -B2 -A2 'response\.close|aclose' qdrant_client/ --type py | head -50 # Check imports and httpx usage patterns rg 'from httpx|import httpx' qdrant_client/ --type py | head -20 # Look for any context manager usage with responses rg 'with.*Client|async with.*Client' qdrant_client/ --type py | head -20

Repository: qdrant/qdrant-client

Length of output: 50376

🏁 Script executed:

#!/bin/bash set -euo pipefail # Check if there's any documentation or test showing the issue rg -l 'response.*close|Response.*close' qdrant_client/ --type py # Look for how responses are used in the API calls sed -n '111,130p' qdrant_client/http/api_client.py

Repository: qdrant/qdrant-client

Length of output: 1001

🏁 Script executed:

#!/bin/bash set -euo pipefochash # Check how qdrant_client uses httpx in context of streaming/responses rg 'stream.*=' qdrant_client/http/ -A 2 -B 2 # Look at the actual HTTP client send implementation sed -n '130,140p' qdrant_client/http/api_client.py

Repository: qdrant/qdrant-client

Length of output: 120

🏁 Script executed:

#!/bin/bash set -euo pipefail # Check how qdrant_client uses httpx in context of streaming/responses rg 'stream.*=' qdrant_client/http/ -A 2 -B 2 # Look at the actual HTTP client send implementation sed -n '130,140p' qdrant_client/http/api_client.py

Repository: qdrant/qdrant-client

Length of output: 46

Close discarded retry responses to avoid connection pool exhaustion.

When is_retryable_status() returns true and the retry path executes continue, the unconsumed httpx.Response object is abandoned without being closed. This can exhaust the connection pool across retries. Add response.close() before the continue statement in the sync middleware and await response.aclose() in the async middleware.

Also applies to: lines 198–209

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@qdrant_client/common/retry.py` around lines 147 - 158, The retry path currently abandons the httpx.Response (variable response) when is_retryable_status(...) triggers a retry, which can exhaust the connection pool; before each continue in the sync retry branch (where compute_backoff(...) is used) call response.close(), and in the async middleware equivalent call await response.aclose() before continue; update both retry blocks (the shown block and the similar block around lines 198–209) to ensure responses are closed/aclosed prior to sleeping/continuing.

coderabbitai · 2026-03-13T17:02:06Z

qdrant_client/qdrant_remote.py

+        if self._retry_config is not None:
+            install_retry_middleware(self.openapi_client, self._retry_config)


⚠️ Potential issue | 🔴 Critical

Don't apply automatic retries to every mutating endpoint.

Installing the middleware here makes every REST call below this class retriable, and the same config is later passed to gRPC on Line 370. That includes non-idempotent operations like snapshot creation/recovery, alias updates, collection lifecycle, and cluster mutations. If the first attempt commits server-side but the client only sees a transient transport failure, the replay can duplicate side effects or turn a successful write into a client-visible failure. Please scope retries to read-only / explicitly idempotent calls, or require per-operation opt-in for writes.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@qdrant_client/qdrant_remote.py` around lines 241 - 242, Currently install_retry_middleware(self.openapi_client, self._retry_config) is enabling automatic retries for every REST call made via self.openapi_client (and the same retry config is reused for gRPC), which risks replaying non-idempotent mutating operations; change this by removing the blanket middleware installation and instead apply retries only to safe endpoints: either add a whitelist/flag mechanism to call install_retry_middleware per-operation (e.g., when executing read-only/idempotent methods) or add logic in the middleware installer to accept an allowed_methods/allowed_paths set and only wrap requests for read/idempotent operations; update code paths that construct the client (where self.openapi_client is initialized) and any place that currently relies on global install_retry_middleware so that mutating methods like snapshot creation/recovery, alias updates, collection lifecycle, and cluster mutations are excluded unless explicitly opted-in.

coderabbitai · 2026-03-13T17:02:06Z

tests/test_retry.py

+    def test_retries_on_resource_exhausted_exception(self):
+        config = RetryConfig(max_retries=1, initial_backoff=0.001, jitter=0.0)
+        interceptor = retry_interceptor(config)
+
+        call_count = 0
+        mock_success = MagicMock()
+        mock_success.code.return_value = grpc.StatusCode.OK
+
+        def continuation(details, request):
+            nonlocal call_count
+            call_count += 1
+            if call_count == 1:
+                raise ResourceExhaustedResponse(message="Rate limited", retry_after_s=0)
+            return mock_success
+
+        result = interceptor.intercept_unary_unary(continuation, MagicMock(), MagicMock())
+        assert result is mock_success
+        assert call_count == 2


⚠️ Potential issue | 🟡 Minor

This won't verify the zero-delay Retry-After path.

retry_after_s=0 is the interesting edge case here, but this test only checks call_count. If the interceptor treats 0 as falsy and falls back to exponential backoff, this still passes. Please either assert the sleep duration or use a positive retry_after_s so the retry-after branch is actually exercised.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/test_retry.py` around lines 360 - 377, The test test_retries_on_resource_exhausted_exception currently only checks call_count and doesn't verify the zero-delay Retry-After behavior; update the test to either (A) assert the interceptor uses the retry-after value by patching/mocking the sleep function used inside retry_interceptor (or whatever backoff/sleep helper it calls) and verify it was called with 0 when ResourceExhaustedResponse(retry_after_s=0) is raised, referencing the RetryConfig, retry_interceptor, ResourceExhaustedResponse and intercept_unary_unary symbols, or (B) more simply change the raised ResourceExhaustedResponse to use a positive retry_after_s (e.g. 0.5) and assert the mocked sleep was called with that positive value so the retry-after branch is exercised (also ensure the test patches the same sleep/backoff helper the interceptor uses).

feat: add retry/resilience middleware with exponential backoff

2ba0f45

Add configurable automatic retries for transient failures on both REST and gRPC interfaces. Disabled by default (retry_config=None).

coderabbitai bot reviewed Mar 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add retry/resilience middleware with exponential backoff#1171

feat: add retry/resilience middleware with exponential backoff#1171
Shiven0504 wants to merge 1 commit intoqdrant:masterfrom
Shiven0504:feat/retry-middleware

Shiven0504 commented Mar 13, 2026

Uh oh!

netlify bot commented Mar 13, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Mar 13, 2026

Walkthrough

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 13, 2026

Uh oh!

coderabbitai bot Mar 13, 2026

Uh oh!

coderabbitai bot Mar 13, 2026

Uh oh!

coderabbitai bot Mar 13, 2026

Uh oh!

coderabbitai bot Mar 13, 2026

Uh oh!

coderabbitai bot Mar 13, 2026

Uh oh!

coderabbitai bot Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		if self._retry_config is not None:
		install_retry_middleware(self.openapi_client, self._retry_config)

Conversation

Shiven0504 commented Mar 13, 2026

Uh oh!

netlify bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for poetic-froyo-8baba7 ready!

Uh oh!

coderabbitai bot commented Mar 13, 2026

Walkthrough

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

netlify bot commented Mar 13, 2026 •

edited

Loading