Question: Using a stretched etcd cluster for active–passive application failover across regions #21053
Unanswered
prateekkohli21
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Overview
I’m designing an active–passive deployment across multiple geographic regions and would like to validate whether my approach using a single stretched etcd cluster is correct.
Architecture
Regions
The application runs only in two regions (active + passive).
The third region exists only to provide Raft quorum.
Goal
Leadership should move automatically
The passive site should become active
Proposed Approach
Acquire lease → become active
Continuously renew lease
If lease renewal fails → stop immediately
Another site may acquire the lease
Assumptions
Questions
2.Are there known pitfalls or operational concerns with this design (e.g., WAN latency, quorum stability)?
3.Are there recommended best practices for:
Election timeouts
Lease TTLs
Network latency thresholds
Goal
To achieve strong safety guarantees (no split-brain) rather than fast failover, while keeping the system operationally simple.
Beta Was this translation helpful? Give feedback.
All reactions