Open
Conversation
Contributor
|
The current PR does not fix the root cause as described in the original issue. Please modify the technical solution. |
Contributor
|
Please revise your technical solution and reopen the PR. Thank you for your contribution. |
Author
|
sorry for late. Technical Solution: Fix ai-proxy-multi Health Check MechanismRoot CauseThe
SolutionI. Changes to
|
| State | Handling Method |
|---|---|
| Checker created | fetch_node_status to get status |
| Checker being created | get_target_status_by_path from SHM |
| No checker configured | Include by default |
2. Modified get_checkers_status_ver Function (Lines 223-232)
local function get_checkers_status_ver(checkers)
local status_ver_total = 0
for _, checker in pairs(checkers) do
-- Only count status_ver from real checkers (proxy checkers don't have this attribute)
if checker.status_ver then
status_ver_total = status_ver_total + checker.status_ver
end
end
return status_ver_total
endReason: Proxy checker doesn't have status_ver attribute, needs type check
3. Modified pick_target Function (Lines 356-359)
instances[i]._dns_value = instance._dns_value
instances[i]._nodes_ver = instance._nodes_ver
instances[i]._index = i - 1 -- Store index for later useKey points:
_dns_value: Syncs node info computed in request context to config object_index: Stores instance array index for building correct resource_path
Core Mechanism: Why _dns_value Can Be Correctly Accessed in Timer Context
Request Context:
resolve_endpoint(instance) → Sets instance._dns_value
↓
instances[i]._dns_value = instance._dns_value [Syncs to etcd copy]
↓
res_conf.value.plugins['ai-proxy-multi'].instances[i]._dns_value is set
Timer Context:
res_conf.value fetched from etcd
↓
jp.value(res_conf.value, json_path) extracts instance config
↓
upstream_constructor_config._dns_value exists ← because it was synced
↓
construct_upstream works correctly
Key Understanding:
- On first request,
_dns_valueis computed byresolve_endpointand set - Immediately afterward,
instances[i]._dns_value = instance._dns_valuewrites the value to the config object - When timer triggers, the config read from etcd already contains
_dns_value - The entire flow is "lazy initialization + state persistence", not recalculation each time
Summary
| Problem | Solution |
|---|---|
_dns_value missing in timer context |
Sync computed value to config object in request context, timer reads the synced value |
| Health checker lost after creation failure | Add SHM proxy, can read status even if checker not created |
| Caller blocks waiting for checker | Returns proxy checker immediately usable |
status_ver attribute missing |
Add type check to skip proxy checker |
Author
I don't have permission to reopen the PR. What should I do next? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
[##] Description
hi, @Baoyuantop . Based on the 3.15.0-ubuntu Docker image, I have verified the changes in this PR and confirmed that the issue has been resolved.
Which issue(s) this PR fixes:
Fixes # #13101
Checklist