Handle socket timeouts as non error to prevent leds to turn off/on#141
Handle socket timeouts as non error to prevent leds to turn off/on#141marcoh242 wants to merge 5 commits intowebosbrew:mainfrom
Conversation
Treat socket timeouts (EAGAIN/EWOULDBLOCK) as a non-error in hyperion_read(), returning 0 instead of -1. This prevents unnecessary disconnects when no data is available yet. This fix was done by Claude 4.6
|
@marcoh242 Thanks for the PR! Unfortunately I remember that we had an issue where hyperion-webos needeed a full restart after a short network disconnect or longer idle period. I think this was the supposed fix to this problem. While your finding is still a bug, it might need a different approach. If you want to continue working on this I created a Claude planning prompt (/plan) for you with ChatGPT: |
|
I followed your suggestions and let Claude implemented the generated plan which resulted in the same initially error i got. Plan: Robust hyperion_read() for stream socketsTL;DRMake 1. Bug AnalysisMain bug (confirmed): Short reads (confirmed real problem): TCP EINTR (confirmed real problem): EAGAIN/EWOULDBLOCK (confirmed real problem, partially fixed): The current patch handles this but is incomplete — returning EOF handling (needs fix):
2. Proposed DesignIntroduce a persistent receive state struct and a helper Return values (unchanged contract):
New static state:enum rx_phase { RX_HEADER, RX_BODY };
static struct {
enum rx_phase phase;
uint8_t header[4];
uint32_t body_len;
uint32_t received; // bytes received so far in current phase
} rx;Helper _read_exact():Returns:
Handles EINTR internally by retrying. hyperion_read() logic:
Reset points:
3. RationaleWhy a state machine instead of just looping inside read()? Why not change service.c? Why not use non-blocking sockets + poll()? 4. Critical Assessment of Requirements
StepsPhase 1: hyperion_client.c changes
Phase 2: No changes to service.c
Phase 3: No changes to hyperion_client.h
2nd Iteration:
|
|
I think this looks better, but the 2nd iteratrion still has brought back the missing reconnect issue. |
1. Add `#include <time.h>` to system includes 2. Add `#define RX_STALE_SECS 120` after `recvBuff` 3. Add `static time_t rx_last_data;` after `rx` struct 4. Add `rx_last_data = time(NULL);` to `_rx_reset()` 5. Add `_check_stale()` helper after `_read_exact()`, before `hyperion_client()` 6. Modify `hyperion_read()`: - After `_read_exact()` returns > 0: `rx_last_data = time(NULL);` - Replace `return 0` at idle exits with `return _check_stale();`
|
Branch is updated. Here is the output: Plan: Robust hyperion_read() for stream socketsTL;DRMake 1. Bug AnalysisMain bug (confirmed): Short reads (confirmed real problem): TCP EINTR (confirmed real problem): EAGAIN/EWOULDBLOCK (confirmed real problem, partially fixed): The current patch handles this but is incomplete — returning EOF handling (needs fix):
2. Proposed DesignIntroduce a persistent receive state struct and a helper Return values (unchanged contract):
New static state:enum rx_phase { RX_HEADER, RX_BODY };
static struct {
enum rx_phase phase;
uint8_t header[4];
uint32_t body_len;
uint32_t received; // bytes received so far in current phase
} rx;Helper _read_exact():Returns:
Handles EINTR internally by retrying. hyperion_read() logic:
Reset points:
3. RationaleWhy a state machine instead of just looping inside read()? Why not change service.c? Why not use non-blocking sockets + poll()? 4. Critical Assessment of Requirements
Completed: State machine + partial-read handling (already implemented)The rx state machine, Remaining fix: stale-connection detection via time-based idle checkProblemCurrent code treats SolutionAdd Steps
No changes to:
Verification
|
|
@marcoh242 Nice, thank you! We might just need to lower the timeout value a bit. It looks good now for me, but maybe one of the experienced devs @sundermann @tuxuser @Informatic @mariotaku can have a quick eye on this |
|
@marcoh242 and a few linting fixes are needed |
fix linting
run clang-format
|
@TBSniller Should be fine now |
|
@codex review |
1 similar comment
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9e92d0f6d4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@marcoh242 Can you set the timeout to like 5 or less seconds and test for your fix again? 120s are quite long and maybe not needed. In worst case this would mean the user needs to wait 120 seconds till his LEDs are working again in case of a short connection loss. If it still works for you we are done I guess |
|
Ok gonna test with 5 seconds and see what happens when i change WiFi Channel or restart hyperion |
|
I tested it a while and my issue with the leds turning on/off is gone but If I force a network disconnect by shortly unplugging the network I get the issue u mentioned at first. For me personally its still an improvement but I dont have enough knowledge to work on that further. |
Treat socket timeouts (EAGAIN/EWOULDBLOCK) as a non-error in hyperion_read(), returning 0 instead of -1. This prevents unnecessary disconnects when no data is available yet.
This fix was done by Claude 4.6
Instead of a timeout error where the leds go off shortly I only saw a small lag in Hyperion.NG live preview. Due to smoothing it wasnt even visible for me.
In this log you can see acquire timing gone up to 1026 when i saw a short lag. I guess this created a error before this fix.
Instead of this change a Option to increase the timeout limit should also work for users with this problem.