Skip to content

Refine fatal error configuration.#423

Open
windelbouwman wants to merge 1 commit intoros-industrial:melodic-develfrom
windelbouwman:error-handling
Open

Refine fatal error configuration.#423
windelbouwman wants to merge 1 commit intoros-industrial:melodic-develfrom
windelbouwman:error-handling

Conversation

@windelbouwman
Copy link
Copy Markdown

This refines selection of fatal en non-fatal errors.

There was no way to select BUSOFF as non fatal. My use case would be to select BUSOFF as non fatal and configure the can device with a restart-ms.

@mathias-luedtke
Copy link
Copy Markdown
Member

There was no way to select BUSOFF as non fatal.

It was written this way on purpose ;)

My use case would be to select BUSOFF as non fatal and configure the can device with a restart-ms

Does this really work with this fix?
(I did not have a chance to test it)

input_.is_error = 1;

if (frame_.can_id & fatal_error_mask_) {
if (frame_.can_id & error_mask_) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about this change..
This triggers the state callback on every error message and might a little bit noisy in general

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My idea was that error_mask_ is used to test for errors which must be logged. The fatal_error_mask_ is intended for errors which must put the driver in not ready mode (using setNotReady). Is this correct?

I like this idea, apart from the recovery strategy, users can configure which errors should be fatal. Another idea would be to simply log all errors, en only allow to configure which errors are fatal.

The concept of "fatal" being the behavior that the driver is put into not ready mode.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This triggers the state callback on every error message and might a little bit noisy in general

Error's should only happen every now and then (arbitration error, ack error). If there are many error's on the bus, it's probably okay to log them? Another strategy could be to rate limit the amount of errors logged, and summarize them, but this would be more work.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, the reason why this change is important, is that in case of an error, every now and then, the driver hangs on "failed to send message".

Now that I think about it, in the current implementation all errors, except the BUS_OFF error can be configured to be ignored, right? Setting the whole bunch of parameters to false will do the trick!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My idea was that error_mask_ is used to test for errors which must be logged.

error_mask entry will be reported to the error frames callback!

We could add a logging_mask for this purpose, which could default to error_mask.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not in favor of adding extra configration. I suggest to move the logging inside the fatal error clause?

I guess the error frame callback is call by calling setInternalError?

@windelbouwman
Copy link
Copy Markdown
Author

It was written this way on purpose ;)

Why would the BUS_OFF error be a special case? I think it's fair to treat it as the other errors. What will happen with BUS_OFF, is that the output queue will fill up, and eventually the send asio call will fail / block.

@mathias-luedtke
Copy link
Copy Markdown
Member

What will happen with BUS_OFF, is that the output queue will fill up, and eventually the send asio call will fail / block.

In our usecase, the output queue will fill up immediately and then the kernel driver would close the socket..

Why would the BUS_OFF error be a special case? I think it's fair to treat it as the other errors

Agreed.
Again, I did not test this special case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants