Skip to content

Improve substarte-relay connection problems handlingΒ #3028

@bkontur

Description

@bkontur

Investigate/check

  • Do we restart loops correctly in all kind of connection errors? E.g. RestartNeeded does it stop loop or restart? Or the only solution is to restart substrate-relay?

Possible improvement 1:

Now we are connected to the one exact node uri, e.g.:

  --source-uri wss://rococo-rpc.polkadot.io \
  --target-uri wss://bridge-hub-westend-rpc.dwellir.com \

If the node is down, or has some problem, we could configure list of uris, so when RestartNeeded, we rotate and try another uri, e.g.:

  --source-uri wss://rococo-rpc.polkadot.io 
  --source-uri wss://rococo-xyz1-rpc.polkadot.io 
  --source-uri wss://rococo-xyz2-rpc.polkadot.io 
  --target-uri wss://bridge-hub-westend-rpc.dwellir.com 
  --target-uri wss://bridge-hub-westend-xyz2-rpc.dwellir.com 
  --target-uri wss://bridge-hub-westend-xyz2-rpc.luckyfriday.com 

So, if one node is overloaded, we just try another one.

Possible improvement 2 - connect substrate-relay to some "load balancer"

This "load balancer" would do routing to the live and not overloaded node, instead of handling this in our code.

Some logs from 2024-07-12/15

https://matrix.to/#/!FqmgUhjOliBGoncGwm:parity.io/$OjKXcX4aO9lkzM46fRLKXTMi-mf9vcpdJN_RDMgIn6o?via=parity.io

e.g.:

Polkadot client has failed to return its sync status: FailedToGetSystemHealth { chain: "Polkadot", error: RpcError(RestartNeeded(Transport(connection closed
2024-07-15T08:05:06Z {} [Polkadot_to_BridgeHubKusama_Parachains_1002] 2024-07-15 08:05:06 +00 WARN bridge Failed to read best Polkadot block: ChannelError("Background task of BridgeHubKusama client has exited with result: Err(ChannelError(\"Mandatory best headers subscription for BridgeHubKusama has finished\"))")
2024-07-15T03:17:36Z {} [Polkadot_to_BridgeHubKusama_Parachains_1002] 2024-07-15 03:17:36 +00 WARN bridge Failed to read head of Polkadot parachain ParaId(1002) at BridgeHubKusama: FailedToReadStorageValue { chain: "BridgeHubKusama", hash: "0x181d…2a58", key: StorageKey([243, 240, 56, 234, 7, 239, 168, 105, 144, 9, 71, 27, 60, 48, 159, 184, 100, 28, 243, 91, 238, 116, 177, 147, 83, 37, 172, 214, 89, 235, 25, 203, 127, 32, 114, 84, 61, 57, 196, 82, 229, 51, 84, 40, 99, 135, 86, 81, 234, 3, 0, 0]), error: RpcError(RestartNeeded(Transport(connection closed
2024-07-15T00:47:50Z {} [Polkadot_to_BridgeHubKusama_Parachains_1002] 2024-07-15 00:47:50 +00 WARN bridge Polkadot client has failed to return its sync status: FailedToGetSystemHealth { chain: "Polkadot", error: RpcError(RestartNeeded(Transport(connection closed
2024-07-14T23:18:10Z {} 2024-07-14 23:18:10 +00 ERROR bridge [BridgeHubKusama-to-BridgeHubPolkadot-on-demand-parachain] Failed to read relay data from BridgeHubPolkadot client: ChannelError("Background task of BridgeHubPolkadot client has exited with result: Err(ChannelError(\"Finalized headers subscription for BridgeHubPolkadot has finished\"))")
2024-07-14T23:04:57Z {} [Polkadot_to_BridgeHubKusama_Sync] 2024-07-14 23:04:57 +00 INFO bridge Call of PolkadotFinalityApi_free_headers_interval at BridgeHubKusama has failed with an error: FailedStateCall { chain: "BridgeHubKusama", hash: "0x8551…5ec9", method: "PolkadotFinalityApi_free_headers_interval", arguments: Bytes([]), error: RpcError(Call(ErrorObject { code: ServerError(4003), message: "Client error: Execution failed: Other: Exported method PolkadotFinalityApi_free_headers_interval is not found", data: None })) }. Treating as `None`
2024-07-14T23:04:57Z {} [Polkadot_to_BridgeHubKusama_Sync] 2024-07-14 23:04:57 +00 ERROR bridge Finality sync loop iteration has failed with error: Target(FailedToGetSystemHealth { chain: "BridgeHubKusama", error: RpcError(RestartNeeded(Transport(connection closed
2024-07-14T23:04:57Z {} 2024-07-14 23:04:57 +00 ERROR bridge [Polkadot-to-BridgeHubKusama-on-demand-headers] Failed to read best finalized source header from target: FailedToGetSystemHealth { chain: "BridgeHubKusama", error: RpcError(RestartNeeded(Transport(connection closed
2024-07-14T20:27:45Z {} 2024-07-14 20:27:45 +00 WARN bridge [Polkadot-to-BridgeHubKusama-on-demand-headers] Failed to scan mandatory Polkadot headers range ((21644741, 21647633)): FailedToReadHeaderHashByNumber { chain: "Polkadot", number: "21647633", error: RpcError(RestartNeeded(Transport(i/o error: Connection reset by peer (os error 104)
2024-07-14T00:35:31Z {} [Kusama_to_BridgeHubPolkadot_Parachains_1002] 2024-07-14 00:35:31 +00 WARN bridge Kusama client has failed to return its sync status: FailedToGetSystemHealth { chain: "Kusama", error: RpcError(RestartNeeded(Transport(connection closed
2024-07-12T22:50:53Z {} [BridgeHubPolkadot_to_BridgeHubKusama_MessageLane_00000001] 2024-07-12 22:50:53 +00 ERROR bridge Error retrieving state from BridgeHubKusama node: FailedToGetSystemHealth { chain: "BridgeHubKusama", error: RpcError(RestartNeeded(Transport(connection closed
2024-07-12T22:50:53Z {} [BridgeHubKusama_to_BridgeHubPolkadot_MessageLane_00000001] 2024-07-12 22:50:53 +00 ERROR bridge Error retrieving state from BridgeHubPolkadot node: FailedToGetSystemHealth { chain: "BridgeHubKusama", error: RpcError(RestartNeeded(Transport(connection closed
2024-07-12T22:42:56Z {} [Polkadot_to_BridgeHubKusama_Parachains_1002] 2024-07-12 22:42:56 +00 WARN bridge Polkadot client has failed to return its sync status: FailedToGetSystemHealth { chain: "Polkadot", error: RpcError(RestartNeeded(Transport(connection closed
2024-07-12T21:03:49Z {} [Kusama_to_BridgeHubPolkadot_Parachains_1002] 2024-07-12 21:03:49 +00 WARN bridge Kusama client has failed to return its sync status: FailedToGetSystemHealth { chain: "Kusama", error: RpcError(RestartNeeded(Transport(connection closed
2024-07-12T20:37:38Z {} [Kusama_to_BridgeHubPolkadot_Parachains_1002] 2024-07-12 20:37:38 +00 WARN bridge Failed to read best Kusama block: ChannelError("Background task of BridgeHubPolkadot client has exited with result: Err(ChannelError(\"Mandatory best headers subscription for BridgeHubPolkadot has finished\"))")
2024-07-12T20:13:04Z {} [Polkadot_to_BridgeHubKusama_Parachains_1002] 2024-07-12 20:13:04 +00 WARN bridge Failed to read best Polkadot block: ChannelError("Background task of BridgeHubKusama client has exited with result: Err(ChannelError(\"Mandatory best headers subscription for BridgeHubKusama has finished\"))")
2024-07-12T19:58:39Z {} 2024-07-12 19:58:39 +00 ERROR bridge [Polkadot-to-BridgeHubKusama-on-demand-headers] Failed to read best finalized source header from source: ChannelError("Background task of Polkadot client has exited with result: Err(ChannelError(\"Mandatory best headers subscription for Polkadot has finished\"))")

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

Todo

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions