Skip to content

netvsc: high rx_comp_busy and tx_send_full + traffic loss #42

@anagesh-a10

Description

@anagesh-a10

Hi Team,
one of the customer using our solution, a custom image which is based on Linux kernel 4.14.51 and Centos 7.8. Customer is facing random traffic loss in production on netvsc interfaces (non-accelerated) . P.S. deployment size is ~600 VM instances.

  1. on problematic instances 'rx_comp_busy' is always non zero(=1) and a high of 'tx_send_full' as shown below
    -bash-4.2# ethtool -S eth2
    NIC statistics:
    tx_scattered: 0
    tx_no_memory: 0
    tx_no_space: 0
    tx_too_big: 0
    tx_busy: 0
    tx_send_full: 75776 <<<<<<<<<<<<<
    rx_comp_busy: 1 <<<<<<<<<<<<<<<<
    vf_rx_packets: 0
    vf_rx_bytes: 0
    vf_tx_packets: 0
    vf_tx_bytes: 0
    vf_tx_dropped: 0
    tx_queue_0_packets: 48323650
    tx_queue_0_bytes: 9856533412
    rx_queue_0_packets: 70704892
    rx_queue_0_bytes: 6523868834
    tx_queue_1_packets: 44242587
    tx_queue_1_bytes: 9561505139
    rx_queue_1_packets: 67683390
    rx_queue_1_bytes: 6248204528
    tx_queue_2_packets: 45780035
    tx_queue_2_bytes: 10119440310
    rx_queue_2_packets: 69738233
    rx_queue_2_bytes: 6443619208
    tx_queue_3_packets: 44413637
    tx_queue_3_bytes: 9640385380
    rx_queue_3_packets: 69258427
    rx_queue_3_bytes: 6396199857
    tx_queue_4_packets: 96161043
    tx_queue_4_bytes: 43152567515
    rx_queue_4_packets: 68506662
    rx_queue_4_bytes: 6329763902
    tx_queue_5_packets: 42685859
    tx_queue_5_bytes: 9232930840
    rx_queue_5_packets: 68869195
    rx_queue_5_bytes: 6360734718
    tx_queue_6_packets: 44105935
    tx_queue_6_bytes: 9641517238
    rx_queue_6_packets: 71297219
    rx_queue_6_bytes: 6568436535
    tx_queue_7_packets: 44680296
    tx_queue_7_bytes: 9764630663
    rx_queue_7_packets: 70747471

(2) we have rebuild the kernel with below 2-patches as this symptom (napi gets disable when ring is temporary busy ) is similar to issue mentioned in #36

(1) hv_netvsc: Fix napi reschedule while receive completion is busy
(2) hv_netvsc: fix race that may miss tx queue wakeup

(3) now with these patches there is some improvement in the sense few instances are getting into this problem , but issue still persists(~5 out of ~200) . On these bad instances ethtool stats shows very high 'rx_comp_busy & tx_send_full' as shown below. I think super high 'rx_comp_busy' is expected after these patches

-bash-4.2# ethtool -S eth2
NIC statistics:
tx_scattered: 0
tx_no_memory: 0
tx_no_space: 0
tx_too_big: 0
tx_busy: 0
tx_send_full: 417979<<<<<<<<<<<<<<<<<<<
rx_comp_busy: 36978379935<<<<<<<<<<<<<< rapid fast increments
vf_rx_packets: 0
vf_rx_bytes: 0
vf_tx_packets: 0
vf_tx_bytes: 0
vf_tx_dropped: 0
tx_queue_0_packets: 22487545
tx_queue_0_bytes: 4594218563
rx_queue_0_packets: 33816104
rx_queue_0_bytes: 3148800004
tx_queue_1_packets: 23095847
tx_queue_1_bytes: 4629433827
rx_queue_1_packets: 34169457
rx_queue_1_bytes: 3198473995
tx_queue_2_packets: 22235899
tx_queue_2_bytes: 4554101089
rx_queue_2_packets: 35447873
rx_queue_2_bytes: 3306351633
tx_queue_3_packets: 22655564
tx_queue_3_bytes: 4658776077
rx_queue_3_packets: 34320559
rx_queue_3_bytes: 3200636386
tx_queue_4_packets: 43152346
tx_queue_4_bytes: 17461777045
rx_queue_4_packets: 34941411
rx_queue_4_bytes: 3240195702
tx_queue_5_packets: 22992696
tx_queue_5_bytes: 4613837166
rx_queue_5_packets: 32975505
rx_queue_5_bytes: 3079512739
tx_queue_6_packets: 22535083
tx_queue_6_bytes: 4672503110
rx_queue_6_packets: 33796904
rx_queue_6_bytes: 3159691807
tx_queue_7_packets: 22452840
tx_queue_7_bytes: 4584966389
rx_queue_7_packets: 33860772
rx_queue_7_bytes: 3155304090
rx_queue_7_bytes: 6525418289

I would request azure team to provide list of patches that we can try with 4.14.51 kernel as LIS option is not applicable to us.
please let me know if I can provide any additional details.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions