mlx rdma网卡指标参数简介
作者:互联网
mlx rdma网卡指标参数简介
综述
mlx5 driver在linux sysfs下有一系列的mlx网卡参数和计数器分布在/sys/class/infiniband/mlx5_x/ports/1/counters
和/sys/class/infiniband/mlx5_x/ports/1/hw_counters
目录下,这些参数统计了某种类型的事件发生的次数,如某种错误数,收包数等等。理解这些参数,可以帮助我们更好的理解mlx网卡的运行状态,通过监控,可以更快的定位rdma报错的根因
hw_counter
rnr_nak_retry_err
:本机作为发送方,收到对端发来的RNR NAK包的数量。如果接收方qp的srq没有空闲了,这个计数会涨out_of_buffer
:本机作为接收方,收包的时候发现没有buffer了,如果自己qp的srq满了,这个计数会涨out_of_sequence
:收包乱序local_ack_timeout_err
:发送的rdma请求超时计数packet_seq_err
:本机收到NAK包计数req_cqe_error
:本机CQE报错计数duplicate_request
:本机收到重复包np_ecn_marked_roce_packets
:本机收到的ecn
counter
port_rcv_data
: Total number of data octets, divided by 4 (lanes), received on all VLs. This is 64 bit counter.port_rcv_packets
: Total number of packets (this may include packets containing Errors. This is 64 bit counter.port_xmit_data
: Total number of data octets, divided by 4 (lanes), transmitted on all VLs. This is 64 bit counter.port_xmit_packets
: Total number of packets transmitted on all VLs from this port. This may include packets with errors.unicast_rcv_packets
: Total number of unicast packets, including unicast packets containing errors.unicast_xmit_packets
: Total number of unicast packets transmitted on all VLs from the port. This may include unicast packets with errors.
参考链接
- Understanding mlx5 Linux Counters and Status Parameters
- Understanding mlx5 ethtool Counters
- Nak Errors
标签:unicast,packets,number,网卡,rdma,mlx5,mlx,Total,port 来源: https://blog.csdn.net/zxpoiu/article/details/115792911