UDP服务器性能优化:Perf和GCP的对比
作者:互联网
RTC服务器是UDP协议,存在以下几个难点:
- UDP包数目众多,包普遍比较小。比如一个视频关键帧,可能会被分成几十个UDP发送。比如每个Opus包,几十到一百多字节不等。
- 不同协议需要复用端口(才能支持K8S云原生平台),每个包都需要找到对应的Session处理,客户端地址可能还会变更。
- 高实时性,每个Session要即时的收发数据,不能做主动聚集包后收发,每个Session短时间就一两个包处理,没有太多可以批量处理的包。
- 内核对UDP协议的性能优化,不如TCP高,优化方式也不如TCP多。
- 需要加密和解密,除了CPU消耗,还导致内存拷贝。
尽管这样,还是有不少可以做的,详细可以看下面的链接:
- v4.0, 2021-02-28, RTC: Support high performance Zero Copy NACK. 4.0.76
- v4.0, 2021-02-27, RTC: Support Object Cache Pool for performance. 4.0.75
- v4.0, 2021-02-12, RTC: Support High Resolution(about 25ms) Timer. 4.0.72
- v4.0, 2021-02-10, RTC: Improve performance about 700+ streams. 4.0.71
优化过程中,最关键的是压测工具srs-bench,以及Perf+GCP。
发现Perf和GCP的数据有点差距,比如67%左右CPU使用时:
top - 14:58:57 up 25 days, 1:58, 4 users, load average: 0.66, 0.76, 0.73
Tasks: 92 total, 2 running, 90 sleeping, 0 stopped, 0 zombie
%Cpu(s): 30.1 us, 5.1 sy, 0.0 ni, 61.8 id, 0.0 wa, 0.0 hi, 3.1 si, 0.0 st
KiB Mem : 8008964 total, 460028 free, 1390824 used, 6158112 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 6311680 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
8375 root 0 -20 1120556 992436 4192 R 68.1 12.4 24:14.17 srs
8462 root 20 0 312104 36364 3800 S 1.0 0.5 0:25.25 perf
6745 root 20 0 150332 6664 2380 S 0.7 0.1 0:15.11 dstat
6 root 20 0 0 0 0 S 0.3 0.0 49:03.07 ksoftirqd/0
SRS的统计信息:
Hybrid cpu=70.00%,969MB, cid=47984,8, timer=24421,4394,19973, clock=0,45,4,0,0,0,0,0,0,
objs=(pkt:0,raw:0,fua:0,msg:0,oth:401,buf:0,drop:0),
cache=(pkt:20-31w,raw:109113-69w,fua:32227-41w,msg:1-41w,buf:19-34w)
RTC: Server conns=401, rpkts=(47734,rtp:47726,stun:1,rtcp:7),
spkts=(1710,rtp:117,stun:1,rtcp:1592), rtcp=(pli:0,twcc:3982,rr:398),
snk=(39826,a:19913,v:19913,h:0), rnk=(2,2,h:2,m:0),
fid=(id:0,fid:5272,ffid:42461,addr:1,faddr:47734)
对比Perf的Top37函数:
Overhead Shared Object Symbol
10.13% srs.4.0.77 [.] sha1_block_data_order_avx2
4.37% srs.4.0.77 [.] bitvector_left_shift
2.96% libpthread-2.17.so [.] __recvfrom_nocancel
2.51% libc-2.17.so [.] __memcpy_ssse3
2.51% srs.4.0.77 [.] heap_delete
2.49% srs.4.0.77 [.] SrsHourGlass::cycle
2.39% srs.4.0.77 [.] SrsRtpPacket2::decode
2.19% srs.4.0.77 [.] SrsRtpObjectCacheManager<SrsRtpPacket2>::recycle
2.16% srs.4.0.77 [.] SrsRtpPacket2::recycle_shared_buffer
1.79% [kernel] [k] finish_task_switch
1.71% srs.4.0.77 [.] SrsRtcPublishStream::on_rtp
1.56% [kernel] [k] system_call_after_swapgs
1.56% [kernel] [k] free_hot_cold_page
1.52% srs.4.0.77 [.] srtp_get_stream
1.47% [kernel] [k] copy_user_enhanced_fast_string
1.39% srs.4.0.77 [.] aesni_ctr32_encrypt_blocks
1.33% srs.4.0.77 [.] operator delete[]
1.32% [kernel] [k] _raw_spin_unlock_irqrestore
1.19% srs.4.0.77 [.] SrsRtcRecvTrack::do_check_send_nacks
0.99% srs.4.0.77 [.] OPENSSL_cleanse
0.94% srs.4.0.77 [.] SrsRtpRingBuffer::set
0.93% srs.4.0.77 [.] std::less<unsigned int>::operator()
0.89% srs.4.0.77 [.] srtp_unprotect
0.88% srs.4.0.77 [.] heap_insert
0.85% srs.4.0.77 [.] SrsRtcPublishStream::check_send_nacks
0.85% srs.4.0.77 [.] SrsRtpNackForReceiver::get_nack_seqs
0.83% srs.4.0.77 [.] SrsRtcPublishStream::get_audio_track
0.81% srs.4.0.77 [.] SrsRtcTrackDescription::has_ssrc
0.72% srs.4.0.77 [.] SrsResourceManager::find_by_fast_id
0.69% srs.4.0.77 [.] SrsSharedPtrMessage::count
0.68% srs.4.0.77 [.] EVP_MD_CTX_cleanup
0.67% srs.4.0.77 [.] SrsRtcPublishStream::do_on_rtp_plaintext
0.64% srs.4.0.77 [.] SrsBuffer::require
0.63% libc-2.17.so [.] epoll_ctl
0.61% [kernel] [k] udp_recvmsg
0.60% srs.4.0.77 [.] operator new[]
0.58% srs.4.0.77 [.] SrsUdpMuxListener::cycle
而GCP的top37函数:
[root@iZbp12af7ajnkuducj2u8rZ ~]# ./objs/pprof objs/srs gperf.srs.gcp
(pprof) top37
Total: 17795 samples
2397 13.5% 13.5% 2397 13.5% __recvfrom_nocancel
1894 10.6% 24.1% 1894 10.6% sha1_block_data_order_avx2
746 4.2% 28.3% 746 4.2% bitvector_left_shift
501 2.8% 31.1% 511 2.9% heap_delete
485 2.7% 33.8% 2315 13.0% SrsHourGlass::cycle
440 2.5% 36.3% 440 2.5% __GI_epoll_wait
429 2.4% 38.7% 1136 6.4% SrsRtpObjectCacheManager::recycle
424 2.4% 41.1% 424 2.4% __memcpy_ssse3
417 2.3% 43.5% 516 2.9% SrsRtpPacket2::recycle_shared_buffer
373 2.1% 45.6% 1146 6.4% SrsRtpPacket2::decode
321 1.8% 47.4% 321 1.8% __GI_epoll_ctl
287 1.6% 49.0% 4914 27.6% SrsRtcPublishStream::on_rtp
270 1.5% 50.5% 270 1.5% aesni_ctr32_encrypt_blocks
245 1.4% 51.9% 698 3.9% SrsRtcRecvTrack::do_check_send_nacks
218 1.2% 53.1% 218 1.2% srtp_get_stream
200 1.1% 54.2% 1338 7.5% SrsRtpRingBuffer::set
199 1.1% 55.3% 199 1.1% std::less::operator
185 1.0% 56.4% 923 5.2% SrsRtcPublishStream::check_send_nacks
180 1.0% 57.4% 180 1.0% heap_insert
179 1.0% 58.4% 206 1.2% SrsRtpNackForReceiver::get_nack_seqs
175 1.0% 59.4% 175 1.0% __sendto_nocancel
150 0.8% 60.2% 237 1.3% SrsResourceManager::find_by_fast_id
149 0.8% 61.1% 149 0.8% OPENSSL_cleanse
143 0.8% 61.9% 143 0.8% srtp_unprotect
141 0.8% 62.6% 141 0.8% std::vector::size
130 0.7% 63.4% 130 0.7% EVP_MD_CTX_cleanup
127 0.7% 64.1% 264 1.5% SrsRtcPublishStream::get_audio_track
118 0.7% 64.8% 118 0.7% SrsFastCoroutine::pull
118 0.7% 65.4% 118 0.7% SrsRtcTrackDescription::has_ssrc
114 0.6% 66.1% 114 0.6% SrsBuffer::require
113 0.6% 66.7% 3272 18.4% SrsRtcPublishStream::do_on_rtp_plaintext
110 0.6% 67.3% 377 2.1% SrsRtpObjectCacheManager::allocate
106 0.6% 67.9% 8985 50.5% SrsUdpMuxListener::cycle
96 0.5% 68.4% 634 3.6% _st_vp_check_clock
94 0.5% 69.0% 1151 6.5% SrsRtcConnection::notify
84 0.5% 69.4% 84 0.5% PackedCache::KeyMatch (inline)
84 0.5% 69.9% 84 0.5% std::_Rb_tree::_M_begin
标签:__,UDP,Perf,0.7%,0.8%,SrsRtcPublishStream,GCP,0.77,srs.4 来源: https://blog.csdn.net/winlinvip/article/details/114263371