首页 > 系统相关> > Linux 性能优化实战---平均负载

Linux 性能优化实战---平均负载

2019-04-23 21:52:24 作者：互联网

查看平均负载：

$ uptime
 20:32:31 up 33 min,  1 user,  load average: 0.72, 0.63, 0.70

结果解释：

20:32:31 // 当前时间
up 33 min  // 系统运行时间
1 user  // 正在登录的用户数
0.72, 0.63, 0.70  // 分别为过去 1 分钟、5 分钟、15 分钟的平均负载

平均负载：处于可运行状态和不可中断状态的平均进程数，即平均活跃进程数。
可运行状态进程：正在使用 CPU 或正在等待 CPU 的进程。ps 后看到的 R 状态的进程。
不可中断进程：正处于内核态关键流程中的进程，且这些流程不可打断，如等待硬件设备 I/O 响应。ps 后看到的 D 状态的进程。是系统对进程和硬件设备的一种保护机制。

平均负载的理想情况是等于 CPU 数。

查看 CPU 个数：

$ grep 'model name' /proc/cpuinfo | wc -l

平均负载案例分析

准备

$ apt install stress 
$ sudo apt install sysstat

stress 是一个 Linux 系统压力测试工具。
sysstat 包含了常用的 Linux 性能工具，用来监控和分析系统的性能。案例中会用到 mpstat 和 pidstat。
mpstat 是一个常用的多核 CPU 性能分析工具，用来实时查看 CPU 的性能指标及所有 CPU 的平均指标。
pidstat 是一个常用的进程性能分析工具，用来实时查看进程的 CPU、内存、I/O 及上下文切换等性能指标。

测试前的平均负载：

$ uptime
 20:53:18 up 54 min,  1 user,  load average: 0.47, 0.46, 0.58

案例一：CPU 密集型进程

在第一个终端运行 stress，模拟 CPU 使用率 100% 的场景：

$ stress --cpu 1 --timeout 600

在第二个终端运行 uptime，查看平均负载的变化情况：

# -d 参数表示高亮显示变化的区域
$ watch -d uptime
... load average: 1.43, 1.00, 0.78

1 分钟的平均负载增加到 1.43。

在第三个终端运行 mpstat，查看 CPU 使用率的变化情况：

# -P ALL 表示监控所有 CPU，后面数字 5 表示间隔 5 秒后输出一组数据
$ mpstat -P ALL 5
Linux 4.13.0-45-generic (yjp-VirtualBox) 	2019年04月23日 	_x86_64_	(2 CPU)

21时05分22秒  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
21时05分27秒  all   33.19    0.00    1.46   11.84    0.00    0.58    0.00    0.00    0.00   52.92
21时05分27秒    0   98.55    0.00    1.45    0.00    0.00    0.00    0.00    0.00    0.00    0.00
21时05分27秒    1    4.83    0.00    1.47   17.02    0.00    0.63    0.00    0.00    0.00   76.05

有一个 CPU 的使用率为 98.55%，而 iowait 为 0，说明平均负载升高是由 CPU 使用率过高导致。

用 pidstat 查看导致 CPU 使用率过高的进程：

# 间隔 5 秒后输出一组数据
$ pidstat -u 5 1
Linux 4.13.0-45-generic (yjp-VirtualBox) 	2019年04月23日 	_x86_64_	(2 CPU)

21时11分15秒   UID       PID    %usr %system  %guest    %CPU   CPU  Command
21时11分20秒  1000      6697   98.01    1.00    0.00   99.00     0  stress

案例二：I/O 密集型进程

模拟 I/O 压力：

$ stress -i 1 --timeout 600

第二个终端中查看平均负载变化情况：

$ watch -d uptime
...load average: 1.93, 1.46, 1.21

1 分钟的平均负载慢慢升高到了 1.93

第三个终端中查看 CPU 使用率变化情况：

$ mpstat -P ALL 5 1
Linux 4.13.0-45-generic (yjp-VirtualBox) 	2019年04月23日 	_x86_64_	(2 CPU)

21时20分51秒  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
21时20分56秒  all    2.60    0.00    1.52   41.28    0.00    1.84    0.00    0.00    0.00   52.76
21时20分56秒    0    0.60    0.00    0.40    0.60    0.00    0.40    0.00    0.00    0.00   97.99
21时20分56秒    1    4.92    0.00    2.58   88.76    0.00    3.51    0.00    0.00    0.00    0.23

Average:     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
Average:     all    2.60    0.00    1.52   41.28    0.00    1.84    0.00    0.00    0.00   52.76
Average:       0    0.60    0.00    0.40    0.60    0.00    0.40    0.00    0.00    0.00   97.99
Average:       1    4.92    0.00    2.58   88.76    0.00    3.51    0.00    0.00    0.00    0.23

其中一个 CPU 使用率为 2.58%，而 iowait 为 88.76%，说明平均负载升高由 I/O 引起。

pidstat 查找高 I/O 进程：

$ pidstat -u 5 1
Linux 4.13.0-45-generic (yjp-VirtualBox) 	2019年04月23日 	_x86_64_	(2 CPU)

21时21分22秒   UID       PID    %usr %system  %guest    %CPU   CPU  Command
21时21分27秒     0         8    0.00    0.20    0.00    0.20     1  rcu_sched
21时21分27秒     0        16    0.00    0.20    0.00    0.20     1  ksoftirqd/1
21时21分27秒     0       176    0.00    1.20    0.00    1.20     1  kworker/1:1H
21时21分27秒     0       177    0.00    0.40    0.00    0.40     0  kworker/0:1H
21时21分27秒     0       199    0.00    0.40    0.00    0.40     0  jbd2/sda1-8
21时21分27秒     0       866    5.78    1.39    0.00    7.17     1  snapd
21时21分27秒     0      1425    1.20    0.80    0.00    1.99     1  Xorg
21时21分27秒     0      1815    0.00    0.20    0.00    0.20     0  dockerd
21时21分27秒  1000      3350    0.80    1.59    0.00    2.39     0  compiz
21时21分27秒     0      6754    0.00    0.20    0.00    0.20     1  kworker/u4:1
21时21分27秒  1000      6987    0.20    4.98    0.00    5.18     0  stress
21时21分27秒  1000      6988    0.40    0.20    0.00    0.60     1  watch
21时21分27秒  1000      7087    0.20    0.20    0.00    0.40     1  pidstat

场景三：大量进程场景

模拟 8 个进程：

$ stress -c 8 --timeout 600

1 分钟的平均负载慢慢升高到了 8：

$ watch -d uptime
...load average: 8.00, 4.17, 2.43

查看进程情况，8 个进程争抢 2 个 CPU：

$ pidstat -u 5 1
Linux 4.13.0-45-generic (yjp-VirtualBox) 	2019年04月23日 	_x86_64_	(2 CPU)

21时34分56秒   UID       PID    %usr %system  %guest    %CPU   CPU  Command
21时35分02秒     0       176    0.00    0.20    0.00    0.20     1  kworker/1:1H
21时35分02秒     0       199    0.00    0.20    0.00    0.20     1  jbd2/sda1-8
21时35分02秒     0       763    0.20    0.00    0.00    0.20     0  accounts-daemon
21时35分02秒     0       866    3.15    0.79    0.00    3.94     1  snapd
21时35分02秒     0      1425    1.77    1.77    0.00    3.54     1  Xorg
21时35分02秒     0      2029    0.20    0.20    0.00    0.39     1  docker-containe
21时35分02秒  1000      3350    0.59    0.79    0.00    1.38     0  compiz
21时35分02秒  1000      4433    0.20    0.00    0.00    0.20     1  gnome-terminal-
21时35分02秒  1000      7534   24.80    0.20    0.00   25.00     0  stress
21时35分02秒  1000      7535   21.46    1.77    0.00   23.23     1  stress
21时35分02秒  1000      7536   23.62    0.39    0.00   24.02     0  stress
21时35分02秒  1000      7537   22.83    0.59    0.00   23.43     1  stress
21时35分02秒  1000      7538   22.64    0.20    0.00   22.83     0  stress
21时35分02秒  1000      7539   21.85    1.57    0.00   23.43     1  stress
21时35分02秒  1000      7540   22.24    1.57    0.00   23.82     1  stress
21时35分02秒  1000      7541   23.82    0.20    0.00   24.02     0  stress
21时35分02秒  1000      7544    0.00    0.20    0.00    0.20     0  pidstat

参考
倪朋飞. Linux 性能优化实战

标签：实战,02,负载,21,0.00,0.20,35,Linux,CPU
来源： https://blog.csdn.net/u012319493/article/details/89481927