DBA如何利用strace/pstack/gdb来定位问题
作者:互联网
参考:https://database.51cto.com/art/202006/617923.htm
strace简介
strace是Linux环境下的一款程序调试工具,用来监察一个应用程序所使用的系统调用。
Strace是一个简单的跟踪系统调用执行的工具。在其最简单的形式中,它可以从开始到结束跟踪二进制的执行,并在进程的生命周期中输出一行具有系统调用名称,每个系统调用的参数和返回值的文本行。
常用选项
- -T:strace输出显示时间
- -t:可以在每行的输出之前添加时间戳,-tt可以精确到微妙级别,-ttt也可以精确到微妙级,但是它并不是打印当前时间,而是显示自从epoch以来的所经过的秒数
- -s:指定输出的字符串的最大长度,默认为32,如果输出到文件的话会全部输出
- -o:指定将strace输出到文件
- -ff:如果提供-o filename,则所有进程的跟踪结果输出到相应的filename.pid中,pid是各进程的进程号.
- -e:指定跟踪某个行为,例如-e trace=open指定只跟踪open行为
- -y:将文件句柄用文件路径代替显示
正确姿势
简单使用
- strace -T -tt -o /tmp/strace.log CMD
- strace -T -tt CMD 2>&1 |tee /tmp/strace.log
- strace -T -tt -s 100 -o /tmp/strace.log CMD
- strace -T -tt -s 100 -ff -o /tmp/strace.log CMD
- strace -T -tt -s 100 -e trace=XXXX -o /tmp/strace.log CMD
使用案例
利用strace观察客户端client执行SQL
通过sys.processlist表中pid可以知道客户端连接pid
mysql> select thd_id,conn_id,user,pid,command,current_statement,program_name from sys.processlist where conn_id>0 and pid >0;
+--------+---------+----------------+--------+---------+-------------------------------------------------------------------+--------------+
| thd_id | conn_id | user | pid | command | current_statement | program_name |
+--------+---------+----------------+--------+---------+-------------------------------------------------------------------+--------------+
| 218265 | 218238 | root@localhost | 103440 | Query | select thd_id,conn_id,user,pi ... t where conn_id>0 and pid >0 | mysql |
+--------+---------+----------------+--------+---------+-------------------------------------------------------------------+--------------+
1 row in set (0.33 sec)
#session1
[root@localhost scripts]# strace -T -tt -s 100 -o /tmp/strace.log -p103440
strace: Process 103440 attached
#session2
mysql> use test
Database changed
mysql> select *from t1 limit 2;
+----+--------------------------------------+
| id | name |
+----+--------------------------------------+
| 1 | 137b97ba-6de1-11eb-91cf-000c29a5b3f5 |
| 2 | 13d6586d-6de1-11eb-91cf-000c29a5b3f5 |
+----+--------------------------------------+
2 rows in set (0.00 sec)
查看strace文件
[root@localhost tmp]# vi /tmp/strace.log
11:37:35.114324 write(1, "\r\33[15G*from t1 limit 2;\33[K", 26) = 26 <0.000846>
11:37:35.115354 read(0, "\n", 1) = 1 <0.624789>
11:37:35.740309 write(1, "\n", 1) = 1 <0.000850>
11:37:35.741471 ioctl(0, SNDCTL_TMR_STOP or TCSETSW, {B38400 opost isig icanon echo ...}) = 0 <0.000025>
11:37:35.741701 rt_sigprocmask(SIG_BLOCK, [HUP INT QUIT TERM CONT TSTP WINCH], [], 8) = 0 <0.000022>
11:37:35.741814 rt_sigaction(SIGINT, {0x409933, [INT], SA_RESTORER|SA_RESTART, 0x7f5d55d0c5f0}, NULL, 8) = 0 <0.000019>
11:37:35.741886 rt_sigaction(SIGTSTP, {SIG_DFL, [], SA_RESTORER, 0x7f5d55d0c5f0}, NULL, 8) = 0 <0.000059>
11:37:35.742227 rt_sigaction(SIGTERM, {SIG_DFL, [], SA_RESTORER, 0x7f5d55d0c5f0}, NULL, 8) = 0 <0.000017>
11:37:35.742423 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000018>
11:37:35.742500 ioctl(0, SNDCTL_TMR_STOP or TCSETSW, {B38400 opost isig icanon echo ...}) = 0 <0.000022>
11:37:35.742888 times({tms_utime=29, tms_stime=17, tms_cutime=0, tms_cstime=0}) = 436033838 <0.000022>
11:37:35.742987 ioctl(3, FIONREAD, [0]) = 0 <0.000018>
11:37:35.743055 sendto(3, "\30\0\0\0\3select *from t1 limit 2", 28, 0, NULL, 0) = 28 <0.001870>
11:37:35.745629 recvfrom(3, "\1\0\0\1\2\"\0\0\2\3def\4test\2t1\2t1\2id\2id\f?\0\v\0\0\0\3\3B\0\0\0&\0\0\3\3def\4test\2t1\2t1\4name\4name\f!\0\375\377\2\0\374\20\0\0\0\0'\0\0\4\0011$137b97ba"..., 16384, 0, NULL, NULL) = 182 <0.000031>
11:37:35.745963 times({tms_utime=29, tms_stime=17, tms_cutime=0, tms_cstime=0}) = 436033838 <0.000014>
11:37:35.746031 write(1, "+----+--------------------------------------+\n", 46) = 46 <0.000447>
11:37:35.746524 write(1, "| id | name |\n", 46) = 46 <0.000205>
11:37:35.746796 write(1, "+----+--------------------------------------+\n", 46) = 46 <0.000205>
11:37:35.747051 write(1, "| 1 | 137b97ba-6de1-11eb-91cf-000c29a5b3f5 |\n", 46) = 46 <0.000153>
11:37:35.747251 write(1, "| 2 | 13d6586d-6de1-11eb-91cf-000c29a5b3f5 |\n", 46) = 46 <0.000186>
11:37:35.747480 write(1, "+----+--------------------------------------+\n", 46) = 46 <0.000158>
11:37:35.747692 write(1, "2 rows in set (0.00 sec)\n", 25) = 25 <0.000210>
11:37:35.747943 write(1, "\n", 1) = 1 <0.000146>
11:37:35.748142 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=1207, ...}) = 0 <0.000022>
11:37:35.748238 ioctl(1, TCGETS, {B38400 opost isig icanon echo ...}) = 0 <0.000018>
11:37:35.748295 ioctl(0, TCGETS, {B38400 opost isig icanon echo ...}) = 0 <0.000013>
11:37:35.748348 rt_sigprocmask(SIG_BLOCK, [HUP INT QUIT TERM CONT TSTP WINCH], [], 8) = 0 <0.000017>
11:37:35.748413 rt_sigaction(SIGINT, {0x4b8097, [], SA_RESTORER, 0x7f5d55d0c5f0}, {0x409933, [INT], SA_RESTORER|SA_RESTART, 0x7f5d55d0c5f0}, 8) = 0 <0.000017>
利用strace观察server端执行
#命令
[root@localhost tmp]# strace -o /tmp/strace.log -T -tt -f -p `pidof mysqld`
strace: Process 2075 attached with 40 threads
strace: Process 105607 attached
strace: Process 105612 attached
strace: Process 105619 attached
strace: Process 105624 attached
strace: Process 105631 attached
^Cstrace: Process 2075 detached
#假设你想看跟IO相关的行为,可以用-e trace来指定
# mysql相关的IO操作:
# mysql: read, write, open
# innodb: pread64, pwrite64
strace -o /tmp/strace.log -T -tt -f -e trace=read,open,write,pwrite64,pread64 -p `pidof mysqld`
strace会在/tmp文件夹下生成strace.log.xxx,xxx对应的就是THREAD_OS_ID,我们直接看strace.log.8208文件
pstack简介
pstack用来跟踪进程栈,这个命令在排查进程问题时非常有用,比如我们发现一个服务一直处于work状态(如假死状态,好似死循环),使用这个命令就能轻松定位问题所在;
可以在一段时间内,多执行几次pstack,若发现代码栈总是停在同一个位置,那个位置就需要重点关注,很可能就是出问题的地方;
正确使用姿势
pstack使用起来非常简单,直接pstack -pid即可
这里分析下mysql夯住,登录不了的案例
pstack 2075 >1.log
举例如下:
#0 0x00007fc7c2b57a4d in recv () from /lib64/libpthread.so.0
#1 0x000000000047a3ae in inline_mysql_socket_recv (src_file=0x782550 "/dbfiles/soft/mysql-5.7.31/vio/viosocket.c", src_line=134, mysql_socket=..., buf=0xc0f460, n=16384, flags=0) at /dbfiles/soft/mysql-5.7.31/include/mysql/psi/mysql_socket.h:829
#2 0x000000000047a896 in vio_read (vio=0xc0aad0, buf=0xc0f460 "T", size=16384) at /dbfiles/soft/mysql-5.7.31/vio/viosocket.c:134
#3 0x000000000047aa0c in vio_read_buff (vio=0xc0aad0, buf=0xc13490 "\270", size=4) at /dbfiles/soft/mysql-5.7.31/vio/viosocket.c:177
#4 0x0000000000480103 in net_read_raw_loop (net=0xa2a200 <mysql>, count=4) at /dbfiles/soft/mysql-5.7.31/sql/net_serv.cc:694
#5 0x000000000048020b in net_read_packet_header (net=0xa2a200 <mysql>) at /dbfiles/soft/mysql-5.7.31/sql/net_serv.cc:784
#6 0x00000000004802c1 in net_read_packet (net=0xa2a200 <mysql>, complen=0x7ffdfaa4c550) at /dbfiles/soft/mysql-5.7.31/sql/net_serv.cc:844
#7 0x0000000000480451 in my_net_read (net=0xa2a200 <mysql>) at /dbfiles/soft/mysql-5.7.31/sql/net_serv.cc:921
#8 0x000000000045b0c2 in cli_safe_read_with_ok (mysql=0xa2a200 <mysql>, parse_ok=0 '\000', is_data_packet=0x0) at /dbfiles/soft/mysql-5.7.31/sql-common/client.c:1050
#9 0x000000000045b558 in cli_safe_read (mysql=0xa2a200 <mysql>, is_data_packet=0x0) at /dbfiles/soft/mysql-5.7.31/sql-common/client.c:1169
#10 0x000000000045f98f in cli_read_change_user_result (mysql=0xa2a200 <mysql>) at /dbfiles/soft/mysql-5.7.31/sql-common/client.c:2732
#11 0x0000000000462080 in run_plugin_auth (mysql=0xa2a200 <mysql>, data=0xc134b9 "\024f\022\320e~\f\023~ \213\026~\233\027\231\210p\264\201\066mysql_native_password", data_len=21, data_plugin=0xc134ce "mysql_native_password", db=0x0) at /dbfiles/soft/mysql-5.7.31/sql-common/client.c:3934
#12 0x000000000046494b in mysql_real_connect (mysql=0xa2a200 <mysql>, host=0x77e9d4 "localhost", user=0xbe71f0 "root", passwd=0xbe7220 "123456", db=0x0, port=0, unix_socket=0xbe71a0 "/dbfiles/mysql_home/data/mysql.sock", client_flag=66560) at /dbfiles/soft/mysql-5.7.31/sql-common/client.c:4620
#13 0x0000000000411884 in sql_real_connect (host=0x0, database=0x0, user=0xbe71f0 "root", password=0xbe7220 "123456", silent=0) at /dbfiles/soft/mysql-5.7.31/client/mysql.cc:5044
#14 0x0000000000411c4e in sql_connect (host=0x0, database=0x0, user=0xbe71f0 "root", password=0xbe7220 "123456", silent=0) at /dbfiles/soft/mysql-5.7.31/client/mysql.cc:5183
#15 0x000000000040904c in main (argc=5, argv=0xbe7408) at /dbfiles/soft/mysql-5.7.31/client/mysql.cc:1330
#15 0x000000000040904c in main (argc=5, argv=0xbe7408) at /dbfiles/soft/mysql-5.7.31/client/mysql.cc:1330
我们打开源码mysql.cc 对应的1330行:
glob_buffer.mem_realloc(512);
completion_hash_init(&ht, 128);
init_alloc_root(PSI_NOT_INSTRUMENTED, &hash_mem_root, 16384, 0);
memset(&mysql, 0, sizeof(mysql));
if (sql_connect(current_host,current_db,current_user,opt_password,
opt_silent))
{
quick= 1; // Avoid history
status.exit_status= 1;
mysql_end(-1);
}
if (!status.batch)
ignore_errors=1; // Don't abort monitor
1330行: if (sql_connect(current_host,current_db,current_user,opt_password,
opt_silent))
接着查看sql_connect函数
sql_connect(char *host,char *database,char *user,char *password,uint silent)
{
bool message=0;
uint count=0;
int error;
for (;;)
{
if ((error=sql_real_connect(host,database,user,password,wait_flag)) >= 0)
{
if (count)
{
tee_fputs("\n", stderr);
(void) fflush(stderr);
}
return error;
}
if (!wait_flag)
return ignore_errors ? -1 : 1;
if (!message && !silent)
{
message=1;
tee_fputs("Waiting",stderr); (void) fflush(stderr);
}
(void) sleep(wait_time);
if (!silent)
{
putc('.',stderr); (void) fflush(stderr);
count++;
}
}
}
可发现目前mysql 客户端正在进行登录验证。
正常情况下,mysql登录如下:
#0 0x00007efe3b0b4700 in __read_nocancel () from /lib64/libpthread.so.0
#1 0x00000000004b645e in read_char (el=0x212ef40, cp=0x7ffc117afc00 L"t\x9000000") at /dbfiles/soft/mysql-5.7.31/cmd-line-utils/libedit/libedit-20190324-3.1/src/read.c:288
#2 0x00000000004b6703 in el_wgetc (el=0x212ef40, cp=0x7ffc117afc00 L"t\x9000000") at /dbfiles/soft/mysql-5.7.31/cmd-line-utils/libedit/libedit-20190324-3.1/src/read.c:410
#3 0x00000000004b6279 in read_getcmd (el=0x212ef40, cmdnum=0x7ffc117afc07 "\t", ch=0x7ffc117afc00 L"t\x9000000") at /dbfiles/soft/mysql-5.7.31/cmd-line-utils/libedit/libedit-20190324-3.1/src/read.c:234
#4 0x00000000004b6a4f in el_wgets (el=0x212ef40, nread=0x7ffc117afc74) at /dbfiles/soft/mysql-5.7.31/cmd-line-utils/libedit/libedit-20190324-3.1/src/read.c:523
#5 0x00000000004aaa45 in el_gets (el=0x212ef40, nread=0x7ffc117afc74) at /dbfiles/soft/mysql-5.7.31/cmd-line-utils/libedit/libedit-20190324-3.1/src/eln.c:75
#6 0x0000000000499fef in readline (p=0x2154090 "mysql> ") at /dbfiles/soft/mysql-5.7.31/cmd-line-utils/libedit/libedit-20190324-3.1/src/readline.c:449
#7 0x000000000040a973 in read_and_execute (interactive=true) at /dbfiles/soft/mysql-5.7.31/client/mysql.cc:2322
#8 0x0000000000409617 in main (argc=5, argv=0x210b408) at /dbfiles/soft/mysql-5.7.31/client/mysql.cc:1454
if (!mysql_get_option(&mysql, MYSQL_OPT_PROTOCOL, &protocol) &&
!mysql_get_option(&mysql, MYSQL_OPT_SSL_MODE, &ssl_mode))
{
if (protocol == MYSQL_PROTOCOL_SOCKET &&
ssl_mode >= SSL_MODE_REQUIRED)
put_info("You are enforcing ssl conection via unix socket. Please consider\n"
"switching ssl off as it does not make connection via unix socket\n"
"any more secure.", INFO_INFO);
}
status.exit_status= read_and_execute(!status.batch);
if (opt_outfile)
end_tee();
mysql_end(0);
DBUG_RETURN(0); // Keep compiler happy
}
status.exit_status= read_and_execute(!status.batch);
等待输入命令。
gdb简介
GDB 是 linux 环境下的一般功能强大的调试器,用来调试 C 或 C++ 写的程序。它可以做这些事情
- Start your program, specifying anything that might affect its behavior.
- Make your program stop on specified conditions.
- Examine what has happened, when your program has stopped.
- Change things in your program, so you can experiment with correcting the effects of one bug and go on to learn about another.
常用命令
- info threads:查看全部线程
- thread n:指定某个线程
- b:在某处打断点
- c:继续往下走
- s:执行一行代码,如果代码函数调用,则进入函数
- n:执行一行代码,函数调用不进入
- p:打印某个变量值
- list:打印代码的文本信息
- bt:查看某个线程的栈帧
- info b:查看当前所有断点信息
实际案例
进入gdb界面
[root@localhost mysql_home]# cat /dbfiles/source/debug.file
break main
run --defaults-file=/etc/my.cnf --user=mysql --gdb
gdb -x /dbfiles/source/debug.file /usr/local/bin/mysqld
(gdb) info thread
Id Target Id Frame
13 Thread 0x7fffb0ff9700 (LWP 1818) "mysqld" 0x00007ffff7bca9f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
12 Thread 0x7fffb17fa700 (LWP 1817) "mysqld" 0x00007ffff7bca9f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
11 Thread 0x7fffb1ffb700 (LWP 1816) "mysqld" 0x00007ffff7bca9f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
10 Thread 0x7fffb27fc700 (LWP 1815) "mysqld" 0x00007ffff7bca9f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
9 Thread 0x7fffb2ffd700 (LWP 1814) "mysqld" 0x00007ffff7bca9f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
8 Thread 0x7fffb37fe700 (LWP 1813) "mysqld" 0x00007ffff7bca9f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
7 Thread 0x7fffb3fff700 (LWP 1812) "mysqld" 0x00007ffff7bca9f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
6 Thread 0x7fffc0e85700 (LWP 1811) "mysqld" 0x00007ffff7bca9f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
5 Thread 0x7fffc1686700 (LWP 1810) "mysqld" 0x00007ffff7bca9f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
4 Thread 0x7fffc1e87700 (LWP 1809) "mysqld" 0x00007ffff7bca9f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
3 Thread 0x7fffc2688700 (LWP 1808) "mysqld" 0x00007ffff7bca9f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
2 Thread 0x7fffece5d700 (LWP 1807) "mysqld" 0x00007ffff62f153a in sigwaitinfo () from /lib64/libc.so.6
* 1 Thread 0x7ffff7fe3880 (LWP 1803) "mysqld" 0x00007ffff7bcde5d in nanosleep () from /lib64/libpthread.so.0
标签:11,DBA,定位问题,31,37,pstack,strace,mysql,dbfiles 来源: https://blog.csdn.net/u011962202/article/details/113843917