系统相关
首页 > 系统相关> > linux ftrace

linux ftrace

作者:互联网

1.What is tracing?

  Tracing is the specialized use of logging to record information about a program’s flow of execution.

  You know when you add print messages in your code to debug it? You are tracing the application with your own “tracing system”. The debug messages added to the code are static tracepoints, and your “trace system” might send the debug messages to the program’s standard output or a log file.

  Well, that works. But does not scale. Every time you want to trace some part of the code, you will have to add new debug messages and rebuild it. And you do not have much control over where the messages go. What if you want to save them to analyze later? What if you want to filter the messages? And the code will be a mess, with lots of print statements (you will have to be very careful to not commit any code with debug messages, right?).

  The fact is that most of the time we don’t need to add debug messages to the code because there are already tracing tools and frameworks that can do that for us.

2.What is ftrace?

  Ftrace is a tracing framework for the Linux kernel. It was added to the kernel back in 2008 and has evolved a lot since then.

  Ftrace stands for function tracer and basically lets you watch and record the execution flow of kernel functions. It was created by Steven Rostedt, derived from two other tools called latency tracer from Ingo Molnar and Steven’s logdev utility.

  With ftrace you can really see what the kernel is doing. You can trace function calls and learn a lot about how the kernel works. You can find out which kernel functions are called when you run a user space application. You can profile functions, measure execution time and find out bottlenecks and performance issues. You can identify hangs in kernel space. You can measure the time it takes to run a real-time task and find out latency issues. You can measure stack usage in kernel space and find out possible stack overflows. You can really do a lot of things to monitor and find bugs in the kernel!

2.1.How does ftrace work?

  There are two main types of tracing: static tracing and dynamic tracing.

  Ftrace uses a combination of both static probes (function tracing, event tracing, etc) and dynamic probes (kprobes, uprobes, etc).

  To trace function calls, ftrace will build the kernel with GCC’s -pg option.

  When the kernel is built with -pg, GCC will prepend machine instructions to the prologue of every non-inlined function that will redirect the execution to the ftrace’s trampolines and tracer plugins that perform the actual tracing.

2.2.Ftrace framework

  ftrace 是内建于 Linux 内核的跟踪工具,从 2.6.27 开始加入主流内核。使用 ftrace 可以调试或者分析内核中发生的事情。ftrace 提供了不同的跟踪器,以用于不同的场合,比如跟踪内核函数调用、对上下文切换进行跟踪、查看中断被关闭的时长、跟踪内核态中的延迟以及性能问题等。系统开发人员可以使用 ftrace 对内核进行跟踪调试,以找到内核中出现的问题的根源,方便对其进行修复。

  Ftrace 有两大组成部分,一是 framework,另外就是一系列的 tracer 。每个 tracer 完成不同的功能,它们统一由 framework 管理。 ftrace 的 trace 信息保存在 ring buffer 中,由 framework 负责管理。 Framework 利用 debugfs 系统在 /debugfs 下建立 tracing 目录,并提供了一系列的控制文件。
在这里插入图片描述

2.3.内核支持ftrace

  内核源码目录下的 kernel/trace/Makefile 文件给出了 ftrace 相关的编译选项。ftrace 相关的配置选项比较多,针对不同的跟踪器有各自对应的配置选项。不同的选项有不同的依赖关系,内核源码目录下的kernel/trace/Kconfig 文件描述了这些依赖关系。可以参考 Makefile 文件和 Konfig 文件,然后选中自己所需要的跟踪器。

CONFIG_FUNCTION_TRACER 
CONFIG_FUNCTION_GRAPH_TRACER 
CONFIG_CONTEXT_SWITCH_TRACER 
CONFIG_NOP_TRACER 
CONFIG_SCHED_TRACER 
CONFIG_DYNAMIC_FTRACE //enable/disable ftrace tracepoints dynamically

2.3.1.内核配置
在这里插入图片描述

  ftrace 通过 debugfs 向用户态提供了访问接口,所以还需要将 debugfs 编译进内核。激活对 debugfs 的支持,可以直接编辑内核配置文件 .config ,设置 CONFIG_DEBUG_FS=y;配置内核时激活debugfs挂载到/sys/kernel/debug,如下所示:

>//Mount the kernel to /sys/kernel/debug and instruct it to use the debugfs file system: 
># mount -t debugfs nodev /sys/kernel/debug

  配置完成后,编译安装新内核,然后启动到新内核。 注意,激活 ftrace 支持后,编译内核时会使用编译器的 -pg 选项,这是在 kernel/trace/Makefile 文件中定义的,使用 -pg 选项会在编译得到的内核映像中加入大量的调试信息。一般情况下,只是在开发测试阶段激活 ftrace 支持,以调试内核,修复 bug 。最终用于发行版的内核则会关闭 -pg 选项,也就无法使用 ftrace。(refer to If the kernel is compiled with function tracer support the -pg compile option.patch)

kernel/trace/Makefile:
ifdef CONFIG_FUNCTION_TRACER
ORIG_CFLAGS := $(KBUILD_CFLAGS)
KBUILD_CFLAGS = $(subst $(CC_FLAGS_FTRACE),,$(ORIG_CFLAGS))

Kernel/Makefile:
ifdef CONFIG_FUNCTION_TRACER
ifndef CC_FLAGS_FTRACE  
CC_FLAGS_FTRACE := -pg
endif

  激活内核对 ftrace 的支持后会在 debugfs 下创建一个 tracing 目录 /sys/kernel/debug/tracing 。该目录下包含了 ftrace 的控制和输出文件,根据编译内核时针对 ftrace 的设定不同,该目录下实际显示的文件和目录与这里也会不同。

  tracing目录(/sys/kernel/debug/tracing)中的文件控制着跟踪的能力。根据你在内核配置时的选项的不同,这里列的文件可能稍有差异。你可以在内核源代码目录下/Documentation/trace目录中找到这些文件的信息。

osborn:/d/tracing # ls -l
-r--r--r--  1 root root  0 1970-01-01 08:00 README
-r--r--r--  1 root root  0 1970-01-01 08:00 available_events
-r--r--r--  1 root root  0 1970-01-01 08:00 available_filter_functions
-r--r--r--  1 root root  0 1970-01-01 08:00 available_tracers
-rw-rw-r--  1 root shell 0 1970-01-01 08:00 buffer_size_kb
-r--r--r--  1 root root  0 1970-01-01 08:00 buffer_total_size_kb
-rw-r--r--  1 root root  0 1970-01-01 08:00 cpu_freq_switch_profile_enabled
-rw-r--r--  1 root root  0 2018-01-01 08:29 current_tracer
...

2.3.2.ftrace 跟踪器

   ftrace 当前包含多个跟踪器,用于跟踪不同类型的信息,比如进程调度、中断关闭等。可以查看文件 available_tracers 获取内核当前支持的跟踪器列表。在编译内核时,也可以看到内核支持的跟踪器对应的选项,如之前图所示。

2.3.3.available_tracers(被编译进系统的跟踪器)

   This file contains all the available tracers for the installed version of ftrace. To see the list of available tracers, use the cat command to view the contents of the file:

#cat /sys/kernel/debug/tracing/available_tracers
events wakeup preemptirqsoff preemptoff irqsoff ftrace sched_switch none

2.3.4.current_tracer(当前启用的跟踪器)

   To manually start a tracing session, first select the tracer you wish to use from the list in available_tracers and then use the echo to insert the name of the tracer into /sys/kernel/debug/tracing/current_tracer:

#echo events > /sys/kernel/debug/tracing/current_tracer

2.3.5.ftrace_enabled

   The following tracers (listed below) give different output depending on whether or not the sysctl ftrace_enabled is set. To set ftrace_enabled, one can either use the sysctl function or set it via the proc file system interface.

  sysctl kernel.ftrace_enabled=1
 or
  echo 1 > /proc/sys/kernel/ftrace_enabled

  To disable ftrace_enabled simply replace the ‘1’ with ‘0’ in the above commands.
  When ftrace_enabled is set the tracers will also record the functions that are within the trace.

2.3.6.tracing_on

   Obviously there are times where you only want to trace a particular code path. Perhaps you only want to trace what is happening when you run a specific test. The file tracing_on is used to disable the ring buffer from recording data:

[tracing]# echo 0 > /sys/kernel/debug/tracing/tracing_on

   This will disable the Ftrace ring buffer from recording. Everything else still happens with the tracers and they will still incur most of their overhead. They do notice that the ring buffer is not recording and will not attempt to write any data, but the calls that the tracers make are still performed.

//使能trace
echo 1 > /sys/kernel/debug/tracing/tracing_on

A common run might be:

# echo 0 > tracing_on
# echo function_graph > current_tracer
# echo 1 > tracing_on; run_test; echo 0 > tracing_on

   The first line disables the ring buffer from recording any data. The next enables the function graph tracer. The overhead of the function graph tracer is still present but nothing will be recorded into the trace buffer. The last line enables the ring buffer, runs the test program, then disables the ring buffer. This narrows the data stored by the function graph tracer to include mostly just the data accumulated by the run_test program.

   To enable the function tracer, just echo “function” into the current_tracer file.

    [tracing]# echo function > current_tracer
    [tracing]# echo 1 > tracing_on
    [tracing]# cat trace | head -10
    # tracer: function
    #
    #          TASK-PID    CPU#    TIMESTAMP  FUNCTION
    #              | |      |          |        |
                bash-16939 [000]  6075.461561: mutex_unlock <-tracing_set_tracer
              <idle>-0    [001]  6075.461561: _spin_unlock_irqrestore <-hrtimer_get_next_event
              <idle>-0    [001]  6075.461562: rcu_needs_cpu <-tick_nohz_stop_sched_tick
                bash-16939 [000]  6075.461563: inotify_inode_queue_event <-vfs_write
              <idle>-0    [001]  6075.461563: mwait_idle <-cpu_idle
                bash-16939 [000]  6075.461563: __fsnotify_parent <-vfs_write

   The header explains the format of the output pretty well. The first two items are the traced task name and PID. The CPU that the trace was executed on is within the brackets. The timestamp is the time since boot, followed by the function name. The function in this case is the function being traced with its parent following the “<-” symbol.

2.3.7.trace_options 文件用来控制trace的输出格式。

   The trace_options file is used to control what gets printed in the trace output. To see what is available, simply cat the file:

cat /sys/kernel/debug/tracing/trace_options

echo noprint-parent > trace_options

echo sym-offset > trace_options

2.4.获取trace

   This file contains human-readable trace output.

   This file contains the same output as /sys/kernel/debug/tracing/trace, but is meant to be piped into a command. Unlike /sys/kernel/debug/tracing/trace, reading from this file consumes its output.

2.5.set_event

//为了简便,我们只需使能所有的事件
echo *:* > /sys/kernel/debug/tracing/set_event

2.6.Using trace_printk()

   printk() is the king of all debuggers, but it has a problem. If you are debugging a high volume area such as the timer interrupt, the scheduler, or the network, printk() can lead to bogging down the system or can even create a live lock. It is also quite common to see a bug “disappear” when adding a few printk()s. This is due to the sheer overhead that printk() introduces.

   Ftrace introduces a new form of printk() called trace_printk(). It can be used just like printk(), and can also be used in any context (interrupt code, NMI code, and scheduler code). What is nice about trace_printk() is that it does not output to the console. Instead it writes to the Ftrace ring buffer and can be read via the trace file.

   Writing into the ring buffer with trace_printk() only takes around a tenth of a microsecond or so. But using printk(), especially when writing to the serial console, may take several milliseconds per write. The performance advantage of trace_printk() lets you record the most sensitive areas of the kernel with very little impact.

   For example you can add something like this to the kernel or module:

trace_printk("read foo %d out of bar %p\n", bar->foo, bar);
Then by looking at the trace file, you can see your output.
    [tracing]# cat trace
    # tracer: nop
    #
    #          TASK-PID    CPU#    TIMESTAMP  FUNCTION
    #              | |      |          |        |
              <...>-10690 [003] 17279.332920: : read foo 10 out of bar ffff880013a5bef8

   The above example was done by adding a module that actually had a foo and bar construct.
trace_printk() output will appear in any tracer, even the function and function graph tracers.

   [tracing]# echo function_graph > current_tracer
    [tracing]# insmod ~/modules/foo.ko
    [tracing]# cat trace
    # tracer: function_graph
    #
    # CPU  DURATION                  FUNCTION CALLS
    # |    |  |                    |  |  |  |
    3) + 16.283 us  |      }
    3) + 17.364 us  |    }
    3)              |    do_one_initcall() {
    3)              |      /* read foo 10 out of bar ffff88001191bef8 */
    3)  4.221 us    |    }
    3)              |    __wake_up() {
    3)  0.633 us    |      _spin_lock_irqsave();
    3)  0.538 us    |      __wake_up_common();
    3)  0.563 us    |      _spin_unlock_irqrestore();

Yes, the trace_printk() output looks like a comment in the function graph tracer.

2.7.ftrace的一般使用流程:

   设置跟踪器 ——》开启跟踪器 ——》关闭跟踪器

echo function > current_tracer
echo funtion_graph > current_tracer
echo ‘:mod:[module_name]’ > set_ftrace_filter
例如,要跟踪模块 ipv6 中的函数
echo ':mod:ipv6' > set_ftrace_filter
echo 1 > tracing_on
echo 0 > tracing_on

ftrace 脚本示例:

#!/bin/sh
#ftrace.sh
 
dir="/sys/kernel/debug/tracing/"
save="/root/Desktop/"

echo 1 > /proc/sys/kernel/ftrace_enabled
echo 0 > ${dir}tracing_on
echo function_graph > ${dir}current_tracer
echo ':mod:openvswitch' > ${dir}set_ftrace_filter
echo 1 > ${dir}tracing_on
sleep 2
echo 0 > ${dir}tracing_on
cat ${dir}trace > ${save}trace_records

3.trace-cmd/kernelshark

   Ftrace在debugfs中的接口很简单,但在实际操作时,还是显得不太灵活。如果能让一个应用程序代替我们与Ftrace的接口交互,将会大大简化其使用,而这个程序就是trace-cmd。此外,为了能够更加直观的分析过滤Ftrace的log,将会介绍一个基于GUI的工具kernelshark。

sudo apt install trace-cmd kernelshark

   Here’s some examples of trace-cmd:

#trace-cmd record -e sched myprogram

refer to https://lwn.net/Articles/410200/

3.1.kernelshark

   kernelshark是一个用来查看trace.dat的GUI程序

3.Code:

5.Debug

echo 0 > /sys/kernel/debug/tracing/tracing_on && 
echo > /sys/kernel/debug/tracing/trace && 
echo 51200 > /sys/kernel/debug/tracing/buffer_size_kb &&
echo *:* > /sys/kernel/debug/tracing/set_event && 
sleep 2 && echo 1 > /sys/kernel/debug/tracing/tracing_on &&
sleep 10 && echo 1 > /sys/kernel/debug/tracing/tracing_on &

refer to

标签:kernel,ftrace,trace,tracing,echo,linux,tracer
来源: https://blog.csdn.net/weixin_41028621/article/details/115175633