| by YoungTimes | No comments

C++多线程编程-揪出高CPU占用的线程

在工程实践中,有很多对性能要求比较苛刻的场景,要求CPU的占用不能超过指定的阈值,以保证系统整体的实时响应。本文主要记录下如何抓出CPU占用高的线程。

假设有如下业务代码:

#include <iostream>
#include <thread>
#include <chrono>

void fun_1() {
    int a = 1;
    while (1) {
        a = a + 1;
    }
}

void fun_2() {
    int b = 1;
    while (1) {
        b = b + 1;
        std::this_thread::sleep_for(std::chrono::seconds(100));
    }
}

int main() {
    std::thread t1(fun_1);
    std::thread t2(fun_2);

    t1.join();
    t2.join();

    return 0;
}

代码编译:

g++ cpu.cpp -o cpu -pthread -g

1.定位线程

使用top命令,定位高占用的进程。

$top
2988 xxx   20   0   32564   1860   1708 S  99.3  0.0  24:27.84 cpu         
20178 xxx   20   0 5067244 195804 118856 S   1.7  2.5   5:09.94 chrome      
10485 xxx   20   0 5091816 227952 123736 S   1.0  2.9   5:03.26 chrome 

我们看到进程ID为2988的线程CPU占用率达到了98%,这就是我们要找的高CPU占用进程。

$top -Hp 2988
2989 xxx   20   0   32564   1860   1708 R 99.9  0.0  22:49.20 cpu          
2988 xxx   20   0   32564   1860   1708 S  0.0  0.0   0:00.00 cpu          
2990 xxx   20   0   32564   1860   1708 S  0.0  0.0   0:00.00 cpu 

通过top -Hp可以看到,进程2988包含了3个线程,其中2989线程的CPU占用非常高。

或者使用pstree查看进程树:

$ pstree -p xxx | grep cpu
bash(32251)---cpu(2988)-+-{cpu}(2989)
                         -{cpu}(2990)

或者使用ps -Lf pid查询进程的线程信息:

UID        PID  PPID   LWP  C NLWP STIME TTY      STAT   TIME CMD
xxxxxxx   2988 32251  2988  0    3 09:22 pts/1    Sl+    0:00 ./cpu
xxxxxxx   2988 32251  2989 99    3 09:22 pts/1    Rl+   42:07 ./cpu
xxxxxxx   2988 32251  2990  0    3 09:22 pts/1    Sl+    0:00 ./cpu

2.定位代码

2.1 strace -T -r -c -p pid命令

使用strace -T -r -c -p pid查看系统调用和花费的时间。我们代码中没有系统调用。

2.2 pstack

pstack显示每个进程的栈跟踪。使用系统安装的pstack出现以下错误:

2988: ./cpu
pstack: Input/output error
failed to read target.

使用自定义的pstack(效果不如GDB):

#!/bin/sh

if test $# -ne 1; then
    echo "Usage: `basename $0 .sh` <process-id>" 1>&2
    exit 1
fi

if test ! -r /proc/$1; then
    echo "Process $1 not found." 1>&2
    exit 1
fi

# GDB doesn't allow "thread apply all bt" when the process isn't
# threaded; need to peek at the process to determine if that or the
# simpler "bt" should be used.

backtrace="bt"
if test -d /proc/$1/task ; then
    # Newer kernel; has a task/ directory.
    if test `/bin/ls /proc/$1/task | /usr/bin/wc -l` -gt 1 2>/dev/null ; then
    backtrace="thread apply all bt"
    fi
elif test -f /proc/$1/maps ; then
    # Older kernel; go by it loading libpthread.
    if /bin/grep -e libpthread /proc/$1/maps > /dev/null 2>&1 ; then
    backtrace="thread apply all bt"
    fi
fi

GDB=${GDB:-/usr/bin/gdb}

if $GDB -nx --quiet --batch --readnever > /dev/null 2>&1; then
    readnever=--readnever
else
    readnever=
fi

# Run GDB, strip out unwanted noise.
$GDB --quiet $readnever -nx /proc/$1/exe $1 <<EOF 2>&1 | 
set width 0
set height 0
set pagination no
$backtrace
EOF
/bin/sed -n \
    -e 's/^\((gdb) \)*//' \
    -e '/^#/p' \
    -e '/^Thread/p'
#end

执行:sudo bash pstack.sh 2988

可以看到各个线程的堆栈,从而定位高CPU占用的代码位置。

Thread 3 (Thread 0x7efea5eae700 (LWP 19157)):
 0  0x00007efea6e50c60 in nanosleep () from /lib/x86_64-linux-gnu/libpthread.so.0
 1  0x00007efea5eadd60 in ?? ()
 2  0x000055bc354b3ff0 in std::enable_if > >::value, std::chrono::duration > >::type std::chrono::duration_cast >, long, std::ratio<1l, 1l> >(std::chrono::duration > const&) ()
 3  0x000000002efffa9b in ?? ()
 4  0x00007efea5eaddc8 in ?? ()
 5  0x7ab7c2f9478dc700 in ?? ()
 6  0x00007efea5eadde0 in ?? ()
 7  0x000055bc354b3df3 in fun_2() ()
 Thread 2 (Thread 0x7efea66af700 (LWP 19156)):
 0  0x000055bc354b3da9 in fun_1() ()
 1  0x000055bc354b45c7 in void std::__invoke_impl<void, void ()()>(std::__invoke_other, void (&&)()) ()
 2  0x000055bc354b43d5 in std::__invoke_result::type std::__invoke(void (*&&)()) ()
 3  0x000055bc354b4aa2 in decltype (__invoke((_S_declval<0ul>)())) std::thread::_Invoker >::_M_invoke<0ul>(std::_Index_tuple<0ul>) ()
 4  0x000055bc354b4a5e in std::thread::_Invoker >::operator()() ()
 5  0x000055bc354b4a2e in std::thread::_State_impl > >::_M_run() ()
 6  0x00007efea73336df in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
 7  0x00007efea66af700 in ?? ()
 8  0x0000000000000000 in ?? ()
 Thread 1 (Thread 0x7efea77fc740 (LWP 19155)):
 0  0x00007efea6e47d2d in __pthread_timedjoin_ex () from /lib/x86_64-linux-gnu/libpthread.so.0
 1  0x0000000000000000 in ?? ()

2.3 使用GDB Attach

使用GDB Attach到高CPU占用的进程,打印线程的堆栈,也可以追踪到有可疑问题的代码。

$ sudo gdb attach 2988
(gdb) thread 2 apply bt
 [Switching to thread 2 (Thread 0x7efea66af700 (LWP 19156))]
 0  0x000055bc354b3da9 in fun_1 () at cpu.cpp:8
 8            a = a + 1;

Java追查高CPU占用:

https://blog.csdn.net/coldcp/article/details/101192481?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-2.channel_param&depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-2.channel_param

参考材料:

https://www.cnblogs.com/fengtai/p/12181907.html

发表评论