When doing Linux system operation, sometimes encounter a dead loop of user state process, that is, the system is slow, the process hangs, etc. How to solve the problem? The following small series will introduce you to the next user-process process infinite loop.
1, the problem phenomenon
business process (user mode multithreaded programs) linked to death, OS unresponsive, system log there is no exception. From the kernel-state stack of the process, it seems that all threads are stuck in the following stack flow in kernel mode:
[root@vmc116 ~]# cat /proc/27007/task/11825/stack
["ffffffff8100baf6"] retint_careful+0x14/0x32
["ffffffffffffffff"] 0xffffffffffffffffff
2, problem analysis
1) kernel stack analysis
From the kernel stack, all processes are blocked on retint_careful. This is the flow in the interrupt return process. The code (assembly) is as follows:
entry_64.S
The code is as follows: Br>
ret_from_intr:
DISABLE_INTERRUPTS(CLBR_NONE)
TRACE_IRQS_OFF
decl PER_CPU_VAR(irq_count)
/* Restore saved previous stack */< Br>
popq %rsi
CFI_DEF_CFA rsi,SS+8-RBP /* reg/off reset after def_cfa_expr */
leaq ARGOFFSET-RBP(%rsi), %rsp< Br>
CFI_DEF_CFA_REGISTER rsp
CFI_ADJUST_CFA_OFFSET RBP-ARGOFFSET< Br>
. . .
retint_careful:
CFI_RESTORE_STATE
bt $TIF_NEED_RESCHED,%edx
jnc retint_signal
TRACE_IRQS_ON
ENABLE_INTERRUPTS( CLBR_NONE)
pushq_cfi %rdi
SCHEDULE_USER
popq_cfi %rdi
GET_THREAD_INFO(%rcx)
DISABLE_INTERRUPTS(CLBR_NONE)
TRACE_IRQS_OFF
jmp retint_check
This is actually the process that the user-mode process returns from the interrupt after the user state is interrupted by the interrupt, combined with retint_careful+0x14/0x32, disassembling , you can confirm that the blocking point is actually
SCHEDULE_USER
This is actually calling schedule() for scheduling, which means that when the process goes to the process of interrupt return, it needs to be scheduled ( TIF_NEED_RESCHED is set, so scheduling occurs here.
There is a question: Why can't I see the stack frame of schedule() in the stack?
Because this is directly called by the assembly, there is no related stack frame push and context save operation.
2) Performing state information analysis
From the results of the top command, the relevant thread is actually in the R state, the CPU is almost completely exhausted, and most of it is consumed in the user state:
[root@vmc116 ~]# top
top - 09:42:23 up 16 days, 2:21, 23 users, load average: 84.08, 84.30, 83.62
Tasks: 1037 total, 85 running, 952 sleeping, 0 stopped, 0 zombie
Cpu(s): 97.6%us, 2.2%sy, 0.2%ni, 0.0%id, 0.0%wa, 0.0%hi , 0.0%si, 0.0%st
Mem: 32878852k total, 32315464k used, 563388k free, 374152k buffers
Swap: 35110904k total, 38644k used, 35072260k free, 28852536k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
27074 root 20 0 5316m 163m 14m R 10.2 0.5 321:06.17 z_itask_templat
27084 root 20 0 5316m 163m 14m R 10.2 0.5 296:23.37 z_itask_templat
27085 root 20 0 5316m 163m 14m R 10.2 0.5 337:57.26 z_itask _templat
27095 root 20 0 5316m 163m 14m R 10.2 0.5 327:31.93 z_itask_templat
27102 root 20 0 5316m 163m 14m R 10.2 0.5 306:49.44 z_itask_templat
27113 root 20 0 5316m 163m 14m R 10.2 0.5 310:47.41 z_itask_templat
25730 root 20 0 5316m 163m 14m R 10.2 0.5 283:03.37 z_itask_templat
30069 root 20 0 5316m 163m 14m R 10.2 0.5 283:49.67 Z_itask_templat
13938 root 20 0 5316m 163m 14m R 10.2 0.5 261:24.46 z_itask_templat
16326 root 20 0 5316m 163m 14m R 10.2 0.5 150:24.53 z_itask_templat
6795 root 20 0 5316m 163m 14m R 10.2 0.5 100:26.77 z_itask_templat
27063 root 20 0 5316m 163m 14m R 9.9 0.5 337:18.77 z_itask_templat
27065 root 20 0 5316m 163m 14m R 9.9 0.5 314:24.17 Z_itask_templat
27068 root 20 0 5316m 163m 14m R 9.9 0.5 336:32.78 z_itask_templat
27069 root 20 0 5316m 163m 14m R 9.9 0.5 338:55.08 z_itask_templat
27072 root 20 0 5316m 163m 14m R 9.9 0.5 306:46.08 z_itask_templat
27075 root 20 0 5316m 163m 14m R 9.9 0.5 316:49.51 z_itask_templat
. . .
3) Process scheduling information
See the scheduling information of the relevant thread:
[root@vmc116 ~]# cat /proc/27007/task/11825/schedstat< Br>
15681811525768 129628804592612 3557465
[root@vmc116 ~]# cat /proc/27007/task/11825/schedstat
15682016493013 129630684625241 3557509
[root@ Vmc116 ~]# cat /proc/27007/task/11825/schedstat
15682843570331 129638127548315 3557686
[root@vmc116 ~]# cat /proc/27007/task/11825/schedstat
15683323640217 129642447477861 3557793
[root@vmc116 ~]# cat /proc/27007/task/11825/schedstat
15683698477621 129645817640726 3557875
Discovering related threads The scheduling statistics have been increasing, indicating that the relevant thread has been scheduled to run, and its state has always been R, speculation that it is likely to have an infinite loop (or non-sleep deadlock) in the user state.
There is another problem here: Why is the CPU usage of each thread from top to only about 10%, instead of the 100% occupancy caused by the infinite loop process normally seen?
Because there are a lot of threads and the same priority, according to the CFS scheduling algorithm, the time slice will be evenly distributed, and one of the threads will not be allowed to monopolize the CPU. The result is a round-robin scheduling between multiple threads, consuming all the CPUs. .
Another question: Why is the kernel not detecting softlockup in this case?
Because the priority of the business process is not high, it will not affect the scheduling of the watchdog kernel thread (the highest priority real-time thread), so there will be no softlockup.
Another question: Why do you always block retint_careful every time you look at the thread stack, not elsewhere?
Because this (when the interrupt returns) is the timing of the scheduling, scheduling can not occur at other points in time (regardless of other circumstances ~), and we look at the behavior of the thread stack, must also rely on the process scheduling So every time we look at the stack, it is the time to view the stack process (cat command) is dispatched, this time is the time when the interrupt returns, so the blocking point just seen is retint_careful.
4) User State Analysis
From the above analysis, it is assumed that the user state has a deadlock.
User mode confirmation method:
Deploy debug information, then gdb attach related processes, confirm the stack, and combine code logic analysis.
Final confirmation that the problem did create an infinite loop in the user state process.
The above is the processing method of the next user state process infinite loop of the Linux system. First, analyze the cause of the problem, and then deal with it according to the reason. Have you learned it?
Through the history command under Linux, we can view the used commands, that is, th
If you need regular statistics due to work needs or personal preferences, you can install some stati
eCryptFS is a file encryption system in Linux system. It can encrypt files or direc
Sometimes Linux users will find that they cannot start when they start the system.
Difference between Linux exit command and _exit command
Unix uninstall gdb debugging tools trick
Two ways to enable IPv6 on Ubuntu systems
Linux system settings ssh connection interrupt time skills
Linux MyEclipse start Tomcat too slow how to do?
How to boot Linux ISO image file from hard disk
Ubuntu install Nvidia Optimus driver steps
Linux mysql how to change the root password
What if Ubuntu can't start Eclipse?
Linux how to use the command to update the file directory time
How to disguise Linux system Set the system illusion for hackers
How does the Linux system upload and download files with SecureCRT?
Easily turn off win2003 shutdown prompt window to speed up the shutdown speed
Win10 Mobile annual update 14332 phone dialing flashback solution
Win10 office security mode how to start win10 office security mode startup method
What is the problem with Windows 8?
Which is better for Windows 7 and Windows XP?
How to enable or disable system autoplay in Windows 8
Setting up a web server on Windows 2003 [Setting IIS to share on another machine]