Conflicts in Linux and their coping strategies

  

The stability of the Linux system has become a good weapon for many critics against the ever-increasing Windows system. However, although the conflicts of the Linux system are relatively few, once they appear in unexpected situations, it is easy to get people into trouble. It is important to learn some common ways to prevent these conflicts from happening. It can help Linux system administrators avoid the dilemmas.

In an interview with this site, Mark Wilding and Dan Behman provide a relatively simple and straightforward way to prevent and fix Linux system conflicts. The two of them jointly published a new book —— "Self-Service Linux: Mastering the Art of Problem Determination".

It is generally believed that there is no conflict in the Linux server system. However, the conflict and stagnation of the system does exist some time. How is the conflict or stagnation at the application level different from the kernel level?

Mark Wilding: The conflict or stagnation at the application level is limited to a specific thread or process. Such conflicts or stagnation problems do not cause conflicts or stagnation of other threads or processes running on the same system. However, if it happens at the kernel level, it will affect all processes running in the system.

System conflicts and stagnation, what is the difference between them?

Dan Behman: At any level, the attributes of conflict and stagnation are basically the same. Stagnation occurs when a process or thread is blocked. This is due to some kind of lock or some hardware resources being busy, so the process or thread has to wait. Waiting for certain locks or resources is a common occurrence, but only when such locks or resources are eventually impossible to achieve will cause system stalls.

It is also important to note that stagnation problems can sometimes be diagnosed early. I mean, for example, a particular moment of a resource is very busy, and a process or thread that needs such a resource needs to wait for a very long time until the resource is idle. Users often don't understand the busy condition of resources, but only see that the process is waiting, so he thinks that the system has stalled, but in fact the system is still in accordance with the established workflow, but the speed is relatively slow.

And the conflict of the system is different from the above stagnation. It is mainly caused by some unknown hardware or software error. When such an error occurs, the extraordinary error handler will most likely call those diagnoses and reports, so that it is hopeful that the cause of the error can be traced.

A conflict can be seen as a fatal problem that requires analysis before it can be analyzed. The stagnation problem can be regarded as a real-time problem, which can be analyzed and solved in real time.

I know that one of the biggest advantages of Linux is the openness of its source code. In addition, there are other reasons why Linux can easily solve conflicts with other operating systems?

Behman: With the openness of this source code, there are quite a few reference files at every level of the Linux system. At the same time, since the source code is open, its development team is equally open. In this way, you can turn to the Linux kernel developers for the problems they encounter, including the original developers, even Linus Torvalds himself, and all the help-seeking programs are just sending an email. . As far as I know, this ability of Linux is missing from operating systems that are not open source.

What are the difficulties and challenges in dealing with stagnation?

Wilding: The stagnation of an application is for a variety of reasons, including those that may be caused by problems with kernel space. . This means that sometimes these issues are not controlled by the developer. But this is the advantage of Linux. All source code is open, so if you encounter a kernel block in a process, you can contact the source code to see how the process works in the kernel. However, in most cases, it is not necessary to conduct such in-depth research. In order to explore the reasons for the stagnation of the process, developers of application software need to carefully study the status and evidence of these software levels.

For users or maintenance personnel, they generally do not understand the specific working procedures of the application software, and they do not have the ability to enter the source code level for testing. This is a flexible operation when the system stagnation problem is encountered. deal with. For example, in some cases, process A is waiting for resources released after process B ends, while process B is waiting for resources owned by process A. This is the so-called “deadlock”, which is a problem that often occurs in complex applications and can be used as a diagnostic solution for stagnation problems.

If you don't know the specific waiting reasons for Process A and Process B, then you don't even need to understand whether this is the case of "deadlock" and you have no choice but to turn off both processes. Then turn it back on. It is this similar situation, so for the application software, it is very important to track the complete resources and locking situation, which can help solve this more difficult problem.

Behman: Another challenge with the stagnation problem is that when a stagnation problem occurs, the process or thread often does not know if it is stalled or when it will be stalled. This situation is different from the conflict problem. When a conflict problem occurs, the process can intercept most of the signals, and signal handlers can be added to the platform system to handle these extraordinary situations, such as cleaning up memory, stack traces, and so on. However, when the stagnation problem occurs, this extraordinary process is not completely impossible, but it is often flexible and not very fixed.

When a stagnation problem occurs, the system or application software is often restarted. One thing to keep in mind is that when a stagnation problem occurs, some of the information and evidence to diagnose the problem is often captured by the active kernel and application software. If you don't collect these important ones and immediately restart, then you never know how to diagnose the problem, and thus it is impossible to prevent it from happening again in the future.

For some extraordinary environments, the stability and reliability of the system is closely linked to the speed of problem diagnosis and resolution. Therefore, it is necessary to adhere to a reasonable idea, that is, "collect the error first, then restart".

Comparing with conflict, what is the first thing to do when it comes to stagnation?

Behman: Dealing with stagnation at the kernel level and dealing with stagnation at the application level Very different.

If you are asking about the application level. When a conflict problem occurs, there is an extraordinary function called "signal processing" that calls to handle a variety of things, such as in-memory, stack trace feedback, and so on. Therefore, in general, when confronted with conflicts, the primary problem is to collect, organize, and analyze the data.

While the stagnation problem occurs, this data is not automatically collected, and this is often a manual process. Two key points for collecting stagnant state data are tracking the output and the tracking feedback of the stack. This way of tracking the output can be used to figure out what the process is doing because it is constantly monitoring the process; for example, whether the process is still working and so on. The trace feedback of the stack can give the source code part of the current process. This is very important for developers, so they can study the cause of the stagnation of the process.

What are the main reasons for conflict and stagnation?

Wilding: For conflicts, we can divide the main reasons into two, one is Preventive type, the other is error-handling type. A preventive conflict is a situation in which a kernel or application software has a conflict due to a serious situation. The software is aware of this problem and produces a “suicide” approach to prevent further errors from occurring, thus avoiding more serious problems. For error-handling conflicts, it means that some illegal content in memory has entered, almost all program errors. In this case, the hardware detects the application and then sends a signal to block the progress of the software.

For the stagnation problem, there are generally two reasons. One is the case where a process or thread waits for a resource, which may or may not be resolved. Other processes or threads constrain the resource so that when the process or thread is waiting, it still occupies resources, so other processes or threads can only wait. An example is a process that locks on important resources that it occupies, and that itself is receiving the Internet without purpose. The second most common reason is a kind of “loop-by-loop” wait, where two or more processes are waiting for each other's resources, thus falling into “deadlock”. The solution to this situation can be to release a lock, or to share memory in a space.

Under these conflicts and stagnation, what are the basic investigative rules that governors can apply?

Wilding: One of the best basic guidelines is to organize work. . It is very important to keep the collected data in a clear place so that it can be easily found in the future. This is especially useful for situations where multiple problems are encountered at the same time.

Behman: Another basic guideline is to collect data quantitatively rather than qualitatively. For example, "At 6 o'clock last night, system memory utilization is lower", this is a qualitative observation. This has little effect on problem handling. The quantitative version of this example should collect and save all of the output data commands, as well as other related diagnostic commands. The purpose is to collect enough data, so as to avoid recurrence of the problem as much as possible; this is the "one-time" method, without the need to repeat the problem, multiple collections can get more complete data. .

Copyright © Windows knowledge All Rights Reserved