There are many reasons why Web sites may not work properly, making it difficult to systematically check all issues. The following will focus on the analysis of the most common problems that cause Web sites to crash. If these general problems can be solved, then there will be the ability to deal with some unexpected situations.
The disk is full
The most likely cause of the system not working properly is that the disk is full. A good network administrator will pay close attention to disk usage. At some time, some of the load on the disk needs to be transferred to the backup storage medium (such as tape).
The log file will quickly run out of all disk space. Web server log files, SQL*Net log files, JDBC log files, and application server log files are all equivalent to memory leaks. You can take steps to save the log files in a different file system than the operating system. The web server will also be suspended when the log file system space is full, but the chances of the machine itself being suspended are greatly reduced.
C pointer error
Programs written in C or C++, such as the Web server API module, may cause a system crash, because as long as an error occurs in the indirect reference pointer (ie, access to the pointed memory), Causes the operating system to terminate all programs. In addition, a Java analog that uses a bad C pointer will access an empty object reference. A null reference in Java usually does not cause the JVM to exit immediately, but only if the programmer can handle the error properly using exception handling. In this regard, Java does not require much attention, but using Java for additional metrics on reliability can have some negative impact on performance.
Memory Leaks
C/C++ programs can also create another pointer problem: losing references to allocated memory. This problem usually occurs when memory is allocated in a subroutine, and as a result, the program does not release memory when it returns from a subroutine. As a result, the reference to the allocated memory is lost, and the process will continue to use the memory as long as the operating system is still running. The result is that programs that have taken up more memory will degrade system performance until the machine is completely stopped, and the memory is completely emptied.
One of the solutions is to use a code analysis tool (such as Purify) to carefully analyze the code to identify possible leaks. However, this method cannot find a leak in the library caused by other reasons, because the source code of the library is not available. Another way is to clear and restart the process at regular intervals. The Apache web server will create and clean up child processes for this reason.
Although Java itself has no pointers, in general, Java programs use memory more badly than C programs. In Java, objects are created frequently, and the garbage collector releases memory until all references to the object disappear. Even if the garbage collector is running, it will only return the memory to the virtual machine VM, not to the operating system. The result: Java programs will use all of their heaps and never release them. Due to the code generated by the Just In Time (JIT) compiler, the size of a Java program can sometimes expand to several times the maximum heap size.
There is still a problem, the situation is similar. A database connection is allocated from the connection pool, and the assigned connection cannot be returned back to the connection pool. Some connection pools have an activity timer that will release the database connection after a period of inactivity, but this is not enough to alleviate the waste of resources caused by bad code quickly leaking database connections.
Process lacks file descriptors
If a file descriptor has been assigned to a web server or other critical process, but it requires more file descriptors, the server or process will be suspended or reported an error. Until the required file descriptor is obtained. File descriptors are used to keep track of open files and open sockets. Open files and open sockets are a key component of a Web server. The task is to copy files to a network connection. By default, most shells have 64 file descriptors, which means that each process started from the shell can open 64 files and network connections simultaneously. Most shells have an embedded ulimit command that increases the number of file descriptors.
Thread deadlock
The performance improvement brought by multi-threading is at the cost of reliability, mainly because it is possible to create a thread deadlock. When a thread is deadlocked, the first thread waits for the second thread to release the resource, while the second thread is waiting for the first thread to release the resource. Let's imagine a situation where two people meet face to face on the sidewalk. In order to give way to the other party, the two men take one step at the same time. The two sides cannot pass, and at the same time take another step to the other side. . Both sides blocked the other side's way in the same way. Assuming that this situation continues, it is not difficult to understand why a deadlock has occurred.
There is no easy way to solve the deadlock, because it is a very specific situation to make the thread generate this kind of problem, and often has a very high load. Most software tests don't generate enough load, so it's impossible to expose all threading errors. There is a thread deadlock problem in every language that uses threads. Since thread programming using Java is easier than using C, there are more people using threads in Java programmers, and thread deadlocks are becoming more common. You can increase the use of synchronization keywords in your Java code to reduce deadlocks, but doing so can also affect performance. If the load is too heavy, there may be deadlocks inside the database.
If the program uses a permanent lock, such as a lock file, and the program does not unlock the state at the end of the program, other processes may not be able to use this type of lock, neither lock nor unlock. This will further cause the system to not work properly. You must manually unlock it at this time.
Server Overloading
Netscape Web Server uses one thread per connection. The Netscape Enterprise web server hangs after the thread runs out, without providing any services for existing connections. If there is a load distribution mechanism that detects that the server is not responding, the load on that server can be distributed to other web servers, which may cause these servers to run out of all threads one by one. As a result, the entire server group will be suspended. The operating system level may continue to receive new connections, while applications (web servers) are unable to service these connections. The user can see the connected message on the browser status line, but nothing will happen in the future. //This article comes from the computer software and hardware application network www.45it.com
One way to solve the problem is to set the value of the obj.conf parameter RqThrottle to a value below the number of threads, so if the value of RqThrottle is crossed, Will not receive new connections. Servers that cannot connect will stop working, and the server on the connection will be slower, but at least the connected server will not be suspended. At this point, the file descriptor should be set to at least the same number as the number of threads, otherwise the file descriptor will become a bottleneck.
The temporary table in the database is not enough
The number of temporary tables (cursors) of many databases is fixed, and the temporary table retains the memory area of the query result. After the data in the temporary table is read, the temporary table is released, but a large number of simultaneous queries may exhaust a fixed number of all temporary tables. At this time, other queries need to wait in line until the temporary table is released before continuing to run.
This is a problem that is not easily noticed by programmers, but will be revealed during load testing. But for the database administrator (DataBase Administrator, DBA), this problem is very obvious.
In addition, there are some other problems: the set table space is not enough, the serial number limit is too low, which will lead to table overflow error. These issues demonstrate the importance of a good DBA to periodically check the database settings and performance used for production. Moreover, most database vendors also provide monitoring and modeling tools to help solve these problems.
In addition, there are many factors that are most likely to cause the Web site to not work. Such as: correlation, subnet traffic overload, poor device drivers, hardware failures, wildcards including error files, inadvertently locked key tables.