Dec 13, 2010

Linux load average

I wanted to write an article about a topic that over time, it is still generating much confusion: the load average.

The system load average is a set of three numerical values which are provided by tools such as uptime or top. These values represent the average number of system processes that during the last 1, 5 and 15 minutes have been waiting for any resource of the system (CPU, disk access, network, etc.)

[root@centos ~]# uptime
15:43:45 up 9 days,  5:19,  1 user,  load average: 1.62, 1.49, 1.39

[root@centos ~]# top
top - 15:44:32 up 9 days,  5:20,  1 user,  load average: 1.38, 1.43, 1.37
...

In the previous output of the uptime command, 1.49 means that during the last 5 minutes, an average of 1.49 processes have been blocked waiting for some resource allocation.

For instance, I usually set a trigger in Zabbix for when the average load during the last 5 minutes has been higher than the number of cores available on the monitored machine.

When an alarm of this type is raised, does not necessarily have to mean that the CPU is overloaded. At that moment we have to use other Linux tools, such as top, vmstat, iostat, vnstat, etc, in order to find out what process or processes are involved and what resources are affected.

Finally, to emphasize the word of the title (Linux...), since these values for UNIX systems represent the number of processes which have utilized (running) the CPU or have been expecting for it (runnable).


No comments:

Post a Comment