How to Monitor IO Performance and Disks Activities on Solaris

  • by
Mostly, we can monitor overall IO activities by using "top" in Solaris, in there, an indicator called "iowait" can be a good target to watch:
$ top
load averages:  5.47,  4.94,  5.10             orcl              19:49:02
475 processes: 147 sleeping, 1 stopped, 7 on cpu
CPU states: 38.0% idle, 37.6% user, 20.5% kernel, 13.9% iowait,  0.0% swap
Memory: 10.0G real, 1.6G free, 4.8G swap in use, 5.5G swap free
   PID USERNAME THR PR NCE  SIZE   RES STATE   TIME FLTS    CPU COMMAND
  9398 oracle     2  0   0  6.1G 50.3M sleep 124.0H    0  6.25% oracle
  9400 oracle     2  0   0  6.1G 52.0M cpu45  71.7H    0  5.42% oracle
  9402 oracle     2  0   0  6.1G 50.5M sleep  37.5H    0  4.68% oracle
...

Nowadays, "iowait" had been evolved into "wa" on Linux, let's see the definition of "wa" in Ubuntu (Ubuntu Manual: top)
wa  --  iowait
Amount of time the CPU has been waiting for I/O to complete.
That is, 50% iowait means that half of processes are waiting for IO to complete in terms of CPU time.

A fine-grained monitoring is to use "iostat", you can see how busy among the disks by adding "-x" or "-xn". For example:
$ iostat -x 5
                  extended device statistics
device       r/s    w/s   kr/s   kw/s wait actv  svc_t  %w  %b
...
sd10       56.0    4.9 2514.7   80.3  0.0  0.1    1.1   0   5
sd11      109.0    5.4 3585.9  109.6  0.0  0.1    0.7   0   6
sd12       76.3    4.0 2911.4   82.0  0.0  0.1    0.9   0   5
sd13       37.2    5.7 2000.3   87.3  0.0  0.1    1.6   0   4
...
                  extended device statistics
device       r/s    w/s   kr/s   kw/s wait actv  svc_t  %w  %b
...
sd10        2.6   10.4   20.8   39.9  0.0  0.0    0.6   0   1
sd11       77.0    0.6 3123.2    4.8  0.0  0.0    0.5   0   3
sd12      379.4    0.2 4562.5    1.6  0.0  0.1    0.3   0   5
sd13        5.8    9.6   46.4   33.5  0.0  0.0    0.6   0   1
...
                  extended device statistics
device       r/s    w/s   kr/s   kw/s wait actv  svc_t  %w  %b
...
sd10        1.6    4.4   12.8   42.1  0.0  0.0    1.3   0   1
sd11      206.8    0.8 6059.5    6.4  0.0  0.1    0.6   0   8
sd12      140.6    0.0 3659.4    0.0  0.0  0.1    0.4   0   4
sd13        2.0    4.8   16.0   45.3  0.0  0.0    1.4   0   1
...
                  extended device statistics
device       r/s    w/s   kr/s   kw/s wait actv  svc_t  %w  %b
...
sd10        1.6    7.6   12.8   37.7  0.0  0.0    1.5   0   1
sd11      174.2    0.4 6462.5    3.2  0.0  0.1    0.5   0   6
sd12      613.6    0.0 7833.8    0.0  0.0  0.2    0.4   0   9
sd13        1.8    8.0   14.4   40.9  0.0  0.0    0.9   0   1
...
                  extended device statistics
device       r/s    w/s   kr/s   kw/s wait actv  svc_t  %w  %b
...
sd10        1.6   16.6   12.8   52.4  0.0  0.0    0.7   0   1
sd11       33.6    0.6 1513.5    4.8  0.0  0.0    0.5   0   1
sd12      190.6    0.2 2255.9    1.6  0.0  0.1    0.4   0   3
sd13        0.6   15.6    4.8   44.4  0.0  0.0    0.4   0   1
...

It will refresh the status every 5 seconds. But the next question is how to interpret the columns. According Oracle documentation (http://docs.oracle.com/cd/E23824_01/html/821-1451/spmonitor-4.html), the definitions of the columns in "iostat -x" are:
r/s
Reads per second
w/s
Writes per second
kr/s
Kbytes read per second
kw/s
Kbytes written per second
wait
Average number of transactions that are waiting for service (queue length)
actv
Average number of transactions that are actively being serviced
svc_t
Average service time, in milliseconds
%w
Percentage of time that the queue is not empty
%b
Percentage of time that the disk is busy
That is the way we trace IO bottleneck down to specific disks for tuning.

Leave a Reply

Your email address will not be published. Required fields are marked *