The nmon Manual
Warranty
- The nmon tool is NOT OFFICIALLY SUPPORTED BY IBM.
- No warranty is given or implied, and you cannot obtain help from IBM.
- Help can be found at the nmon Forum.
The postings on this site solely reflect the personal views of the authors and do not necessarily represent the views, positions, strategies or opinions of IBM or IBM management.
Briefly
Briefly
nmon is a free performance monitoring tool for AIX and Linux and is downloadable from this Wiki.
This Wiki is the sole place to get nmon.
nmon now includes other tools like
- nmon2rrd tool for creating web pages with the nmon performance data and
- nmonmerge tool for joining save files together.
There is also a free spreadsheet analyser for nmon captured data from Stephen Atkins from
This nmon tool gives you a huge amount of information on one screen and can save data to a comma separated values (.csv) file for latest analyses. This tool runs on:
- AIX 5.1, AIX 5.2 and AIX 5.3 using nmon version 10, this version now supports AIX 5.3 on POWER5 processor based machines with SMT and Shared CPU micro-Partitions
- Linux on POWER based machines using nmon for Linux version 9e like pSeries p5 and OpenPower running Linux versions SUSE SLES 9, Red Hat EL 3 and 4, Debian
- Linux on x86 based machines like using nmon for Linux version 9e Intel and AMD running SUSE 9 & SLES 9, Fedora & RedHat EL 2.1, 3 and 4 and many recent distributions based on the Linux 2.4 or 2.6 kernel
- Linux on zSeries/mainframe machines using nmon for Linux version 9e running SUSE and RedHat
- AIX 4 (4.1.5, 4.2.0 and 4.3.3) using nmon version 9f, this is functionally stabilised and will not be developed further.
Once you have proved these versions are OK, all previous versions of nmon should be deleted.
About the author and contributors
Nigel works in IBM EMEA pSeries Technical Support - specialises in Linux on POWER and AIX, performance, sizing, tools and benchmarks. The nmon tool was first developed to support his personal use in AIX benchmarks and performance tuning but "by popular demand" it is given away to deserving friends. Developing and improving nmon is good for performance skills development but it is not Nigel's job - it is a spare time low priority project.
For assistance, comments or suggestions contact the author at nag@uk.ibm.com - he will do his best to respond, helpfully. People sending bugs or suggestions automatically get put on the nmon email list. Alternatively, email the author with a title line of "nmon mailing list" to get placed directly on the email list which includes a few hundred people. This means you will get an email reminder each time nmon gets updated so you can download the new version. Unless there is a serious problem with nmon this email list is not used more than once every three months.
Also a big thank you to the following nmon developers and testers including: Ralf Schmidt-Dannert, Dave Williams, Erik Hendrix, Michael Pearson, Maarten J Kreuger, Jean-Armand Broyelle plus Franck Almarcha and the Montpellier benchmark team, Carol Davis and SAP team, and lastly Robert Reuben (the boss) for management apporval to the use of IBM hardware.
Reporting problems
If you are reporting a problem please, first read this document first and particularly the Frequently Asked Questions (FAQ) section.
Make the email title line meaningful and not just "nmon bug".
For example, "nmon 9a on AIX 5.3ML5 - WLM stats are missing" and then include:
- The nmon version - like 9a or 10j.
- The hardware platform - like POWER, x86, mainframe.
- The operating system and maintenance level.
Like AIX 5.3 ML2 or for Linux SUSE SLES 9 service pack 1.
- How to reproduce the problem?
Like using nmon online and I hit the w option for WLM
- The symptoms or results you obtained.
Like no WLM statistics were displayed even though WLM is on
- What you were expecting? - like the WLM statistics
- Other information
I have recently upgraded AIX to ML04
If you want an on-screen capture (as an example of the problem), then please restart nmon with the -B option to remove the boxes.
If possible, a small sample captured file as this has lots of other configuration information in it. Send the raw nmon output (not the massive Excel file) and less than 200 snapshots please.
nmon and Security
It was recently pointed out to me by one of the nmon developers that nmon for AIX runs operating system commands in the background to get some data. Also as access to /dev/kmem was required for nmon (up to nmon version 9), so some sites use the Set UID option with a file owner of root to grant access. Add this together and there is a possible security issue as users could gain access to root privileges by changing the shell variable PATH.
The answer is for root users to not set SUID for nmon. For added security in nmon 10, this has been absolutely, enforced by the code checking for SUID or SGID with a file owner of root and nmon will stop immediately with an error message. Also, the PATH variable is adjusted with the default PATH added to the front of the list, as nmon starts to help ensure the correct commands are run.
This has never been a problem for nmon on Linux.
Upgrade
- AIX 4 users carry on running nmon 9 unchanged
- AIX 5 users remove nmon 9 and replace with nmon 10 for AIX
- Linux users remove nmon 9 and replace with nmon 10 for Linux
Introduction
The nmon tool is designed for AIX and IBM eServer pSeries performance specialists to use for analyzing AIX performance data, including the following:
- CPU utilisation - AIX and Linux
- Memory use - AIX and Linux different
- Kernel statistics and run queue information - AIX only
- Disks I/O rates, transfers, and read/write ratios - AIX and Linux
- Free space on file systems - AIX and Linux
- Disk adapters - AIX only
- Network I/O rates, transfers, and read/write ratios - AIX and Linux
- Paging space and paging rates - AIX and Linux
- Machine details, CPU and OS specification - AIX and Linux
- Top processors - AIX and Linux
- User defined disk groups - AIX only
- Asynchronous I/O - AIX only
- Workload Manager - AIX only
- ESS and other disk subsystem - AIX only
- NFS - AIX Only
- Dynamic LPAR changes - AIX and Linux (on POWER H/W)
Also included is a new tool to generate graphs from the nmon output and create .gif files that can be displayed on a website.
Benefits of the nmon tool
The nmon tool is helpful in presenting all the important performance tuning information on one screen and dynamically updating it. The tool works on any dumb screen, telnet session, or even dial-up line. In addition, the tool is very efficient. It does not consume many CPU cycles, usually below 2%. On newer machines, CPU usage is well below 1%.
Data is displayed on the screen and updated once every two seconds using a dumb screen. However, you can easily change this interval to a longer or shorter time period. If you display the data on X-Windows, VNC, putty or similar and stretch the window, nmon can output a great of information all in one place.
The nmon tool can also capture the same data to a text file for later analysis and graphing for reports. The output is in a spreadsheet format (.csv).
Installing the nmon tool
The tool is one stand-alone binary file (a different file for each AIX or Linux version) that you can install in five seconds and if you type fast, probably less. Installation is simple:
- Copy the nmonXX.tar.Z file to the machine - if using FTP, remember to use binary mode. Note: in XXX this example will be replaced by the version.
- To uncompress the file run uncompress nmonXX.tar.Z
- To extract the files run tar xvf nmonXX.tar
- Read the README file.
- Start nmon by typing the command nmon
- If you are the root user you may need to type ./nmon
This is under review to make it simpler
Extra notes for using nmon 9 for AIX 4 only
- You must be the root user or allow regular users to read the /dev/kmem file by typing the following command (as root):
chmod ugo+r /dev/kmem
- If you want the disk statistics, then also run (as root):
chdev -l sys0 -a iostat=true
Extra notes for using nmon 10 for AIX 5.1 only
You will need to have installed the libperfstat library from the AIX CDROMs.
This is in bos.perf.libperfstat package or nmon will fail to start and report it is missing.
To run interactively
Just type the name of the nmon tool for your operating system - see the FAQ section if you cann not work this out. Then use the one-key commands to see the data you want and to hid it again. For example, to get Cpu, Memory, and Disk statistics, start nmon and type: cmd
To get help information while running interactively, type h
To get more help information
- For brief details, type nmon -?
- For full details, type nmon -h
Hints and Tips:
- The Microsoft Windows telnet is not recommend as it is a very poor terminal emulator.
- Use a larger window than 80x25 characters.
- The developer uses VNC and putty to display nmon from a Windows machine.
- On AIX with X Windows, the "dtterm" terminal emulator allows colours and nmon can use these to light information.
Capturing data to a file for later analysis and graphing
Run nmon with the -f or -F flag. See nmon -h for the details, but as an example, to run nmon for an hour capturing data snapshots every 15 seconds for an hour (15*240=3600 seconds) use:
- nmon -f -s 15 -c 240
- nmon -fT -s 15 -c 240
The second line also captures the top processes. Both of these will create the output file in the current directory called: <hostname>_date_time.nmon
This file is in a comma separated values format and can be imported into a spread sheet directly.
- FOR Lotus 1-2-3 ONLY (NOT RECOMMENDED ANY LONGER)
- If you are using Lotus 1-2-3 the file needs to be sorted before importing. On AIX, follow this example: sort -A mymachine_311204_1030.nmon > xxx.csv
This sorting is not required for the Excel version of the nmon analyser.
Hints and Tips:
- To load this into a spreadsheet, check the spreadsheet documentation for loading comma-separated value (.csv) data files. Many spreadsheets accept this data as just one of the possible files to load or provide an import function to do this. Many spreadsheets have fixed maximum numbers of columns and rows.
- We suggest you collect a maximum of 300 snapshots to avoid hitting these issues and to make a nice looking graph. Also not a 1024x768 computer screen with 350 data points would only have approximately two pixels per data. There is little purpose in having more data points.
- When you are capturing data to a file, the nmon tool disconnects from the shell, to ensure that it continues running even if you log out. This means that nmon can appear to crash, but it is still running in the background.
Sample output for nmon
Take this link to see the nmon Sample On Screen Output
Ideas for the next release for AIX wanted
These are the current ideas for the next release:
- Proper pop-down menu system for selecting the statistics to be displayed
- User decides which stats appear first = at the top of the screen
- Disk stats by Volume Group
- Detailed paging space file use, sizes, %used, active, auto etc.
Please vote now by email to the author or forward any other ideas.
Ideas for the next release of Linux wanted
The Linux version is the prime focus for the next release and there are many functions to be added. As Linux get in more production use it is important the production style performance monitoring and tuning becomes available. Classic Linux tuning involves rebuilding the kernel and added modules or packages. In many sites this is just not acceptable and passive monitoring (like nmon) is the only option. Some of the options are:
- Upgrade to the nmon 10 user interface - now in nmon for Linux 11
- Add NFS support - now in nmon for Linux 11
- Add kernel stats to run queues, context switches, fork etc. - some in nmon for Linux 11
- User Defined Disk Groups - now in nmon for Linux 11
If you have any suggestions or more importantly know where the data can be found, please let me know via email.
New features for nmon on AIX version 10
| Name |
Online key |
Command Line Option |
Comments |
| NFS |
N |
-N |
NFS completely new for nmon 10. This removes the need to run a separate program and merge the data for analyse. This is complicated as the NFS version 2 and 3 have slightly different protocols and hence the field names.
Source: /usr/include/libperfstat.h perfstat_protocol() |
| Shared CPU LPAR |
p |
automatic |
Partition statistics for Shared CPU partitions information - this is the big pSeries p5 and AIX5.3 feature. Some of these numbers are very hard to understand and this is what has cause nmon10 to take a long time to be released. With Shared Processor partitions the utilisation numbers (i.e. user, sys, wait and idle) become largely meaningless. Instead it is important to monitor Entitlement, and physical CPU use, particularly for uncapped logical partitions as they can range from 2 thousandths of a CPU to 10 times their Entitlement and can still report 80 percent utilisation throughout. If you are not running POWER5, AIX 5.3 or a Shared Processor logical partition these statistics are not available as they would be meaningless. These are saved to the captured file if its an LPAR, a shared CPU LPAR, POWER5 and AIX 5.3
Source: /usr/include/sys/dr.h lpar_get_info() |
| CPU Utilisation for lots of CPUs |
C |
N/A |
CPU utilisation for machines with high numbers of CPUs with the graphs one per line is not practical (up to 128 lines). So with higher numbers of CPU, use capital C (instead of lower case c), so see the CPU across the page and two lots of them with more than 64 logical CPU's. This way you can see up to 128 logical on one screen, which is the maximum with POWER5 (currently).
Source: /usr/include/libperfstat.h perfstat_cpu() |
| CPU Utilisation for less CPUs |
c |
default |
In addition to the CPU utilisations for each CPU - if you are on a POWER5 with AIX 5.3 and in a Shared CPU environment the have details of your Entitlement and Physical CPU use. As the utilisation numbers are pointless without understanding how much CPU power the utilisation applies too.
Source: /usr/include/libperfstat.h perfstat_cpu() |
| CPU Long term |
l |
N/A |
The long term stats which shows 75 snapshots and is useful in visualising very peaky workloads has not changed except the Entitlement and Physical CPU use when appropriate (see above)
Source: /usr/include/libperfstat.h perfstat_cpu() |
| WLM Classes |
W |
-W |
Same as previous version
Source: /usr/include/sys/wlm.h wlm_get_info() |
| WLM Sub-Classes |
S |
-S |
Sub-Class for WLM subclasses - normal only WLM classes are shown. Warning: if you have lots of subclasses you might not be able to see them all (the maximum is ~8000) and could cause major problems with saving to file as this would also break the line width limit in Excel.
Source: /usr/include/sys/wlm.h wlm_get_info() |
| Disk Graphs |
d |
default |
This shows the utilisation, read, write and transfers per second disk. This is unchanged from the previous version.
Source: /usr/include/libperfstat.h perfstat_disk() |
| Disk Details |
D repeatedly |
N/A |
This includes more details including the old peak counters and numbers instead of graphs (see above) but also not has disk size and free, number of paths, adapter name, volume group and disk description. Hit D repeatedly, to loop through the various details.
Source: /usr/include/libperfstat.h perfstat_disk() |
| Disk Map |
o |
N/A |
The disk map - showing hundreds of disk using one character each has only changed a little in that the characters used have been improved.
Source: /usr/include/libperfstat.h perfstat_disk() |
| Adapters |
a |
default |
Disk Adapter details now includes details like their full type and title which can help identify bottlenecks.
Source: /usr/include/libperfstat.h perfstat_diskadapter() |
| Resources |
r |
default |
Resource includes your CPU speed in MHz. This is helpful when you want to check which processor speed the machine has and can be difficult to find out and some more details.
Source: many places |
| Kernel |
k |
default |
Many new fields in this section. Hardware and Software Interrupts per second, Up time in days as reported by the "uptime" command, Kernel process starts, overflows (failure to start) and exits, and Load Averages for the past 1 minute, 5 minutes and 15 minutes. Note: these new fields are not saved to the captured file.
Source: /usr/include/libperfstat.h perfstat_cpu_total() |
| Large Page |
L |
-L |
This gives you the large page statistics including the large page size (usually 16 MB), total number and size of large pages in MB's, number and size of pages in-use and size free and in-use percentage - popular with High Performance people (HPC), now it is simple to track this over time to tune the large page pool.
Source: /usr/include/sys/vminfo.h vmgetinfo() |
| Network |
n |
default |
This gives you information about your network but now includes adapter details, MTU, error counters & collisions, megabit rating, which can help identify bottlenecks. For example, when Ethernet gets to 80% of its rating i.e. 100 Mbit/second Ethernet is roughly 10 Mbytes/second the number of collisions rises sharply and so can be a bottleneck at around 8 MB/second. These are included in the captured file output with a new NETERROR section for the extra data.
Source: /usr/include/libperfstat.h perfstat_netinterface() |
| Memory |
m |
default |
Memory now gives you more details of where memory is going. First: file system cache (numperm) + system (kernel) + processes and free add up to 100%. There is also User (i.e. non-system marked pages) and Pinned pages (can't be paged out) - note these overlap with the previous statistics. On the virtual memory stats which in normally RAM plus paging size - these are shown and how much has been accesses (vmstat reports avm = accessed active virtual memory) . There are minfree & maxfree, minpgahead & maxpgahead (page ahead). The minperm & maxperm is an issue because they are in pages. Nmon calculates the percentage based on these numbers but the numbers are different to those produced by AIX tools. Sorry but the AIX documentation is wrong (it is not a straight percentage of total memory) and the official percentages are not available via the libperfstat API.
Source: /usr/include/libperfstat.h perfstat_memory_total() |
| Boxes |
B |
-B |
The -B start up option to remove the boxes. Some purists don't like to waste the screen space with the box lines. This removes the lines and "wasted" space. If the boxes don't work for you then $TERM is probable set wrong or your terminal emulator is duff. Don't blame me. You can also hit B when online but curses sometimes gets in a muddle and its low priority. The -B start seems to work well. Do not forget the NMON shell variable can make this automatic. |
| AIO |
A |
-A |
Asynchronous I/O stats to the nmon file. If you want AIO stats then add the -A option to the nmon command as in nmon -fTA .... This allows application using AIO to be monitored long term and the minimum and maximum AIO server parameters set appropriately.
Source: this is collected from the top process data |
| Top Processes |
t, u or U |
-t -T |
This has not changed for nmon10. t or -T gives you the top processes, u gives you the user command line and U or T also gives you the WLM classes.
Source: /usr/include/procinfo.h getprocs64() |
Disk Statistics - double counting
It has been reported that multiple path I/O like EMC Powerpath and DAC drivers for FAST results in the disk statistics being double what they should be. This is a feature of the libperfstat library - nmon just reports what it finds. If you supply my sample captures I may be able to work out heuristics to avoid this for the next version. This may be improving in future AIX versions.
Back porting to AIX 4?
nmon version 10 will NEVER be ported back to AIX 4 because the API's are missing, so nmon 9a is the "functionally stabilised" version for AIX 4.
Starting nmon
There is also now a small shell script called "nmon" that starts the right nmon version. Put this script and the nmon binaries in a directory in your $PATH and just type: nmon
This version is now only compiled in 32 bit mode, so it will run on 32 and 64 bit hardware. The idea is to make it easier to install/run.
/dev/kmem - not used in nmon 10
The nmon10 version does not need access to /dev/kmem nor does it need to be run as the root user. You can set /dev/kmem back to read access by user and group only.
Terminal Emulation
- If you do not get the nice boxes around your online statistics then either you have the shell variable TERM set wrongly or your terminal emulator is rubbish or setup badly.
- Do not use Microsoft Windows telnet and use a larger window than 80x25 characters. The author uses putty for Linux (built in ssh) and VNC to display nmon from a Thinkpad running Windows XP. VNC then runs a dtterm terminal emulator so nmon use the colours also used is the vim editor for colourising the C code - why not do the same?
nmon External Data Collectors
The external data collectors feature is to get nmon to run other commands that you can then add to the nmon data file for analysis. A typical example is to collect DB2 or Oracle stats to compare against nmon data. You can run a command when:
- nmon starts using the shell variable NMON_START
- nmon ends using the shell variable NMON_END
- each snap shot using the shell variable NMON_SNAP
- a subset of snap shots using the shell variable NMON_ONE_IN
This is controlled by shell variables set before you run nmon. The separate file that the data collectors generate is merged into the nmon file before analysis with the cat command. You don't need to have all of these - i.e. could do start + end or just the snap shots or - a special start-up plus snap shots. This is a bit complex so here is a worked example.
First set the TIMESTAMP shell variable:
- if TIMESTAMP = 0 then lines will have the classic nmon Tnnnn timestamps at the start of the line and work well with the nmon data file
- if TIMESTAMP = 1 then lines will have the a time stamp that has the hours, minute, seconds and day, month, year - this can be used if you don't want to merge the data with the nmon file for analysis.
Process Counter External Data Collectors Example
export TIMESTAMP=0
export NMON_START="mystart"
export NMON_SNAP="mysnap"
export NMON_END="myend"
export NMON_ONE_IN=1 # 1 is the default
We set the above shell variables, so there refer to a program or shell script
If the mystart, myend, mysnap contain the following shell scripts
ps -ef >start_ps.xt
echo "PROCCOUNT,Process Count, Procs" >ps.csv
echo PROCCOUNT,$1,`ps -ef | wc -l` >>ps.csv
Now run nmon as normal, for example: nmon -f -s 2 -c 10
At the end of the capture, the ps.csv file might contain (for example):
PROCCOUNT,T0001,56
PROCCOUNT,T0002,58
PROCCOUNT,T0003,67
PROCCOUNT,T0004,65
PROCCOUNT,T0005,71
PROCCOUNT,T0006,68
PROCCOUNT,T0007,66
PROCCOUNT,T0008,58
PROCCOUNT,T0009,57
PROCCOUNT,T0010,60
The start_ps.txt and end_ps.txt files would have a list of running processes at the time. The ps.csv file can be merged with the nmon output file (below called this_050607_0916.nmon, yes my machine is called "this") after nmon finishes with the following command:
- cat this_050607_0916.nmon ps.csv >combined.csv
Then run the nmon Analyser on the combined file - if you are lucky the analyser may drawer you a graph. Here is what was produced:

Hints:
- comma separate the data and don't go over 2K bytes in line length
- make the important data in the first couple of columns.
- keep the stats in the same range - i.e. all KB/s or all percentages
If you set the NMON_ONE_IN variable you can also run the NMON_SNAP command less often!!
By default this is set to 1 - run it every time - but if the command you want to capture is heavy in CPU terms or takes a long elapsed time to finish. You can run it less often. For example to run in just one in ten snapshots: export NMON_ONE_IN=10
Oracle Transaction Counters External Data Collectors Example
Here is another example collecting transaction commits and rollback statistics from the Oracle database using two scripts called oraclestart and oraclesnap that run an SQL statement and save the data in a file called dbstats.csv:
echo "DATABASE,Transactions,commit,rollback" >dbstats.csv
export ORACLE_SID=MYDATABASE
( sqlplus -s "system/manager as sysdba" <<EOF
set heading off
set headsep off
set echo off
set lines 2000
set feedback off
set newpage none
set recsep off
select 'DATABASE,$1,'||
sum(decode(name, 'user commits', value, 0))||','||
sum(decode(name, 'user rollbacks', value, 0))
from
sys.v_\$sysstat;
EOF
) >> dbstats.csv
export TIMESTAMP=0
export NMON_START="oraclestart"
export NMON_SNAP="oraclesnap"
unset NMON_END
Now run nmon
You need to ensure the ORACLE_SID and usernames and password work in your environment. Do this by running the command manually with: * oraclesnap T9999
And checking the results in the file dbstats.csv
This should put one line in the file dbstats.csv. This script has to log on to the Oracle database each time it runs, so you should not be doing this every second as it will take elapsed time and CPU resources. But if you are collecting nmon data once a minute or more this overhead should be small.
Thanks to Ralf Schmidt-Dannert of the IBM SAP and Oracle Solutions team in Minneapolis, USA for this example.
One Caveat on External Data Collectors
The "T" or "t" as the first letter of the second column is used by tools to recognise the difference between new header lines of new data sections and the data lines (i.e. those containing the timestamp values for example, T0000, T0001, etc.) So do not use a header line like "PROCCOUNT,The Process Count, Procs" - the "T" in "The" will cause problems.
Description of other features from before nmon 10
Hopefully this will give you some of the background to other nmon features tha might not be clear from the simple help information provided.
Workload Manager Statistics
This is a AIX feature. Work Load Management statistics are started with: W (upper-case) to see them. Note: AIX 433 does not support the gathering of WLM stats. Work Load Management - this is the major benefit of AIX and no charge too. I have written a white paper on this find it at: http://www.ibm.com/developerworks/eserver/library/es-Practical_WLM.html
If you use passive mode you can use WLM to find out which applications are taking the CPU, RAM and IO resources of the machine with zero overhead. I tested WLM and could not detect WLM taking any resources at all or at least below 0.25% of one CPU. nmon outputs
- actual resource use percentage per class
- desired percent AIX sets as a target based on active class shares and limits. These are worth watching as for example classes without processes get zero targets. See the Junk class in the example below.
- share values (-1 means it is not set)
- number of processes per class (try for zero in Default class)
- class Inheritance and Shared Memory flags
Is there missing data you need? - remember things like min hard and soft are for CPU and RAM and Block IO and for each class there are limits to what we can output on the screen. The -S options allows you to see sub-classes but if you have lots they may not fit on the screen or over run the captured data file line length limit from Excel.
The nmon file capture records the full WLM details once (at the start) in the BBBP section but then only the actual resources used to reduce output. Online the output looks like this:
Work Load Manager CPU MEM BIO CPU MEM IO CPU MEM BIO Tier Inheritance
Class Name |---Used----||--Desired-||----Shares-----|Proc's T I Localshm
Unclassified 0% 0% 0% 100 100 100 -1 -1 -1 1 0 0 0
Unmanaged 0% 11% 0% 100 99 100 -1 -1 -1 1 0 0 0
Default 0% 29% 0% 100 98 100 -1 -1 -1 34 0 0 0
Shared 0% 21% 0% 100 98 100 -1 -1 -1 0 0 0 0
System 0% 50% 0% 100 99 100 50 -1 -1 80 0 0 0
database 72% 0% 0% 75 100 100 300 -1 -1 9 0 1 0
batch 26% 0% 0% 25 100 100 100 -1 -1 4 0 1 0
junk 0% 0% 0% 100 100 100 400 -1 -1 0 0 0 0
Round Robin Database - rrdtool
RRD support for graphing - a new version in C with source nmon2rrd - generates about 33 to 50 or more graphs in .gif format (depending on the data file contents) and an index.html ready for a Web Server, so that you can access it from any web browser. It is performed on AIX and so can be completely automated. This eliminates the need for file transfers to Windows for analysis with Excel or 1-2-3 spreadsheets. This avoids the terrible problems Stephen Atkins has to deal with limits in the spreadsheets. Note: I include the source code so you can fix up the code and return it to me. You will, of course, require a compiler but it is straight forward stuff and I learn a lot about some silly stuff in the nmon output file (sorry Steve) and it compiles with no options: cc nmon2rrd.c - it is not my best code but a quick hack 
- Note: use standard nmon output files and NOT the rrd format - (i.e. NOT the -R flag).
- Note: nmon2rrd does not support (i.e. ignores) new stuff like AIO, NFS and WLM.
- Use: nmon2rrd -? for hints
Use: nmon2rrd -f nmonfile [-d directory] [-x]
-f nmonfile the regular CSV nmon output file
-d directory dirname for the output
-x execute the output files
Example:
nmon2rrd -f m1_030811_1534.nmon -d /webpages/docs/m1/030811 -x
This assumes that rrdtool is on your system and in your PATH. rrdtool is on my top tools list with VMC, vim, filezilla and, of course, Linux. The rrdtool creator is Tobi Oetiker. To learn more about rrdtool and the writer go to:
Miss off the -x if you want to run the generated scripts yourself as in
$ mkdir output
$ nmon2rrd -f my.nmon -d output
$ cd output
$ rrdtool - <rrd_create >rrd_create.log
$ rrdtool - <rrd_update >rrd_update.log
# if there is a TOP section in the nmon output file
$ rrdtool - <rrd_top >rrd_top.log
$
$ rrdtool - <rrd_graph >rrd_graph.log
This allows you to change the create script to collect long term data and see how to create more/different graphs.
If you want a very small and ultra low risk web server (because it only servers out .html, .gif, .jpg files from a fixed directory) then take a look at nweb which is in the available from IBM at:
http://www.ibm.com/developerworks/eserver/library/es-nweb.html
Included is a sample nmon output file sample.nmon. Get a copy of rrdtool and place it and nmon2rrd in your path. To see the output run:
$ mkdir /tmp/test
$ cp sample.nmon /tmp/test
$ cd /tmp/test
$ nmon2rrd -f sample.nmon -x
You need to then get it available via a web server and start at the generated index.html file.
Alternatively use nweb:
If your AIX machine hostname is bonzo.abc.com - in your web browser go to
- http:/ bonzo.abc.com:8181/index.html
For people familiar with rrdtool you can get nmon to directly generate rrdtool friendly output using the -R flag. You will have to create your own round robin databases but then this nmon output can be passed straight to the database using a named pipe, see the "Immediate use of nmon output for other tools" section.
Minimum CPU threshold
nmon will not save to file process using less than 0.1% of a CPU. This is to reduce the file output to useful information. But 0.1% of the fastest CPU is now quite a lot of CPU power, so the threshold is now changeable using the -I option. This was requested by a nmon user as a useful idea. So add the following option when you start nmon:
This sets the Ignore Process Percent threshold (default 0.1) i.e. don't save TOP stats if proc using less CPU than this percentage. Example:
- nmon -f -I 0.01 -s 10 -c 300
This will mean a lot more top processes statistics will be gathered.
nmon and cron
The nmon default capture to file filenames has bee carefully chosen. If you save the output of many machines and captures in one directory and list the directory you will have the files in first machine hostname order and second orders by time (and date). This is a sensible ordering. Many people have written scripts to start nmon via cron and many of the scripts are a complete waste of time or even wrong. One feature that was added to nmon to make this easy was the -m flag so the nmon moves to a particular directory before saving data.
So here is what I put in my crontab (use crontab -e to add tasks to your crontab file). This collects the data once a day in the directory /home/nmon_data at once every 5 minutes and with 288 snapshot which makes a excellent graph detail level. It also collects top processes and user command lines (T), NFS stats (N), Workload Manager but no Subclasses (W), Large page stats (L) and Asynchronous I/O details. The reporting threshold is 0.001 percent of a CPU.
0 0 * * * /usr/lbin/nmon_aix53 -fTNWLA -I 0.001 -s 300 -c 288 -m /home/nmon_data
There is no need of any shell scripts to start this collection.
Note: that is you start two nmon processes running at the same time they will have the same filename. So if you want to, for example, collect details and summary stats start then one minute apart. So if I also wanted hourly statistics with less top process details a second crontab entry might be:
2 0 * * * /usr/lbin/nmon_aix53 -ftNWLA -s 3600 -c 24 -m /home/nmon_data
Also note that only one of f, F, z, x or X should be used and it should be the first argument. You have been warned as not following this can cause confusion.
Automatic starting with certain statistics for online mode
Use the NMON shell variable to determine which statistics are shown automatically at start up time. If you find you always want CPU, kernel, Memory and Disks i.e. you type: ckmd then set the shell variable as below:
Killing nmon
This is a thoroughly unpleasant thing to do but can be done safely as detailed below. One case that this makes sense is in benchmarks, where once the benchmark run is finished you want to stop nmon as any further details are not required. Nmon in file capture mode detaches itself from the shell session so that it will continue to run even if you log out or switch off your terminal or X Windows session. This can make it hard to kill as you have to search from the "ps -ef | grep nmon" command output to find the nmon and if there is more than one you have to guess. If you add the -p option nmon will return the process id of the nmon process. For example,
- $ nmon_aix53 -f -s60 -c 60 -p
428963
The 428963 is the PID. To cleanly, shutdown nmon use the signal USR2. This request nmon to stop after the next collection and thus avoids the last line of output being incomplete - often the case if you use kill -9). So in this example use
- $ kill -USR2 428963
Limiting top processes to certain commands
If there are lots of processes running but you want to limit your monitoring to just a few commands of particular interest you can do this in two ways for online and file capture modes. Note these are the program names and don't include the parameters.
Using shell variables:
There are 64 shell variables to use and set to the commands you want to monitor. Follow this simple example to monitor just ksh, vi and syncd commands:
export NMONCMD0=ksh
export NMONCMD1=vi
export NMONCMD2=syncd
The start nmon and it will just show you these commands
Using the command line this involves using the -C option:
Note the command is only checked up to the characters you give it, so "or" will match "oracle" and "orifice" = limited wild cards!
If you are new to UNIX then also note that you use the "unset" command to remove this shell variable as in: unset NMONCMD
Immediate use of nmon output for other tools
Some people want to extract data from the nmon output for immediate use in other tools. Given the data is not available via a sensible programming API writing your own C language tool is a much better idea. But if you have to use nmon as the collector here is how to do it. First note trying to get the data from the online mode is in the "barking mad" category. For file capture mode, follow this example of using a named FIFP (pipe) and the -F option for force nmon to use it:
mkfifo /tmp/mypipe
nmon -F /tmp/mypipe -s 1 -c 6400
grep CPU_ALL /tmp/mypipe | your_tool_here
Hundreds of disks and a "can't see the wood for the trees" problem
For machines with hundreds of disks it is hard to see how many are being actively used and which are getting hot. The Disk %Busy Map (hit o) will show you - one character for each disk
with more pixels in the character then the hotter the disk is getting. For example:
Disk-Busy-Map-Key(%): @=90 #=80 X=70 8=60 O=50 0=40 o=30 +=20 -=10 .=5 _=0%
hdisks numbers-> 1 2 3 4
01234567890123456789012345678901234567890123456789
hdisk0 to 49 __X_X_.__X__Oooo+____#_+___---___X_____@___.______
_#___@____X_O___.__O__X____--____.__--__@@_.__@___
___++_O__+__O_O_.__._@@@#___#__oOOo____@__________
_--_X@@OO_oo+__#.___X_.__O_+_______@ @XoOOO##@0O_-
The above example shows 50 disks per line and 400 disks. The more pixels in the character for each disk shows it is a busy disk. You can get a feel of how well spread the data access is across the disks and how many disks are not doing anything.
AIO monitoring
It is nearly impossible to work out how many AIO servers you need until now! By monitoring them you can determine, if you have enough or too many and how many are really active. This will help in tuning them for example with Oracle Asynchronous I/O Processes. Type A or the -A option for file capture mode. The only data looks like this:
Total AIO processes=500 Actually in use= 23 CPU used= 12.1%
all time peak=400 resent peak= 45 peak= 24.9% (use 0 to reset)
Busy disks and Processes only
Only want the busy disks and processes actually running on the CPU?
The dot command (hit .) does this. This works for ESS/vpaths too. For example:
ESS I/O AvgBusy read-KB/s write-KB/s xfers/s Total vpaths=35
vpath9 0.0% 0.0 8.0 0.5
vpath21 0.0% 0.0 8.0 0.5
TOTALS 0.0% 0.0KB/s 16.0KB/s 1.0 TOTAL=16.0KB/s
Direct rrdtool output
The "rrd" is the round robin database and rrdtool a freeware tool more information can be found at:
First create the rrd databases - see the script provided. This will need changing depending on the # of disk etc. Then use nmon -R option to save files with seconds since 1970 timestamps. You need to save the nmon output for immediate reading by your filter so, follow this example:
- mkfifo xyz
- then start nmon with output redirected to the fifo
- nmon -F xyz -R
- then read from the FIFO to add the data directly into the rrd database in real time (replace cat with what ever your filter is called to load data into rrdtool)
- cat <xyz
Zeroing peak counters
Network, Disks stats (not graphs) hit D (upper case d),AIO statistics track the peak values and display them. Also the CPU graphs provide peak indicator. These can all be reset to zero by typing 0 (zero).
User Defined Disk Groups
On a recent benchmark with 3 x ESS = 1024 disks it became impossible to monitor them to ensure balanced I/O loading. So this was developed. The idea is to merge the disks into sets and monitor the sets. It is like the adapter stats but you get to choose which disks go into which set (adapter). Three obviously ways of doing this are by the:
- disk use = group disks that have common data for example a databases data, index, sort, logs, archive = 5 disk groups
- disk placement = the disks in a particular rack/drawer for example ESS, cluster, rank, loop - makes 8 groups per ESS
- disk type or volume group/logical volume
- Or any thing else you think up.
To set this up create a file with:
- one line per disk group
- starting with the name of the group
- then a list of hdisks
- all space separated
Then start nmon with the following option: -g filename
- If online hit: g
- If saving to a file there will be more sections for diskgroups = DGxxxx. The nmon analyser understands these new sections thanks to Stephen Atkins its developer.
Here are a few examples:
For my ESS placement disk groups I used the following script (this assumes you have the lsess command installed):
FILE1=/tmp/lsess_arary.tmp1
FILE2=/tmp/lsess_arary.tmp2
lsess >$FILE1
grep hdisk $FILE1 | grep -v "not ready" | awk '{ print $3 }' | cut -b 4-8 | sort | uniq >$FILE2
for j in `cat $FILE2`
do
for i in 1100 1101 1300 1301 1500 1501 1700 1701 1000 1001 1200 1201 1400 1401 1600 1601
do
echo "ESS${j}_${i} \c"
grep hdisk $FILE1 | grep $j | grep ${i} | awk '{ printf " " $1 }'
echo
done
done
rm $FILE1 $FILE2
exit
and generated the following disk group file:
array_1100 hdisk44 hdisk45 hdisk46 hdisk47 hdisk48 hdisk49 hdisk50 hdisk51
array_1101 hdisk52 hdisk53 hdisk54 hdisk55 hdisk56 hdisk57 hdisk58 hdisk59
array_1300 hdisk60 hdisk61 hdisk62 hdisk63 hdisk64 hdisk65 hdisk66 hdisk67
array_1301 hdisk68 hdisk69 hdisk70 hdisk71 hdisk72 hdisk73 hdisk74 hdisk75
... etc.
As another example, for a database, you might need to work out the disks and create something like:
root hdisk0 hdisk1
home hdisk2 hdisk3
apps hdisk4 hdisk5 hdisk6
data hdisk7 hdisk8 hdisk9 hdisk10 hdisk11 hdisk12 hdisk13 hdisk14
index hdisk15 hdisk16 hdisk17 hdisk18 hdisk19 hdisk20 hdisk21 hdisk22
archive hdisk23 hdisk24 hdisk25
sort hdisk26 hdisk27 hdisk28 hdisk29 hdisk30
logs hdisk31 hdisk32
others hdisk33 hdisk34
nmon will report errors if it does not like your disk group file but starts any way. It is work checking the number of disks it found for each disk group are as you expected them. If the error messages are only displayed on the screen for a very short time and you can't read them quickly enough, use the capture to file mode.
Note: the same disk can be in more than one group - so you could have for example, disk placement and disk usage monitored at the same time by different group names.
Limits:
- Make the disk group name 14 or fewer characters and no blank lines.
- Only 64 user defined disk groups
- Only 512 disks per group
Online help
This is the output from nmon -h on AIX
nmon_aix53 -h
Hint: nmon_aix53 [-h] [-s <seconds>] [-c <count>] [-f -d -t -r <name>] [-x]
-h FULL help information - much more than here
Interactive-Mode:
read startup banner and type: "h" once it is running
For Data-Collect-Mode (-f)
-f spreadsheet output format [note: default -s300 -c288]
optional
-s <seconds> between refreshing the screen [default 2]
-c <number> of refreshes [default millions]
-t spreadsheet includes top processes
-x capacity planning (15 min for 1 day = -fdt -s 900 -c 96)
For Interactive-Mode
-s <seconds> between refreshing the screen [default 2]
-c <number> of refreshes [default millions]
-g <filename> User decided Disk Groups
- file = on each line: group_name <hdisk_list> space separated
- like: rootvg hdisk0 hdisk1 hdisk2
- upto 32 groups hdisks can appear more than once
-b black and white [default is colour]
-B no boxes [default is show boxes]
example: nmon_aix53 -s 1 -c 100
For Data-Collect-Mode = spreadsheet format (comma separated values)
Note: use only one of f,F,z,x or X and make it the first argument
-f spreadsheet output format [note: default -s300 -c288]
output file is <hostname>_YYYYMMDD_HHMM.nmon
-F <filename> same as -f but user supplied filename
-r <runname> goes into spreadsheet file [default hostname]
-t include top processes in the output
-T as -t plus saves command line arguments in UARG section
-s <seconds> between snap shots
-c <number> of refreshes
-l <dpl> disks/line default 150 to avoid spreadsheet issues. EMC=64.
-g <filename> User decided Disk Groups (see above -g)
-D Skip disk configuration sections
-E Skip ESS configuration sections
-N Include NFS section
-W Include WLM sections
-S Include WLM sections with SubClasses
-L Include LARGE page section.
-I <percent> Ignore process percent threshold (default 0.1)
don't save TOP stats if proc using less CPU than this %
-A Include Async I/O Section
-m <dir> nmon changes to this directory before saving data to a file
example: collect for 1 hour at 30 second intervals with top procs
nmon_aix53 -f -t -r Test1 -s30 -c120
To load into a spreadsheet like Lotus 1-2-3:
sort -A *nmon >stats.csv
transfer the stats.csv file to your PC
Start 1-2-3 and then Open <char-separated-value ASCII file>
Capacity planning mode - use cron to run each day
-x sensible spreadsheet output for CP = one day
every 15 mins for 1 day ( i.e. -ft -s 900 -c 96)
-X sensible spreadsheet output for CP = busy hour
every 30 secs for 1 hour ( i.e. -ft -s 30 -c 120)
Set-up and installation
To enable disk stats as root: chdev -l sys0 -a iostat=true
- this adds the disk % busy numbers (otherwise they are zero)
If you have hundreds of disk this can take 1% to 2% CPU
Interactive Mode Commands
key --- Toggles to control what is displayed ---
h = Online help information
r = Resources pSeries type, machine name, cache details and AIX version + LPAR
p = Partitions stats
c = CPU by processor stats with bar graphs
C = CPU by processor stats for high numbers of CPU
l = long term CPU (over 75 snapshots) with bar graphs
m = Memory and Paging stats
k = Kernel Internal stats
n = Network stats
N = NFS Network File System stats
d = Disk I/O Graphs
D = Disk I/O Stats
o = Disk I/O Map (one character per disk showing how busy it is)
g = Disk Group I/O Stats (have to use -g command line option)
a = Adapter I/O Stats
e = ESS vpath Logical Disk I/O Stats
j = JFS Stats
t = Top Process Stats 1=Basic-Details 2=Accumulated-CPU
Performance sorted by 3=CPU 4=Size 5=I/O
u = Top but with command arguments shown (used with 3,4 & 5)
to refresh arguments (for new processes) hit u twice
U = as u plus Workload Management Classes
W = Workload Management (WLM) Stats
S = WLM with SubClasses
w = use with top to show AIX wait processes (good for SMP)
A = Summarise Async I/O (aioserver) processes
v = Verbose this highlights problems on the machine and
categorises them as either danger, warnings or OK
b = black and white mode (or use -b option)
. = minimum mode i.e. only busy disks and processes
key --- Other Controls ---
+ = double the screen refresh time
- = halves the screen refresh time
q = quit (also x, e or control-C)
0 = reset peak counts to zero (peak = ">")
space = refresh screen now
Startup Control
If you find you always type the same toggles every time you start
then place them in the NMON shell variable. For example:
export NMON=cmdrvtan
Others:
a) Use shell variable NMONAIX=4.3.2 to a force AIX version
To you want to stop nmon - kill -USR2 <nmon-pid>
b) Use -p and nmon outputs the background process pid
c) To limit the processes nmon lists (online and to a file)
Either set NMONCMD0 to NMONCMD63 to the program names
or use -C cmd:cmd:cmd etc. example: -C ksh:vi:syncd
d) If you want to pipe nmon output to other commands use a FIFO:
mkfifo /tmp/mypipe
nmon -F /tmp/mypipe &
grep /tmp/mypipe
e) If nmon fails please report it with:
1) nmon version like: v10i
2) the output of lslpp -L bos.mp (or for uniprocessor bos.up)
3) some clue of what you were doing
4) I may ask you to run the debug version
f) From version 7 nmon can output rrdtool friendly output
Use -R - you then have to create suitable rrd databases
and can run nmon output via ksh to update them
This is still experimental - help needed (see the README.txt)
Written by Nigel Griffiths nag@uk.ibm.com
Feedback welcome - on the current release only and state exactly the problem
Version v10i - updated for each AIX release
No warranty given or implied.
This is the output from nmon -h on Linux
Note nmon for Linux is still using the nmon 9 look and feel. This should be improved in the next release to the nmon 10 look and feel. It is also somewhat limited in statistics, again it is planned to be improved - your suggestions are welcome.
Hint: nmon [-h] [-s <seconds>] [-c <count>] [-f -d -t -r <name>] [-x]
-h FULL help information - much more than here
Interactive-Mode:
read startup banner and type: "h" once it is running
For Data-Collect-Mode (-f)
-f spreadsheet output format [note: default -s300 -c288]
optional
-s <seconds> between refreshing the screen [default 2]
-c <number> of refreshes [default millions]
-t spreadsheet includes top processes
-x capacity planning (15 min for 1 day = -fdt -s 900 -c 96)
For Interactive-Mode
-s <seconds> between refreshing the screen [default 2]
-c <number> of refreshes [default millions]
-b black and white [default is colour]
example: nmon -s 1 -c 100
For Data-Collect-Mode = spreadsheet format (comma separated values)
Note: use only one of f,F,z,x or X and make it the first argument
-f spreadsheet output format [note: default -s300 -c288]
output file is <hostname>_YYYYMMDD_HHMM.nmon
-F <filename> same as -f but user supplied filename
-r <runname> goes into spreadsheet file [default hostname]
-t include top processes in the output
-T as -t plus saves command line arguments in UARG section
-s <seconds> between snap shots
-c <number> of refreshes
-l <dpl> disks/line default 150 to avoid spreadsheet issues. EMC=64.
-D Skip disk configuration sections
example: collect for 1 hour at 30 second intervals with top procs
nmon -f -t -r Test1 -s30 -c120
To load into a spreadsheet like Lotus 1-2-3:
sort -A *nmon >stats.csv
transfer the stats.csv file to your PC
Start 1-2-3 and then Open <char-separated-value ASCII file>
Capacity planning mode - use cron to run each day
-x sensible spreadsheet output for CP = one day
every 15 mins for 1 day ( i.e. -ft -s 900 -c 96)
-X sensible spreadsheet output for CP = busy hour
every 30 secs for 1 hour ( i.e. -ft -s 30 -c 120)
Set-up and installation
If you get a "can't open /dev/kmem" message
then as root run: chmod ugo+r /dev/kmem
or run the tool as the root user
To enable disk stats as root: chdev -l sys0 -a iostat=true
- this adds the disk % busy numbers (otherwise they are zero)
If you have hundreds of disk this can take 1% to 2% CPU
Interactive Mode Commands
key --- Toggles to control what is displayed ---
h = Online help information
r = Machine type, machine name, cache details and OS version + LPAR
c = CPU by processor stats with bar graphs
l = long term CPU (over 75 snapshots) with bar graphs
m = Memory and Paging stats
n = Network stats
N = Network errors
d = Disk I/O Graphs
D = Disk I/O Stats
o = Disk I/O Map (one character per disk showing how busy it is)
p = Logical Partitions Stats
b = black and white mode (or use -b option)
. = minimum mode i.e. only busy disks and processes
key --- Other Controls ---
+ = double the screen refresh time
- = halves the screen refresh time
q = quit (also x, e or control-C)
0 = reset peak counts to zero (peak = ">")
space = refresh screen now
Startup Control
If you find you always type the same toggles every time you start
then place them in the NMON shell variable. For example:
export NMON=cmdrvtan
Others:
a) To you want to stop nmon - kill -USR2 <nmon-pid>
b) Use -p and nmon outputs the background process pid
c) To limit the processes nmon lists (online and to a file)
Either set NMONCMD0 to NMONCMD63 to the program names
or use -C cmd:cmd:cmd etc. example: -C ksh:vi:syncd
d) If you want to pipe nmon output to other commands use a FIFO:
mkfifo /tmp/mypipe
nmon -F /tmp/mypipe &
grep /tmp/mypipe
e) If nmon fails please report it with:
1) nmon version like: Linux9c7
2) the output of cat /proc/cpuinfo
3) some clue of what you were doing
4) I may ask you to run the debug version
Written by Nigel Griffiths nag@uk.ibm.com
Feedback welcome - on the current release only and state exactly the problem
No warranty given or implied.
nmon Documentation about the data source for nmon
This sectiondscribes where nmon gets the infomration you can see - hopefully this can help you work out what it all means by studying the AIX and Linux Manuals
AIX versions
- The data displayed by nmon are similar to the data from the standard AIX commands such as vmstat, iostat, netpnmon, df, and sar. Use the manual pages for these standard commands to understand what the data means.
- As the data comes from the AIX API's you can also look at the C programming header files.
Process data
- The process details come from the getprocs664() system call which returns the procentry64 structure from the C header file /usr/include/procinfo.h. Feel free to study this data structure to learn more.
Other data
- Most other data comes from the libperfstat library, the many functions and data structures are contained in the C header file /usr/include/libperfstat.h
LPAR
- The Logical Partition data comes from the get_lparinfo() system call, which returns two data structures and is documented in the C header file /usr/include/sys/dr.h
Memory
- A few further items of memory come from the vmgetinfo() system call and are documented in the C header file /usr/include/sys/vminfo.h
To display these nmon works in the following ways.
- For, thresholds and variables nmon just prints them in a suitable format and some times scaled down to KB or MB.
- For counters, by which I mean numbers that are just incremented when events happen, nmon has to compare the current value and the previous value, take the difference and divide by the elapsed time. This gives the number of events per second. Some of these are text graphs where it is helpful.
- For percentages, the data is calculate from other data
- For extracted data, for example AIO, Adapters, ESS User defined Disk Groups that data does not really exist at all. It is deduced from other data. AIO is found by searching the top processes for particular command names. Adapters, strictly speaking Disk Adapters, was found by working out the adapter to disk mapping and adding the disk stats. In nmon 10 the libperfstat library does this. ESS and User Defined Disk Groups are both like disk adapters but with different mappings.
- For completely made up data, hopefully, not to much of this.
Linux versions:
The data comes from the /proc filesystem but this is poorly documented apart from the kernel source code! The primary files to check are
- /proc/cpuinfo,
- /proc/meminfo,
- /proc/stat,
- /proc/version,
- /proc/diskstats or /proc/partitions or /proc/diskinfo,
- /proc/net/dev,
- /proc/<PID>/stat,
- /proc/<PID>/statm
- /proc/ppc64/lparcfg for POWER5 logical partitions data.
Also some data about the Linux version has to be harvested from uname and /etc/*ease and /etc/*version etc.
It is truly odd that it is vary hard to work out what Linux Dirsto and verion you are running - welcome to open source.
IBM RedBooks and IBM AIX Manuals
Resources
- AIX 5 performance series: CPU monitoring and tuning. This article shows you the standard AIX tools that can help you determine CPU bottlenecks, and shows how to interpret the reports generated by the tools so you can tune for performance.
- Understanding IBM pSeries Performance and Sizing (SG24-4810-1). 400-page Redbook. For performance tuning on pSeries and AIX.
- Database Performance on AIX in the DB2 UDB and Oracle Environments (SG24-5511). 450-page Redbook. The techie's bible for tuning these databases for high performance.
- AIX 5L Performance Tools Handbook (SG24-6039). 950-page Redbook. All the latest tools for AIX5L including truss and WLM.
- The AIX Performance Management Guide - from the AIX manuals. Start from
- For AIX Workload Management this article is an excellent start -
Some hints about the data - this need more work (but its unlikely to get it)
CPU
The below four types of CPU workload always add up to 100% (the CPU has to be doing something).
- User = Application code (kernel programmers call this user mode) this includes programs and RDBMS
- Sys = AIX Kernel code - this is invoked by either a system call or hardware interrupt including the regular clock interrupts
- Wait = waiting for IO. This really is idle but there is outstanding disk I/O.
- Idle = nothing else to run
Memory Use
- Virtual - memory backed up by paging space & called virtual memory
- Physical the actual RAM in the machine
- Paging of Memory the transfers between RAM and disk In=to RAM Out=to disk
- % Used percentage of memory allocated and being used
- % Free percentage of memory not allocated and available
- MB Used amount of memory allocated and being used in megabytes
- MB Free amount of memory not allocated and available in megabytes
- Total(MB) above columns added up
Verbose Mode
- CPU - If this is high response times will suffer
- Paging Space - Virtual/Real Ratio for small memory system (<128MB) this should be greater than 2.5 and for large memory systems 1.2 is a recommended minimum. Laos 20% of paging space should be free
- Page Faults - If this is above 20 times the CPU this will be slowing down performance
- Top Disk - If this hotest disk is above 60% this could be slowing down performance.
Paging Space
- to Paging Space transfers between RAM and allocated paging logical volumes
- to File System transfers between RAM and allocated journal file system (i.e. read only program code)
- Paging Scans - this is how often AIX is looking for memory pages to release to the free list.
- Paging Cycles - this is how often it scans all memory and fails to free any pages
- Page Reclaims - this is where a program grabs the page back from the freelist before it was used, as it really needs it.
VM parameters
- numperm - amount of RAm used by the JFS cache
- minperm and maxperm - low and high water levels that effect the VMM's choice of which pages to take on to the freelist
- minfree and max free - the low and high water levels for stopping and starting the lrd deamon to look for memory pages to free up
Network
- read kB/s and write kB/s - the kilobytes read and written on the network
- packin and packout - the number of network packets in=received out=sent
- insize and outsize - the average packet size in=received out=sent
- Disk I/O
- Busy - The percentage of the time the disk was found in use
- Read kB/s and Write kB/s - read and written kilobytes by drive per second
- Xfers - blocks transferred per second
- Rsize and Wsize - average transfer size
- Peak% - the peak Busy percentage
- Peak-RW KB/s - the peak Read plus Write
- Used - highlights when disk is being used
- Adapter I/O
- see Disk I/O the I/O for all disks on the adapter are added up
CPU mode 1
- PID - Process indentity
- PPID - Parent Process indentity
- UID - User indentity
- Pgrp - Process group
- Nice - used in process priority calculation
- Status - see /usr/include/sys/proc.h files
- proc-Flag - see above include files
- Thrds - number of threads within the process
- Files - number of files open within the process
- Command - the simple form of the command used to start the process
CPU mode 2
- Time Start - time the process was started
- System time this process was running in system (Kernel) mode inside AIX
- User time this process was running user application code
- Child CPU time used by processes started by this process
- Delta time differences between the last two screen updates
CPU mode 3
- %CPU - Used percentage of one CPU used on this process
- Size K - process size in kilobytes
- Res set K - resident set in kilobytes (RAM actually allocated)
- Res Text - resident set in kilobytes for code part of program
- Res Data - resident set in kilobytes for data and stack part of program
- RAM Use - percentage of memory used
- Paging io - paging caused by this process doing I/O
- Paging other - paging caused by this process (not including doing I/O)
- Paging repage - paging caused by this process repeatedly needing pages
Kernel Internal Statistics
- RunQueue - run queue length (processes waiting for CPU)
- SwapIn - processes swapped back in after thrashing
- iget - inode JFS file descriptor) get from the disk
- namei - file/directory lookup within the JFS
- dirblk - directory block read within the JFS
- pswitch - process switches (changes between user applications)
- syscall - system calls (application requesting AIX services)
- rawch - raw character read in from tty
- canch - canonical character read in and processes
- outch - character output to tty
- read - read system call (all types of device disk, network/socket, pipe)
- write - write system call (all types of device disk, network/socket, pipe)
- fork - new process creation (clone current process)
- exec - new program code started (over writing current process with a new program)
- readch - characters read
- writech - characters written
- msg - shared messages written between applications
- sem - shared semaphore operations between applications
Disk %Busy Map
- This shows how busy all the disks get but with hundred on one screen
Asynchronous I/O Processes
- Total AIO processes = count of the number running
- all Time peak = maximum every spotted running by nmon
- Actually in use = number during current snapshot
- Recent peak = number since user hit 0 key
- Peak CPU used = total CPU currently being used since user hit 0 key
ESS I/O
JFS
The postings on this site solely reflect the personal views of the authors and do not necessarily represent the views, positions, strategies or opinions of IBM or IBM management.