This is a discussion on Sar questions within the Sco Unix forums, part of the Unix Operating Systems category; --> Below is my sar output. How can we know that what caused the cpu to consume 100% CPU at ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Below is my sar output. How can we know that what caused the cpu to consume 100% CPU at all time? #sar 00:00:00 %usr %sys %wio %idle (-u) 01:00:00 25 75 0 0 02:00:00 25 75 0 0 03:00:00 24 76 0 0 04:00:00 24 76 0 0 05:00:00 24 76 0 0 06:00:00 24 76 0 0 .. .. (%idle = 0 all the time) #sar 1 5 23:30:19 %usr %sys %wio %idle (-u) 23:30:20 24 76 0 0 23:30:21 24 76 0 0 23:30:22 16 84 0 0 23:30:23 14 86 0 0 23:30:24 20 80 0 0 Average 19 81 0 0 Thanks, chalawal |
| |||
| On 24 Dec 2003 08:34:35 -0800, chalawal@hotmail.com (Chalawal Maliwan) wrote: >Below is my sar output. How can we know that what caused the cpu to >consume 100% CPU at all time? > >#sar > >00:00:00 %usr %sys %wio %idle (-u) >01:00:00 25 75 0 0 >02:00:00 25 75 0 0 >03:00:00 24 76 0 0 >04:00:00 24 76 0 0 >05:00:00 24 76 0 0 >06:00:00 24 76 0 0 Download cpuhog, iohog, and memhog. http://www.caldera.com/skunkware/sysadmin/ Cpuhog should identify the culprit. You can also do it manually with the ps command: http://docsrv.sco.com:507/en/man/html.C/ps.C.html The pcpu column will show the percentage of CPU use for each process. Also, watch out for situations where more than one process is hogging the CPU cycles. I've seen it happen once or twice. -- Jeff Liebermann 150 Felker St #D Santa Cruz CA 95060 (831)421-6491 pgr (831)336-2558 home http://www.LearnByDestroying.com AE6KS jeffl@comix.santa-cruz.ca.us jeffl@cruzio.com |
| |||
| Chalawal Maliwan <chalawal@hotmail.com> wrote: >Below is my sar output. How can we know that what caused the cpu to >consume 100% CPU at all time? >#sar >00:00:00 %usr %sys %wio %idle (-u) >01:00:00 25 75 0 0 >02:00:00 25 75 0 0 >03:00:00 24 76 0 0 >04:00:00 24 76 0 0 >05:00:00 24 76 0 0 >06:00:00 24 76 0 0 Usually fairly simply. From http://aplawrence.com/Unixart/slow.html If it is the cpu that is pegged busy, it *may* be a run away process that is eating cpu cycles. Do this: for x in 1 2 3 4 5 do ps -e | sort -r +2 | head -5 echo "===" sleep 5 done Look for a process who's time column has gone up by 3 to 5 seconds each time- if you have something like that, that's your problem- you need to kill it. The TIME column is time on the cpu- normally a process doesn't spend a great deal of time actually running- it's waiting for the disk, waiting for you to type something, etc. Most processes spend most of their time sleeping, waiting for something else to happen, so something that gains 3 seconds or more in 5 seconds of wall time is usually suspect. If you watch it over a few minutes, the time it gains here divided by the elapsed wall clock time is the percentage of your cpu this process is taking for itself. A shortlived process can take a lot of the cpu to print, or to redraw an X screen etc., so you have to use some good judgement here. But 3 seconds out of 5 is very likely a real problem. Of course you need to understand what you are killing: you probably wouldn't want to kill the main Oracle database, for example. If you kill the errant process and another copy of it pops right back to the top of the list, then you need to track down its parent: # for example, if process 15246 is the problem ps -p 15246 -o ppid Of course, it may go further up the chain. Here's a script that traces back to init: # This works on SCO or Linux, just pass a process ID as an argument. MYPROC=$1 NEXTPROC=$MYPROC while [ $NEXTPROC != 0 ] do ps -lp $NEXTPROC MYPROC=$NEXTPROC NEXTPROC=`ps -p $MYPROC -o "ppid=" ` done Sometimes you'll have a badly written network program that starts sucking resources when its client dies. If you can't get the supplier to fix it, you may want to write a script to track down and kill these things. One clue that might help: the difference between a good "xyz" process and a bad one might just be whether or not it has an attached tty. So, if you see this: 5821 ? 00:00:42 xyz 6689 ttyp0 00:00:08 xyz 7654 ttyp1 00:00:12 xyz It's probably the one with a "?" that will start accumulating time. So a script that watched for and killed those might look like this: set -f # turn off shell expansion because of "?" ps -e | grep "xyz$" | while read line do set $line [ "$2" = "?" ] && kill -9 $1 done If you can't do it that way, you have to get more clever, and watch for changing time: set -f mkdir /tmp/mystuff ps -e | grep "xyz$" | while read line do set $line ps -p $1 > /tmp/mystuff/first sleep 5 #adjust sleep as necessary ps -p $1 > /tmp/mystuff/second diff /tmp/mystuff/first /tmp/mystuff/second || kill -9 $1 done And even that may not be clever enough for your particular situation, so test and tread carefully. You may even need to do math on the time field to see what has really happened. Bela Lubkin made an interesting post about an apparently slow CPU2 on an SMP system. Read it at http://aplawrence.com//Bofcusm/1695.html. Another thing you may see is a process that has used a lot of time but isn't gaining time right now. I've seen that many times where the process is "deliver"- MMDF's mail delivery agent on SCO systems that aren't running sendmail. What happens is that for whatever reason (a root.lock file from a crash in /usr/spool/mail or a missing "sys" home directory), there are thousands of undelivered messages in the subdirectories of /usr/spool/mmdf/lock/home The fix for that is simple if you don't care about the messages: rm -r all those directories and recreate them empty with the same ownership and permissions cd /usr/spool/mmdf/lock/home /etc/rc2.d/P86mmdf stop rm -r * chown mmdf:mmdf * chmod 777 * cd /usr/spool/mail rm *.lock /etc/rc2.d/P86mmdf start You'd then want to verify that mail is working normally and that whatever caused the problem isn't still happening- for example, if /usr/sys is missing this problem will come right back again very quickly. Another possibility is a program that is rapidly spawning off other programs. You should be able to see that in "ps -e". First, are the number of processes growing?: ps -e | wc -l sleep 5 ps -e | wc -l Or, are there new processes briefly showing up at the end of the listing?: ps -e | tail sleep 5 ps -e | tail In either case, you need to track down the parent and kill it. -- tony@aplawrence.com Unix/Linux/Mac OS X resources: http://aplawrence.com Get paid for writing about tech: http://aplawrence.com/publish.html |
| |||
| On Wed, 24 Dec 2003 09:48:03 -0800, Jeff Liebermann <jeffl@comix.santa-cruz.ca.us> wrote: >You can also do it manually with the ps command: > http://docsrv.sco.com:507/en/man/html.C/ps.C.html >The pcpu column will show the percentage of CPU use for each process. I found this on the man page for ps: Display in decreasing order, the IDs and percentage CPU usage of all processes where usage is more than 5% of CPU time: ps -A -o "pid=" -o "pcpu=" | awk '$2 > 5 {print $1" "$2}' | sort -r +1 Methinks it might be handy although I would sort by the pcpu percentage instead of the process ID. Change to: ps -A -o "pid=" -o "pcpu=" | awk '$2 > 5 {print $1" "$2}' | sort -r +2 (Note: I didn't try this because my ancient 3.2v4.2 ps command doesn't support the -o option). -- Jeff Liebermann 150 Felker St #D Santa Cruz CA 95060 (831)421-6491 pgr (831)336-2558 home http://www.LearnByDestroying.com AE6KS jeffl@comix.santa-cruz.ca.us jeffl@cruzio.com |
| |||
| On Wed, 24 Dec 2003 18:13:47 +0000 (UTC), Tony Lawrence <apl@shell01.TheWorld.com> wrote: (...) We both forgot about the "top" program. http://www.caldera.com/skunkware/sysadmin/ which will also display the top cpu hogs. >In either case, you need to track down the parent and kill it. That's what happens with child abuse. -- Jeff Liebermann 150 Felker St #D Santa Cruz CA 95060 (831)421-6491 pgr (831)336-2558 home http://www.LearnByDestroying.com AE6KS jeffl@comix.santa-cruz.ca.us jeffl@cruzio.com |
| |||
| Jeff Liebermann <jeffl@comix.santa-cruz.ca.us> wrote: >On Wed, 24 Dec 2003 18:13:47 +0000 (UTC), Tony Lawrence ><apl@shell01.TheWorld.com> wrote: >(...) >We both forgot about the "top" program. > http://www.caldera.com/skunkware/sysadmin/ >which will also display the top cpu hogs. I didn't forget. Although extraordinarily popular, I think it's usefulness is overblown and undeserved. >>In either case, you need to track down the parent and kill it. >That's what happens with child abuse. :-) -- tony@aplawrence.com Unix/Linux/Mac OS X resources: http://aplawrence.com Get paid for writing about tech: http://aplawrence.com/publish.html |
| ||||
| > > Download cpuhog, iohog, and memhog. > http://www.caldera.com/skunkware/sysadmin/ > Cpuhog should identify the culprit. > > You can also do it manually with the ps command: > http://docsrv.sco.com:507/en/man/html.C/ps.C.html > The pcpu column will show the percentage of CPU use for each process. > > Also, watch out for situations where more than one process is hogging > the CPU cycles. I've seen it happen once or twice. I tried with cpuhog and saw some particular processes that consumed the cpu time. Thanks for all who answered my questions. chalawal |