Unix Technical Forum

Need assistance locating I/O bottleneck - lsof help, perhaps?

This is a discussion on Need assistance locating I/O bottleneck - lsof help, perhaps? within the comp.unix.solaris forums, part of the Solaris Operating System category; --> We have an SF6800 server w/ 20 CPUs and a boatload of A5x00s - over 150 disks. We've recently ...


Go Back   Unix Technical Forum > Unix Operating Systems > Solaris Operating System > comp.unix.solaris

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 01-06-2008, 07:15 PM
John_B
 
Posts: n/a
Default Need assistance locating I/O bottleneck - lsof help, perhaps?

We have an SF6800 server w/ 20 CPUs and a boatload of A5x00s - over 150
disks. We've recently started experiencing *severe* I/O degredation.

After running guds and forwarding that information to Sun, two separate
engineers determined that our bootdisks in our D240 are the bottleneck
with system scalls bottling up the I/O. A further examination (iostat
-xnp) of the boot drives shows that every 30 seconds, the disk is
getting slammed with approximately 200-300 I/O writes in a two-second
period with blocking averaging between 75-100%.

I have several problems in isolating the cause for the I/O bottleneck,
however. One is that we have over 2,300 users with over 7,000 processes
running during a normal day. The second is that the boot disks are
encapsulated under VxVM. So, all of the activity is showing up under
slice 7, which is the public region, instead of the actual /opt, /var,
or /. So, there is no way to determine from iostat exactly where the
hundreds of writes are coming from.

Sun recommended lsof, but since it's an open-source utility, they don't
support it. lsof obviously has a boat-load of options to try to get the
appropriate data. Running lsof by itself is useless because of the huge
amount of I/O that we get on a normal day.

I believe that I have Adrian Cockroft's Solaris Tuning book at home, but
it will still take time to read through and try to figure out what might
be happening. Because this is our production system, I obviously can't
do anything on the fly that might require a reboot.

Has anyone run into these kinds of problems? Any ideas on what to look
for? Any suggestions on the recommended syntax for lsof? Are there
better tools out there to try to find this bottleneck?
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 01-06-2008, 07:16 PM
Gary Mills
 
Posts: n/a
Default Re: Need assistance locating I/O bottleneck - lsof help, perhaps?

In <DdWdneIycqzeqXHcRVn-iA@giganews.com> John_B <spam.blows.and@spammers.suck.com> writes:

>We have an SF6800 server w/ 20 CPUs and a boatload of A5x00s - over 150
>disks. We've recently started experiencing *severe* I/O degredation.


What operating system? Are you using UFS logging?

--
-Gary Mills- -Unix Support- -U of M Academic Computing and Networking-
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 01-06-2008, 07:16 PM
CJT
 
Posts: n/a
Default Re: Need assistance locating I/O bottleneck - lsof help, perhaps?

John_B wrote:
> We have an SF6800 server w/ 20 CPUs and a boatload of A5x00s - over 150
> disks. We've recently started experiencing *severe* I/O degredation.
>
> After running guds and forwarding that information to Sun, two separate
> engineers determined that our bootdisks in our D240 are the bottleneck
> with system scalls bottling up the I/O. A further examination (iostat
> -xnp) of the boot drives shows that every 30 seconds, the disk is
> getting slammed with approximately 200-300 I/O writes in a two-second
> period with blocking averaging between 75-100%.

<snip>

If it's getting hit every 30 seconds, perhaps there's a process you
can identify it with (as another line of attack).

--
The e-mail address in our reply-to line is reversed in an attempt to
minimize spam. Our true address is of the form che...@prodigy.net.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 01-06-2008, 07:16 PM
Darren Dunham
 
Posts: n/a
Default Re: Need assistance locating I/O bottleneck - lsof help, perhaps?

John_B <spam.blows.and@spammers.suck.com> wrote:
> We have an SF6800 server w/ 20 CPUs and a boatload of A5x00s - over 150
> disks. We've recently started experiencing *severe* I/O degredation.


> After running guds and forwarding that information to Sun, two separate
> engineers determined that our bootdisks in our D240 are the bottleneck
> with system scalls bottling up the I/O. A further examination (iostat
> -xnp) of the boot drives shows that every 30 seconds, the disk is
> getting slammed with approximately 200-300 I/O writes in a two-second
> period with blocking averaging between 75-100%.


Since it's happening at a specific point in time, you might be able to
use the kernel I/O tracing facility. With a little poking, you could
get a list of PIDs scheduling the I/O and see if they're user or system
processes involved.

Here's a short tutorial.
http://www.sun.com/sun-on-net/itworl...61001perf.html

--
Darren Dunham ddunham@taos.com
Senior Technical Consultant TAOS http://www.taos.com/
Got some Dr Pepper? San Francisco, CA bay area
< This line left intentionally blank to confuse you. >
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 11:27 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com