This is a discussion on Need assistance locating I/O bottleneck - lsof help, perhaps? within the comp.unix.solaris forums, part of the Solaris Operating System category; --> We have an SF6800 server w/ 20 CPUs and a boatload of A5x00s - over 150 disks. We've recently ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| We have an SF6800 server w/ 20 CPUs and a boatload of A5x00s - over 150 disks. We've recently started experiencing *severe* I/O degredation. After running guds and forwarding that information to Sun, two separate engineers determined that our bootdisks in our D240 are the bottleneck with system scalls bottling up the I/O. A further examination (iostat -xnp) of the boot drives shows that every 30 seconds, the disk is getting slammed with approximately 200-300 I/O writes in a two-second period with blocking averaging between 75-100%. I have several problems in isolating the cause for the I/O bottleneck, however. One is that we have over 2,300 users with over 7,000 processes running during a normal day. The second is that the boot disks are encapsulated under VxVM. So, all of the activity is showing up under slice 7, which is the public region, instead of the actual /opt, /var, or /. So, there is no way to determine from iostat exactly where the hundreds of writes are coming from. Sun recommended lsof, but since it's an open-source utility, they don't support it. lsof obviously has a boat-load of options to try to get the appropriate data. Running lsof by itself is useless because of the huge amount of I/O that we get on a normal day. I believe that I have Adrian Cockroft's Solaris Tuning book at home, but it will still take time to read through and try to figure out what might be happening. Because this is our production system, I obviously can't do anything on the fly that might require a reboot. Has anyone run into these kinds of problems? Any ideas on what to look for? Any suggestions on the recommended syntax for lsof? Are there better tools out there to try to find this bottleneck? |
| |||
| In <DdWdneIycqzeqXHcRVn-iA@giganews.com> John_B <spam.blows.and@spammers.suck.com> writes: >We have an SF6800 server w/ 20 CPUs and a boatload of A5x00s - over 150 >disks. We've recently started experiencing *severe* I/O degredation. What operating system? Are you using UFS logging? -- -Gary Mills- -Unix Support- -U of M Academic Computing and Networking- |
| |||
| John_B wrote: > We have an SF6800 server w/ 20 CPUs and a boatload of A5x00s - over 150 > disks. We've recently started experiencing *severe* I/O degredation. > > After running guds and forwarding that information to Sun, two separate > engineers determined that our bootdisks in our D240 are the bottleneck > with system scalls bottling up the I/O. A further examination (iostat > -xnp) of the boot drives shows that every 30 seconds, the disk is > getting slammed with approximately 200-300 I/O writes in a two-second > period with blocking averaging between 75-100%. <snip> If it's getting hit every 30 seconds, perhaps there's a process you can identify it with (as another line of attack). -- The e-mail address in our reply-to line is reversed in an attempt to minimize spam. Our true address is of the form che...@prodigy.net. |
| ||||
| John_B <spam.blows.and@spammers.suck.com> wrote: > We have an SF6800 server w/ 20 CPUs and a boatload of A5x00s - over 150 > disks. We've recently started experiencing *severe* I/O degredation. > After running guds and forwarding that information to Sun, two separate > engineers determined that our bootdisks in our D240 are the bottleneck > with system scalls bottling up the I/O. A further examination (iostat > -xnp) of the boot drives shows that every 30 seconds, the disk is > getting slammed with approximately 200-300 I/O writes in a two-second > period with blocking averaging between 75-100%. Since it's happening at a specific point in time, you might be able to use the kernel I/O tracing facility. With a little poking, you could get a list of PIDs scheduling the I/O and see if they're user or system processes involved. Here's a short tutorial. http://www.sun.com/sun-on-net/itworl...61001perf.html -- Darren Dunham ddunham@taos.com Senior Technical Consultant TAOS http://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area < This line left intentionally blank to confuse you. > |