This is a discussion on How often should I reboot Solaris and LynxOS within the Sun Solaris Administration forums, part of the Solaris Operating System category; --> "EKL" <En-Kuang_Lung@raytheon.com> wrote: ]Would someone please give me some pointers on a trend analysis on resource ]leaks for Solaris ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| "EKL" <En-Kuang_Lung@raytheon.com> wrote: ]Would someone please give me some pointers on a trend analysis on resource ]leaks for Solaris 9 and LynxOS (or in general on UNIX machines). Basically, ]I need statistics to determine how often I need to restart these machines to ]avoid unplanned failures. Thanks. For the statistics, I run a program that checks various things, such as swap usage, swap activity, cpu usage, disk usage, i-node usage, network errors etc and log that using rrdtool. I then display daily stats, and yearly stats so I can get trends and predict when a filesystem will fill up etc. This works very well. For re-boots - many will say that you don't need to. From experience, you dont need regular reboots, but if you install an application or patch, you should consider scheduling an attended re-boot shortly afterwards when it suits everyone. You want to make sure it still boots cleanly in case of an unscheduled unattended reboot later on. You want to make sure that everything starts up as it should from a re-boot. |
| |||
| Frank-Christian Kruegel wrote: > On Thu, 24 Jul 2003 11:18:08 -0400, "EKL" <En-Kuang_Lung@raytheon.com> wrote: > >>Would someone please give me some pointers on a trend analysis on resource >>leaks for Solaris 9 and LynxOS (or in general on UNIX machines). Basically, >>I need statistics to determine how often I need to restart these machines to >>avoid unplanned failures. Thanks. > > My uptime record was 611 days on a Netra T1. On the 612th day I had to swap both > hard disks. No problems during that time. See, that's what happens when you allow the hard disks to hit the 611.5 day timer, at which point they become so fragmented not even Humpty Dumpty can put them back together. The only issue I can recall is that after "N" days of operation the file modification dates would go bogus. Unfortunately can't remember what "N" was ?240 days? or if this was a SunOS 4.x feature or only on a machine based on SunOS 4.x |
| |||
| Greg <go@at.ends> writes: >In article <Xns93C2A0A186616davetsccorpcom@199.45.49.11>, dave@tsc- >corp.com says... >> "Michael Vilain <vilain@spamcop.net>" wrote in news:news- >> 0A1283.10123524072003@news.tdl.com: >> >> > If you just run without users >> >> Yup, it's them darned users that are the problem! >Just "kill -SIGSTOP" all their processes, except for their shells. >(Don't use SIGTERM or SIGKILL. That just pisses the users off. SIGSTOP >is just as effective in terms of preventing the processes from hosing >the system, but it's less psychologically traumatic to the users.) SIGSEGV is more fun; I remember a sysadmin using it to punish certain students. (Yes, he is starting a debugger!) Casper -- Expressed in this posting are my opinions. They are in no way related to opinions held by my employer, Sun Microsystems. Statements on Sun products included here are not gospel and may be fiction rather than truth. |
| |||
| Lon Stowell <lon.stowell@comcast.net> writes: > The only issue I can recall is that after "N" days of operation > the file modification dates would go bogus. Unfortunately > can't remember what "N" was ?240 days? or if this was a > SunOS 4.x feature or only on a machine based on SunOS 4.x Sure this wasn't the "248 days, I hang" bug in Solaris 2.x? The comments in the bug report were something like "customer may lose satellite if this happens again". Casper -- Expressed in this posting are my opinions. They are in no way related to opinions held by my employer, Sun Microsystems. Statements on Sun products included here are not gospel and may be fiction rather than truth. |
| |||
| In comp.sys.sun.admin Casper H.S. Dik <Casper.Dik@sun.com> wrote: : SIGSEGV is more fun; I remember a sysadmin using it to punish certain : students. (Yes, he is starting a debugger!) : Casper : -- Heh. Or more creatively, send a FPE signal, especially if you *know* there should be no floating point in the application. That will really confuse them! (Why is my hello-world program constantly getting floating point error?) Chris Barrera cbarrera@ t i . c o m |
| |||
| I R A Darth Aggie wrote: > On 25 Jul 2003 12:49:10 GMT, > Casper H.S. Dik <Casper.Dik@Sun.COM>, in > <3f212746$0$49103$e4fe514c@news.xs4all.nl> wrote: > +> > +> The comments in the bug report were something like "customer may > +> lose satellite if this happens again". > > Oh, great, so NASA already has an excuse if they lose one (or more) of > the recently launched Mars probes?? Nah. It was a company who run telecommunications satellites. For some reason they get a bit annoyed if they lose their geostationary satellites every 248 days. That got fixed back in Solaris 2., I think, 6, and patched back as far as about 2.3. -- Tony |
| |||
| I like tweaking errno.h Chris Barrera wrote: > > In comp.sys.sun.admin Casper H.S. Dik <Casper.Dik@sun.com> wrote: > : SIGSEGV is more fun; I remember a sysadmin using it to punish certain > : students. (Yes, he is starting a debugger!) > > : Casper > : -- > > Heh. Or more creatively, send a FPE signal, especially if you *know* > there should be no floating point in the application. That will really > confuse them! > > (Why is my hello-world program constantly getting floating point error?) > > Chris Barrera > cbarrera@ t i . c o m -- Paul Watson # Oninit Ltd # Growing old is mandatory Tel: +44 1436 672201 # Growing up is optional Fax: +44 1436 678693 # Mob: +44 7818 003457 # www.oninit.com # |
| |||
| Casper H.S. Dik wrote: > Lon Stowell <lon.stowell@comcast.net> writes: > >> The only issue I can recall is that after "N" days of operation >> the file modification dates would go bogus. Unfortunately >> can't remember what "N" was ?240 days? or if this was a >> SunOS 4.x feature or only on a machine based on SunOS 4.x > > Sure this wasn't the "248 days, I hang" bug in Solaris 2.x? The one I mentioned was in the big Mack Truck Auspex servers [RIP] where the mod times would go foo foo. Poked around and the recommendation was a 240 day reboot, but that may have been to give a safety margin for busy admins. Those were SunOS 4.1 based machines for the control processor. > > The comments in the bug report were something like "customer may > lose satellite if this happens again". There is probably a story behind that worth telling in alt.folklore.computers. |
| |||
| On Fri, 25 Jul 2003 15:36:48 +0100, Tony Walton <tony.walton@s-u-n.com>, in <3F214080.5020909@s-u-n.com> wrote: +> I R A Darth Aggie wrote: +> > On 25 Jul 2003 12:49:10 GMT, +> > Casper H.S. Dik <Casper.Dik@Sun.COM>, in +> > <3f212746$0$49103$e4fe514c@news.xs4all.nl> wrote: +> +> > +> +> > +> The comments in the bug report were something like "customer may +> > +> lose satellite if this happens again". +> > +> > Oh, great, so NASA already has an excuse if they lose one (or more) of +> > the recently launched Mars probes?? +> +> Nah. It was a company who run telecommunications satellites. For some +> reason they get a bit annoyed if they lose their geostationary +> satellites every 248 days. 1. Yeah, I imagine they would get a little peeved if a multi-million dollar platform went missing... 2. How do you lose a geostationary satellite? by definition, you know exactly where it is, all the time...unless the computer in question would send out orbital correction commands to the satellite(s)!! James -- Consulting Minister for Consultants, DNRC I can please only one person per day. Today is not your day. Tomorrow isn't looking good, either. I am BOFH. Resistance is futile. Your network will be assimilated. |
| ||||
| On Fri, 25 Jul 2003 17:36:02 GMT, Lon Stowell <lon.stowell@comcast.net> wrote: >> The comments in the bug report were something like "customer may >> lose satellite if this happens again". > > There is probably a story behind that worth telling in > alt.folklore.computers. Not really. Of course I've worked for both "that company" and NASA (as well as SDC, aka star wars), so I'm not that easily impressed. I do happen to know the guys who opened that bug report, and yes, it does get management upset when you start losing satellites. Of course it's not nearly as bad as when you find them on the ground...... |