[OpenIndiana-discuss] OI Crash

Doug Hughes doug at will.to
Sat Jan 19 04:46:04 UTC 2013


On 1/18/2013 7:53 PM, DormitionSkete at hotmail.com wrote:
> On Jan 17, 2013, at 8:47 PM, Reginald Beardsley wrote:
>
>> As far as I'm concerned, problems like this are a bottomless abyss.  Which is why I'm still putting up w/ my OI box hanging.  It's annoying, but not critical.  It's also why critical stuff still runs on Solaris 10.
>>
>> Intermittent failures are the worst time sink there is. There is no assurance that devoting all your time to the problem will fix it even at very high skill levels w/ a full complement of the very best tools.
>>
>> If you're getting crash dumps there is hope of finding the cause, so that's a big improvement.
>>
>> Good luck,
>> Reg
>>
>> BTW Back in the 80's there was a VAX operator in Texas who went out to his truck, got a .357 and shot the computer.  His employer was not happy.  But I can certainly understand how the operator felt.
>
>
>  From 1992 to I used to 1998, I used to work at the Denver Museum of Natural History -- now the Denver Museum of Nature and Science.  We had two or three DEC Vax's and an AIX machine there.  It was their policy that once a week we had to power each of the servers all the way down to clear out any memory problems -- or whatever -- as preventive maintenance.
>
> Since then, I've always had the habit of setting up a cron job to reboot my servers once a week.  It's not as good as a full power down, but it's better than nothing.  And in all these years, I've never had to deal with intermittent problems like this, except for a few brief times when I used Red Hat Linux ten plus years ago.  (I've tried most of Red Hat's versions since 6.2, and RHEL 6 is the first version I've found that runs decent enough on our hardware, and that I'm happy enough with, for us to use.)
>
> So, if you can do it, you might want try setting up a cron job to reboot your server once a week -- or every night.  I reboot our LTSP thin client server every night just because it gets hit with running lots of desktop applications that I think give it a greater potential for these kinds of memory problems.
>
> On the other hand, we have all of our websites hosted on one of our parishioner's servers -- and he doesn't reboot his machines periodically like I do -- and about every two months, I have to call him up and tell him something is wrong.  And he goes and powers down his system -- sometimes he has to even unplug it -- and then turn it back on, and everything works again.
>
> I know there are system admins that just love to brag about how great their up-times are on their machines -- but this might just save you a lot of time and grief.
>
> Of course, if you're running a real high-volume server, this might not be workable for you; but it only takes 2-5 minutes or so to reboot... Perhaps in the middle of the night you might be able to spare it being down that short time?
>
> Just a friendly suggestion.
>
> Shared experience.
>
> I know others may tell you that that's no longer necessary anymore in these more modern times; but my experience has been otherwise.
>
> I hope it helps.
>
> +Peter, hieromonk
>

Haven't we passed the days of mystical sysadmin without understanding 
and characterization? Keeping up tradition for tradition's sake without 
understanding the underlying reasons really doesn't do anybody a favor. 
If there are memory leaks, we posses the technology to find them. My 
organization has thousands of machines that run jobs sometimes for 
months at a time. If I had to reboot servers once a week, my users would 
be at the doors with pitchforks. The only time we take downtime is when 
there are reasons to do so, including OS updates, hardware failures, and 
user software run amok. They can run a very long time like this.

Not that memory leaks never happen. Of course they do, but they 
eventually get found and fixed, or the program causing them passes into 
obsolescence. Always.

I encourage discovery rather than superstition, and diagnosis rather 
than repetition.

Be a knight, not a victim!




More information about the OpenIndiana-discuss mailing list