Date: Wed, 24 Mar 2004 16:47:28 -0500 From: Don Bowman <don@sandvine.com> To: 'Kris Kennaway' <kris@obsecurity.org>, Don Bowman <don@sandvine.com> Cc: "'current@freebsd.org'" <current@freebsd.org> Subject: RE: LOR on current Message-ID: <FE045D4D9F7AED4CBFF1B3B813C85337045D86F8@mail.sandvine.com>
next in thread | raw e-mail | index | archive | help
From: Kris Kennaway [mailto:kris@obsecurity.org] > On Wed, Mar 24, 2004 at 03:23:36PM -0500, Don Bowman wrote: > > > > Right, I think that's not the cause of your lockup :) > > > > Not being one to believe in coincidences... I'm typing > > on the serial console. The machine halts, i can no longer type. > > some seconds pass, out pops that message. This time too it > > returned. Most times (when i run two postgresql vacuums > simulatenously > > for example), that's the end of it. > > > > I will continue to investigate. > > Check for disk problems..I have often experienced hangs or lockups on > machines with faulty disks. 6-disk raid 5 behind ASR. All disks report optimal, controller reports optimal. I know the hangs you mean, from the vm swapin etc which holds all the locks. I don't think this is they. with ahd i would get scsi sense errors in the log for machines with problems [CRC errors etc], i don't have a for what asr does in this case. ran a 96 hour memory test (memtest86), with ecc checking, there were no soft or hard errors. Ran machine to 40 degrees C ambient in environmental chamber, its all good. Its got 3 power supplies, all are operational, fed from UPS. This is a software problem somewhere I think. I'm curious, how many people use ASR with current? It seems like it might be somewhat unloved.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?FE045D4D9F7AED4CBFF1B3B813C85337045D86F8>