Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 28 Aug 1995 14:08:23 -0700 (PDT)
From:      "Rodney W. Grimes" <rgrimes@gndrsh.aac.dev.com>
To:        rashid@haven.ios.com (Rashid Karimov.)
Cc:        gibbs@freefall.FreeBSD.org, hackers@freebsd.org
Subject:   Re: S.O.S -2.1Stable and ASUSP54TP4
Message-ID:  <199508282108.OAA02461@gndrsh.aac.dev.com>
In-Reply-To: <199508281910.PAA11521@haven.ios.com> from "Rashid Karimov." at Aug 28, 95 03:10:19 pm

next in thread | previous in thread | raw e-mail | index | archive | help

> 
> > > >	After having a lot of problems with different
> > > >	motherboards under FreeBSD I've switched to the
> > > >	"editor's choice" - ASUS P54TP4 mb.,which was
> > > >	recommended here by Jordan :)
> > > >	Few problems went away ( like random reboots and
> > > >	stuff), but oneremains consistent:
> > > >
> > > >	the freaking system loks each and every day and
> > > >	I think this is because of .... something related
> > > >	to HD activity/driver/adapter/whatever.
> > > >
> > > >	The system is P90,Adaptec 2940 PCI _SCSI adapter,
> > > >	SMC EtherPower.
> > > >
> > > >
> > > >	The symptoms:
> > > >
> > > >	system locks at random times w/o any messages at the console/
> > > >	log files. Locks means the system becomes unreachable neither
> > > >	from the local net nor from the console
> > > >	After I hit "reboot" switch, system reboots up to the fsck
> > > >	level and it starts complaining that it can't read partition
> > > >	information off the second HDD ( Seagate Barracuda 4 Gb) (!).
> > > >
> > > >	If one hits "reboot" again and goes to the Adaptec BIOS and runs
> > > >	disk utilities --> media check from there - the BIOS (!) complains
> > > >	that it can not  talk to the second HD.
> > > 
> > > It sounds like your Barracuda is overheating.
> > 
> > I agree with that assement of the facts given here.  But would like a
> > few ``details'' filled in.
> > 
> > a) When the lock up occurs are any drive select lights on solid?

This bit of information could be valuable.

> > b) Do you have an LED hooked to the controller and what state is it when
> >    the lock occurs?
> 
> 	The LED on the adapter ( ADAPTEC ) is OFF. No activity at all - 
> 	at least for the time I'd been watching it.

Okay, good.  (Justin is the LED on the adaptec driven by the firmware,
or is it simply tied to the scsi BUSY signal?  Or do you even know?)

> > c) Have you any process that core on occasion, or system panics of any
> >    form?  [Looking for memory related problems which usually manifest
> >    themselves as random signall 11's if they occur in user land, and
> >    kernel panics if they occur in the kernel]
> 
> 	Not with this particualr system. I do see like 5-10 messages
> 	every 8-10 hours about processes dying with SIG 3 ( weird ).

Not really, someone is using ^\ to abort something.  See if the uid
is always the same to track who is doing it down and ask them why
they like to create core files :-).

> 	They are randon though and this doesn't happen right before the
> 	system locks up

These are not caused by machine problems (or at least I have never seen
a hardware failure cause a SIGQUIT in my life.

> > 
> > d) And my most famous question I always ask, have your _triple_ checked
> >    that your scsi bus is properly terminated and built using high quality
> >    double shielded scsi-ii rated cables?  [Applies to external cables,
> >    internal cables should be 110 Ohm 26AWG flat ribon cable.]  Also make
> >    sure you are using ACTIVE termination, at fast scsi-II speeds anything
> >    less is dangerous.
> 
> 	Well , the last HD is terminated, the first one is not.

What about all the ones in between?
And what about the controller???

> 	The terminated one gets the power from the HD. Should 
> 	I change it to "from the SCSI cable" ?

No, getting it from the HD is the right place, especially on long
chains.

> 	The HDs are internal ones , so I use flat ribbon cable.
> 	Don't have a brand on them though ... think they came
> 	with adapters.

Do the connectors go in firmly when connecting a drive, or are they
fairly loose slipping on.  (I have seen some really cheap low tension
tin 50 pin IDC connectors that are good for about 4 cycles of use and
then the contact resistance goes through the roof and scsi problems came
out of the wood work.)

> > 
> > e) Back to the Barracuda and heat problems, what is the case temperature
> >    of the drives while operating (give them 4 hours to stabalize under
> >    your worse load before taking a measurement).  Also make sure not
> >    to perturb the normal conditions for those 4 hours.
> 
> 	Its quite possible that this is a temperature problem,
> 	since I've changed the motherboards on Friday and left the
> 	PC in not-well-conditioned room.

:-(.  And probably don't have an direct air flow to this heat monster
named after a fish :-).
 
> 	Other thing happened since the time of my last message:
> 
> 	The system locked up in the weird way - the console
> 	driver was working , I could switch the virt. consoles,
> 	when I telnetted from the net - I saw the message "connected',
> 	but otherwise the system was dead. Don't know if it is
> 	related to the same heat problem ,...

That sounds all too much like a scsi sub system hang... did you wait
for a while, or did you go for the reset pretty quickly?  If you had
waited you probably would have got a vnode pager error and a panic
when it went to go to the disk.  The timeouts are fairly long, so
it can take a bit before the resulting panic comes up.

Some times it just deadlocks though, depending on what was going
on when the scsi bus went to lunch.
-- 
Rod Grimes                                      rgrimes@gndrsh.aac.dev.com
Accurate Automation Company                 Reliable computers for FreeBSD



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199508282108.OAA02461>