Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 21 Mar 2005 22:14:07 -0800
From:      "Ted Mittelstaedt" <tedm@toybox.placo.com>
To:        "RacerX" <racerx@makeworld.com>, <FreeBSD-Questions@freebsd.org>
Subject:   RE: Anthony's drive issues.Re: ssh password delay
Message-ID:  <LOBBIFDAGNMAMLGJJCKNIEOAFAAA.tedm@toybox.placo.com>
In-Reply-To: <20050321095647.R83831@makeworld.com>

next in thread | previous in thread | raw e-mail | index | archive | help
owner-freebsd-questions@freebsd.org wrote:
> Anthony -
>
>  	I'm curious - with the issues you are having with the drives (SCSI
> I think you mentioned) have you considered these ideas?
>
> 1. Upgrade the system BIOS
> 2. Upgrade the firmware in the SCSI controller
> 3. Upgrade the firmware in the array (if applicable)
>
> Ther may be a bug-a-boo in one of those. If you have not - consider
> doing so and see if this "may" correct your issues.
>

Racer,

  Anthony has an on-motherboard Adaptec chip in an 8 year old Vectra.
It does not use firmware.  It might be possible that he has a flashable
motherboard BIOS but that BIOS isn't going to have microcode for the
Adaptec controller in it.  (And in any case if he's never flashed his
BIOS I would -strongly- recommend he don't do it now, since his eeprom
has probably had the existing BIOS code burned into it by so long without
an update)  He is stuck with the ROM that is burned into
the Adaptec controller by the manufacturer.  And I wouldn't put it past
HP to have tampered with the Adaptec microcode anyway.  Compaq definitely
did with Adaptec controllers they put into their machines that were
made during the same era.

  I also checked his disk drives and neither of them have upgradable
firmware in the drives.

  He does not have an array controller.

  As I've told him in the past, he has 2 disks on his SCSI chain, one
of them is a Seagate that syncs up at 10Mbt to the controller, the other
is a newer Quantum that syncs up at 20Mbt.

I have told him to go into his Vectra BIOS and limit the sync negotiation
on both disk drives to the same speed - 10Mbt.  He refuses to try doing
this.
I've also told him to remove the Quantum and try running a FreeBSD system
off the Seagate, to see if it errors with just the single Seagate drive
on it.  He refuses to do that either.

  Others have told him to check termination.  It is possible one or both
drives are pinned for termination, and since his chassis provides
termination
that would be an error.  It is also possible that one or both drives
isn't
pinned to supply terminator power to the bus which would be a problem as
well.
He has dismissed all of these without checking, claiming his termination
is
fine.

  The basic problem is that Anthony has an error that is non-damaging
to his data - every once in a while the machine spews a bunch of SCSI
errors, resets the bus and everything on it, things slow down for a
moment,
then life continues.  He has by his admission, not lost data - yet.

  So the summary of it is that IMHO he LIKES things the way they are -
it's been happening enough so that he's not afraid of losing data
anymore,
yet it gives him an error he can wave around every time he wants to
knock FreeBSD's drivers.  He isn't really interested in finding the root
of the problem or isolating it to either a controller, a disk, or a
software
driver issue.  Instead he thinks that the SCSI driver author can just
wave
a wand, and look at a non-debug output of the error messages, and
magically
know exactly what workaround to stick in the driver to make the error
messages
go away.

  It is rather amusing or pathetic now, depending on your POV.

  For all we know the SCSI device driver under Windows NT ran into the
exact same error - and simply did the bus reset silently, without
informing
the user.  That would be completely in character with how Microsoft
approaches
things (ie: if it doesen't kill the system the user doesen't need to know
about it)

  As I have told him before the only way to find the error is to install
a
SCSI analyzer onto the SCSI bus, and only Adaptec and the disk drive
manufacturers
have such a tool - and if one did, they would almost certainly find out
it
is some kind of low-level timing od SCSI command set implementation issue
that would need a correction in either
the Adaptec controller microcode, or one of the disk drive's microcode -
and you could identify which disk it was a lot simpler and quicker by
just doing the troubleshooting suggestions that have already been given
to
him.  Besides which, a half hour of time on such a tool would probably
cost
more than the price of a brand new server.

Ted



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?LOBBIFDAGNMAMLGJJCKNIEOAFAAA.tedm>