Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 29 Jun 1999 20:58:59 -0500 (CDT)
From:      Joe Greco <jgreco@ns.sol.net>
To:        ken@plutotech.com (Kenneth D. Merry)
Cc:        scsi@freebsd.org
Subject:   Re: FreeBSD panics with Mylex DAC960SX
Message-ID:  <199906300159.UAA13916@aurora.sol.net>
In-Reply-To: <199906292300.RAA29666@panzer.kdm.org> from "Kenneth D. Merry" at "Jun 29, 1999  5: 0:50 pm"

next in thread | previous in thread | raw e-mail | index | archive | help
> > but during all of these crash-boots, the third line is
> > 
> > da1: <MYLEX DAC960SX138928B5 4332> Fixed Direct Access SCSI-2 device 
> > da1: 40.0MB/s transfers (20.0MHz, offset 16, 16bit), Tagged Queueing Enabled
> > da1: A
> 
> That should probably read "Attempt to query device size failed ...."
> 
> You may be losing characters over the serial console or something.

No.  When done on a VGA console, it shows a graphic character or two.

It does not interleave characters from the "changing root device..."
though.  :-)

> > If I can provide further information to assist in tracking down this bug,
> > please let me know.
> 
> My first guess is that it's happening during the open() routine, for some
> reason.  That's why fsck seems to cause the problem.
> 
> You're probably right about the device returning a size of zero.  It isn't
> immediately clear to me why the open routine would cause a panic, *unless*
> the Mylex unit returns good status for the read capacity command, but
> returns a capacity of 0.
> 
> It would be helpful to get a stack trace from the machine, if you can.
> Enabling DDB at least will give us a DDB stack trace.

Okay.  Alas, I must go physically bop the power on the machine to cause
the Mylex to reset; once it is up and running it is _very_ happy.  So I
may not get to this for the next day or so.

> > Also, I was wondering more generally about what the proper way to deal with
> > a device such as this is.  Assuming FreeBSD didn't actually crash when
> > trying to access the device, it is still possible to attempt booting when
> > the DAC controller is not ready, which will result - presumably - in fsck
> > exiting and complaining about that filesystem.  What is the "correct" way
> > to wait for something like this to become ready?  Is there a "correct" way,
> > even?
> 
> Well, it really depends on how the device behaves.  Here's what happens
> after the initial probe phase:
> 
> - the da driver sends a read capacity to the disk, with a retry count of 4
>   and a timeout of 5 seconds.
> 
> 	1.  The read capacity succeeds, and the probe continues normally.
> 	2.  The read capacity fails, and one of a few things happen:
> 
> 		1.  If the error has an associated error recovery action,
> 		    we may send a start unit to the disk, or one TUR every
> 		    half second for a minute.  Then we retry the original
> 		    command.
> 		2.  If the error has no associated error recovery action,
> 		    we just retry it until the retry count is exhausted.
> 
> My guess is that the error returned by the Mylex unit may not be an
> error with an associated recovery action.  So we just retry it four times
> and then report the "Attempt to query device size failed ..." where ... is
> the error.
> 
> Unfortunately, you're not getting the error printout, probably because of
> serial console weirdness.  Could you try booting with -v?  That will cause
> the full sense information for the error to get printed out, and maybe
> we'll have a better chance of figuring out what the error is.
> 
> Also, once you boot up in single user mode, you might try the following
> camcontrol command:
> 
> camcontrol cmd -n da -u 1 -v -c "25 0 0 0 0 0 0 0 0 0" -i 8 "i4 i4"
> 
> That will issue a read capacity command to da1, and print out the total
> number of blocks in the disk and the block size.  The -v will tell
> camcontrol to print out sense information.

I will be delighted to.  :-)  Unfortunately, I will probably have to putz
with it a bit, because the Mylex generally becomes ready within a minute
of me making it to single user mode.  Sigh.

I'll also see if it is any different if I break the array, which also
causes a panic (but might result in different specifics).

... Joe

-------------------------------------------------------------------------------
Joe Greco - Systems Administrator			      jgreco@ns.sol.net
Solaria Public Access UNIX - Milwaukee, WI			   414/342-4847


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-scsi" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199906300159.UAA13916>