Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 28 Feb 2006 09:55:21 -0700
From:      "Kenneth D. Merry" <ken@freebsd.org>
To:        ticso@cicely.de
Cc:        Bernd Walter <ticso@cicely12.cicely.de>, freebsd-scsi@freebsd.org
Subject:   Re: Automatic unit start broken?
Message-ID:  <20060228165521.GA9261@nargothrond.kdm.org>
In-Reply-To: <20060228162647.GZ64548@cicely12.cicely.de>
References:  <20060227201644.GR64548@cicely12.cicely.de> <20060227202254.GA1016@nargothrond.kdm.org> <20060227204326.GS64548@cicely12.cicely.de> <20060228161004.GA9002@nargothrond.kdm.org> <20060228162647.GZ64548@cicely12.cicely.de>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Feb 28, 2006 at 17:26:47 +0100, Bernd Walter wrote:
> On Tue, Feb 28, 2006 at 09:10:04AM -0700, Kenneth D. Merry wrote:
> > On Mon, Feb 27, 2006 at 21:43:27 +0100, Bernd Walter wrote:
> > > On Mon, Feb 27, 2006 at 01:22:54PM -0700, Kenneth D. Merry wrote:
> > > > On Mon, Feb 27, 2006 at 21:16:45 +0100, Bernd Walter wrote:
> > > > What error code do your disks return?  You will probably see some console
> > > > output if GEOM has tried to read metadata off the disk and that initial
> > > > read fails.
> > > > 
> > > > If the drive returns 0x04,0x02 ("Logical unit not ready, initializing cmd.
> > > > required"), CAM will attempt to spin the disk up automatically and retry
> > > > the command.
> > > 
> > > During the first tests I waited 90s in loader to let all delayed spin
> > > up drives spin up.
> > > This is with recent RELENG_6 and a drive which don't spin up themself:
> > > [...]
> > > da7 at esp1 bus 0 target 10 lun 0
> > > da7: <SEAGATE ST336706LC 8A03> Fixed Direct Access SCSI-3 device 
> > > da7: 20.000MB/s transfers (10.000MHz, offset 15, 16bit), Tagged Queueing Enabled
> > > da7: Attempt to query device size failed: NOT READY, Logical unit not ready, initial
> > 
> > That's rather odd, since it looks like you've got an 0x04,0x02 response,
> > but the device must have rejected the start unit command if we failed to
> > get capacity information.
> 
> At least the drive won't fail a start unit when done via camcontrol.

That's good.

> > > [...]
> > > No GEOM message about this driver until rc sends a start command and
> > > GEOM is retriggered to reread the drive:
> > > Unit started successfully
> > > GEOM_LABEL: Label for provider da7 is ufs/dump1.
> > > The following commands were used in rc:
> > > camcontrol start -n da -u 7
> > > cat /dev/null > /dev/da7
> > > 
> > > Without the loader delay other disks are having problems as well:
> > > da9 at esp1 bus 0 target 14 lun 0
> > > da9: <IBM DDYS-T36950M S80D> Fixed Direct Access SCSI-3 device 
> > > da9: 20.000MB/s transfers (10.000MHz, offset 15, 16bit), Tagged Queueing Enabled
> > > da9: Attempt to query device size failed: NOT READY, Logical unit is in process of b
> > > 
> > 
> > That's a different error.  We won't send a start unit in that case.  The
> > error recovery action for 0x04,0x01 is to send a test unit ready every half
> > second for a minute until the device becomes ready.
> > Evidently it didn't become ready after that period of time.
> 
> Possible that this works, but a minute is hardly enough for a drive
> with ID 14 - considered 6s per ID this means the given drive requires
> 84s after power-up.
> But I doubt that the kernel waits - I should have noticed waiting a
> whole minute.
> Where is the minute defined?
> If it is not solved by raising the wait to 120s it likely won't work.

Look in cam_periph.c, in camperiphscsisenseerror(), in the SS_TUR/SSQ_MANY
case.  Increase the retry count to 240 and you'll get up to 240 test unit
ready commands sent every half second.

But, I think you may be right that the kernel may not be waiting.  See
below.  I suspect the driver is broken.

> > > On Shell:
> > > [30]cicely19# dd if=/dev/da7 bs=1k count=1 of=/dev/null
> > > 1+0 records in
> > > 1+0 records out
> > > 1024 bytes transferred in 0.008765 secs (116829 bytes/sec)
> > > [31]cicely19# camcontrol stop -n da -u 7
> > > Unit stopped successfully
> > > [32]cicely19# dd if=/dev/da7 bs=1k count=1 of=/dev/null
> > > dd: /dev/da7: Input/output error
> > > 0+0 records in
> > > 0+0 records out
> > > 0 bytes transferred in 0.004810 secs (0 bytes/sec)
> > > Exit 1
> > 
> > What errors do you see on the console at that point?  In order for CAM to
> > automatically spin up the disk, it needs to send back 0x04,0x02 when it is
> > spun down, and it needs to actually spin up the disk in response to a start
> > unit.
> 
> I don't see anything on console.

That's strange.

> > What happens when you:
> > 
> > camcontrol stop da7
> > camcontrol tur da7 -v
> > camcontrol start da7 -v
> 
> [52]raven# camcontrol stop da7
> Unit stopped successfully
> [53]raven# camcontrol tur da7 -v
> Unit is not ready
> (pass8:esp1:0:10:0): TEST UNIT READY. CDB: 0 0 0 0 0 0 
> (pass8:esp1:0:10:0): CAM Status: CCB request is in progress
> Exit 1

Okay, that's wrong.  The CCB status is never set properly, even though the
command was completed.

It looks like the driver may be broken.  It should set the CAM status to
CAM_SCSI_STATUS_ERROR in this case, but there is no place in the driver
(that I can see) where it ever sets that status.

> [54]raven# camcontrol start da7 -v
> Unit started successfully

Ken
-- 
Kenneth Merry
ken@FreeBSD.ORG



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060228165521.GA9261>