Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 29 Feb 2000 22:50:36 -0700
From:      "Kenneth D. Merry" <ken@kdm.org>
To:        Mike Smith <msmith@FreeBSD.ORG>
Cc:        scsi@FreeBSD.ORG
Subject:   Re: chio trap with not-ready changer
Message-ID:  <20000229225036.B27747@panzer.kdm.org>
In-Reply-To: <200003010537.VAA03162@mass.cdrom.com>; from msmith@FreeBSD.ORG on Tue, Feb 29, 2000 at 09:37:29PM -0800
References:  <20000229221227.A27407@panzer.kdm.org> <200003010537.VAA03162@mass.cdrom.com>

next in thread | previous in thread | raw e-mail | index | archive | help

--r5Pyd7+fXNt84Ff3
Content-Type: text/plain; charset=us-ascii

On Tue, Feb 29, 2000 at 21:37:29 -0800, Mike Smith wrote:
> > > Is there any interest in debugging this at the moment, or should I hang 
> > > back until someone familiar with the 'ch' driver has some time to tinker?
> > 
> > Well, I don't infinite time to debug it, but I can certainly help you look
> > into it.
> 
> Ok, thanks.  Matt Jacob sounds like he's also interested, so I'm going to 
> answer his request here as well.  Here are some relevant snips from a 
> verbose boot:
> 
> Waiting 3 seconds for SCSI devices to settle
> (noperiph:ahc0:0:-1:-1): SCSI bus reset delivered. 0 SCBs aborted.
> ahc0: target 5 synchronous at 5.0MHz, offset = 0xb
> ahc0: target 6 synchronous at 5.0MHz, offset = 0xb
> sa0 at ahc0 bus 0 target 5 lun 0
> sa0: <EXABYTE EXB8500C8CQANXR4 0620> Removable Sequential Access SCSI-2 device
> sa0: Serial Number 02110501
> sa0: 5.000MB/s transfers (5.000MHz, offset 11)
> sa1 at ahc0 bus 0 target 6 lun 0
> sa1: <EXABYTE EXB8500C8CQANXR4 0620> Removable Sequential Access SCSI-2 device
> sa1: Serial Number 02088201
> sa1: 5.000MB/s transfers (5.000MHz, offset 11)
> pass0 at ahc0 bus 0 target 4 lun 0
> pass0: <SPECTRA STL-8000 1.94> Removable Changer SCSI-2 device
> pass0: Serial Number 3
> pass0: 3.300MB/s transfers
> ...
> (ch0:ahc0:0:4:0): MODE SENSE(06). CDB: 1a 0 1d 0 20 0
> (ch0:ahc0:0:4:0): UNIT ATTENTION asc:29,0
> (ch0:ahc0:0:4:0): Power on, reset, or bus device reset occurred
> (ch0:ahc0:0:4:0): fatal error, failed to attach to device
> (ch0:ahc0:0:4:0): lost device
> (ch0:ahc0:0:4:0): removing device entry
> ...
> Fatal trap 12: page fault while in kernel mode
> fault virtual address   = 0x64
> fault code              = supervisor write, page not present
> instruction pointer     = 0x8:0xc011f124
> stack pointer           = 0x10:0xc02635dc
> frame pointer           = 0x10:0xc02635ec
> code segment            = base 0x0, limit 0xfffff, type 0x1b
>                         = DPL 0, pres 1, def32 1, gran 1
> processor eflags        = interrupt enabled, resume, IOPL = 0
> current process         = Idle
> interrupt mask          = cam
> kernel: type 12 trap, code=0
> Stopped at      xpt_release_ccb+0x20:   decl    0x20(%eax)
> db> tr
> xpt_release_ccb(c0a22800,c0a22800,c09d1400,40080400,c0738700) at xpt_release_ccb+0x20
> chdone(c0a2dc80,c0a22800,c0738a40,40000400,ffffffff) at chdone+0x26e
> camisr(c0280e10,c02636ac,c021d4a0,40000400,c022d3d2) at camisr+0x1eb
> swi_cambio(40000400,c022d3d2,c022cefb,40000400,c09be600) at swi_cambio+0xd
> splz_swi(c0738a80,0,c00f0010,1cd0010,c7d00010) at splz_swi+0x14
> Xresume10() at Xresume10+0x2b
> --- interrupt, eip = 0xc0226552, esp = 0xc02636f4, ebp = 0 ---
> default_halt() at default_halt+0x2

Okay, I think I know why the panic is happening at least.  I don't
understand why you're getting the unit attention error, though, since we
generally retry unit attention errors unconditionally.

Try the attached patch and see if it fixes things up.  Once you've got the
panic fixed, we can attempt to solve the unit attention problem.

The comment in the patch should explain what is wrong.  This looks like an
oversight -- I didn't quite fix this driver completely when I went around
fixing the way we handle probe failures with drivers that require an extra
command to complete on probe.  (e.g. da, cd, ch)

> > My guess is that the problem is that the mode sense that gets fired off on
> > probe is failing since the drive isn't initialized.
> > 
> > So the question is, how does it fail?  Does the command time out (you might
> > get a "timed out ..." message from the Adaptec driver), or does it just
> > return an error message and fail to attach?
> 
> Looks like chdone() is releasing a CCB that someone else has already 
> thrown away.  I'm not familiar enough with who-does-what in the CAM stack 
> to be any more certain about this - the error handling there is a bit 
> convoluted. 8)

Yeah, it is somewhat tricky.  Justin has a rewrite in the works, though.

> > IIRC, I've heard bad things about SpectraLogic changers, so I'm not sure
> > this is a surprise.
> 
> I've certainly heard mixed reports; I've been reasonably happy with this 
> unit so far in terms of it actually working once I worked out how to get 
> it going.  It loads and unloads fine, the mechanicals seem to be quite 
> robust, and basically "it just works" with the one exception above.  I'm 
> sure that I'll have more to say about it later, but for $450.oo I'm not 
> going to complain _too_ loudly. 8)

Well, hopefully we can get it at least functional. :)

Ken
-- 
Kenneth Merry
ken@kdm.org

--r5Pyd7+fXNt84Ff3
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="scsi_ch.c.patch.20000229"

==== //depot/FreeBSD-ken/src/sys/cam/scsi/scsi_ch.c#3 - /a/ken/perforce/FreeBSD-ken/src/sys/cam/scsi/scsi_ch.c ====
*** /tmp/tmp.27814.0	Tue Feb 29 22:44:32 2000
--- /a/ken/perforce/FreeBSD-ken/src/sys/cam/scsi/scsi_ch.c	Tue Feb 29 22:44:15 2000
***************
*** 688,695 ****
  			xpt_announce_periph(periph, announce_buf);
  		softc->state = CH_STATE_NORMAL;
  		free(mode_header, M_TEMP);
  		cam_periph_unlock(periph);
! 		break;
  	}
  	case CH_CCB_WAITING:
  	{
--- 688,704 ----
  			xpt_announce_periph(periph, announce_buf);
  		softc->state = CH_STATE_NORMAL;
  		free(mode_header, M_TEMP);
+ 		/*
+ 		 * Since our peripheral may be invalidated by an error
+ 		 * above or an external event, we must release our CCB
+ 		 * before releasing the probe lock on the peripheral.
+ 		 * The peripheral will only go away once the last lock
+ 		 * is removed, and we need it around for the CCB release
+ 		 * operation.
+ 		 */
+ 		xpt_release_ccb(done_ccb);
  		cam_periph_unlock(periph);
! 		return;
  	}
  	case CH_CCB_WAITING:
  	{
***************
*** 697,702 ****
--- 706,713 ----
  		wakeup(&done_ccb->ccb_h.cbfcnp);
  		return;
  	}
+ 	default:
+ 		break;
  	}
  	xpt_release_ccb(done_ccb);
  }

--r5Pyd7+fXNt84Ff3--


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-scsi" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20000229225036.B27747>