From owner-freebsd-scsi@FreeBSD.ORG  Thu Aug 10 19:35:38 2006
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
X-Original-To: freebsd-scsi@freebsd.org
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 46DA516A4DE
	for <freebsd-scsi@freebsd.org>; Thu, 10 Aug 2006 19:35:38 +0000 (UTC)
	(envelope-from anderson@centtech.com)
Received: from mh1.centtech.com (moat3.centtech.com [207.200.51.50])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 5820E43D46
	for <freebsd-scsi@freebsd.org>; Thu, 10 Aug 2006 19:35:37 +0000 (GMT)
	(envelope-from anderson@centtech.com)
Received: from [10.177.171.220] (neutrino.centtech.com [10.177.171.220])
	by mh1.centtech.com (8.13.1/8.13.1) with ESMTP id k7AJZa3K031584
	for <freebsd-scsi@freebsd.org>; Thu, 10 Aug 2006 14:35:36 -0500 (CDT)
	(envelope-from anderson@centtech.com)
Message-ID: <44DB8A9C.8090609@centtech.com>
Date: Thu, 10 Aug 2006 14:35:56 -0500
From: Eric Anderson <anderson@centtech.com>
User-Agent: Thunderbird 1.5.0.5 (X11/20060802)
MIME-Version: 1.0
To: freebsd-scsi@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Virus-Scanned: ClamAV 0.87.1/1644/Wed Aug 9 22:55:42 2006 on mh1.centtech.com
X-Virus-Status: Clean
Subject: isp issues on recent -STABLE
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 10 Aug 2006 19:35:38 -0000

Lately (the past week or so), I've been having a lot of trouble with one 
  of my servers.  The system has two QLogic (2312) cards in it, only one 
connected to the storage (via fiber channel switch).

Basically, under heavy disk load, I get mass warnings to the console, 
and then the system hangs, unpingable.  Hitting the power button 
(sending ACPI power down) doesn't do anything, except for a warning.

I'm running -STABLE as of about 2 days ago, but prior to that I was 
running from about early June time frame.

The lock-up happens nearly daily, when my backups are running (using 
rsync), so I'm sure it will happen again tonight.  I've got the debugger 
and all enabled in the kernel, but I couldn't seem to break into it last 
time it died.

I know there have been recent changes to the isp driver, so I'm 
wondering if it's related.  I may try reverting back to older -stable 
and see if it goes away.  In the mean time, any suggestions for debugging?

Eric


-- 
------------------------------------------------------------------------
Eric Anderson        Sr. Systems Administrator        Centaur Technology
Anything that works is better than anything that doesn't.
------------------------------------------------------------------------

From owner-freebsd-scsi@FreeBSD.ORG  Fri Aug 11 11:52:44 2006
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
X-Original-To: freebsd-scsi@freebsd.org
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 2D19016A4DA
	for <freebsd-scsi@freebsd.org>; Fri, 11 Aug 2006 11:52:44 +0000 (UTC)
	(envelope-from anderson@centtech.com)
Received: from mh1.centtech.com (moat3.centtech.com [207.200.51.50])
	by mx1.FreeBSD.org (Postfix) with ESMTP id C6C6743D45
	for <freebsd-scsi@freebsd.org>; Fri, 11 Aug 2006 11:52:43 +0000 (GMT)
	(envelope-from anderson@centtech.com)
Received: from [10.177.171.220] (neutrino.centtech.com [10.177.171.220])
	by mh1.centtech.com (8.13.1/8.13.1) with ESMTP id k7BBqgDV086239
	for <freebsd-scsi@freebsd.org>; Fri, 11 Aug 2006 06:52:42 -0500 (CDT)
	(envelope-from anderson@centtech.com)
Message-ID: <44DC6F9F.4060405@centtech.com>
Date: Fri, 11 Aug 2006 06:53:03 -0500
From: Eric Anderson <anderson@centtech.com>
User-Agent: Thunderbird 1.5.0.5 (X11/20060802)
MIME-Version: 1.0
To: freebsd-scsi@freebsd.org
References: <44DB8A9C.8090609@centtech.com>
In-Reply-To: <44DB8A9C.8090609@centtech.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Virus-Scanned: ClamAV 0.87.1/1646/Fri Aug 11 04:51:17 2006 on
	mh1.centtech.com
X-Virus-Status: Clean
Subject: Re: isp issues on recent -STABLE
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 11 Aug 2006 11:52:44 -0000

On 08/10/06 14:35, Eric Anderson wrote:
> Lately (the past week or so), I've been having a lot of trouble with one 
>   of my servers.  The system has two QLogic (2312) cards in it, only one 
> connected to the storage (via fiber channel switch).
> 
> Basically, under heavy disk load, I get mass warnings to the console, 
> and then the system hangs, unpingable.  Hitting the power button 
> (sending ACPI power down) doesn't do anything, except for a warning.
> 
> I'm running -STABLE as of about 2 days ago, but prior to that I was 
> running from about early June time frame.
> 
> The lock-up happens nearly daily, when my backups are running (using 
> rsync), so I'm sure it will happen again tonight.  I've got the debugger 
> and all enabled in the kernel, but I couldn't seem to break into it last 
> time it died.
> 
> I know there have been recent changes to the isp driver, so I'm 
> wondering if it's related.  I may try reverting back to older -stable 
> and see if it goes away.  In the mean time, any suggestions for debugging?
> 
> Eric
> 
> 


Just to follow up with more details, here's the messages I get before 
the lock up:

[..snip..]
Aug  9 23:02:10 snapshot1 kernel: isp0: command timed out for 0.2.2
Aug  9 23:02:10 snapshot1 kernel: (da8:isp0:0:2:2): Command timed out
Aug  9 23:02:10 snapshot1 kernel: (da8:isp0:0:2:2): Retrying Command
Aug  9 23:26:58 snapshot1 kernel: (da3:isp0:0:1:0): Queue Full
Aug  9 23:26:58 snapshot1 kernel: (da3:isp0:0:1:0): tagged openings now 254
Aug  9 23:26:58 snapshot1 kernel: (da3:isp0:0:1:0): Retrying Command
Aug  9 23:26:58 snapshot1 kernel: (da3:isp0:0:1:0): Queue Full
Aug  9 23:26:58 snapshot1 kernel: (da3:isp0:0:1:0): tagged openings now 253
Aug  9 23:26:58 snapshot1 kernel: (da3:isp0:0:1:0): Retrying Command
Aug  9 23:26:58 snapshot1 kernel: (da3:isp0:0:1:0): Queue Full
Aug  9 23:26:58 snapshot1 kernel: (da3:isp0:0:1:0): tagged openings now 252
[..continuing in this pattern..]
Aug 10 00:46:10 snapshot1 kernel: isp0: command timed out for 0.2.2
Aug 10 00:46:10 snapshot1 kernel: (da8:isp0:0:2:2): Command timed out
Aug 10 00:46:10 snapshot1 kernel: (da8:isp0:0:2:2): Retrying Command
Aug 10 00:52:18 snapshot1 kernel: (da8:isp0:0:2:2): Queue Full
Aug 10 00:52:18 snapshot1 kernel: (da8:isp0:0:2:2): tagged openings now 96
[..snip..]
Aug 10 00:52:18 snapshot1 kernel: (da8:isp0:0:2:2): Retrying Command
Aug 10 00:52:18 snapshot1 kernel: (da8:isp0:0:2:2): Queue Full
Aug 10 00:52:18 snapshot1 kernel: (da8:isp0:0:2:2): tagged openings now 12
Aug 10 00:52:18 snapshot1 kernel: (da8:isp0:0:2:2): Retrying Command
Aug 10 01:07:30 snapshot1 kernel: isp0: command timed out for 0.2.1
Aug 10 01:07:30 snapshot1 kernel: (da7:isp0:0:2:1): Command timed out
Aug 10 01:07:30 snapshot1 kernel: (da7:isp0:0:2:1): Retrying Command
Aug 10 01:07:30 snapshot1 kernel: isp0: command timed out for 0.2.1
Aug 10 01:07:30 snapshot1 kernel: (da7:isp0:0:2:1): Command timed out
Aug 10 01:07:30 snapshot1 kernel: (da7:isp0:0:2:1): Retrying Command
Aug 10 01:24:16 snapshot1 kernel: (da2:isp0:0:0:2): Queue Full
Aug 10 01:24:16 snapshot1 kernel: (da2:isp0:0:0:2): tagged openings now 254
Aug 10 01:24:16 snapshot1 kernel: (da2:isp0:0:0:2): Retrying Command
Aug 10 01:24:16 snapshot1 kernel: (da2:isp0:0:0:2): Queue Full
Aug 10 01:24:16 snapshot1 kernel: (da2:isp0:0:0:2): tagged openings now 253
Aug 10 01:24:16 snapshot1 kernel: (da2:isp0:0:0:2): Retrying Command
Aug 10 02:00:24 snapshot1 kernel: isp0: command timed out for 0.2.1
Aug 10 02:00:24 snapshot1 kernel: (da7:isp0:0:2:1): Command timed out
Aug 10 02:00:24 snapshot1 kernel: (da7:isp0:0:2:1): Retrying Command
Aug 10 02:00:24 snapshot1 kernel: isp0: command timed out for 0.2.1
Aug 10 02:00:24 snapshot1 kernel: (da7:isp0:0:2:1): Command timed out
Aug 10 02:00:24 snapshot1 kernel: (da7:isp0:0:2:1): Retrying Command
Aug 10 02:27:53 snapshot1 kernel: (da5:isp0:0:1:2): Queue Full
Aug 10 02:27:53 snapshot1 kernel: (da5:isp0:0:1:2): tagged openings now 254
Aug 10 02:27:53 snapshot1 kernel: (da5:isp0:0:1:2): Retrying Command
Aug 10 02:27:53 snapshot1 kernel: (da5:isp0:0:1:2): Queue Full
Aug 10 02:27:53 snapshot1 kernel: (da5:isp0:0:1:2): tagged openings now 253
Aug 10 02:27:53 snapshot1 kernel: (da5:isp0:0:1:2): Retrying Command
Aug 10 02:27:53 snapshot1 kernel: (da5:isp0:0:1:2): Queue Full
Aug 10 02:27:53 snapshot1 kernel: (da5:isp0:0:1:2): tagged openings now 252
Aug 10 02:27:53 snapshot1 kernel: (da5:isp0:0:1:2): Retrying Command
Aug 10 02:27:53 snapshot1 kernel: (da5:isp0:0:1:2): Queue Full
Aug 10 02:27:53 snapshot1 kernel: (da5:isp0:0:1:2): tagged openings now 251
Aug 10 02:27:53 snapshot1 kernel: (da5:isp0:0:1:2): Retrying Command
Aug 10 02:27:53 snapshot1 kernel: (da5:isp0:0:1:2): Queue Full
Aug 10 02:27:53 snapshot1 kernel: (da5:isp0:0:1:2): tagged openings now 250
Aug 10 02:27:53 snapshot1 kernel: (da5:isp0:0:1:2): Retrying Command
Aug 10 02:27:53 snapshot1 kernel: (da5:isp0:0:1:2): Queue Full
Aug 10 02:27:53 snapshot1 kernel: (da5:isp0:0:1:2): tagged openings now 249
Aug 10 02:27:53 snapshot1 kernel: (da5:isp0:0:1:2): Retrying Command
Aug 10 02:27:53 snapshot1 kernel: (da5:isp0:0:1:2): Queue Full
Aug 10 02:27:53 snapshot1 kernel: (da5:isp0:0:1:2): tagged openings now 248
Aug 10 02:27:53 snapshot1 kernel: (da5:isp0:0:1:2): Retrying Command
Aug 10 02:27:53 snapshot1 kernel: (da5:isp0:0:1:2): Queue Full
Aug 10 02:27:53 snapshot1 kernel: (da5:isp0:0:1:2): tagged openings now 247
Aug 10 02:27:53 snapshot1 kernel: (da5:isp0:0:1:2): Retrying Command
Aug 10 02:27:53 snapshot1 kernel: (da5:isp0:0:1:2): Queue Full
Aug 10 02:27:53 snapshot1 kernel: (da5:isp0:0:1:2): tagged openings now 246
Aug 10 02:27:53 snapshot1 kernel: (da5:isp0:0:1:2): Retrying Command
Aug 10 02:31:38 snapshot1 kernel: isp0: command timed out for 0.2.2
Aug 10 02:31:38 snapshot1 kernel: (da8:isp0:0:2:2): Command timed out
Aug 10 02:31:38 snapshot1 kernel: (da8:isp0:0:2:2): Retrying Command
Aug 10 02:31:38 snapshot1 kernel: isp0: command timed out for 0.2.2
Aug 10 02:31:38 snapshot1 kernel: (da8:isp0:0:2:2): Command timed out
Aug 10 02:31:38 snapshot1 kernel: (da8:isp0:0:2:2): Retrying Command
Aug 10 02:42:14 snapshot1 kernel: isp0: command timed out for 0.2.1
Aug 10 02:42:14 snapshot1 kernel: (da7:isp0:0:2:1): Command timed out
Aug 10 02:42:14 snapshot1 kernel: (da7:isp0:0:2:1): Retrying Command
[..machine locks up around 03:08..]

This happened while doing an fsck on one of the filesystems on one of 
these devices (I can't recall which).

Eric


-- 
------------------------------------------------------------------------
Eric Anderson        Sr. Systems Administrator        Centaur Technology
Anything that works is better than anything that doesn't.
------------------------------------------------------------------------

From owner-freebsd-scsi@FreeBSD.ORG  Fri Aug 11 15:43:54 2006
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
X-Original-To: freebsd-scsi@freebsd.org
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 00A0A16A4DD
	for <freebsd-scsi@freebsd.org>; Fri, 11 Aug 2006 15:43:54 +0000 (UTC)
	(envelope-from geoffb@chuggalug.clues.com)
Received: from chuggalug.clues.com (chuggalug2.demon.co.uk [83.104.169.191])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 5016043D49
	for <freebsd-scsi@freebsd.org>; Fri, 11 Aug 2006 15:43:52 +0000 (GMT)
	(envelope-from geoffb@chuggalug.clues.com)
Received: from chuggalug.clues.com (localhost [127.0.0.1])
	by chuggalug.clues.com (8.12.10/8.12.10) with ESMTP id k7BFhmKP084148; 
	Fri, 11 Aug 2006 15:43:48 GMT
	(envelope-from geoffb@chuggalug.clues.com)
Received: (from geoffb@localhost)
	by chuggalug.clues.com (8.12.10/8.12.10/Submit) id k7BFhmkw084147;
	Fri, 11 Aug 2006 15:43:48 GMT (envelope-from geoffb)
Date: Fri, 11 Aug 2006 15:43:48 +0000
From: Geoff Buckingham <geoffb@chuggalug.clues.com>
To: Eric Anderson <anderson@centtech.com>
Message-ID: <20060811154348.GA83765@chuggalug.clues.com>
References: <44DB8A9C.8090609@centtech.com> <44DC6F9F.4060405@centtech.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <44DC6F9F.4060405@centtech.com>
User-Agent: Mutt/1.4.1i
Cc: freebsd-scsi@freebsd.org
Subject: Re: isp issues on recent -STABLE
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 11 Aug 2006 15:43:54 -0000

On Fri, Aug 11, 2006 at 06:53:03AM -0500, Eric Anderson wrote:
> [..snip..]
> Aug  9 23:02:10 snapshot1 kernel: isp0: command timed out for 0.2.2
> Aug  9 23:02:10 snapshot1 kernel: (da8:isp0:0:2:2): Command timed out
> Aug  9 23:02:10 snapshot1 kernel: (da8:isp0:0:2:2): Retrying Command
> Aug  9 23:26:58 snapshot1 kernel: (da3:isp0:0:1:0): Queue Full
> Aug  9 23:26:58 snapshot1 kernel: (da3:isp0:0:1:0): tagged openings now 254

I don't what may have changed in the driver recently, but from your post
you seem to be using the FC isp and potentially a "SAN" presenting arrays as 
luns to you rather than JBOD on a loop or switch.

If you you have some kind of data mover presenting arrays, your SAN vendor 
may well recomend a maximum queue size per lun (often 20-30)

man camcontrol, look at the tags section.

Your commands may be timing out because you have managed to queue too many 
command (which I hope should not happen). Or.....

Your queues could be filling because your commands are timing out. Which
would imply something is broke :-(

From owner-freebsd-scsi@FreeBSD.ORG  Fri Aug 11 16:01:35 2006
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
X-Original-To: freebsd-scsi@freebsd.org
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id EB5F216A4DE
	for <freebsd-scsi@freebsd.org>; Fri, 11 Aug 2006 16:01:35 +0000 (UTC)
	(envelope-from anderson@centtech.com)
Received: from mh2.centtech.com (moat3.centtech.com [207.200.51.50])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 7BDC343D45
	for <freebsd-scsi@freebsd.org>; Fri, 11 Aug 2006 16:01:35 +0000 (GMT)
	(envelope-from anderson@centtech.com)
Received: from [10.177.171.220] (neutrino.centtech.com [10.177.171.220])
	by mh2.centtech.com (8.13.1/8.13.1) with ESMTP id k7BG1WA6018701;
	Fri, 11 Aug 2006 11:01:32 -0500 (CDT)
	(envelope-from anderson@centtech.com)
Message-ID: <44DCA9F1.1000302@centtech.com>
Date: Fri, 11 Aug 2006 11:01:53 -0500
From: Eric Anderson <anderson@centtech.com>
User-Agent: Thunderbird 1.5.0.5 (X11/20060802)
MIME-Version: 1.0
To: Geoff Buckingham <geoffb@chuggalug.clues.com>
References: <44DB8A9C.8090609@centtech.com> <44DC6F9F.4060405@centtech.com>
	<20060811154348.GA83765@chuggalug.clues.com>
In-Reply-To: <20060811154348.GA83765@chuggalug.clues.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Virus-Scanned: ClamAV 0.87.1/1646/Fri Aug 11 04:51:17 2006 on
	mh2.centtech.com
X-Virus-Status: Clean
Cc: freebsd-scsi@freebsd.org
Subject: Re: isp issues on recent -STABLE
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 11 Aug 2006 16:01:36 -0000

On 08/11/06 10:43, Geoff Buckingham wrote:
> On Fri, Aug 11, 2006 at 06:53:03AM -0500, Eric Anderson wrote:
>> [..snip..]
>> Aug  9 23:02:10 snapshot1 kernel: isp0: command timed out for 0.2.2
>> Aug  9 23:02:10 snapshot1 kernel: (da8:isp0:0:2:2): Command timed out
>> Aug  9 23:02:10 snapshot1 kernel: (da8:isp0:0:2:2): Retrying Command
>> Aug  9 23:26:58 snapshot1 kernel: (da3:isp0:0:1:0): Queue Full
>> Aug  9 23:26:58 snapshot1 kernel: (da3:isp0:0:1:0): tagged openings now 254
> 
> I don't what may have changed in the driver recently, but from your post
> you seem to be using the FC isp and potentially a "SAN" presenting arrays as 
> luns to you rather than JBOD on a loop or switch.
> 
> If you you have some kind of data mover presenting arrays, your SAN vendor 
> may well recomend a maximum queue size per lun (often 20-30)

I have this one host connected to a single QLogic fiber channel switch, 
which has 5 ACNC fiber channel arrays attached to it, along with a tape 
robot and tape drives.  Three of the arrays present 3 LUNs each (2TB per 
LUN), and two of them present 2 LUNs each, 4GB and 10TB - I'm not using 
these two arrays much yet, and are not really associated with the problems.

> man camcontrol, look at the tags section.
> 
> Your commands may be timing out because you have managed to queue too many 
> command (which I hope should not happen). Or.....
> 
> Your queues could be filling because your commands are timing out. Which
> would imply something is broke :-(

Strange that I've never hit this in the past, but now I seem to be 
hitting it quite often.  The vendor of the arrays says queue depth is 
256 per LUN, and that coincides with my messages above I believe.

Eric


-- 
------------------------------------------------------------------------
Eric Anderson        Sr. Systems Administrator        Centaur Technology
Anything that works is better than anything that doesn't.
------------------------------------------------------------------------

From owner-freebsd-scsi@FreeBSD.ORG  Sat Aug 12 00:36:16 2006
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
X-Original-To: freebsd-scsi@freebsd.org
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 1129616A4E0
	for <freebsd-scsi@freebsd.org>; Sat, 12 Aug 2006 00:36:16 +0000 (UTC)
	(envelope-from jrhett@svcolo.com)
Received: from outbound0.sv.meer.net (outbound0.mx.meer.net [209.157.153.23])
	by mx1.FreeBSD.org (Postfix) with ESMTP id AF54C43D49
	for <freebsd-scsi@freebsd.org>; Sat, 12 Aug 2006 00:36:15 +0000 (GMT)
	(envelope-from jrhett@svcolo.com)
Received: from mail.meer.net (mail.meer.net [209.157.152.14])
	by outbound0.sv.meer.net (8.12.10/8.12.6) with ESMTP id k7C0aFih083418
	for <freebsd-scsi@freebsd.org>; Fri, 11 Aug 2006 17:36:15 -0700 (PDT)
	(envelope-from jrhett@svcolo.com)
Received: from [10.66.240.106] (public-wireless.sv.svcolo.com [64.13.135.30])
	by mail.meer.net (8.13.3/8.13.3/meer) with ESMTP id k7C0ZGfh044335
	for <freebsd-scsi@freebsd.org>; Fri, 11 Aug 2006 17:35:16 -0700 (PDT)
	(envelope-from jrhett@svcolo.com)
Mime-Version: 1.0 (Apple Message framework v752.2)
Content-Transfer-Encoding: 7bit
Message-Id: <27BE7ACB-BCF9-40A5-96F4-CE90A297CE71@svcolo.com>
Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed
To: freebsd-scsi@freebsd.org
From: Jo Rhett <jrhett@svcolo.com>
Date: Fri, 11 Aug 2006 17:35:15 -0700
X-Mailer: Apple Mail (2.752.2)
Subject: myl driver failing during server shutdown
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 12 Aug 2006 00:36:16 -0000

So I had thought that my motherboard didn't honor the acpi reset or  
power down command.  It turns out that it does just fine -- but the  
shutdown is failing/hanging.  Attaching a serial console to it, I see  
this:

Waiting (max 60 seconds) for system process `syncer' to stop...
Syncing disks, vnodes remaining...3 0 2 0 0 done
All buffers synced.
Uptime: 6d18h12m6s
(da0:mly0:1:0:0): Synchronize cache failed, status == 0xb, scsi  
status == 0x0
mly0: flushing cache...kernel trap 12 with interrupts disabled


Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0x0
fault code              = supervisor read, page not present
instruction pointer     = 0x20:0x0
stack pointer           = 0x28:0xe25c1ac0
frame pointer           = 0x28:0x0
code segment            = base 0x0, limit 0xfffff, type 0x1b
                         = DPL 0, pres 1, def32 1, gran 1
processor eflags        = resume, IOPL = 0
current process         = 1 (init)
trap number             = 12
panic: page fault
Uptime: 6d19h12m38s
(da0:mly0:1:0:0): Synchronize cache failed, status == 0xb, scsi  
status == 0x0
Dumping 991 MB (2 chunks)
Aborting dump due to I/O error.
status == 0xb, scsi status == 0x0

** DUMP FAILED (ERROR 5) **

This is 100% reproducable.  Anyone have any ideas where to start on  
this problem?  What does this error mean?

Note: if you want to debug this, I can provide root access.  It's  
just a personal box :-)

-- 
Jo Rhett
senior geek
Silicon Valley Colocation


From owner-freebsd-scsi@FreeBSD.ORG  Sat Aug 12 03:27:28 2006
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
X-Original-To: freebsd-scsi@freebsd.org
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id E4D6016A4DF
	for <freebsd-scsi@freebsd.org>; Sat, 12 Aug 2006 03:27:28 +0000 (UTC)
	(envelope-from scottl@samsco.org)
Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 2DB6F43D45
	for <freebsd-scsi@freebsd.org>; Sat, 12 Aug 2006 03:27:27 +0000 (GMT)
	(envelope-from scottl@samsco.org)
Received: from [192.168.254.11] (phobos.samsco.home [192.168.254.11])
	(authenticated bits=0)
	by pooker.samsco.org (8.13.4/8.13.4) with ESMTP id k7C3RK1N005782;
	Fri, 11 Aug 2006 21:27:26 -0600 (MDT)
	(envelope-from scottl@samsco.org)
Message-ID: <44DD4A97.2050203@samsco.org>
Date: Fri, 11 Aug 2006 21:27:19 -0600
From: Scott Long <scottl@samsco.org>
User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US;
	rv:1.7.13) Gecko/20060414
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: Jo Rhett <jrhett@svcolo.com>
References: <27BE7ACB-BCF9-40A5-96F4-CE90A297CE71@svcolo.com>
In-Reply-To: <27BE7ACB-BCF9-40A5-96F4-CE90A297CE71@svcolo.com>
Content-Type: multipart/mixed; boundary="------------010706020105070808080501"
X-Spam-Status: No, score=-1.4 required=3.8 tests=ALL_TRUSTED autolearn=failed 
	version=3.1.1
X-Spam-Checker-Version: SpamAssassin 3.1.1 (2006-03-10) on pooker.samsco.org
Cc: freebsd-scsi@freebsd.org
Subject: Re: myl driver failing during server shutdown
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 12 Aug 2006 03:27:29 -0000

This is a multi-part message in MIME format.
--------------010706020105070808080501
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit

Jo Rhett wrote:
> So I had thought that my motherboard didn't honor the acpi reset or  
> power down command.  It turns out that it does just fine -- but the  
> shutdown is failing/hanging.  Attaching a serial console to it, I see  
> this:
> 
> Waiting (max 60 seconds) for system process `syncer' to stop...
> Syncing disks, vnodes remaining...3 0 2 0 0 done
> All buffers synced.
> Uptime: 6d18h12m6s
> (da0:mly0:1:0:0): Synchronize cache failed, status == 0xb, scsi  status 
> == 0x0
> mly0: flushing cache...kernel trap 12 with interrupts disabled
> 
> 
> Fatal trap 12: page fault while in kernel mode
> fault virtual address   = 0x0
> fault code              = supervisor read, page not present
> instruction pointer     = 0x20:0x0
> stack pointer           = 0x28:0xe25c1ac0
> frame pointer           = 0x28:0x0
> code segment            = base 0x0, limit 0xfffff, type 0x1b
>                         = DPL 0, pres 1, def32 1, gran 1
> processor eflags        = resume, IOPL = 0
> current process         = 1 (init)
> trap number             = 12
> panic: page fault
> Uptime: 6d19h12m38s
> (da0:mly0:1:0:0): Synchronize cache failed, status == 0xb, scsi  status 
> == 0x0
> Dumping 991 MB (2 chunks)
> Aborting dump due to I/O error.
> status == 0xb, scsi status == 0x0
> 
> ** DUMP FAILED (ERROR 5) **
> 
> This is 100% reproducable.  Anyone have any ideas where to start on  
> this problem?  What does this error mean?
> 
> Note: if you want to debug this, I can provide root access.  It's  just 
> a personal box :-)
> 

Give this (untested) patch a try.  If that doesn't work, it's going to
need a lot more digging, and I unfortunately don't have the time for
that right now.

Scott


--------------010706020105070808080501
Content-Type: text/plain; x-mac-type="0"; x-mac-creator="0";
 name="mly.diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="mly.diff"

Index: mly.c
===================================================================
RCS file: /usr/ncvs/src/sys/dev/mly/mly.c,v
retrieving revision 1.39
diff -u -r1.39 mly.c
--- mly.c	8 Aug 2005 12:23:26 -0000	1.39
+++ mly.c	10 Aug 2006 11:57:54 -0000
@@ -1128,9 +1128,12 @@
 	    mc->mc_data = *data;
 	    mc->mc_flags |= MLY_CMD_DATAOUT;
 	}
-	mc->mc_length = datasize;
-	mc->mc_packet->generic.data_size = datasize;
+    } else if (datasize != 0) {
+	error = EINVAL;
+	goto out;
     }
+    mc->mc_length = datasize;
+    mc->mc_packet->generic.data_size = datasize;
     
     /* run the command */
     if ((error = mly_immediate_command(mc)))

--------------010706020105070808080501--

From owner-freebsd-scsi@FreeBSD.ORG  Sat Aug 12 18:02:06 2006
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
X-Original-To: freebsd-scsi@freebsd.org
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 7481A16A4DD
	for <freebsd-scsi@freebsd.org>; Sat, 12 Aug 2006 18:02:06 +0000 (UTC)
	(envelope-from lydianconcepts@gmail.com)
Received: from nf-out-0910.google.com (nf-out-0910.google.com [64.233.182.186])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 57E8943D5D
	for <freebsd-scsi@freebsd.org>; Sat, 12 Aug 2006 18:01:51 +0000 (GMT)
	(envelope-from lydianconcepts@gmail.com)
Received: by nf-out-0910.google.com with SMTP id g2so1468569nfe
	for <freebsd-scsi@freebsd.org>; Sat, 12 Aug 2006 11:01:50 -0700 (PDT)
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com;
	h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references;
	b=fNmF+yncSMtcGSzy/k/6Sywkm0sOBlw7cqGDnueux0BkiAyMk9FD3LMoM1mqOlqlai/NmCJ5sCdlhiVGlN1GI4HHN2ltxvCp1CevDkLQrezRtTCRmDgp/ZYR8SqNl6rTJSWBpo8CO71oreMF/3GSJSo2ICWG7vsv5//9eiTU7FE=
Received: by 10.78.127.6 with SMTP id z6mr2662719huc;
	Sat, 12 Aug 2006 11:01:50 -0700 (PDT)
Received: by 10.78.134.9 with HTTP; Sat, 12 Aug 2006 11:01:49 -0700 (PDT)
Message-ID: <7579f7fb0608121101g112e006cy1112d282fab753d3@mail.gmail.com>
Date: Sat, 12 Aug 2006 11:01:49 -0700
From: "Matthew Jacob" <lydianconcepts@gmail.com>
To: "Eric Anderson" <anderson@centtech.com>
In-Reply-To: <44DCA9F1.1000302@centtech.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
References: <44DB8A9C.8090609@centtech.com> <44DC6F9F.4060405@centtech.com>
	<20060811154348.GA83765@chuggalug.clues.com>
	<44DCA9F1.1000302@centtech.com>
Cc: freebsd-scsi@freebsd.org
Subject: Re: isp issues on recent -STABLE
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 12 Aug 2006 18:02:06 -0000

Hmm. I have no special help on this one. This doesn't seem
*particularly* related to any changes I've made recently.

If you could isolate a date/change when this occurred for you it would help.

All I see in the messages below is indications that we've given your
storage way too much work to do.

On 8/11/06, Eric Anderson <anderson@centtech.com> wrote:
> On 08/11/06 10:43, Geoff Buckingham wrote:
> > On Fri, Aug 11, 2006 at 06:53:03AM -0500, Eric Anderson wrote:
> >> [..snip..]
> >> Aug  9 23:02:10 snapshot1 kernel: isp0: command timed out for 0.2.2
> >> Aug  9 23:02:10 snapshot1 kernel: (da8:isp0:0:2:2): Command timed out
> >> Aug  9 23:02:10 snapshot1 kernel: (da8:isp0:0:2:2): Retrying Command
> >> Aug  9 23:26:58 snapshot1 kernel: (da3:isp0:0:1:0): Queue Full
> >> Aug  9 23:26:58 snapshot1 kernel: (da3:isp0:0:1:0): tagged openings now 254
> >
> > I don't what may have changed in the driver recently, but from your post
> > you seem to be using the FC isp and potentially a "SAN" presenting arrays as
> > luns to you rather than JBOD on a loop or switch.
> >
> > If you you have some kind of data mover presenting arrays, your SAN vendor
> > may well recomend a maximum queue size per lun (often 20-30)
>
> I have this one host connected to a single QLogic fiber channel switch,
> which has 5 ACNC fiber channel arrays attached to it, along with a tape
> robot and tape drives.  Three of the arrays present 3 LUNs each (2TB per
> LUN), and two of them present 2 LUNs each, 4GB and 10TB - I'm not using
> these two arrays much yet, and are not really associated with the problems.
>
> > man camcontrol, look at the tags section.
> >
> > Your commands may be timing out because you have managed to queue too many
> > command (which I hope should not happen). Or.....
> >
> > Your queues could be filling because your commands are timing out. Which
> > would imply something is broke :-(
>
> Strange that I've never hit this in the past, but now I seem to be
> hitting it quite often.  The vendor of the arrays says queue depth is
> 256 per LUN, and that coincides with my messages above I believe.
>
> Eric
>
>
>
> --
> ------------------------------------------------------------------------
> Eric Anderson        Sr. Systems Administrator        Centaur Technology
> Anything that works is better than anything that doesn't.
> ------------------------------------------------------------------------
> _______________________________________________
> freebsd-scsi@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-scsi
> To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org"
>