From owner-freebsd-scsi@FreeBSD.ORG Thu Aug 10 19:35:38 2006 Return-Path: X-Original-To: freebsd-scsi@freebsd.org Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 46DA516A4DE for ; Thu, 10 Aug 2006 19:35:38 +0000 (UTC) (envelope-from anderson@centtech.com) Received: from mh1.centtech.com (moat3.centtech.com [207.200.51.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5820E43D46 for ; Thu, 10 Aug 2006 19:35:37 +0000 (GMT) (envelope-from anderson@centtech.com) Received: from [10.177.171.220] (neutrino.centtech.com [10.177.171.220]) by mh1.centtech.com (8.13.1/8.13.1) with ESMTP id k7AJZa3K031584 for ; Thu, 10 Aug 2006 14:35:36 -0500 (CDT) (envelope-from anderson@centtech.com) Message-ID: <44DB8A9C.8090609@centtech.com> Date: Thu, 10 Aug 2006 14:35:56 -0500 From: Eric Anderson User-Agent: Thunderbird 1.5.0.5 (X11/20060802) MIME-Version: 1.0 To: freebsd-scsi@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.87.1/1644/Wed Aug 9 22:55:42 2006 on mh1.centtech.com X-Virus-Status: Clean Subject: isp issues on recent -STABLE X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Aug 2006 19:35:38 -0000 Lately (the past week or so), I've been having a lot of trouble with one of my servers. The system has two QLogic (2312) cards in it, only one connected to the storage (via fiber channel switch). Basically, under heavy disk load, I get mass warnings to the console, and then the system hangs, unpingable. Hitting the power button (sending ACPI power down) doesn't do anything, except for a warning. I'm running -STABLE as of about 2 days ago, but prior to that I was running from about early June time frame. The lock-up happens nearly daily, when my backups are running (using rsync), so I'm sure it will happen again tonight. I've got the debugger and all enabled in the kernel, but I couldn't seem to break into it last time it died. I know there have been recent changes to the isp driver, so I'm wondering if it's related. I may try reverting back to older -stable and see if it goes away. In the mean time, any suggestions for debugging? Eric -- ------------------------------------------------------------------------ Eric Anderson Sr. Systems Administrator Centaur Technology Anything that works is better than anything that doesn't. ------------------------------------------------------------------------ From owner-freebsd-scsi@FreeBSD.ORG Fri Aug 11 11:52:44 2006 Return-Path: X-Original-To: freebsd-scsi@freebsd.org Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2D19016A4DA for ; Fri, 11 Aug 2006 11:52:44 +0000 (UTC) (envelope-from anderson@centtech.com) Received: from mh1.centtech.com (moat3.centtech.com [207.200.51.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id C6C6743D45 for ; Fri, 11 Aug 2006 11:52:43 +0000 (GMT) (envelope-from anderson@centtech.com) Received: from [10.177.171.220] (neutrino.centtech.com [10.177.171.220]) by mh1.centtech.com (8.13.1/8.13.1) with ESMTP id k7BBqgDV086239 for ; Fri, 11 Aug 2006 06:52:42 -0500 (CDT) (envelope-from anderson@centtech.com) Message-ID: <44DC6F9F.4060405@centtech.com> Date: Fri, 11 Aug 2006 06:53:03 -0500 From: Eric Anderson User-Agent: Thunderbird 1.5.0.5 (X11/20060802) MIME-Version: 1.0 To: freebsd-scsi@freebsd.org References: <44DB8A9C.8090609@centtech.com> In-Reply-To: <44DB8A9C.8090609@centtech.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.87.1/1646/Fri Aug 11 04:51:17 2006 on mh1.centtech.com X-Virus-Status: Clean Subject: Re: isp issues on recent -STABLE X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Aug 2006 11:52:44 -0000 On 08/10/06 14:35, Eric Anderson wrote: > Lately (the past week or so), I've been having a lot of trouble with one > of my servers. The system has two QLogic (2312) cards in it, only one > connected to the storage (via fiber channel switch). > > Basically, under heavy disk load, I get mass warnings to the console, > and then the system hangs, unpingable. Hitting the power button > (sending ACPI power down) doesn't do anything, except for a warning. > > I'm running -STABLE as of about 2 days ago, but prior to that I was > running from about early June time frame. > > The lock-up happens nearly daily, when my backups are running (using > rsync), so I'm sure it will happen again tonight. I've got the debugger > and all enabled in the kernel, but I couldn't seem to break into it last > time it died. > > I know there have been recent changes to the isp driver, so I'm > wondering if it's related. I may try reverting back to older -stable > and see if it goes away. In the mean time, any suggestions for debugging? > > Eric > > Just to follow up with more details, here's the messages I get before the lock up: [..snip..] Aug 9 23:02:10 snapshot1 kernel: isp0: command timed out for 0.2.2 Aug 9 23:02:10 snapshot1 kernel: (da8:isp0:0:2:2): Command timed out Aug 9 23:02:10 snapshot1 kernel: (da8:isp0:0:2:2): Retrying Command Aug 9 23:26:58 snapshot1 kernel: (da3:isp0:0:1:0): Queue Full Aug 9 23:26:58 snapshot1 kernel: (da3:isp0:0:1:0): tagged openings now 254 Aug 9 23:26:58 snapshot1 kernel: (da3:isp0:0:1:0): Retrying Command Aug 9 23:26:58 snapshot1 kernel: (da3:isp0:0:1:0): Queue Full Aug 9 23:26:58 snapshot1 kernel: (da3:isp0:0:1:0): tagged openings now 253 Aug 9 23:26:58 snapshot1 kernel: (da3:isp0:0:1:0): Retrying Command Aug 9 23:26:58 snapshot1 kernel: (da3:isp0:0:1:0): Queue Full Aug 9 23:26:58 snapshot1 kernel: (da3:isp0:0:1:0): tagged openings now 252 [..continuing in this pattern..] Aug 10 00:46:10 snapshot1 kernel: isp0: command timed out for 0.2.2 Aug 10 00:46:10 snapshot1 kernel: (da8:isp0:0:2:2): Command timed out Aug 10 00:46:10 snapshot1 kernel: (da8:isp0:0:2:2): Retrying Command Aug 10 00:52:18 snapshot1 kernel: (da8:isp0:0:2:2): Queue Full Aug 10 00:52:18 snapshot1 kernel: (da8:isp0:0:2:2): tagged openings now 96 [..snip..] Aug 10 00:52:18 snapshot1 kernel: (da8:isp0:0:2:2): Retrying Command Aug 10 00:52:18 snapshot1 kernel: (da8:isp0:0:2:2): Queue Full Aug 10 00:52:18 snapshot1 kernel: (da8:isp0:0:2:2): tagged openings now 12 Aug 10 00:52:18 snapshot1 kernel: (da8:isp0:0:2:2): Retrying Command Aug 10 01:07:30 snapshot1 kernel: isp0: command timed out for 0.2.1 Aug 10 01:07:30 snapshot1 kernel: (da7:isp0:0:2:1): Command timed out Aug 10 01:07:30 snapshot1 kernel: (da7:isp0:0:2:1): Retrying Command Aug 10 01:07:30 snapshot1 kernel: isp0: command timed out for 0.2.1 Aug 10 01:07:30 snapshot1 kernel: (da7:isp0:0:2:1): Command timed out Aug 10 01:07:30 snapshot1 kernel: (da7:isp0:0:2:1): Retrying Command Aug 10 01:24:16 snapshot1 kernel: (da2:isp0:0:0:2): Queue Full Aug 10 01:24:16 snapshot1 kernel: (da2:isp0:0:0:2): tagged openings now 254 Aug 10 01:24:16 snapshot1 kernel: (da2:isp0:0:0:2): Retrying Command Aug 10 01:24:16 snapshot1 kernel: (da2:isp0:0:0:2): Queue Full Aug 10 01:24:16 snapshot1 kernel: (da2:isp0:0:0:2): tagged openings now 253 Aug 10 01:24:16 snapshot1 kernel: (da2:isp0:0:0:2): Retrying Command Aug 10 02:00:24 snapshot1 kernel: isp0: command timed out for 0.2.1 Aug 10 02:00:24 snapshot1 kernel: (da7:isp0:0:2:1): Command timed out Aug 10 02:00:24 snapshot1 kernel: (da7:isp0:0:2:1): Retrying Command Aug 10 02:00:24 snapshot1 kernel: isp0: command timed out for 0.2.1 Aug 10 02:00:24 snapshot1 kernel: (da7:isp0:0:2:1): Command timed out Aug 10 02:00:24 snapshot1 kernel: (da7:isp0:0:2:1): Retrying Command Aug 10 02:27:53 snapshot1 kernel: (da5:isp0:0:1:2): Queue Full Aug 10 02:27:53 snapshot1 kernel: (da5:isp0:0:1:2): tagged openings now 254 Aug 10 02:27:53 snapshot1 kernel: (da5:isp0:0:1:2): Retrying Command Aug 10 02:27:53 snapshot1 kernel: (da5:isp0:0:1:2): Queue Full Aug 10 02:27:53 snapshot1 kernel: (da5:isp0:0:1:2): tagged openings now 253 Aug 10 02:27:53 snapshot1 kernel: (da5:isp0:0:1:2): Retrying Command Aug 10 02:27:53 snapshot1 kernel: (da5:isp0:0:1:2): Queue Full Aug 10 02:27:53 snapshot1 kernel: (da5:isp0:0:1:2): tagged openings now 252 Aug 10 02:27:53 snapshot1 kernel: (da5:isp0:0:1:2): Retrying Command Aug 10 02:27:53 snapshot1 kernel: (da5:isp0:0:1:2): Queue Full Aug 10 02:27:53 snapshot1 kernel: (da5:isp0:0:1:2): tagged openings now 251 Aug 10 02:27:53 snapshot1 kernel: (da5:isp0:0:1:2): Retrying Command Aug 10 02:27:53 snapshot1 kernel: (da5:isp0:0:1:2): Queue Full Aug 10 02:27:53 snapshot1 kernel: (da5:isp0:0:1:2): tagged openings now 250 Aug 10 02:27:53 snapshot1 kernel: (da5:isp0:0:1:2): Retrying Command Aug 10 02:27:53 snapshot1 kernel: (da5:isp0:0:1:2): Queue Full Aug 10 02:27:53 snapshot1 kernel: (da5:isp0:0:1:2): tagged openings now 249 Aug 10 02:27:53 snapshot1 kernel: (da5:isp0:0:1:2): Retrying Command Aug 10 02:27:53 snapshot1 kernel: (da5:isp0:0:1:2): Queue Full Aug 10 02:27:53 snapshot1 kernel: (da5:isp0:0:1:2): tagged openings now 248 Aug 10 02:27:53 snapshot1 kernel: (da5:isp0:0:1:2): Retrying Command Aug 10 02:27:53 snapshot1 kernel: (da5:isp0:0:1:2): Queue Full Aug 10 02:27:53 snapshot1 kernel: (da5:isp0:0:1:2): tagged openings now 247 Aug 10 02:27:53 snapshot1 kernel: (da5:isp0:0:1:2): Retrying Command Aug 10 02:27:53 snapshot1 kernel: (da5:isp0:0:1:2): Queue Full Aug 10 02:27:53 snapshot1 kernel: (da5:isp0:0:1:2): tagged openings now 246 Aug 10 02:27:53 snapshot1 kernel: (da5:isp0:0:1:2): Retrying Command Aug 10 02:31:38 snapshot1 kernel: isp0: command timed out for 0.2.2 Aug 10 02:31:38 snapshot1 kernel: (da8:isp0:0:2:2): Command timed out Aug 10 02:31:38 snapshot1 kernel: (da8:isp0:0:2:2): Retrying Command Aug 10 02:31:38 snapshot1 kernel: isp0: command timed out for 0.2.2 Aug 10 02:31:38 snapshot1 kernel: (da8:isp0:0:2:2): Command timed out Aug 10 02:31:38 snapshot1 kernel: (da8:isp0:0:2:2): Retrying Command Aug 10 02:42:14 snapshot1 kernel: isp0: command timed out for 0.2.1 Aug 10 02:42:14 snapshot1 kernel: (da7:isp0:0:2:1): Command timed out Aug 10 02:42:14 snapshot1 kernel: (da7:isp0:0:2:1): Retrying Command [..machine locks up around 03:08..] This happened while doing an fsck on one of the filesystems on one of these devices (I can't recall which). Eric -- ------------------------------------------------------------------------ Eric Anderson Sr. Systems Administrator Centaur Technology Anything that works is better than anything that doesn't. ------------------------------------------------------------------------ From owner-freebsd-scsi@FreeBSD.ORG Fri Aug 11 15:43:54 2006 Return-Path: X-Original-To: freebsd-scsi@freebsd.org Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 00A0A16A4DD for ; Fri, 11 Aug 2006 15:43:54 +0000 (UTC) (envelope-from geoffb@chuggalug.clues.com) Received: from chuggalug.clues.com (chuggalug2.demon.co.uk [83.104.169.191]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5016043D49 for ; Fri, 11 Aug 2006 15:43:52 +0000 (GMT) (envelope-from geoffb@chuggalug.clues.com) Received: from chuggalug.clues.com (localhost [127.0.0.1]) by chuggalug.clues.com (8.12.10/8.12.10) with ESMTP id k7BFhmKP084148; Fri, 11 Aug 2006 15:43:48 GMT (envelope-from geoffb@chuggalug.clues.com) Received: (from geoffb@localhost) by chuggalug.clues.com (8.12.10/8.12.10/Submit) id k7BFhmkw084147; Fri, 11 Aug 2006 15:43:48 GMT (envelope-from geoffb) Date: Fri, 11 Aug 2006 15:43:48 +0000 From: Geoff Buckingham To: Eric Anderson Message-ID: <20060811154348.GA83765@chuggalug.clues.com> References: <44DB8A9C.8090609@centtech.com> <44DC6F9F.4060405@centtech.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <44DC6F9F.4060405@centtech.com> User-Agent: Mutt/1.4.1i Cc: freebsd-scsi@freebsd.org Subject: Re: isp issues on recent -STABLE X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Aug 2006 15:43:54 -0000 On Fri, Aug 11, 2006 at 06:53:03AM -0500, Eric Anderson wrote: > [..snip..] > Aug 9 23:02:10 snapshot1 kernel: isp0: command timed out for 0.2.2 > Aug 9 23:02:10 snapshot1 kernel: (da8:isp0:0:2:2): Command timed out > Aug 9 23:02:10 snapshot1 kernel: (da8:isp0:0:2:2): Retrying Command > Aug 9 23:26:58 snapshot1 kernel: (da3:isp0:0:1:0): Queue Full > Aug 9 23:26:58 snapshot1 kernel: (da3:isp0:0:1:0): tagged openings now 254 I don't what may have changed in the driver recently, but from your post you seem to be using the FC isp and potentially a "SAN" presenting arrays as luns to you rather than JBOD on a loop or switch. If you you have some kind of data mover presenting arrays, your SAN vendor may well recomend a maximum queue size per lun (often 20-30) man camcontrol, look at the tags section. Your commands may be timing out because you have managed to queue too many command (which I hope should not happen). Or..... Your queues could be filling because your commands are timing out. Which would imply something is broke :-( From owner-freebsd-scsi@FreeBSD.ORG Fri Aug 11 16:01:35 2006 Return-Path: X-Original-To: freebsd-scsi@freebsd.org Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id EB5F216A4DE for ; Fri, 11 Aug 2006 16:01:35 +0000 (UTC) (envelope-from anderson@centtech.com) Received: from mh2.centtech.com (moat3.centtech.com [207.200.51.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7BDC343D45 for ; Fri, 11 Aug 2006 16:01:35 +0000 (GMT) (envelope-from anderson@centtech.com) Received: from [10.177.171.220] (neutrino.centtech.com [10.177.171.220]) by mh2.centtech.com (8.13.1/8.13.1) with ESMTP id k7BG1WA6018701; Fri, 11 Aug 2006 11:01:32 -0500 (CDT) (envelope-from anderson@centtech.com) Message-ID: <44DCA9F1.1000302@centtech.com> Date: Fri, 11 Aug 2006 11:01:53 -0500 From: Eric Anderson User-Agent: Thunderbird 1.5.0.5 (X11/20060802) MIME-Version: 1.0 To: Geoff Buckingham References: <44DB8A9C.8090609@centtech.com> <44DC6F9F.4060405@centtech.com> <20060811154348.GA83765@chuggalug.clues.com> In-Reply-To: <20060811154348.GA83765@chuggalug.clues.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.87.1/1646/Fri Aug 11 04:51:17 2006 on mh2.centtech.com X-Virus-Status: Clean Cc: freebsd-scsi@freebsd.org Subject: Re: isp issues on recent -STABLE X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Aug 2006 16:01:36 -0000 On 08/11/06 10:43, Geoff Buckingham wrote: > On Fri, Aug 11, 2006 at 06:53:03AM -0500, Eric Anderson wrote: >> [..snip..] >> Aug 9 23:02:10 snapshot1 kernel: isp0: command timed out for 0.2.2 >> Aug 9 23:02:10 snapshot1 kernel: (da8:isp0:0:2:2): Command timed out >> Aug 9 23:02:10 snapshot1 kernel: (da8:isp0:0:2:2): Retrying Command >> Aug 9 23:26:58 snapshot1 kernel: (da3:isp0:0:1:0): Queue Full >> Aug 9 23:26:58 snapshot1 kernel: (da3:isp0:0:1:0): tagged openings now 254 > > I don't what may have changed in the driver recently, but from your post > you seem to be using the FC isp and potentially a "SAN" presenting arrays as > luns to you rather than JBOD on a loop or switch. > > If you you have some kind of data mover presenting arrays, your SAN vendor > may well recomend a maximum queue size per lun (often 20-30) I have this one host connected to a single QLogic fiber channel switch, which has 5 ACNC fiber channel arrays attached to it, along with a tape robot and tape drives. Three of the arrays present 3 LUNs each (2TB per LUN), and two of them present 2 LUNs each, 4GB and 10TB - I'm not using these two arrays much yet, and are not really associated with the problems. > man camcontrol, look at the tags section. > > Your commands may be timing out because you have managed to queue too many > command (which I hope should not happen). Or..... > > Your queues could be filling because your commands are timing out. Which > would imply something is broke :-( Strange that I've never hit this in the past, but now I seem to be hitting it quite often. The vendor of the arrays says queue depth is 256 per LUN, and that coincides with my messages above I believe. Eric -- ------------------------------------------------------------------------ Eric Anderson Sr. Systems Administrator Centaur Technology Anything that works is better than anything that doesn't. ------------------------------------------------------------------------ From owner-freebsd-scsi@FreeBSD.ORG Sat Aug 12 00:36:16 2006 Return-Path: X-Original-To: freebsd-scsi@freebsd.org Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1129616A4E0 for ; Sat, 12 Aug 2006 00:36:16 +0000 (UTC) (envelope-from jrhett@svcolo.com) Received: from outbound0.sv.meer.net (outbound0.mx.meer.net [209.157.153.23]) by mx1.FreeBSD.org (Postfix) with ESMTP id AF54C43D49 for ; Sat, 12 Aug 2006 00:36:15 +0000 (GMT) (envelope-from jrhett@svcolo.com) Received: from mail.meer.net (mail.meer.net [209.157.152.14]) by outbound0.sv.meer.net (8.12.10/8.12.6) with ESMTP id k7C0aFih083418 for ; Fri, 11 Aug 2006 17:36:15 -0700 (PDT) (envelope-from jrhett@svcolo.com) Received: from [10.66.240.106] (public-wireless.sv.svcolo.com [64.13.135.30]) by mail.meer.net (8.13.3/8.13.3/meer) with ESMTP id k7C0ZGfh044335 for ; Fri, 11 Aug 2006 17:35:16 -0700 (PDT) (envelope-from jrhett@svcolo.com) Mime-Version: 1.0 (Apple Message framework v752.2) Content-Transfer-Encoding: 7bit Message-Id: <27BE7ACB-BCF9-40A5-96F4-CE90A297CE71@svcolo.com> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed To: freebsd-scsi@freebsd.org From: Jo Rhett Date: Fri, 11 Aug 2006 17:35:15 -0700 X-Mailer: Apple Mail (2.752.2) Subject: myl driver failing during server shutdown X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 12 Aug 2006 00:36:16 -0000 So I had thought that my motherboard didn't honor the acpi reset or power down command. It turns out that it does just fine -- but the shutdown is failing/hanging. Attaching a serial console to it, I see this: Waiting (max 60 seconds) for system process `syncer' to stop... Syncing disks, vnodes remaining...3 0 2 0 0 done All buffers synced. Uptime: 6d18h12m6s (da0:mly0:1:0:0): Synchronize cache failed, status == 0xb, scsi status == 0x0 mly0: flushing cache...kernel trap 12 with interrupts disabled Fatal trap 12: page fault while in kernel mode fault virtual address = 0x0 fault code = supervisor read, page not present instruction pointer = 0x20:0x0 stack pointer = 0x28:0xe25c1ac0 frame pointer = 0x28:0x0 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = resume, IOPL = 0 current process = 1 (init) trap number = 12 panic: page fault Uptime: 6d19h12m38s (da0:mly0:1:0:0): Synchronize cache failed, status == 0xb, scsi status == 0x0 Dumping 991 MB (2 chunks) Aborting dump due to I/O error. status == 0xb, scsi status == 0x0 ** DUMP FAILED (ERROR 5) ** This is 100% reproducable. Anyone have any ideas where to start on this problem? What does this error mean? Note: if you want to debug this, I can provide root access. It's just a personal box :-) -- Jo Rhett senior geek Silicon Valley Colocation From owner-freebsd-scsi@FreeBSD.ORG Sat Aug 12 03:27:28 2006 Return-Path: X-Original-To: freebsd-scsi@freebsd.org Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E4D6016A4DF for ; Sat, 12 Aug 2006 03:27:28 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2DB6F43D45 for ; Sat, 12 Aug 2006 03:27:27 +0000 (GMT) (envelope-from scottl@samsco.org) Received: from [192.168.254.11] (phobos.samsco.home [192.168.254.11]) (authenticated bits=0) by pooker.samsco.org (8.13.4/8.13.4) with ESMTP id k7C3RK1N005782; Fri, 11 Aug 2006 21:27:26 -0600 (MDT) (envelope-from scottl@samsco.org) Message-ID: <44DD4A97.2050203@samsco.org> Date: Fri, 11 Aug 2006 21:27:19 -0600 From: Scott Long User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.7.13) Gecko/20060414 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Jo Rhett References: <27BE7ACB-BCF9-40A5-96F4-CE90A297CE71@svcolo.com> In-Reply-To: <27BE7ACB-BCF9-40A5-96F4-CE90A297CE71@svcolo.com> Content-Type: multipart/mixed; boundary="------------010706020105070808080501" X-Spam-Status: No, score=-1.4 required=3.8 tests=ALL_TRUSTED autolearn=failed version=3.1.1 X-Spam-Checker-Version: SpamAssassin 3.1.1 (2006-03-10) on pooker.samsco.org Cc: freebsd-scsi@freebsd.org Subject: Re: myl driver failing during server shutdown X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 12 Aug 2006 03:27:29 -0000 This is a multi-part message in MIME format. --------------010706020105070808080501 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Jo Rhett wrote: > So I had thought that my motherboard didn't honor the acpi reset or > power down command. It turns out that it does just fine -- but the > shutdown is failing/hanging. Attaching a serial console to it, I see > this: > > Waiting (max 60 seconds) for system process `syncer' to stop... > Syncing disks, vnodes remaining...3 0 2 0 0 done > All buffers synced. > Uptime: 6d18h12m6s > (da0:mly0:1:0:0): Synchronize cache failed, status == 0xb, scsi status > == 0x0 > mly0: flushing cache...kernel trap 12 with interrupts disabled > > > Fatal trap 12: page fault while in kernel mode > fault virtual address = 0x0 > fault code = supervisor read, page not present > instruction pointer = 0x20:0x0 > stack pointer = 0x28:0xe25c1ac0 > frame pointer = 0x28:0x0 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, def32 1, gran 1 > processor eflags = resume, IOPL = 0 > current process = 1 (init) > trap number = 12 > panic: page fault > Uptime: 6d19h12m38s > (da0:mly0:1:0:0): Synchronize cache failed, status == 0xb, scsi status > == 0x0 > Dumping 991 MB (2 chunks) > Aborting dump due to I/O error. > status == 0xb, scsi status == 0x0 > > ** DUMP FAILED (ERROR 5) ** > > This is 100% reproducable. Anyone have any ideas where to start on > this problem? What does this error mean? > > Note: if you want to debug this, I can provide root access. It's just > a personal box :-) > Give this (untested) patch a try. If that doesn't work, it's going to need a lot more digging, and I unfortunately don't have the time for that right now. Scott --------------010706020105070808080501 Content-Type: text/plain; x-mac-type="0"; x-mac-creator="0"; name="mly.diff" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="mly.diff" Index: mly.c =================================================================== RCS file: /usr/ncvs/src/sys/dev/mly/mly.c,v retrieving revision 1.39 diff -u -r1.39 mly.c --- mly.c 8 Aug 2005 12:23:26 -0000 1.39 +++ mly.c 10 Aug 2006 11:57:54 -0000 @@ -1128,9 +1128,12 @@ mc->mc_data = *data; mc->mc_flags |= MLY_CMD_DATAOUT; } - mc->mc_length = datasize; - mc->mc_packet->generic.data_size = datasize; + } else if (datasize != 0) { + error = EINVAL; + goto out; } + mc->mc_length = datasize; + mc->mc_packet->generic.data_size = datasize; /* run the command */ if ((error = mly_immediate_command(mc))) --------------010706020105070808080501-- From owner-freebsd-scsi@FreeBSD.ORG Sat Aug 12 18:02:06 2006 Return-Path: X-Original-To: freebsd-scsi@freebsd.org Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7481A16A4DD for ; Sat, 12 Aug 2006 18:02:06 +0000 (UTC) (envelope-from lydianconcepts@gmail.com) Received: from nf-out-0910.google.com (nf-out-0910.google.com [64.233.182.186]) by mx1.FreeBSD.org (Postfix) with ESMTP id 57E8943D5D for ; Sat, 12 Aug 2006 18:01:51 +0000 (GMT) (envelope-from lydianconcepts@gmail.com) Received: by nf-out-0910.google.com with SMTP id g2so1468569nfe for ; Sat, 12 Aug 2006 11:01:50 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=fNmF+yncSMtcGSzy/k/6Sywkm0sOBlw7cqGDnueux0BkiAyMk9FD3LMoM1mqOlqlai/NmCJ5sCdlhiVGlN1GI4HHN2ltxvCp1CevDkLQrezRtTCRmDgp/ZYR8SqNl6rTJSWBpo8CO71oreMF/3GSJSo2ICWG7vsv5//9eiTU7FE= Received: by 10.78.127.6 with SMTP id z6mr2662719huc; Sat, 12 Aug 2006 11:01:50 -0700 (PDT) Received: by 10.78.134.9 with HTTP; Sat, 12 Aug 2006 11:01:49 -0700 (PDT) Message-ID: <7579f7fb0608121101g112e006cy1112d282fab753d3@mail.gmail.com> Date: Sat, 12 Aug 2006 11:01:49 -0700 From: "Matthew Jacob" To: "Eric Anderson" In-Reply-To: <44DCA9F1.1000302@centtech.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <44DB8A9C.8090609@centtech.com> <44DC6F9F.4060405@centtech.com> <20060811154348.GA83765@chuggalug.clues.com> <44DCA9F1.1000302@centtech.com> Cc: freebsd-scsi@freebsd.org Subject: Re: isp issues on recent -STABLE X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 12 Aug 2006 18:02:06 -0000 Hmm. I have no special help on this one. This doesn't seem *particularly* related to any changes I've made recently. If you could isolate a date/change when this occurred for you it would help. All I see in the messages below is indications that we've given your storage way too much work to do. On 8/11/06, Eric Anderson wrote: > On 08/11/06 10:43, Geoff Buckingham wrote: > > On Fri, Aug 11, 2006 at 06:53:03AM -0500, Eric Anderson wrote: > >> [..snip..] > >> Aug 9 23:02:10 snapshot1 kernel: isp0: command timed out for 0.2.2 > >> Aug 9 23:02:10 snapshot1 kernel: (da8:isp0:0:2:2): Command timed out > >> Aug 9 23:02:10 snapshot1 kernel: (da8:isp0:0:2:2): Retrying Command > >> Aug 9 23:26:58 snapshot1 kernel: (da3:isp0:0:1:0): Queue Full > >> Aug 9 23:26:58 snapshot1 kernel: (da3:isp0:0:1:0): tagged openings now 254 > > > > I don't what may have changed in the driver recently, but from your post > > you seem to be using the FC isp and potentially a "SAN" presenting arrays as > > luns to you rather than JBOD on a loop or switch. > > > > If you you have some kind of data mover presenting arrays, your SAN vendor > > may well recomend a maximum queue size per lun (often 20-30) > > I have this one host connected to a single QLogic fiber channel switch, > which has 5 ACNC fiber channel arrays attached to it, along with a tape > robot and tape drives. Three of the arrays present 3 LUNs each (2TB per > LUN), and two of them present 2 LUNs each, 4GB and 10TB - I'm not using > these two arrays much yet, and are not really associated with the problems. > > > man camcontrol, look at the tags section. > > > > Your commands may be timing out because you have managed to queue too many > > command (which I hope should not happen). Or..... > > > > Your queues could be filling because your commands are timing out. Which > > would imply something is broke :-( > > Strange that I've never hit this in the past, but now I seem to be > hitting it quite often. The vendor of the arrays says queue depth is > 256 per LUN, and that coincides with my messages above I believe. > > Eric > > > > -- > ------------------------------------------------------------------------ > Eric Anderson Sr. Systems Administrator Centaur Technology > Anything that works is better than anything that doesn't. > ------------------------------------------------------------------------ > _______________________________________________ > freebsd-scsi@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-scsi > To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org" >