From owner-freebsd-hardware@FreeBSD.ORG Mon Jul 7 23:42:45 2008 Return-Path: Delivered-To: freebsd-hardware@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8BA7D1065689 for ; Mon, 7 Jul 2008 23:42:45 +0000 (UTC) (envelope-from mandrews@bit0.com) Received: from magnum.bit0.com (magnum.bit0.com [207.246.88.226]) by mx1.freebsd.org (Postfix) with ESMTP id 656558FC13 for ; Mon, 7 Jul 2008 23:42:45 +0000 (UTC) (envelope-from mandrews@bit0.com) Received: from localhost (localhost [127.0.0.1]) by magnum.bit0.com (Postfix) with ESMTP id 2AC335641F for ; Mon, 7 Jul 2008 19:26:59 -0400 (EDT) X-Virus-Scanned: amavisd-new at bit0.com Received: from magnum.bit0.com ([127.0.0.1]) by localhost (magnum.int.bit0.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id K0VBiBeB8xJd for ; Mon, 7 Jul 2008 19:26:58 -0400 (EDT) Received: from beast.int.bit0.com (nat.bit0.com [207.246.88.210]) by magnum.bit0.com (Postfix) with ESMTP for ; Mon, 7 Jul 2008 19:26:58 -0400 (EDT) Date: Mon, 7 Jul 2008 19:26:54 -0400 (EDT) From: Mike Andrews X-X-Sender: mandrews@beast.int.bit0.com To: freebsd-hardware@freebsd.org Message-ID: <20080707190237.K70038@beast.int.bit0.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Subject: Debugging 3Ware 9000 series hangs under load X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jul 2008 23:42:45 -0000 I've occasionally had problems with 3Ware 9550SX, 9650SE, and now 9690SA cards hanging under load. By "hanging" I mean "swap_pager: indefinite wait buffer" messages on the console, machine still pingable, etc -- basically a livelock situation. "Heavy load" usually involves a busy MySQL instance combined with, say, copying some multi-gigabyte files, maybe also combined with rsync, all running concurrently. The problem is the hangs are extremely sporadic and not easily reproducible on demand. It seems to happen with both ZFS and UFS2; I've been sticking to UFS2 for this for now due to some weird ZFS+MySQL issues (which are annoying but not relevant to this particular problem)... On a FreeBSD/amd64 7-STABLE system built from less-than-one-week-old source, with serial console and the kernel debugger compiled in, but being a complete idiot on how to use KDB, can someone tell me what would be the most useful info I could gather from KDB the next time this happens? Given the state of the controller at that point, I doubt forcing a crash dump from KDB is going to work unless I was able to do it to a disk on another controller... so I will work on getting that set up in the meantime. I suspect this is either a twa driver or kernel issue rather than hardware, as it's happened with multiple cards, cables, etc, but again, hard to tell if I don't know what to ask KDB for. :) On the off chance that it is driver related and takes a while to fix, anyone have recommendations for good 8-port SAS controllers for FreeBSD 7-STABLE amd64 that can take a heavy beating and play nice with Intel 3000 chipset-based boards? I'm eyeing the LSI Megaraid 8888ELP as a possible alternative... but don't have a great deal of LSI experience. The 3ware SAS setup I'm beating on now is replacing an old Adaptec 2120S SCSI setup.