From owner-freebsd-current@FreeBSD.ORG Tue May 8 22:24:10 2007 Return-Path: X-Original-To: freebsd-current@freebsd.org Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id ABF8116A400 for ; Tue, 8 May 2007 22:24:10 +0000 (UTC) (envelope-from blyon@blyon.com) Received: from util2.sjc1.bitgravity.com (util2.sjc1.bitgravity.com [208.67.233.36]) by mx1.freebsd.org (Postfix) with ESMTP id 9D06413C43E for ; Tue, 8 May 2007 22:24:10 +0000 (UTC) (envelope-from blyon@blyon.com) Received: from [209.131.110.155] by util2.sjc1.bitgravity.com with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.63 (FreeBSD)) (envelope-from ) id 1HlY7d-00064e-BA; Tue, 08 May 2007 15:25:29 -0700 In-Reply-To: References: <9FC464A4-4405-4C10-A7CB-0A424EA4EAD3@blyon.com> Mime-Version: 1.0 (Apple Message framework v752.3) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <602A8820-F05C-457A-A20A-E258BD0FEDC5@blyon.com> Content-Transfer-Encoding: 7bit From: Barrett Lyon Date: Tue, 8 May 2007 15:23:30 -0700 To: adam radford X-Mailer: Apple Mail (2.752.3) Cc: freebsd-current@freebsd.org Subject: Re: Functional RAID controller? X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 May 2007 22:24:10 -0000 > If you have "a good idea what's wrong with the twa driver", would > you mind > sharing a stack trace or other information? So far I have only > been told that > "system hangs when I do heavy I/O". This is _not_ reproducable here. > Have you run memtest86 on the machine? Have you run a PCI analyzer on > your machine to see who is on the PCI bus before/during the hang? We have done everything including asking to bring the machines that are crashing to AMCC's offices which are down the street. I have not been doing the technical debugging but a few members of AMCC's staff have been trying to help. We've been running memtest, etc. When the machines hang there are no debugging options, it's completely frozen without any details pointing to why. Its not clear from that condition whether the problem is due to an unacknowledged interrupt or a mutex deadlock of some sort. We are assuming that in this case it is due to the driver trying to do work assuming the interrupt is valid and getting stuck or returning early before the interrupt is acknowledged, causing it to trigger over and over and over. If you want to see it reproduced, we are more than happy to provide you two machines that both have this condition. > You claim the hang doesn't happen on the 6.2 series twa driver, > the driver changes between the 6.x and 7.x twa driver are _very_ > minimal, > some simple time keeping changes, and some XPT_* path inquiry handling > changes. Under 6.x the systems as built function completely stable. > I am really surprised that you are trying to design servers around the > FreeBSD un-stable kernel. There are other reasons for this which I don't want to discuss here, but the other components we are using work very well within 7.0 and we have a lot of performance gains that make it worth using a development kernel. The 10GbE drivers like mxge are having a lot of development work done in HEAD and as a result the 6.x is getting left behind on some of the work we are doing. At the very least, I want to make sure I deploy hardware that will function beyond 6.x. -Barrett