From owner-freebsd-stable@FreeBSD.ORG Mon Oct 5 20:30:02 2009 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 662EA106568F for ; Mon, 5 Oct 2009 20:30:02 +0000 (UTC) (envelope-from byshenknet@byshenk.net) Received: from core.byshenk.net (core.byshenk.net [62.58.73.230]) by mx1.freebsd.org (Postfix) with ESMTP id D5E438FC23 for ; Mon, 5 Oct 2009 20:30:01 +0000 (UTC) Received: from core.byshenk.net (localhost.aoes.com [127.0.0.1]) by core.byshenk.net (8.14.3/8.14.3) with ESMTP id n95K91nJ072482; Mon, 5 Oct 2009 22:09:01 +0200 (CEST) (envelope-from byshenknet@core.byshenk.net) Received: (from byshenknet@localhost) by core.byshenk.net (8.14.3/8.14.3/Submit) id n95K91WL072481; Mon, 5 Oct 2009 22:09:01 +0200 (CEST) (envelope-from byshenknet) Date: Mon, 5 Oct 2009 22:09:00 +0200 From: Greg Byshenk To: Daniel Bond Message-ID: <20091005200900.GE15606@core.byshenk.net> References: <2a41acea0909301556g1df7dbafv813f5924553c8bfb@mail.gmail.com> <4AC5198E.7030609@monkeybrains.net> <4AC51B4C.7080905@monkeybrains.net> <2a41acea0910011450v41590f3dn112f367f26faed2d@mail.gmail.com> <4AC64835.3060107@monkeybrains.net> <2a41acea0910021237w415efa2cs4354a0f99aef8f6@mail.gmail.com> <4AC66437.4040704@monkeybrains.net> <6194E9BC-3A3D-4941-A777-88C7411905B0@danielbond.org> <2a41acea0910050957x2d085e90w2ebea7f9eb87c3e4@mail.gmail.com> <57F8F331-E823-4F88-BDD5-A8B95A3B4CB6@danielbond.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <57F8F331-E823-4F88-BDD5-A8B95A3B4CB6@danielbond.org> User-Agent: Mutt/1.4.2.3i X-Spam-Status: No, score=-1.4 required=5.0 tests=ALL_TRUSTED autolearn=failed version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on core.byshenk.net Cc: FreeBSD Stable Subject: Re: em0 watchdog timeouts X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 05 Oct 2009 20:30:02 -0000 On Mon, Oct 05, 2009 at 08:32:14PM +0200, Daniel Bond wrote: > What I need is useful advice/help. I never stated I needed a driver > developer. > > I'd like to be able to run my favorite OS on cool hardware, in the > future, for a high-performing NFS-server, without problems like I've > experienced the past 6months, on a production system. > Please note that I'm managing a server-park almost completely based on > FreeBSD, and I'm running many NFS servers on other hardware, for other > services, without issues. > > I've seen several other FreeBSD-users having problems with this too, > so I think it's of importance for the project. As I mentioned > originally, I'm happy to dispose the hardware to any FreeBSD developer > that might want to look further into this. Debugging it further is > above my skill-set, I don't even know where to begin looking, > especially since I can't produce any panics. I can give one bit of advice that helped me in a similar situation: check you motherboards. I run about a dozen fileservers on FreeBSD, and have always been very happy with their performance, but some months ago I began to experience problems with one of them. These problems were 'watchdog timeout' errors. Tried all manner of things, different NICs of different types, changing settings, etc., but nothing helped over the long term. At some point, when very heavy i/o was going on to our Beowulf cluster, the 'watchdog timeouts' would begin. What was strange is that other (supposedly identical) machines handled _more_ i/o without a problem. Finally, while doing some comparisons, I realized that the motherboard having the problem was _not_ the same as the others; it was similar, but not identical. I changed the motherboard and all the problems went away, never to reappear. I don't know if it was a specific problem with that particular motherboard, or something about that model, but for whatever reason, it appears that the buses just couldn't handle a RAID card and three active NICs. -- greg byshenk - gbyshenk@byshenk.net - Leiden, NL