From owner-freebsd-stable@FreeBSD.ORG Mon Jun 20 12:12:38 2005 Return-Path: X-Original-To: stable@freebsd.org Delivered-To: freebsd-stable@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id ABE0016A41C for ; Mon, 20 Jun 2005 12:12:38 +0000 (GMT) (envelope-from ltning@anduin.net) Received: from anduin.net (anduin.net [212.12.46.226]) by mx1.FreeBSD.org (Postfix) with ESMTP id 655D943D49 for ; Mon, 20 Jun 2005 12:12:38 +0000 (GMT) (envelope-from ltning@anduin.net) Received: from eirik.unicore.no ([213.225.74.166] helo=[10.0.16.10]) by anduin.net with esmtpa (Exim 4.50 (FreeBSD)) id 1DkL8n-000C9C-3n for stable@freebsd.org; Mon, 20 Jun 2005 14:12:37 +0200 Resent-Message-Id: Mime-Version: 1.0 (Apple Message framework v730) Content-Type: text/plain; charset=ISO-8859-1; delsp=yes; format=flowed Resent-Date: Mon, 20 Jun 2005 14:12:02 +0200 Message-Id: Content-Transfer-Encoding: quoted-printable Resent-To: stable@freebsd.org From: =?ISO-8859-1?Q?Eirik_=D8verby?= Resent-From: =?ISO-8859-1?Q?Eirik_=D8verby?= Date: Mon, 20 Jun 2005 10:53:19 +0200 To: Robert Watson X-Mailer: Apple Mail (2.730) Cc: Subject: Re: NFS-related hang in 5.4? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Jun 2005 12:12:38 -0000 On 20. jun. 2005, at 10.38, Robert Watson wrote: > > On Mon, 20 Jun 2005, Eirik =D8verby wrote: > > > >>> Hmm. Looks like a bug in dummynet. ipfw should not be directly =20 >>> re- injecting UDP traffic back into the input path from an =20 >>> outbound path, or it risks re-entering, generating lock order =20 >>> problems, etc. It should be getting dropped into the netisr queue =20= >>> to be processed from the netisr context. >>> >>> >> >> This problem would exist across all 5.4 installations, both i386 =20 >> and amd64? Would it depend on heavy load, or could it =20 >> theoretically happen at any time when there's traffic? All three =20 >> of my fbsd5 servers (dual opteron, dual p3-1ghz, dual p3-700mhz) =20 >> are experiencing random hangs with ~a few weeks between, =20 >> impression is that if running single-cpu mode they are all stable. =20= >> All using dummynet in a comparable manner. Ideas? >> >> > > Yes. Basically, the network stack avoids recursion in processing =20 > for "complicated" packets by deferring processing an offending =20 > packet to a thread called the 'netisr'. Whenever the stack reaches =20= > a possible recursion point on a packet, it's supposed to queue the =20 > packet for processing 'later' in a per-protocol queue, unwind, and =20 > then when the netisr runs, pick up and continue processing. In the =20= > stack trace you provide, dummynet appears to immediately =20 > immediately invoke the in-bound network path from the out-bound =20 > network path, walking back into the network stack from the outbound =20= > path. This is generally forbidden, for a variety of reasons: > > - We do allow the in-bound path to call the out-bound path, so that > protocols like TCP, and services like NFS can turn around packets > without a context switch. If further recursion is permitted, the =20= > stack > may overflow. > > - Both paths may hold network stack locks over calls in either =20 > direction > -- specifically, we allow protocol locks to be held over calls =20 > into the > socket layer, as the protocol layer drives operation; if a recursive > call is made, deadlocks can occur due to violating the lock =20 > order. This > is what is happening in your case. > > Pretty much all network code is entirely architecture-independent, =20 > so bugs typically span architectures, although race conditions can =20 > sometimes be hard to reproduce if they require precise timing and =20 > multiple processors. > So I'm lucky to have seen this one... Great ;) >>> Is it possible to configure dummynet out of your configuration, =20 >>> and see if the problem goes away? >>> >>> >> >> I'm running a test right now, will let you know in the morning. >> >> > > Thanks. > I know enough not to call this a "confirmation", but disabling =20 dummynet did indeed allow me to finish the backup. I never made it =20 past 15GBs before, now the full 19GB tar.gz file is done, and the =20 boxes are both still running. The funny thing is - I only disabled =20 dummynet on one of the boxes now - the source of the backup, the box =20 that pushes data. The other box has pretty much 100% the same setup, =20 and is also i386. But as traffic shaping can only happen on outgoing =20 packets, I suppose that makes sense. I can try re-running the test again if you wish, in order to gain =20 more statistics. It's just too bad it takes a while ;) /Eirik > > Robert N M Watson >