From owner-freebsd-net@FreeBSD.ORG Fri Dec 14 09:50:19 2007 Return-Path: Delivered-To: net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DA82116A418; Fri, 14 Dec 2007 09:50:19 +0000 (UTC) (envelope-from glebius@FreeBSD.org) Received: from cell.glebius.int.ru (glebius.int.ru [81.19.64.130]) by mx1.freebsd.org (Postfix) with ESMTP id 66D7F13C468; Fri, 14 Dec 2007 09:50:19 +0000 (UTC) (envelope-from glebius@FreeBSD.org) Received: from cell.glebius.int.ru (localhost [127.0.0.1]) by cell.glebius.int.ru (8.14.1/8.14.1) with ESMTP id lBE9PdIB035858; Fri, 14 Dec 2007 12:25:39 +0300 (MSK) (envelope-from glebius@FreeBSD.org) Received: (from glebius@localhost) by cell.glebius.int.ru (8.14.1/8.14.1/Submit) id lBE9PdAg035857; Fri, 14 Dec 2007 12:25:39 +0300 (MSK) (envelope-from glebius@FreeBSD.org) X-Authentication-Warning: cell.glebius.int.ru: glebius set sender to glebius@FreeBSD.org using -f Date: Fri, 14 Dec 2007 12:25:39 +0300 From: Gleb Smirnoff To: Julian Elischer Message-ID: <20071214092539.GB14339@glebius.int.ru> References: <20071213133817.GC71713@elvis.mu.org> <47617AF5.7070701@elischer.org> MIME-Version: 1.0 Content-Type: text/plain; charset=koi8-r Content-Disposition: inline In-Reply-To: <47617AF5.7070701@elischer.org> User-Agent: Mutt/1.5.15 (2007-04-06) Cc: Maxime Henrion , net@FreeBSD.org Subject: Re: Deadlock in the routing code X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 Dec 2007 09:50:19 -0000 On Thu, Dec 13, 2007 at 10:33:25AM -0800, Julian Elischer wrote: J> Maxime Henrion wrote: J> > Replying to myself on this one, sorry about that. J> > I said in my previous mail that I didn't know yet what process was J> > holding the lock of the rtentry that the routed process is dealing J> > with in rt_setgate(), and I just could verify that it is held by J> > the swi1: net thread. J> > So, in a nutshell: J> > - The routed process does its business on the routing socket, that ends up J> > calling rt_setgate(). While in rt_setgate() it drops the lock on its J> > rtentry in order to call rtalloc1(). At this point, the routed J> > process hold the gateway route (rtalloc1() returns it locked), and it J> > now tries to re-lock the original rtentry. J> > - At the same time, the swi net thread calls arpresolve() which ends up J> > calling rt_check(). Then rt_check() locks the rtentry, and tries to J> > lock the gateway route. J> > A classical case of deadlock with mutexes because of different locking J> > order. Now, it's not obvious to me how to fix it :-). J> J> On failure to re-lock, the routed call to rt_setgate should completely abort J> and restart from scratch, releasing all locks it has on the way out. Do you suggest mtx_trylock? -- Totus tuus, Glebius. GLEBIUS-RIPN GLEB-RIPE