Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 17 Dec 2007 10:26:58 -0800
From:      Julian Elischer <julian@elischer.org>
To:        Maxime Henrion <mux@FreeBSD.org>
Cc:        Gleb Smirnoff <glebius@FreeBSD.org>, net@FreeBSD.org
Subject:   Re: Deadlock in the routing code
Message-ID:  <4766BF72.7000005@elischer.org>
In-Reply-To: <20071217101009.GL71713@elvis.mu.org>
References:  <20071213133817.GC71713@elvis.mu.org> <47617AF5.7070701@elischer.org> <20071214092539.GB14339@glebius.int.ru> <4762DD82.9070904@elischer.org> <20071217101009.GL71713@elvis.mu.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Maxime Henrion wrote:
> Julian Elischer wrote:
>> Gleb Smirnoff wrote:
>>> On Thu, Dec 13, 2007 at 10:33:25AM -0800, Julian Elischer wrote:
>>> J>  Maxime Henrion wrote:
>>> J> > Replying to myself on this one, sorry about that.
>>> J> > I said in my previous mail that I didn't know yet what process was
>>> J> > holding the lock of the rtentry that the routed process is dealing
>>> J> > with in rt_setgate(), and I just could verify that it is held by
>>> J> > the swi1: net thread.
>>> J> > So, in a nutshell:
>>> J> > - The routed process does its business on the routing socket, that 
>>> ends up
>>> J> >   calling rt_setgate().  While in rt_setgate() it drops the lock on 
>>> its
>>> J> >   rtentry in order to call rtalloc1().  At this point, the routed
>>> J> >   process hold the gateway route (rtalloc1() returns it locked), and 
>>> it
>>> J> >   now tries to re-lock the original rtentry.
>>> J> > - At the same time, the swi net thread calls arpresolve() which ends 
>>> up
>>> J> >   calling rt_check().  Then rt_check() locks the rtentry, and tries to
>>> J> >   lock the gateway route.
>>> J> > A classical case of deadlock with mutexes because of different locking
>>> J> > order.  Now, it's not obvious to me how to fix it :-).
>>> J> 
>>> J>  On failure to re-lock, the routed call to rt_setgate should completely 
>>> abort J>  and restart from scratch, releasing all locks it has on the way 
>>> out.
>>>
>>> Do you suggest mtx_trylock?
>> I think that would be the cleanest way..
> 
> So, here's what I've got.  I have yet to test it at all, I hope that
> I'll be able to do so today, or tomorrow.  Any input appreciated.
> 
> Cheers,
> Maxime
> 

this code is I think (from memory) called only from the user right?
it is possible that on failure to lock one might  delay for 1 tick or something..

(I don't have the code in front of me right now)

otherwise I think that might do the job.. more comments later.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4766BF72.7000005>