From owner-freebsd-net@FreeBSD.ORG Mon Jan 9 16:30:19 2012 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C1B94106564A for ; Mon, 9 Jan 2012 16:30:19 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 990DD8FC0A for ; Mon, 9 Jan 2012 16:30:19 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [96.47.65.170]) by cyrus.watson.org (Postfix) with ESMTPSA id 4EA8A46B53; Mon, 9 Jan 2012 11:30:19 -0500 (EST) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id D1952B91C; Mon, 9 Jan 2012 11:30:18 -0500 (EST) From: John Baldwin To: Bruce Simpson Date: Mon, 9 Jan 2012 11:28:10 -0500 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p10; KDE/4.5.5; amd64; ; ) References: <201112221115.10239.jhb@freebsd.org> <4F0B0684.8040609@incunabulum.net> In-Reply-To: <4F0B0684.8040609@incunabulum.net> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201201091128.10193.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Mon, 09 Jan 2012 11:30:18 -0500 (EST) Cc: net@freebsd.org Subject: Re: Deferring inp_freemoptions() to an asychronous task X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Jan 2012 16:30:19 -0000 On Monday, January 09, 2012 10:23:48 am Bruce Simpson wrote: > John, > > Sorry it's taken me so long to reply. > > No objections in principle to your change, but this seems to point at a > more general issue with modern network controllers. > > You've also stumbled on the behaviour specific to how BSD has > traditionally dealt with broadcast/multicast sockets. The pcbinfo > structure can't really be disentangled from this. > > Of course, it doesn't help that we have historically required these > sockets to be bound to INADDR_ANY. It might be useful to break reception > out using a separate hash/tree, rather than walking all sockets as is > currently done, but legacy usage needs to be supported. > > Interestingly enough, Microsoft has probably done something similar, > judging from things which appear in MSDN. > > John Baldwin wrote: > > I have a workload at work where a particular device driver can take a while to > > update its MAC filter table when adding or removing multicast link-layer > > addresses. One of the ways I've tackled fixing this is to change > > inp_freemoptions() so that it does all of its actual work asychronously in a > > separate task. Currently it does its work synchronously; however, it can be > > invoked while the associated protocol holds a write lock on its pcbinfo lock > > (e.g. from in_pcbdetach() called from udp_detach()). This stalls all packet > > reception for that protocol since received packets need a read lock on the > > pcbinfo to lookup the socket associated with a given (ip, port) tuple. > > There is often a delay between asking for the group and actually getting > the hash filter entry set up in the MAC, so the operations are async. > > I can see many apps like to assume the operation is instantaneous rather > than deferred; they are probably being naive... > > The same being true for taking down the hash filter entry is not surprising. The other fun part in this case is that if it is going to take a long time, a driver should probably be enabling reception of all multicast (equivalent of IFF_ALLMULTI) while it reprograms the table to avoid dropping packets for already-joined groups. I'm not currently doing this as we are using a different hack, but I think that is something drivers should probably be doing. -- John Baldwin