Date: Mon, 9 Jan 2012 11:28:10 -0500 From: John Baldwin <jhb@freebsd.org> To: Bruce Simpson <bms@incunabulum.net> Cc: net@freebsd.org Subject: Re: Deferring inp_freemoptions() to an asychronous task Message-ID: <201201091128.10193.jhb@freebsd.org> In-Reply-To: <4F0B0684.8040609@incunabulum.net> References: <201112221115.10239.jhb@freebsd.org> <4F0B0684.8040609@incunabulum.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On Monday, January 09, 2012 10:23:48 am Bruce Simpson wrote: > John, > > Sorry it's taken me so long to reply. > > No objections in principle to your change, but this seems to point at a > more general issue with modern network controllers. > > You've also stumbled on the behaviour specific to how BSD has > traditionally dealt with broadcast/multicast sockets. The pcbinfo > structure can't really be disentangled from this. > > Of course, it doesn't help that we have historically required these > sockets to be bound to INADDR_ANY. It might be useful to break reception > out using a separate hash/tree, rather than walking all sockets as is > currently done, but legacy usage needs to be supported. > > Interestingly enough, Microsoft has probably done something similar, > judging from things which appear in MSDN. > > John Baldwin wrote: > > I have a workload at work where a particular device driver can take a while to > > update its MAC filter table when adding or removing multicast link-layer > > addresses. One of the ways I've tackled fixing this is to change > > inp_freemoptions() so that it does all of its actual work asychronously in a > > separate task. Currently it does its work synchronously; however, it can be > > invoked while the associated protocol holds a write lock on its pcbinfo lock > > (e.g. from in_pcbdetach() called from udp_detach()). This stalls all packet > > reception for that protocol since received packets need a read lock on the > > pcbinfo to lookup the socket associated with a given (ip, port) tuple. > > There is often a delay between asking for the group and actually getting > the hash filter entry set up in the MAC, so the operations are async. > > I can see many apps like to assume the operation is instantaneous rather > than deferred; they are probably being naive... > > The same being true for taking down the hash filter entry is not surprising. The other fun part in this case is that if it is going to take a long time, a driver should probably be enabling reception of all multicast (equivalent of IFF_ALLMULTI) while it reprograms the table to avoid dropping packets for already-joined groups. I'm not currently doing this as we are using a different hack, but I think that is something drivers should probably be doing. -- John Baldwin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201201091128.10193.jhb>