Date: Tue, 10 Jan 2012 08:19:18 -0500 From: John Baldwin <jhb@freebsd.org> To: freebsd-arch@freebsd.org Cc: Mikolaj Golub <trociny@freebsd.org>, arch@freebsd.org, Robert Watson <rwatson@freebsd.org>, Kostik Belousov <kib@freebsd.org> Subject: Re: unix domain sockets on nullfs(5) Message-ID: <201201100819.18892.jhb@freebsd.org> In-Reply-To: <86sjjobzmn.fsf@kopusha.home.net> References: <86sjjobzmn.fsf@kopusha.home.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On Monday, January 09, 2012 11:37:52 am Mikolaj Golub wrote: > Hi, > > There is a longstanding problem with nullfs(5) that is unix sockets do > not work between lower and upper layers. > > See, e.g. kern/51583, kern/159663. > > On a unix socket binding the created socket is referenced in the vnode > field v_socket. This field is used on connect (from the vnode returned > by lookup). Unix socket functions like unp_bind/connect set/access > this field directly. > > This is the issue for nullfs, which uses two-layer vnode approach: > binding to the upper layer, the socket reference is stored in the > upper vnode; binding to the lower fs, the socket reference is stored > in the lower vnode and is not seen from the upper layer. > > E.g. having /mnt/upper nullfs mounted on /mnt/lower: > > 1) if we bind to /mnt/lower/test.sock we can connect only to > /mnt/lower/test.sock. > > 2) if we bind to /mnt/upper/test.sock we can connect only to > /mnt/upper/test.sock. > > The desired behavior is one can connect to both the lower and the > upper paths regardless if we bind to /mnt/lower/test.sock or > /mnt/upeer/test.sock. > > In kern/159663 two approaches were discussed: > > 1) copy the socket pointer from lower vnode to upper vnode on the > upper vnode get (fix the case when one binds to the lower fs and wants > to connect via the upper, but does not fix the case when one binds to > the upper and wants to connect via the lower fs); > > 2) make null_lookup/create return lower vnode for VSOCK vnodes. > > Both approaches have issues and looks rather hackish. > > kib@ suggested that the issue could be fixed if one added new VOP_* > operations for setting and accessing vnode's v_socket field. > > The attached patch implements this. It also can be found here: > > http://people.freebsd.org/~trociny/nullfs.VOP_UNP.4.patch > > It adds three VOP_* operations: VOP_UNPBIND, VOP_UNPCONNECT and > VOP_UNPDETACH. Their purpose can be understood from the modifications > in uipc_usrreq.c: > > - vp->v_socket = unp->unp_socket; > + VOP_UNPBIND(vp, unp->unp_socket); > > - so2 = vp->v_socket; > + VOP_UNPCONNECT(vp, &so2); > > - unp->unp_vnode->v_socket = NULL; > + VOP_UNPDETACH(unp->unp_vnode); > > The default functions just do these simple operations, while > filesystems like nullfs can do more complicated things. > > The patch also implements functions for nullfs. By default the old > behavior is preserved. To get the new behaviour the filesystem should > be (re)mounted with sobypass option. Then the socket operations are > bypassed to a lower vnode, which makes the socket be accessible from > both layers. > > I am very interested to hear other people opinion on this. I think this is a decent solution. Why not make the locking notes for VOP_UNPCONNECT() be "L" instead of "E"? A read lock should be sufficient to fetch the socket? In fact, I suspect that unp_connect() could actually use a shared lock on the vnode by adding 'LOCKSHARE' to the flags passed to namei() via NDINIT(). -- John Baldwin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201201100819.18892.jhb>