Date: Sat, 15 Oct 2011 21:41:07 +0300 From: Mikolaj Golub <trociny@freebsd.org> To: Robert Millan <rmh@freebsd.org> Cc: Josef Karthauser <joe@freebsd.org>, freebsd-bugs@freebsd.org, Adrian Chadd <adrian@freebsd.org>, freebsd-fs@freebsd.org, Robert Watson <rwatson@freebsd.org>, Kostik Belousov <kostikbel@gmail.com> Subject: Re: kern/159663: sockets don't work though nullfs mounts Message-ID: <86obxim724.fsf@kopusha.home.net> References: <201108102152.p7ALqUl4075207@red.freebsd.org> <201108102200.p7AM0Nu9026320@freefall.freebsd.org> <CAOfDtXMa6r%2BK5ZmTfuKV5qXNOoqS7kJvRhy4W%2B0jwBhFqfk1PQ@mail.gmail.com> <CAOfDtXM45OT-aZ71-=JE7ZaG4%2B4Db1y4poO9L%2BePZW2%2BAMFXXg@mail.gmail.com> <CAOfDtXMtqd8WonbdwBWL1vaFNte47G-Qo4JAskgM0Y99Ru6U2g@mail.gmail.com> <86k48wz3mc.fsf@kopusha.home.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 26 Sep 2011 00:58:03 +0300 Mikolaj Golub wrote to Robert Millan: MG> Hi, MG> On Sun, 25 Sep 2011 17:32:27 +0200 Robert Millan wrote: RM>> 2011/9/24 Robert Millan <rmh@freebsd.org>: >>> I found a thread from 2007 with further discussion about this problem: >>> >>> http://lists.freebsd.org/pipermail/freebsd-fs/2007-February/002669.html RM>> Hi, RM>> I've looked at the situation in a bit more detail, for now only with RM>> sockets in mind (not named pipes). My understanding is (please RM>> correct me if I'm wrong): RM>> - nullfs holds reference counts for each vnode, but sockets have their RM>> own mechanism for reference counting (so_count / soref / sorele). RM>> vnode reference counting doesn't protect against socket being closed, RM>> which would leave a stale pointer in the upper nullfs layer. RM>> - Increasing the reference count of the socket itself can't be done in RM>> null_nodeget() because this function is merely a getter whose call RM>> doesn't indicate any meaningful event. RM>> - It's not clear to me that there's any event in time where the socket RM>> reference can be increased. If mounting a nullfs were that event, RM>> then all existing sockets would be soref'ed but we wouldn't be RM>> soref'ing future sockets created in the lower layer after the mount. RM>> This doesn't seem correct. RM>> - Possible solution: null_nodeget() semantics are replaced with RM>> something that actually allows vnodes in the upper layer to be created RM>> and destroyed. RM>> - Possible solution: upper layer has a memory structure to keep track RM>> of which sockets in the lower layer have been soref'ed. MG> It looks like there is no need in setting vp->v_un = lowervp->v_un for MG> VFIFO. They work without this modification bypassing vnode operations to lover MG> node and lowervp->v_un is used. MG> The issue is only with local sockets, because when bind or connnect is called MG> for nullfs file the upper v_un is used. MG> For me the approach "vp->v_un = lowervp->v_un" has many complications. May be MG> it is much easier to use always only lower vnode? What we need for this is to MG> make bind and connect get the lower vnode when they are called on nullfs file. Thinking more about "vp->v_un = lowervp->v_un" approach it looks for me that there should not be any coherency issues on contents of v_un between the two file system layers (the main worry about this approach in the thread mentioned above). Consider a scenario when binding to lower fs vnode and then connnecting to the upper fs path. On connect lookup returns nullfs node with: lvnp->v_un = bind_socket uvnp->v_un = bind_socket uvnp is locked (usecount is 1). bind_socket is used to establish the connection. After the connection is established uvnp is released by vput(), usecount is 0, so nullfs vnode is deactivated and destroyed. Thus uvnp->v_un has short time of life and it looks like it can't be stale during this time. When we bind to the upper fs vnode, in bind VOP_CREATE will return nullfs node with: lvnp->v_un = NULL uvnp->v_un = NULL bind sets uvnp->v_un, lvnp->v_un remains NULL. The nullfs node remains active until bind socket is closed, so on connect uvnp->v_un of this node is used. The connection to lower fs will return ECONNREFUSE. Thus I don't see a scenario when uvnp->v_un would be stale. I did some crash testing and did not manage to panic the system. But the issue is that if we bind to an upper fs path, we can't connect to the lower fs path. This behavior contradicts with overall nullfs behavior (all changes done on the upper layer are seen from the lower layer) and is more unionfs-like. That is why my proposal (return lover vnode instead of upper vnode in null_lookup and null_create if the vnode type is VSOCK) looks for me more interesting. But as I wrote it also has an issue: you can bind using the upper fs path and then unmount nullfs without force while the socket is still bound. The updated patch can be found here: http://people.freebsd.org/~trociny/nullfs.VSOCK.patch Anyway, for me any of these solutions, although not ideal, looks like better than having nothing at all, maybe just documenting the behavior in BUGS section. MG> As a proof of concept below is a patch that implements it. Currently I am not MG> sure that vrele/vref magic is done properly, but it looks like it works for MG> me. MG> The issues with this approach I see so far: MG> - we need an additional flag for namei; MG> - nullfs can be unmounted with a socket file still being opened. MG> -- MG> Mikolaj Golub -- Mikolaj Golub
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?86obxim724.fsf>