Date: Fri, 25 Apr 2008 15:19:16 +0100 (BST) From: Robert Watson <rwatson@FreeBSD.org> To: Daichi GOTO <daichi@freebsd.org> Cc: cvs-src@FreeBSD.org, src-committers@FreeBSD.org, Masanori OZAWA <ozawa@ongs.co.jp>, cvs-all@FreeBSD.org Subject: Re: cvs commit: src/sys/fs/unionfs union_subr.c Message-ID: <20080425150600.V97018@fledge.watson.org> In-Reply-To: <4811DE6F.9040604@freebsd.org> References: <200804250953.m3P9rrpd011741@repoman.freebsd.org> <20080425131229.C80552@fledge.watson.org> <4811DE6F.9040604@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 25 Apr 2008, Daichi GOTO wrote: >> Per my earlier e-mail, and assuming I understand correctly, I feel not only >> will this lead to new panics (due to dangling socket pointers and >> incomplete garbage collection), > > We, I and ozawa-san, cannot image the case that leads new panic. Would you > tell us how to get the panic if you can get it? Is it rare case to get or > not? The explanation is somewhat complicated, so I apologize if I'm unclear in explaining it. The UNIX domain (local) socket subsystem provides an IPC service based on sockets, but using the file system as a name space so that processes can rendezvous with one another. A server process, such as syslogd, will call bind(2) to associate a socket with a path, such as /var/run/log or /var/run/logpriv. In the file system, the way this works is that a vnode is hooked up to the namespace of type VSOCK, its v_socket pointer is initialized to point at the socket structure, and presumably the file system does some underlying storage magic to put it on disk (such as creating an inode). The socket also maintains a back pointer to the vnode, unp_vnode, which will be used when the socket is closed, which is where things get tricky. Consider the implementation of UNIX domain socket close -- when the socket is closed, the protocol state is detached by uipc_detach, which does the clears the pointer from the vnode to the socket and vice versa: if ((vp = unp->unp_vnode) != NULL) { unp->unp_vnode->v_socket = NULL; unp->unp_vnode = NULL; } Once uipc_detach has returned, the unpcb structure (pointed to by unp in the above code) is no longer valid, and shortly thereafter, the socket pointer is also invalid, both pointing to freed memory. The UNIX domain socket code is very careful to remove the reference from the vnode so that new threads won't dereference the pointer improperly. However, notice that in the above code, only the "bottom" layer v_socket pointer is cleared, not higher layers, which means that those higher layers will now point at freed memory, which may lead to panics. I haven't tried this, but I suspect you will be able to reproduce the panic if you: start syslogd against a base file system, union mount it to a new location, run the "syslog" command relative to the new file system mount, kill the base syslogd, then run the "syslog" command a second time. On the first occasion, it will work, since the v_socket pointer in the top layer points at the socket referenced by the bottom layer. However, when you kill syslogd, it closes the socket, which frees the socket structure pointed to by v_socket in the bottom layer, but not in the top layer. The next run of syslog will follow the stale v_socket pointer. Does this make sense? >> but it will also lead to possibly incorrect semantics for unionfs(upper >> layers can write to objects readable via the lower layer). > >> Some parts of this patch are fine, but the copying of v_socket pointers >> between layers is not correct. Please consider backing that part of the >> change out. > > Yes, we have noticed above. But.... at least, our patch solves problem of > http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/118346 we believe. If I roll > back that, that problem is still there. I would assert than an error is better than a panic. :-) > Uhmmm... is it better to get back it? And if you have some ideas to solve > this issue, please tell us :) Thanks I'm not 100% sure what the right solution is, but one approach might be to have the vnodes at the different layers simply refer to different sockets. Applicaitons should unlink the old socket in the top layer when they discover a stale socket there, and then create a new socket (masking the bottom layer socket), which should just work. Have you tried unlinking the top layer socket and testing whether that works? Robert N M Watson Computer Laboratory University of Cambridge
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080425150600.V97018>