FreeBSD Mail Archives

Date:      Fri, 25 Apr 2008 15:19:16 +0100 (BST)
From:      Robert Watson <rwatson@FreeBSD.org>
To:        Daichi GOTO <daichi@freebsd.org>
Cc:        cvs-src@FreeBSD.org, src-committers@FreeBSD.org, Masanori OZAWA <ozawa@ongs.co.jp>, cvs-all@FreeBSD.org
Subject:   Re: cvs commit: src/sys/fs/unionfs union_subr.c
Message-ID:  <20080425150600.V97018@fledge.watson.org>
In-Reply-To: <4811DE6F.9040604@freebsd.org>
References:  <200804250953.m3P9rrpd011741@repoman.freebsd.org> <20080425131229.C80552@fledge.watson.org> <4811DE6F.9040604@freebsd.org>

On Fri, 25 Apr 2008, Daichi GOTO wrote:

>> Per my earlier e-mail, and assuming I understand correctly, I feel not only 
>> will this lead to new panics (due to dangling socket pointers and 
>> incomplete garbage collection),
>
> We, I and ozawa-san, cannot image the case that leads new panic. Would you 
> tell us how to get the panic if you can get it? Is it rare case to get or 
> not?

The explanation is somewhat complicated, so I apologize if I'm unclear in 
explaining it.

The UNIX domain (local) socket subsystem provides an IPC service based on 
sockets, but using the file system as a name space so that processes can 
rendezvous with one another.  A server process, such as syslogd, will call 
bind(2) to associate a socket with a path, such as /var/run/log or 
/var/run/logpriv.  In the file system, the way this works is that a vnode is 
hooked up to the namespace of type VSOCK, its v_socket pointer is initialized 
to point at the socket structure, and presumably the file system does some 
underlying storage magic to put it on disk (such as creating an inode).  The 
socket also maintains a back pointer to the vnode, unp_vnode, which will be 
used when the socket is closed, which is where things get tricky.

Consider the implementation of UNIX domain socket close -- when the socket is 
closed, the protocol state is detached by uipc_detach, which does the 
clears the pointer from the vnode to the socket and vice versa:

         if ((vp = unp->unp_vnode) != NULL) {
                 unp->unp_vnode->v_socket = NULL;
                 unp->unp_vnode = NULL;
         }

Once uipc_detach has returned, the unpcb structure (pointed to by unp in the 
above code) is no longer valid, and shortly thereafter, the socket pointer is 
also invalid, both pointing to freed memory.  The UNIX domain socket code is 
very careful to remove the reference from the vnode so that new threads won't 
dereference the pointer improperly.  However, notice that in the above code, 
only the "bottom" layer v_socket pointer is cleared, not higher layers, which 
means that those higher layers will now point at freed memory, which may lead 
to panics.

I haven't tried this, but I suspect you will be able to reproduce the panic if 
you: start syslogd against a base file system, union mount it to a new 
location, run the "syslog" command relative to the new file system mount, kill 
the base syslogd, then run the "syslog" command a second time.  On the first 
occasion, it will work, since the v_socket pointer in the top layer points at 
the socket referenced by the bottom layer.  However, when you kill syslogd, it 
closes the socket, which frees the socket structure pointed to by v_socket in 
the bottom layer, but not in the top layer.  The next run of syslog will 
follow the stale v_socket pointer.

Does this make sense?

>> but it will also lead to possibly incorrect semantics for unionfs(upper 
>> layers can write to objects readable via the lower layer).
>
>> Some parts of this patch are fine, but the copying of v_socket pointers 
>> between layers is not correct.  Please consider backing that part of the 
>> change out.
>
> Yes, we have noticed above. But.... at least, our patch solves problem of 
> http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/118346 we believe. If I roll 
> back that, that problem is still there.

I would assert than an error is better than a panic. :-)

> Uhmmm... is it better to get back it?  And if you have some ideas to solve 
> this issue, please tell us :)  Thanks

I'm not 100% sure what the right solution is, but one approach might be to 
have the vnodes at the different layers simply refer to different sockets. 
Applicaitons should unlink the old socket in the top layer when they discover 
a stale socket there, and then create a new socket (masking the bottom layer 
socket), which should just work.  Have you tried unlinking the top layer 
socket and testing whether that works?

Robert N M Watson
Computer Laboratory
University of Cambridge

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080425150600.V97018>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation