From owner-freebsd-fs@FreeBSD.ORG Wed Mar 26 14:53:26 2008 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B3E5D106566B; Wed, 26 Mar 2008 14:53:26 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id 649518FC1F; Wed, 26 Mar 2008 14:53:26 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 2808C46B04; Wed, 26 Mar 2008 10:53:25 -0400 (EDT) Date: Wed, 26 Mar 2008 14:53:25 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: "Alexander V. Chernikov" In-Reply-To: <47E9448F.1010304@ipfw.ru> Message-ID: <20080326142115.K34007@fledge.watson.org> References: <47E9448F.1010304@ipfw.ru> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-fs@freebsd.org, freebsd-current@FreeBSD.org Subject: Re: unionfs status X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Mar 2008 14:53:26 -0000 On Tue, 25 Mar 2008, Alexander V. Chernikov wrote: > I have made patches solving first 4 problems These patches are available at > http://ipfw.ru/patches/ unionfs2.diff fixes fs mounting onto upper layer, > unionfs_lmount.diff fixes lower unionfs_threads.diff and unionfs_unix.diff > fixes cases 2) and 3) unionfs_rename.diff fixes case with renaming > > Can anybody comment/review ? Dear Alexander, Unfortunately, I don't know too much about unionfs. However, I can comment on the UNIX domain socket patch: > --- sys/fs/unionfs/union_subr.c.orig 2008-03-13 23:10:32.000000000 +0300 > +++ sys/fs/unionfs/union_subr.c 2008-03-13 23:17:34.000000000 +0300 > @@ -160,6 +160,8 @@ > unp->un_path[cnp->cn_namelen] = '\0'; > } > vp->v_type = (uppervp != NULLVP ? uppervp->v_type : lowervp->v_type); > + if (vp->v_type == VSOCK) > + vp->v_socket = (uppervp != NULLVP) ? uppervp->v_socket : lowervp->v_socket; > if ((lowervp != NULLVP) && (lowervp->v_type == VDIR)) > vp->v_mountedhere = lowervp->v_mountedhere; > vp->v_data = unp; I'm a bit worried about this assignment, as it represents an untracked alias for the socket. Let me explain why: UNIX domain sockets may have file system bindings, allowing them to use the file system namespace as a rendezvous for communication. Typical use is that a socket is created, bind() is called on it with a path in some location like /var/run/log. Other processes turn up and connect() to the path, causing a file system lookup to reach the vnode of the socket, and then the socket code follows vp->v_socket to find the socket to connect to. When a bound socket is closed, we follow a back-pointer from the UNIX domain socket to the vnode, and then clear the pointer. Doing this in a race-free manner is somewhat tricky, and I'm not 100% convinced it's correct currently, although it appears to be somewhat close to right. The upshot of all this is that if you copy the pointer value to other vnodes, such as vnodes on upper layer, the UNIX domain socket code won't clear those pointers before freeing the socket they point at. This means that the above code snippet may lead to a v_socket pointer on a higher layer vnode pointing at the right socket, the wrong socket, or possibly some other bit of freed and maybe reused memory. You can imagine a number of schemes to replicate pointer changes around or track the various outstanding references, but I think a more fundamental question is whether this is in fact the right behavior at all. The premise of is that writes flow up, but not down, and "connections" to sockets are read-write events, not read events, most typically. If you're using unionfs to take a template system and "broadcast it" to many jails, you probably don't want all the jails talking to the same syslogd, you want them each talking to their own. When syslogd in a jail finds a disconnected socket, which is effectively what a NULL v_socket pointer means, in /var/run/log, it should be unlinking it and creating a new socket, not reusing the existing file on disk. Robert N M Watson Computer Laboratory University of Cambridge