Date: Wed, 26 Mar 2008 14:53:25 +0000 (GMT) From: Robert Watson <rwatson@FreeBSD.org> To: "Alexander V. Chernikov" <melifaro@ipfw.ru> Cc: freebsd-fs@freebsd.org, freebsd-current@FreeBSD.org Subject: Re: unionfs status Message-ID: <20080326142115.K34007@fledge.watson.org> In-Reply-To: <47E9448F.1010304@ipfw.ru> References: <47E9448F.1010304@ipfw.ru>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 25 Mar 2008, Alexander V. Chernikov wrote: > I have made patches solving first 4 problems These patches are available at > http://ipfw.ru/patches/ unionfs2.diff fixes fs mounting onto upper layer, > unionfs_lmount.diff fixes lower unionfs_threads.diff and unionfs_unix.diff > fixes cases 2) and 3) unionfs_rename.diff fixes case with renaming > > Can anybody comment/review ? Dear Alexander, Unfortunately, I don't know too much about unionfs. However, I can comment on the UNIX domain socket patch: > --- sys/fs/unionfs/union_subr.c.orig 2008-03-13 23:10:32.000000000 +0300 > +++ sys/fs/unionfs/union_subr.c 2008-03-13 23:17:34.000000000 +0300 > @@ -160,6 +160,8 @@ > unp->un_path[cnp->cn_namelen] = '\0'; > } > vp->v_type = (uppervp != NULLVP ? uppervp->v_type : lowervp->v_type); > + if (vp->v_type == VSOCK) > + vp->v_socket = (uppervp != NULLVP) ? uppervp->v_socket : lowervp->v_socket; > if ((lowervp != NULLVP) && (lowervp->v_type == VDIR)) > vp->v_mountedhere = lowervp->v_mountedhere; > vp->v_data = unp; I'm a bit worried about this assignment, as it represents an untracked alias for the socket. Let me explain why: UNIX domain sockets may have file system bindings, allowing them to use the file system namespace as a rendezvous for communication. Typical use is that a socket is created, bind() is called on it with a path in some location like /var/run/log. Other processes turn up and connect() to the path, causing a file system lookup to reach the vnode of the socket, and then the socket code follows vp->v_socket to find the socket to connect to. When a bound socket is closed, we follow a back-pointer from the UNIX domain socket to the vnode, and then clear the pointer. Doing this in a race-free manner is somewhat tricky, and I'm not 100% convinced it's correct currently, although it appears to be somewhat close to right. The upshot of all this is that if you copy the pointer value to other vnodes, such as vnodes on upper layer, the UNIX domain socket code won't clear those pointers before freeing the socket they point at. This means that the above code snippet may lead to a v_socket pointer on a higher layer vnode pointing at the right socket, the wrong socket, or possibly some other bit of freed and maybe reused memory. You can imagine a number of schemes to replicate pointer changes around or track the various outstanding references, but I think a more fundamental question is whether this is in fact the right behavior at all. The premise of is that writes flow up, but not down, and "connections" to sockets are read-write events, not read events, most typically. If you're using unionfs to take a template system and "broadcast it" to many jails, you probably don't want all the jails talking to the same syslogd, you want them each talking to their own. When syslogd in a jail finds a disconnected socket, which is effectively what a NULL v_socket pointer means, in /var/run/log, it should be unlinking it and creating a new socket, not reusing the existing file on disk. Robert N M Watson Computer Laboratory University of Cambridge _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080326142115.K34007>