Date: Fri, 6 Feb 1998 19:26:08 +0000 (GMT) From: Terry Lambert <tlambert@primenet.com> To: kato@migmatite.eps.nagoya-u.ac.jp (KATO Takenori) Cc: current@FreeBSD.ORG Subject: Re: unionfs clobbers a file Message-ID: <199802061926.MAA15103@usr01.primenet.com> In-Reply-To: <19980206210958N.kato@gneiss.eps.nagoya-u.ac.jp> from "KATO Takenori" at Feb 6, 98 09:09:58 pm
next in thread | previous in thread | raw e-mail | index | archive | help
> Current major problem of unionfs is: > Writing a file via unionfs sometimes clobbers the file. > > When new file is created and modified on unionfs, a part of the file > is filled by zero. The size of zero-filled part is always multiple of > 4096 bytes. Easy way to reproduce the problem is: > > # mount -t union /foo /usr/obj > # cd /usr/src > # make world > > When you got signal 11 or other error, please see > /usr/obj/usr/src/tmp/usr/bin/make and > /usr/obj/usr/src/usr.bin/make/.depend. One of them contains zero- > filled field. > > Do you have any idea to solve it? 4096 bytes is a page. Pages are hung off the vnode, when the vnode pager is using the file for backing store. Because of the way vnodes stack, and the lack of a general mechanism for obtaining the backing vnode for a given vnode at the top of a stack, combined with the lack of general support for VOP_GETPAGES and VOP_PUTPAGES in all local media FS implementations, and the vnpager havking a lack of knowledge of whether a given FS is implemented on local media, aliases are created. If I have a vnode that is the local media vnode, and it has pages in place, and I create an overlay vnode, and it has aliases for those pages, then I can get into a situation where the overlay vnode and the local media vnode have the same pages referenced as existing, but only one has copies of the disk pages. When this happens, and you reference the page from the wrong vnode, you get a zero filled page instead, just as you would when extending a file or accessing a page in a sparse file. The easy fix is to modify the vnode pager to not know about where the pages are located on the vnode. This will have two consequences: 1) You *must* support VOP_GETPAGES/VOP_PUTPAGES in local media filesystems for them to continue to work. If you do this, then the "bypass" mechanism of the stacking vnode architecture will "do the right thing" for FS's that do not have these functions in their vnops structure, and the aliases will go away. 2) Most stacking FS's will start to work, except where they've been modified, like the commits that have been threatened to the umapfs. The easy fix is *WRONG*. The unionfs will, in fact, still not work (I think it won't; you can probably kludge it) because of VOP_LOCK and VOP_ADVLOCK. There are deadlocks and recursion panics. The harder fix is to add a VOP_FINALVP to all local media filesystems. Adding a VOP_FINALVP will allow an upper layer to get the backing vnode for a VM object, not matter how buried by other stacks it becomes. This fix will have three consequences: 1) The vnode pager *must* be modified to call VOP_FINALVP to get the backing object on which it is going to operate, instead of using page aliases from random vnode in the stack. If you do this, then the "bypass" mechanism of the stacking vnode architecture will "do the right thing" for FS's that do not have this function in their vnops structure, and the aliases will go away. 2) The advisory locking will need to be hung off a pointer in the generic vnode, instead of off a pointer in the FS specific inode. All advisory locks should be asserted in upper level code instead of in FS code, and should be veto based. The upper level code will use the VOP_FINALVP to get the backing node(s) for the lock range. The locks will then be associated with the data they are locking. 3) Most stacking FS's will start to work, except where they've been modified, like the commits that have been threatened to the umapfs. The unionfs, as a multiplexer, *will* work for VOP_ADVLOCK, since it implements the bypass and no longer has to assert sub-locks on per FS objects. An FS which agregates multiple vp's into a single vp will still need to maintain alias coherency. This is a much smaller problem; the upper level code will assert the VOP_ADVLOCK against the alias vp, and the VOP_ADVLOCK, instead of being a null "non-veto" of the assert, will have to do the assert into the lower layers. This will generally be a non-problem. There are currently no FS's which do this, at this time, and the places where it *is* done are handled as drivers (the vnconfig and ccd code), which is probably the correct way to do it anyway. The unionfs may still fail because of VOP_LOCK, depending on how it is implemented this week. If it's still using the lockmgr code, it will definitely fail, because that code projects a three dimensional geodesic into a two dimensional space. I can explain how to fix this, if you are interested. Generally, allowing the lock to recurse could make it run, but would leave a race condition in the case where the projected image of the lock relationship could have come from the shadow of more than one possible geodesic (make a triangle out of straws and hold it upt to a projection screen until you only see a line and you will approximate the problem). I have, at various times, posted the code to implement the second fix to the -current mailing list; the code should be in the archives (the VOP_ADVLOCK/veto code will be listed under "NFS Client locking"). Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199802061926.MAA15103>