From owner-freebsd-fs@FreeBSD.ORG Wed Jul 6 20:10:01 2005 Return-Path: X-Original-To: freebsd-fs@freebsd.org Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3EFA716A41C; Wed, 6 Jul 2005 20:10:01 +0000 (GMT) (envelope-from refugee@astraldream.net) Received: from efnet-math.org (efnet-math.org [69.60.109.125]) by mx1.FreeBSD.org (Postfix) with ESMTP id 558AB43D46; Wed, 6 Jul 2005 20:10:00 +0000 (GMT) (envelope-from refugee@astraldream.net) Received: from [192.168.0.99] ([70.108.75.148]) (authenticated bits=0) by efnet-math.org (8.13.1/8.13.1) with ESMTP id j66K9vAu014110 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=NO); Wed, 6 Jul 2005 16:09:57 -0400 In-Reply-To: <20050705030015.N790@epsplex.bde.org> References: <20050703181616.GC89744@cvs.freesbie.org> <42C83643.4010506@samsco.org> <20050703201621.GD89744@cvs.freesbie.org> <1120425831.77984.37993.camel@palm> <42C87CAE.7080802@samsco.org> <1120436351.77984.38195.camel@palm> <42C88121.8010602@samsco.org> <1120437780.77984.38252.camel@palm> <20050704095431.GA31030@cvs.freesbie.org> <20050705030015.N790@epsplex.bde.org> Mime-Version: 1.0 (Apple Message framework v730) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: Content-Transfer-Encoding: 7bit From: Suleiman Souhlal Date: Wed, 6 Jul 2005 16:09:51 -0400 To: Bruce Evans X-Mailer: Apple Mail (2.730) Cc: freebsd-fs@freebsd.org, ken@freebsd.org, Stephan Uphoff Subject: Re: [saturnero@freesbie.org: Weird behaviour of mount_unionfs with executables] X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Jul 2005 20:10:01 -0000 Hi, On Jul 4, 2005, at 2:52 PM, Bruce Evans wrote: > On Mon, 4 Jul 2005, Dario Freni wrote: > > >> On Sun, Jul 03, 2005 at 08:43:01PM -0400, Stephan Uphoff wrote: >> >>> YES - and it looks like just specifying noatime for the union mount >>> should fix the copy problem for FreeSBIE. >>> >> >> Correct. Using -o noatime when mounting the unionfs resolves all >> above >> problems. Thank you very much. >> > > Hmm. This shouldn't work, but helps avoid the copy due to a layering > violation. In kern_execve(), the test of MNT_NOATIME is supposed to > be just an optimization to avoid making null VOP_SETATTR(), but it > only looks at the mount flag for the top layer so it doesn't work > right if the mount flag is different in lower layers. This gives the > bizarre behaviour that the noatime option for unionsfs "works" for > exec() but not for read(). > > I could only see the duplication problem for mount_unionfs -b. There > are lots of bugs in that case: > = no copy is made for the modification caused by read(), despite > mount_unionfs(8) claiming that: > > Any other operation which would ultimately require > modification to the > ^^^ > lower layer fails with EROFS. > > - the error for failure is often not EROFS (but this may be a side > effect). > - exec() makes a bad copy. It fails to copy the access times and > doesn't > even copy the permissions; in particular, it loses the x bits so the > file becomes non-executable. > - touch(1) makes a bad copy like exec() and then fails to actually > touch > the file except for the clobbered times in the non-copy being > current. > (touch tries utimes(2) twice, first with the current times and > then with > a null times pointer. Both calls fail with EACCES. I think a > bad copy > is made on the first call ...). Subsequent touch(1)'s work normally > (... touch tries utimes(2) only once, only with the current > times, and > since this works there seems to be a race for the earlier calls > to fail > (I tested as root, so there should be no permissions problems > with either > call, and since the middle utimes(path, NULL) call failed the bug > can't > be just a stale vp after making the copy in the first call).) > > The comment in union_setattr() only claims to make a copy to handle > truncation, but a copy seems to be made when any attribute is set: > > % static int > % union_setattr(ap) > % ... > % /* > % * Handle case of truncating lower object to zero size > % * by creating a zero length upper object. This is to > % * handle the case of open with O_TRUNC and O_CREAT. > % */ > % if (un->un_uppervp == NULLVP && (un->un_lowervp->v_type == > VREG)) { > % error = union_copyup(un, (ap->a_vap->va_size != 0), > % ap->a_cred, ap->a_td); > % if (error) > % return (error); > % } > > Copying on any change to an attribute seems right, but it doesn't > actually handle implicit changes that don't go through setattr -- > mainly setting of atimes on read(). Copying on read() wouldn't be > right. > > % % /* > % * Try to set attributes in upper layer, > % * otherwise return read-only filesystem error. > % */ > % error = EROFS; > % if ((uppervp = union_lock_upper(un, td)) != NULLVP) { > % error = VOP_SETATTR(un->un_uppervp, ap->a_vap, > % ap->a_cred, ap->a_td); > % if ((error == 0) && (ap->a_vap->va_size != VNOVAL)) > % union_newsize(ap->a_vp, ap->a_vap->va_size, VNOVAL); > % union_unlock_upper(uppervp, td); > % } > % return (error); > % } > > The second clause in the comment doesn't match the code here -- the > error > is very rarely EROFS since it is usually the error returned by > VOP_SETATTR() > and we have avoided getting this far in most cases where VOP_SETATTR() > would return EROFS. The comment was closer to matching the code in > rev.1.1 > where decideding EROFS was left to lower layers. > > Bruce The vnode locking is also wrong for several cases: - According sys/kern/vnode_if.src we should return from VOP_OPEN() with vp locked, but union_open() unlocks it before returning it. - Same thing in union_access(). - We are not making sure we are calling VOP_CLOSE() with the vnode exclusively locked in union_close(). - We don't lock the upper and lower vnodes when doing VOP_GETATTR() on them in union_getattr(). - We don't lock othervp when doing VOP_STRATEGY() in union_strategy(). I think there are also other problems, but haven't had time to look for them yet.