From owner-freebsd-bugs@FreeBSD.ORG Wed Apr 28 00:40:17 2004 Return-Path: Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 861B916A4CE for ; Wed, 28 Apr 2004 00:40:17 -0700 (PDT) Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6C98C43D1D for ; Wed, 28 Apr 2004 00:40:17 -0700 (PDT) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) i3S7eHe1049168 for ; Wed, 28 Apr 2004 00:40:17 -0700 (PDT) (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.12.11/8.12.11/Submit) id i3S7eHcG049167; Wed, 28 Apr 2004 00:40:17 -0700 (PDT) (envelope-from gnats) Date: Wed, 28 Apr 2004 00:40:17 -0700 (PDT) Message-Id: <200404280740.i3S7eHcG049167@freefall.freebsd.org> To: freebsd-bugs@FreeBSD.org From: Bruce Evans Subject: Re: bin/53475: cp(1) copies files in reverse order to destination X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: Bruce Evans List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 Apr 2004 07:40:17 -0000 The following reply was made to PR bin/53475; it has been noted by GNATS. From: Bruce Evans To: "Dorr H. Clark" Cc: freebsd-gnats-submit@freebsd.org Subject: Re: bin/53475: cp(1) copies files in reverse order to destination Date: Wed, 28 Apr 2004 17:37:25 +1000 (EST) On Tue, 27 Apr 2004, Dorr H. Clark wrote: > ... > -/* > - * mastercmp -- > - * The comparison function for the copy order. The order is to > copy > - * non-directory files before directory files. The reason for this > - * is because files tend to be in the same cylinder group as their > - * parent directory, whereas directories tend not to be. Copying > the > - * files first reduces seeking. > - */ According to cp -pRv, mastercmp() gets this perfectly backwards: cp actually copies directories first. It seems to just randomize the order of regular files; this is presumably because mastercmp() doesn't distinguish between all pairs of different files and qsort() doesn't preserve the original order. > ... > As quoted above, the comments in cp.c tell us the function > mastercmp() is an attempt to improve performance based on > knowing something about physical disks. > > This is an old optimization strategy (it's in the original > version of cp.c). AFAIK, in the updated BSD filesystem, > when we copy a file, we don't actually move the > physical data block of the file but change the information in its > inode such as the address of its data block and owner. Copying still involves lots of physical i/o. The difference in relatively recent versions of ffs is that it doesn't scatter the files so much by switching the cylinder group too often. IIRC, it switched for every directory. > The next question is whether deleting mastercmp eliminates > an optimization. Our testing shows the exact opposite, > mastercmp is degrading performance. We did several experiments > with cp -R to measure elapsed time on transfers between devices > of differing file system types (to avoid UFS2 optimizations). > Our results show removing mastercmp yields a small performance > gain (note: we had no SCSI devices available, and second note: > variability in file system performance seems dominated > by other factors). It would be interesting to know if mastercmp() works better if it does what its comment says it does. I suspect that the backwardsness doesn't make much difference, but is worse than it used to be because there is now more competition for space in the same cylinder group. I think benchmarks that don't descend into subdirs would show that using mastercmp really is an optimization for that access pattern, but I think that access pattern is relatively unusual. Optimizing for the default fts order seems as good as anything. > M. K. McKusick has indicated in seminars that modern disk drives > lie to the driver about their physical layouts. The use of > mastercmp in cp.c is a legacy optimization from a different > era of disk technology. We recommend removing this call > from cp.c to address 53475. Large seeks (especially ones larger than the drive's cache) still matter, and I think drivers rarely lie about these. cp's attempted optimization is more about second-guessing what ffs does. I agree that it shouldn't do this. The file system might not even be ffs. Bruce