From owner-freebsd-questions@FreeBSD.ORG Sun Aug 15 00:11:57 2004 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5EF2416A4CE for ; Sun, 15 Aug 2004 00:11:57 +0000 (GMT) Received: from smtp4.server.rpi.edu (smtp4.server.rpi.edu [128.113.2.4]) by mx1.FreeBSD.org (Postfix) with ESMTP id 01A5443D3F for ; Sun, 15 Aug 2004 00:11:57 +0000 (GMT) (envelope-from drosih@rpi.edu) Received: from [128.113.24.47] (gilead.netel.rpi.edu [128.113.24.47]) by smtp4.server.rpi.edu (8.13.0/8.13.0) with ESMTP id i7F0BtpM013976; Sat, 14 Aug 2004 20:11:55 -0400 Mime-Version: 1.0 X-Sender: drosih@mail.rpi.edu Message-Id: In-Reply-To: <20040814230143.GB8610@grover.logicsquad.net> References: <20040814230143.GB8610@grover.logicsquad.net> Date: Sat, 14 Aug 2004 20:11:54 -0400 To: "Paul A. Hoadley" , freebsd-questions@freebsd.org From: Garance A Drosihn Content-Type: text/plain; charset="us-ascii" ; format="flowed" X-Scanned-By: CanIt (www . canit . ca) Subject: Re: find -exec surprisingly slow X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 15 Aug 2004 00:11:57 -0000 At 8:31 AM +0930 8/15/04, Paul A. Hoadley wrote: >Hello, > >I'm in the process of cleaning a Maildir full of spam. It has >somewhere in the vicinity of 400K files in it. I started running >this yesterday: > >find . -atime +1 -exec mv {} /home/paulh/tmp/spam/sne/ \; > >It's been running for well over 12 hours. It certainly is >working---the spams are slowly moving to their new home---but >it is taking a long time. It's a very modest system, running >4.8-R on a P2-350. I assume this is all overhead for spawning >a shell and running mv 400K times. Some of it is that, and some of it is the performance-penalty of deleting files from a directory which has 400K filenames in it, only to add the same files into a directory which will eventually have 400K filenames in it. Directory adds/deletes are not fast when a directory has that many filenames. It is probably even worse if there are other processes still working on the same directory (such as sendmail importing more mail). Where is '.' in the above `find .' command? Is it is on the same partition as /home/paulh/tmp/spam/sne/ ? You may find it much faster to do something like: mkdir usermail.new chown user:group usermail.new mv usermail usermail.bigspam mv usermail.new usermail cd usermail.bigspam find . \! -atime +1 -exec mv {} ../usermail \; My assumption there is that you have a LOT fewer "good files" than you have "bad files", so there will be fewer files to move. But I am also making the assumption that all your files are in a single directory (and not a tree of directories), which may be a bad assumption. >Is there a better way to move all files based on some characteristic >of their date stamp? Maybe separating the find and the move, piping >it through xargs? The thing to use is the '-J' option of xargs. That way you can have the destination-directory be the last argument in the command that gets executed, and yet you're still moving as many files in a single `mv' command as possible. E.g., change my earlier `find' command to: find . \! -atime +1 -print0 | xargs -0J[] mv [] ../usermail Check the man page for xargs for a description of -J -- Garance Alistair Drosehn = gad@gilead.netel.rpi.edu Senior Systems Programmer or gad@freebsd.org Rensselaer Polytechnic Institute or drosih@rpi.edu