Date: Sat, 14 Aug 2004 20:11:54 -0400 From: Garance A Drosihn <drosih@rpi.edu> To: "Paul A. Hoadley" <paulh@logicsquad.net>, freebsd-questions@freebsd.org Subject: Re: find -exec surprisingly slow Message-ID: <p06110400bd4454ff72b3@[128.113.24.47]> In-Reply-To: <20040814230143.GB8610@grover.logicsquad.net> References: <20040814230143.GB8610@grover.logicsquad.net>
next in thread | previous in thread | raw e-mail | index | archive | help
At 8:31 AM +0930 8/15/04, Paul A. Hoadley wrote: >Hello, > >I'm in the process of cleaning a Maildir full of spam. It has >somewhere in the vicinity of 400K files in it. I started running >this yesterday: > >find . -atime +1 -exec mv {} /home/paulh/tmp/spam/sne/ \; > >It's been running for well over 12 hours. It certainly is >working---the spams are slowly moving to their new home---but >it is taking a long time. It's a very modest system, running >4.8-R on a P2-350. I assume this is all overhead for spawning >a shell and running mv 400K times. Some of it is that, and some of it is the performance-penalty of deleting files from a directory which has 400K filenames in it, only to add the same files into a directory which will eventually have 400K filenames in it. Directory adds/deletes are not fast when a directory has that many filenames. It is probably even worse if there are other processes still working on the same directory (such as sendmail importing more mail). Where is '.' in the above `find .' command? Is it is on the same partition as /home/paulh/tmp/spam/sne/ ? You may find it much faster to do something like: mkdir usermail.new chown user:group usermail.new mv usermail usermail.bigspam mv usermail.new usermail cd usermail.bigspam find . \! -atime +1 -exec mv {} ../usermail \; My assumption there is that you have a LOT fewer "good files" than you have "bad files", so there will be fewer files to move. But I am also making the assumption that all your files are in a single directory (and not a tree of directories), which may be a bad assumption. >Is there a better way to move all files based on some characteristic >of their date stamp? Maybe separating the find and the move, piping >it through xargs? The thing to use is the '-J' option of xargs. That way you can have the destination-directory be the last argument in the command that gets executed, and yet you're still moving as many files in a single `mv' command as possible. E.g., change my earlier `find' command to: find . \! -atime +1 -print0 | xargs -0J[] mv [] ../usermail Check the man page for xargs for a description of -J -- Garance Alistair Drosehn = gad@gilead.netel.rpi.edu Senior Systems Programmer or gad@freebsd.org Rensselaer Polytechnic Institute or drosih@rpi.edu
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?p06110400bd4454ff72b3>