Date: Tue, 7 Feb 2006 11:30:14 +0700 (ICT) From: Olivier Nicole <on@cs.ait.ac.th> To: freebsd-questions@freebsd.org Subject: Optimize shell Message-ID: <200602070430.k174UEGT086010@banyan.cs.ait.ac.th>
next in thread | raw e-mail | index | archive | help
Hello, I am setting up a machine to work as a mail back-up. It receives copy of every email for every user. When the disk is almost full, I want to delete older messages up to a total size of 4000000000. Messages are stored in /home/sub_home/user/Maildir/cur in maildir format. Message name is of the form 1137993135.86962_0.machine.cs.ait.ac.th where the first number is a Unix time stamp. I came up with the following sheel to find the messages of all users, sort them by date and compute the total size up to 4gB. for i in `/usr/bin/find /home -mindepth 5 -ls | /usr/bin/grep /Maildir/cur/ | /usr/bin/sed -E 's/^ *[0-9]+ +[0-9]+ +[-rwx]+ +[0-9]+ +[^ ]+ +[^ ]+ +([0-9]+) +.*(\/home\/.*\/)([0-9]+)(\..*)$/\3 \1 \2\3\4/' | /usr/bin/sort -n +0 -1 | /usr/bin/awk '{sum+=$2; if (sum < 4000000000) print $3;}'`; do /bin/rm $i done find /home -mindepth 5 -ls makes a list of all files and directory at a depth of 5 and more because my directory structure is so that messages are store at level 6 grep /Maildir/cur/ because courrierimapo tends to put things in other directories it creates when it needs too These two commads give me a list of the form: 1397490 8 -rw------- 1 on staff 3124 Jan 27 15:23 /home/java/on/Maildir/cur/1138350182.1413_1.mackine.cs.ait.ac.th where 3124 is the size The sed command transforms the line into date, size, filname: 1137994623 2466 /home/java/on/Maildir/cur/1137994623.87673_0.mail.cs.ait.ac.th Then it sorts on the date field and awk is used to sum on the size field and print the filename until the total of 4gB is reached. That works OK, but it is damn slow: for 200 users, 7800 messages and 302MB it takes something like 3+ minutes... For 25 GB of email it should take more than 4 hours, this is too much. It sems that the long part is the sort: without sort time /usr/bin/find /home -mindepth 5 -ls | /usr/bin/grep /Maildir/cur/ | /usr/bin/sed -E 's/^ *[0-9]+ +[0-9]+ +[-rwx]+ +[0-9]+ +[^ ]+ +[^ ]+ +([0-9]+) +.*(\/home\/.*\/)([0-9]+)(\..*)$/\3 \1 \2\3\4/' | cat /dev/null 0.026u 0.035s 0:07.67 0.6% 51+979k 0+0io 0pf+0w with sort time /usr/bin/find /home -mindepth 5 -ls | /usr/bin/grep /Maildir/cur/ | /usr/bin/sed -E 's/^ *[0-9]+ +[0-9]+ +[-rwx]+ +[0-9]+ +[^ ]+ +[^ ]+ +([0-9]+) +.*(\/home\/.*\/)([0-9]+)(\..*)$/\3 \1 \2\3\4/' | /usr/bin/sort -n +0 -1 | cat /dev/null 0.281u 0.366s 3:44.75 0.2% 39+1042k 0+0io 0pf+0w Any idea how to speed up the things? Thanks in advance, Olivier
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200602070430.k174UEGT086010>