Date: Wed, 8 Feb 2006 10:26:48 +0700 (ICT) From: Olivier Nicole <on@cs.ait.ac.th> To: on@cs.ait.ac.th Cc: freebsd-questions@freebsd.org Subject: Re: Optimize shell Message-ID: <200602080326.k183QmwW002090@banyan.cs.ait.ac.th> In-Reply-To: <200602070430.k174UEGT086010@banyan.cs.ait.ac.th> (message from Olivier Nicole on Tue, 7 Feb 2006 11:30:14 %2B0700 (ICT)) References: <200602070430.k174UEGT086010@banyan.cs.ait.ac.th>
next in thread | previous in thread | raw e-mail | index | archive | help
Thanks for the suggestions. > I am setting up a machine to work as a mail back-up. It receives copy > of every email for every user. When the disk is almost full, I want to > delete older messages up to a total size of 4000000000. Going to database storing was a good idea, but not an issue as the system is already running. Using delete functions from other tools could be a solution though I doubt it goes accross all the users. Using bash could be a way to go, as using locate (possible, but then it would need a second command to get the file size, so I am not sure that it would save much). And my assumption was wrong, the most time consumption was in the sed, not in the sort. In fact I did not need the sed as I could split the fields on the / for sort and pick up the correct argument in awk. Using xargs also speed up the things a small bit. Here is the final solution: mailback<root>66: cat func5 #!/bin/sh /usr/bin/find /home -mindepth 5 -ls | /usr/bin/grep /Maildir/cur/ | /usr/bin/sort -t/ -n +6 | /usr/bin/awk '{sum+=$7; if (sum < 200000000) print $11;}'|xargs cat >/dev/null mailback<root>67: time ./func5 0.806u 3.086s 0:35.69 10.8% 67+405k 9864+21io 5pf+0w And the original one: mailback<root>68: cat func1 #!/bin/sh for i in `/usr/bin/find /home -mindepth 5 -ls | /usr/bin/grep /Maildir/cur/ | /usr/bin/sed -E 's/^ *[0-9]+ +[0-9]+ +[-rwx]+ +[0-9]+ +[^ ]+ +[^ ]+ +([0-9]+) +.*(\/home\/.*\/)([0-9]+)(\..*)$/\3 \1 \2\3\4/' | /usr/bin/sort -n +0 -1 | /usr/bin/awk '{sum+=$2; if (sum < 200000000) print $3;}'`; do cat $i >/dev/null done mailback<root>69: time ./func1 223.665u 12.341s 4:53.42 80.4% 48+315k 9100+13io 0pf+0w 35 seconds is OK. Best regards, Olivier Original question: > I am setting up a machine to work as a mail back-up. It receives copy > of every email for every user. When the disk is almost full, I want to > delete older messages up to a total size of 4000000000. > > Messages are stored in /home/sub_home/user/Maildir/cur in maildir > format. > > Message name is of the form 1137993135.86962_0.machine.cs.ait.ac.th > where the first number is a Unix time stamp. > > I came up with the following sheel to find the messages of all users, > sort them by date and compute the total size up to 4gB. > > for i in `/usr/bin/find /home -mindepth 5 -ls | /usr/bin/grep /Maildir/cur/ | /usr/bin/sed -E 's/^ *[0-9]+ +[0-9]+ +[-rwx]+ +[0-9]+ +[^ ]+ +[^ ]+ +([0-9]+) +.*(\/home\/.*\/)([0-9]+)(\..*)$/\3 \1 \2\3\4/' | /usr/bin/sort -n +0 -1 | /usr/bin/awk '{sum+=$2; if (sum < 4000000000) print $3;}'`; do > /bin/rm $i > done > > find /home -mindepth 5 -ls makes a list of all files and directory at > a depth of 5 and more because my directory structure is so that > messages are store at level 6 > > grep /Maildir/cur/ because courrierimapo tends to put things in other > directories it creates when it needs too > > These two commads give me a list of the form: > > 1397490 8 -rw------- 1 on staff 3124 Jan 27 15:23 /home/java/on/Maildir/cur/1138350182.1413_1.mackine.cs.ait.ac.th > > where 3124 is the size > > The sed command transforms the line into date, size, filname: > > 1137994623 2466 /home/java/on/Maildir/cur/1137994623.87673_0.mail.cs.ait.ac.th > > Then it sorts on the date field and awk is used to sum on the size > field and print the filename until the total of 4gB is reached. > > That works OK, but it is damn slow: for 200 users, 7800 messages and > 302MB it takes something like 3+ minutes... For 25 GB of email it > should take more than 4 hours, this is too much. > > It sems that the long part is the sort: > > without sort > time /usr/bin/find /home -mindepth 5 -ls | /usr/bin/grep /Maildir/cur/ | /usr/bin/sed -E 's/^ *[0-9]+ +[0-9]+ +[-rwx]+ +[0-9]+ +[^ ]+ +[^ ]+ +([0-9]+) +.*(\/home\/.*\/)([0-9]+)(\..*)$/\3 \1 \2\3\4/' | cat /dev/null > 0.026u 0.035s 0:07.67 0.6% 51+979k 0+0io 0pf+0w > > with sort > time /usr/bin/find /home -mindepth 5 -ls | /usr/bin/grep /Maildir/cur/ | /usr/bin/sed -E 's/^ *[0-9]+ +[0-9]+ +[-rwx]+ +[0-9]+ +[^ ]+ +[^ ]+ +([0-9]+) +.*(\/home\/.*\/)([0-9]+)(\..*)$/\3 \1 \2\3\4/' | /usr/bin/sort -n +0 -1 | cat /dev/null > 0.281u 0.366s 3:44.75 0.2% 39+1042k 0+0io 0pf+0w > > Any idea how to speed up the things? > > Thanks in advance, > > Olivier > _______________________________________________ > freebsd-questions@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-questions > To unsubscribe, send any mail to "freebsd-questions-unsubscribe@freebsd.org" >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200602080326.k183QmwW002090>