Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 8 Feb 2006 10:26:48 +0700 (ICT)
From:      Olivier Nicole <on@cs.ait.ac.th>
To:        on@cs.ait.ac.th
Cc:        freebsd-questions@freebsd.org
Subject:   Re: Optimize shell
Message-ID:  <200602080326.k183QmwW002090@banyan.cs.ait.ac.th>
In-Reply-To: <200602070430.k174UEGT086010@banyan.cs.ait.ac.th> (message from Olivier Nicole on Tue, 7 Feb 2006 11:30:14 %2B0700 (ICT))
References:  <200602070430.k174UEGT086010@banyan.cs.ait.ac.th>

next in thread | previous in thread | raw e-mail | index | archive | help
Thanks for the suggestions.
> I am setting up a machine to work as a mail back-up. It receives copy
> of every email for every user. When the disk is almost full, I want to
> delete older messages up to a total size of 4000000000.

Going to database storing was a good idea, but not an issue as the
system is already running. Using delete functions from other tools
could be a solution though I doubt it goes accross all the users.

Using bash could be a way to go, as using locate (possible, but then
it would need a second command to get the file size, so I am not sure
that it would save much).

And my assumption was wrong, the most time consumption was in the sed,
not in the sort. In fact I did not need the sed as I could split the
fields on the / for sort and pick up the correct argument in
awk. Using xargs also speed up the things a small bit.

Here is the final solution:

mailback<root>66: cat func5
#!/bin/sh
/usr/bin/find /home -mindepth 5 -ls | /usr/bin/grep /Maildir/cur/ | /usr/bin/sort -t/ -n +6 | /usr/bin/awk '{sum+=$7; if (sum < 200000000) print $11;}'|xargs cat >/dev/null
mailback<root>67: time ./func5
0.806u 3.086s 0:35.69 10.8%     67+405k 9864+21io 5pf+0w

And the original one:

mailback<root>68: cat func1
#!/bin/sh
for i in `/usr/bin/find /home -mindepth 5 -ls | /usr/bin/grep /Maildir/cur/ | /usr/bin/sed -E 's/^ *[0-9]+ +[0-9]+ +[-rwx]+ +[0-9]+ +[^ ]+ +[^ ]+ +([0-9]+) +.*(\/home\/.*\/)([0-9]+)(\..*)$/\3 \1 \2\3\4/' | /usr/bin/sort -n +0 -1 | /usr/bin/awk '{sum+=$2; if (sum < 200000000) print $3;}'`; do
cat $i >/dev/null
done
mailback<root>69: time ./func1
223.665u 12.341s 4:53.42 80.4%  48+315k 9100+13io 0pf+0w

35 seconds is OK.

Best regards,

Olivier

Original question:
>  I am setting up a machine to work as a mail back-up. It receives copy
> of every email for every user. When the disk is almost full, I want to
> delete older messages up to a total size of 4000000000.
> 
> Messages are stored in /home/sub_home/user/Maildir/cur in maildir
> format. 
> 
> Message name is of the form 1137993135.86962_0.machine.cs.ait.ac.th
> where the first number is a Unix time stamp.
> 
> I came up with the following sheel to find the messages of all users,
> sort them by date and compute the total size up to 4gB.
> 
> for i in `/usr/bin/find /home -mindepth 5 -ls | /usr/bin/grep /Maildir/cur/ | /usr/bin/sed -E 's/^ *[0-9]+ +[0-9]+ +[-rwx]+ +[0-9]+ +[^ ]+ +[^ ]+ +([0-9]+) +.*(\/home\/.*\/)([0-9]+)(\..*)$/\3 \1 \2\3\4/' | /usr/bin/sort -n +0 -1 | /usr/bin/awk '{sum+=$2; if (sum < 4000000000) print $3;}'`; do
>     /bin/rm $i
> done
> 
> find /home -mindepth 5 -ls makes a list of all files and directory at
>      a depth of 5 and more because my directory structure is so that
>      messages are store at level 6
> 
> grep /Maildir/cur/ because courrierimapo tends to put things in other
>      directories it creates when it needs too
> 
> These two commads give me a list of the form:
> 
> 1397490    8 -rw-------    1 on               staff            3124 Jan 27 15:23 /home/java/on/Maildir/cur/1138350182.1413_1.mackine.cs.ait.ac.th
> 
> where 3124 is the size
> 
> The sed command transforms the line into date, size, filname:
> 
> 1137994623 2466 /home/java/on/Maildir/cur/1137994623.87673_0.mail.cs.ait.ac.th
> 
> Then it sorts on the date field and awk is used to sum on the size
> field and print the filename until the total of 4gB is reached.
> 
> That works OK, but it is damn slow: for 200 users, 7800 messages and
> 302MB it takes something like 3+ minutes... For 25 GB of email it
> should take more than 4 hours, this is too much.
> 
> It sems that the long part is the sort:
> 
> without sort
> time /usr/bin/find /home -mindepth 5 -ls | /usr/bin/grep /Maildir/cur/ | /usr/bin/sed -E 's/^ *[0-9]+ +[0-9]+ +[-rwx]+ +[0-9]+ +[^ ]+ +[^ ]+ +([0-9]+) +.*(\/home\/.*\/)([0-9]+)(\..*)$/\3 \1 \2\3\4/' |  cat /dev/null
> 0.026u 0.035s 0:07.67 0.6%      51+979k 0+0io 0pf+0w
> 
> with sort
> time /usr/bin/find /home -mindepth 5 -ls | /usr/bin/grep /Maildir/cur/ | /usr/bin/sed -E 's/^ *[0-9]+ +[0-9]+ +[-rwx]+ +[0-9]+ +[^ ]+ +[^ ]+ +([0-9]+) +.*(\/home\/.*\/)([0-9]+)(\..*)$/\3 \1 \2\3\4/' | /usr/bin/sort -n +0 -1 | cat /dev/null
> 0.281u 0.366s 3:44.75 0.2%      39+1042k 0+0io 0pf+0w
> 
> Any idea how to speed up the things?
> 
> Thanks in advance,
> 
> Olivier
> _______________________________________________
> freebsd-questions@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-questions
> To unsubscribe, send any mail to "freebsd-questions-unsubscribe@freebsd.org"
> 



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200602080326.k183QmwW002090>