From owner-freebsd-questions@FreeBSD.ORG Wed Feb 8 03:26:58 2006 Return-Path: X-Original-To: freebsd-questions@freebsd.org Delivered-To: freebsd-questions@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4B50516A420 for ; Wed, 8 Feb 2006 03:26:58 +0000 (GMT) (envelope-from on@cs.ait.ac.th) Received: from mail.cs.ait.ac.th (mail.cs.ait.ac.th [192.41.170.16]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6E19D43D45 for ; Wed, 8 Feb 2006 03:26:56 +0000 (GMT) (envelope-from on@cs.ait.ac.th) Received: from banyan.cs.ait.ac.th (banyan.cs.ait.ac.th [192.41.170.5]) by mail.cs.ait.ac.th (8.12.11/8.12.11) with ESMTP id k183Qmio088512 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 8 Feb 2006 10:26:48 +0700 (ICT) Received: (from on@localhost) by banyan.cs.ait.ac.th (8.13.1/8.12.11) id k183QmwW002090; Wed, 8 Feb 2006 10:26:48 +0700 (ICT) Date: Wed, 8 Feb 2006 10:26:48 +0700 (ICT) Message-Id: <200602080326.k183QmwW002090@banyan.cs.ait.ac.th> From: Olivier Nicole To: on@cs.ait.ac.th In-reply-to: <200602070430.k174UEGT086010@banyan.cs.ait.ac.th> (message from Olivier Nicole on Tue, 7 Feb 2006 11:30:14 +0700 (ICT)) References: <200602070430.k174UEGT086010@banyan.cs.ait.ac.th> X-Virus-Scanned: on CSIM by amavisd-milter (http://www.amavis.org/) Cc: freebsd-questions@freebsd.org Subject: Re: Optimize shell X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Feb 2006 03:26:58 -0000 Thanks for the suggestions. > I am setting up a machine to work as a mail back-up. It receives copy > of every email for every user. When the disk is almost full, I want to > delete older messages up to a total size of 4000000000. Going to database storing was a good idea, but not an issue as the system is already running. Using delete functions from other tools could be a solution though I doubt it goes accross all the users. Using bash could be a way to go, as using locate (possible, but then it would need a second command to get the file size, so I am not sure that it would save much). And my assumption was wrong, the most time consumption was in the sed, not in the sort. In fact I did not need the sed as I could split the fields on the / for sort and pick up the correct argument in awk. Using xargs also speed up the things a small bit. Here is the final solution: mailback66: cat func5 #!/bin/sh /usr/bin/find /home -mindepth 5 -ls | /usr/bin/grep /Maildir/cur/ | /usr/bin/sort -t/ -n +6 | /usr/bin/awk '{sum+=$7; if (sum < 200000000) print $11;}'|xargs cat >/dev/null mailback67: time ./func5 0.806u 3.086s 0:35.69 10.8% 67+405k 9864+21io 5pf+0w And the original one: mailback68: cat func1 #!/bin/sh for i in `/usr/bin/find /home -mindepth 5 -ls | /usr/bin/grep /Maildir/cur/ | /usr/bin/sed -E 's/^ *[0-9]+ +[0-9]+ +[-rwx]+ +[0-9]+ +[^ ]+ +[^ ]+ +([0-9]+) +.*(\/home\/.*\/)([0-9]+)(\..*)$/\3 \1 \2\3\4/' | /usr/bin/sort -n +0 -1 | /usr/bin/awk '{sum+=$2; if (sum < 200000000) print $3;}'`; do cat $i >/dev/null done mailback69: time ./func1 223.665u 12.341s 4:53.42 80.4% 48+315k 9100+13io 0pf+0w 35 seconds is OK. Best regards, Olivier Original question: > I am setting up a machine to work as a mail back-up. It receives copy > of every email for every user. When the disk is almost full, I want to > delete older messages up to a total size of 4000000000. > > Messages are stored in /home/sub_home/user/Maildir/cur in maildir > format. > > Message name is of the form 1137993135.86962_0.machine.cs.ait.ac.th > where the first number is a Unix time stamp. > > I came up with the following sheel to find the messages of all users, > sort them by date and compute the total size up to 4gB. > > for i in `/usr/bin/find /home -mindepth 5 -ls | /usr/bin/grep /Maildir/cur/ | /usr/bin/sed -E 's/^ *[0-9]+ +[0-9]+ +[-rwx]+ +[0-9]+ +[^ ]+ +[^ ]+ +([0-9]+) +.*(\/home\/.*\/)([0-9]+)(\..*)$/\3 \1 \2\3\4/' | /usr/bin/sort -n +0 -1 | /usr/bin/awk '{sum+=$2; if (sum < 4000000000) print $3;}'`; do > /bin/rm $i > done > > find /home -mindepth 5 -ls makes a list of all files and directory at > a depth of 5 and more because my directory structure is so that > messages are store at level 6 > > grep /Maildir/cur/ because courrierimapo tends to put things in other > directories it creates when it needs too > > These two commads give me a list of the form: > > 1397490 8 -rw------- 1 on staff 3124 Jan 27 15:23 /home/java/on/Maildir/cur/1138350182.1413_1.mackine.cs.ait.ac.th > > where 3124 is the size > > The sed command transforms the line into date, size, filname: > > 1137994623 2466 /home/java/on/Maildir/cur/1137994623.87673_0.mail.cs.ait.ac.th > > Then it sorts on the date field and awk is used to sum on the size > field and print the filename until the total of 4gB is reached. > > That works OK, but it is damn slow: for 200 users, 7800 messages and > 302MB it takes something like 3+ minutes... For 25 GB of email it > should take more than 4 hours, this is too much. > > It sems that the long part is the sort: > > without sort > time /usr/bin/find /home -mindepth 5 -ls | /usr/bin/grep /Maildir/cur/ | /usr/bin/sed -E 's/^ *[0-9]+ +[0-9]+ +[-rwx]+ +[0-9]+ +[^ ]+ +[^ ]+ +([0-9]+) +.*(\/home\/.*\/)([0-9]+)(\..*)$/\3 \1 \2\3\4/' | cat /dev/null > 0.026u 0.035s 0:07.67 0.6% 51+979k 0+0io 0pf+0w > > with sort > time /usr/bin/find /home -mindepth 5 -ls | /usr/bin/grep /Maildir/cur/ | /usr/bin/sed -E 's/^ *[0-9]+ +[0-9]+ +[-rwx]+ +[0-9]+ +[^ ]+ +[^ ]+ +([0-9]+) +.*(\/home\/.*\/)([0-9]+)(\..*)$/\3 \1 \2\3\4/' | /usr/bin/sort -n +0 -1 | cat /dev/null > 0.281u 0.366s 3:44.75 0.2% 39+1042k 0+0io 0pf+0w > > Any idea how to speed up the things? > > Thanks in advance, > > Olivier > _______________________________________________ > freebsd-questions@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-questions > To unsubscribe, send any mail to "freebsd-questions-unsubscribe@freebsd.org" >