From owner-freebsd-hackers Wed Feb 21 10:37:29 2001 Delivered-To: freebsd-hackers@freebsd.org Received: from mobile.wemm.org (c1315225-a.plstn1.sfba.home.com [65.0.135.147]) by hub.freebsd.org (Postfix) with ESMTP id 4EAA237B503 for ; Wed, 21 Feb 2001 10:37:18 -0800 (PST) (envelope-from peter@netplex.com.au) Received: from netplex.com.au (localhost [127.0.0.1]) by mobile.wemm.org (8.11.1/8.11.1) with ESMTP id f1LIb3f26667; Wed, 21 Feb 2001 10:37:04 -0800 (PST) (envelope-from peter@netplex.com.au) Message-Id: <200102211837.f1LIb3f26667@mobile.wemm.org> X-Mailer: exmh version 2.2 06/23/2000 with nmh-1.0.4 To: Gordon Tetlow Cc: scanner@jurai.net, Dan Phoenix , freebsd-hackers@FreeBSD.ORG Subject: Re: qmail IO--qmail vs postfix competition In-Reply-To: Date: Wed, 21 Feb 2001 10:37:03 -0800 From: Peter Wemm Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Gordon Tetlow wrote: > On Tue, 20 Feb 2001 scanner@jurai.net wrote: > > > Aha. That explains it. You use HW raid. I wondered why you were > > only doing 4 million mails for *30* boxes. Dan, is doing 500K, on a > > completely idle box (cpu/ram/I/O wise), with vinum, Postfix, and RAID-0. > > Have you seen brad knowles papers on vinum vs HW raid? It's erm > > enlightening to say the least :) Id be happy to dig up the URL if you are > > interested. I personally will be using Vinum from now on. The performance > > is very impressive. > > Well, as I said, these boxes are rather bored. I don't think the load > reaches above 0.05. Most of the time is delivering mail trying to > negotiate with destination hosts. I don't think that the mailers are IO > bound, but I haven't really looked to find out to tell you the truth. Once > the mailers are set up we treat them as black boxes. They just work. > > Also, the 500K number, is that per day? The 4 million was in 4 hours, not > a day. Another bored box: mx1.freebsd.org$ grep 'status=sent' /var/log/mail | wc -l 331877 It is 8 hours since the last rollover. Unfortunately, it spends most of its time waiting for something to do and looking at broken mail servers. It delivers most of its mail in a few seconds. We see it peaking at delivering several hundred envelopes per second shortly after getting a large mailing list to digest. Here's a quick histogram of what those 8 hours look like: mx1.freebsd.org$ sh hist.sh zero 1292 1292 0.36577 0.36577 one 4983 6275 1.41071 1.77648 two 7680 13955 2.17424 3.95072 three 10741 24696 3.04082 6.99154 five 30853 55549 8.73461 15.7261 seven 37626 93175 10.6521 26.3782 ten 48169 141344 13.6368 40.0151 fifteen 66877 208221 18.9332 58.9482 twenty 44244 252465 12.5257 71.4739 thirty 48059 300524 13.6057 85.0796 fourtyfive 23626 324150 6.68862 91.7682 sixty 6902 331052 1.95398 93.7222 ninety 7082 338134 2.00494 95.7271 twomin 2336 340470 0.66133 96.3884 threemin 1521 341991 0.43060 96.819 rest 11236 353227 3.18096 100 total 353227 First field: number of seconds. Second is number of deliveries in that interval, third is percentage of total that this represents, and last is an accumulated percentage. This is a 24 hour run for yesterday (1am -> 1am): > sh hist.sh zero 3186 3186 0.29641 0.29641 one 13724 16910 1.27684 1.57325 two 19948 36858 1.8559 3.42915 three 29557 66415 2.74989 6.17904 five 87973 154388 8.18473 14.3638 seven 104690 259078 9.74003 24.1038 ten 144142 403220 13.4105 37.5143 fifteen 208335 611555 19.3828 56.8971 twenty 134030 745585 12.4697 69.3669 thirty 148163 893748 13.7846 83.1515 fourtyfive 74129 967877 6.89673 90.0482 sixty 34204 1002081 3.18223 93.2305 ninety 28955 1031036 2.69388 95.9243 twomin 7146 1038182 0.66484 96.5892 threemin 4297 1042479 0.39977 96.989 rest 32364 1074843 3.01104 100 total 1074843 Some random samples of mail servers in the 5 to 20 second range show most of this delay is due to remote sendmail response time, the ident lookup, etc. I'm pretty pleased to see that 83% of mail is delivered in less than 30 seconds and that 90% is out by 45 seconds. The 'zero' count is because there are a couple of other well connected postfix servers nearby that have a handful of subscribers :-) The machine is only non-trivially busy for a small percentage of its time, it could easily deliver 10 or 20 times that much mail before it was really under load. That is easily 10 to 20 million per day for one box. This is a p3-800 w/ one ide disk. We're in the process of switching it to SCSI because of IDE drive problems. The postfix spool will probably be mirrored for safety. Incidently, the spool is mostly write-only as the entire spool fits cached in memory. mx1.freebsd.org$ mailq -Queue ID- --Size-- ----Arrival Time---- -Sender/Recipient------- .... F40BC6E323E 2021 Wed Feb 21 02:42:14 owner-cvs-all@FreeBSD.ORG (connect to mx1.mainstreet.net[207.5.0.50]: Operation timed out) john@mj.com (connect to foobar.nisse.dk[24.232.51.205]: Operation timed out) r@nisse.dk (connect to osfmail.isc.rit.edu[129.21.2.241]: read timeout) maf8113@osfmail.isc.rit.edu (connect to mx.mainstreet.net[207.5.0.45]: Operation timed out) alexm@securify.com .... (connect to mailhub.state.me.us[141.114.122.227]: No route to host) darren@bmv.state.me.us (connect to mail.is-one.net[210.75.223.43]: read timeout) col@is-one.net (conversation with mbox.iyard.org[140.117.11.95] timed out while sending RCPT TO) kimkara@iyard.org (conversation with relay.orsk.ru[193.233.163.2] timed out while sending RCPT TO) dm@orsk.ru -- 104395 Kbytes in 3639 Requests. The queue (104MB on disk) fits comfortably in memory right now. postfix itself is very light on memory demands. Some other postfix tuning stats: - parallel outbound smtp sender processes: 500 - various qmgr params changed to keep the queue state in memory (ie: deal with something like 100,000 recipients and/or envelopes) - We use bulk_mailer to inject mail on hub.freebsd.org from majordomo and avoid the -outgoing aliases. bulk_mailer was hacked to not split the envelopes unless it got to 100,000 recipients and to not sort the addresses. - hub uses mx1 as a mail exploder, leaving hub to the mailing list management, archiving and searching roles and mx1 solely to delivery. We have seen it pump something like 2000 seperate messages in 3 seconds flat to mx1. The only real problems we've had have been DNS related and disk media errors on the cursed IBM DTLA drives. Cheers, -Peter -- Peter Wemm - peter@FreeBSD.org; peter@yahoo-inc.com; peter@netplex.com.au "All of this is for nothing if we don't go to the stars" - JMS/B5 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message