From owner-freebsd-chat Thu Feb 13 19: 4:54 2003 Delivered-To: freebsd-chat@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A054037B40A for ; Thu, 13 Feb 2003 19:04:46 -0800 (PST) Received: from c3po.skynet.be (c3po.skynet.be [195.238.3.237]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4329943F85 for ; Thu, 13 Feb 2003 19:04:45 -0800 (PST) (envelope-from brad.knowles@skynet.be) Received: from [10.0.1.2] (ip-26.shub-internet.org [194.78.144.26] (may be forged)) by c3po.skynet.be (8.12.7/8.12.7/Skynet-OUT-2.21) with ESMTP id h1E33fYd018722; Fri, 14 Feb 2003 04:04:35 +0100 (MET) (envelope-from ) Mime-Version: 1.0 X-Sender: bs663385@pop.skynet.be Message-Id: In-Reply-To: <3E4BC32A.713AB0C4@mindspring.com> References: <20030211032932.GA1253@papagena.rockefeller.edu> <3E498175.295FC389@mindspring.com> <3E49C2BC.F164F19A@mindspring.com> <3E4A81A3.A8626F3D@mindspring.com> <3E4B11BA.A060AEFD@mindspring.com> <3E4BC32A.713AB0C4@mindspring.com> Date: Fri, 14 Feb 2003 03:44:16 +0100 To: Terry Lambert From: Brad Knowles Subject: Re: Email push and pull (was Re: matthew dillon) Cc: Brad Knowles , Rahul Siddharthan , freebsd-chat@freebsd.org Content-Type: text/plain; charset="us-ascii" ; format="flowed" Sender: owner-freebsd-chat@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org At 8:09 AM -0800 2003/02/13, Terry Lambert wrote: > OK, then why do you keep talking about I/O throughput? Do you > mean *network I/O*? Why the hell would you care about disk I/O > on a properly designed message store, when the bottleneck is > going to first be network I/O, followed closely by bus bandwidth? Disk I/O is many orders of magnitude slower than any other thing on the system. Moreover, disk I/O suffers from issues with synchronous meta-data updates where entire directories must be locked for the entire period of time during which an update is occuring, thus reducing by many more orders of magnitude the number of small operations (e.g., file creation and deletion, renaming, updating of other file attributes, etc...) that we can perform in a given unit of time. This is an issue for MTAs, and is an issue for message stores, especially when the message stores use a meta-data intensive storage mechanism such as found in Maildir and Cyrus (to a lesser degree). > So what's the difference between not enforcing a quota, and ending > up with the email sitting on your disks in a user maildrop, or > enforcing a quota, and ending up with the email sitting on your > disks in an MTA queue? In "free" systems, quotas are frequently set ridiculously low. In systems with a sustainable business model, you pay for the storage you use. If you want a higher quota, you pay for it (one way or another). In those situations, quotas rarely need to be enforced, and this problem is not one that is faced very often. In the case where you do have this issue, at the very least you can hold the message in the queue for a while, in the hope that the user will come clean out their mailbox. > Quotas are actually a strong argument for single image storage. SIS increases SPOFs, reduces reliability, increases complexity, increases the probability of hot-spots and other forms of contention, and all for very little possible benefit. > Obviously, unless setting the quota low on purpose is your revenue > model (HotMail, Yahoo Mail). As I said above, "free" systems frequently set quotas ridiculously low. They are not of interest for this discussion. > How? It's going to sit on your disks, no matter what, the only > choice you really have on it is *which* disk it's going to sit on. True, but it's easier for me to deal with multiple gigabyes of DOS crap in the mail queue than it is for the user to try to deal with multiple gigabytes of crap in their mailbox. There are things that they need to be protected from, because they don't have the access or the power on their end. If they did, they wouldn't need us. >> If 95-99% of all users never even notice that there is a quota, >> then I've solved the part of the problem that is feasible to solve. >> The remainder cannot possibly be solved with any quota at any level, >> and these users need to be dealt with separately. > > Again, how? Outside of the DOS problem, they need education and proper management of their expectations. TANSTAAFL. > Flood fill will only work as part of an individual infrastructure, > not as part of a shared infrasstrusture, if what you are trying to > sell is to be any different from what everyone else is giving away > for free. Ahh, something akin to the Yasushi model. See . When restricted to the network internal to the mail system, replicating the mailbox over multiple servers is not a bad idea, although I don't think it matters so much what replication model you use. >> If you store them on the recipient system, you have what exists >> today for e-mail. Of the three, this is the only one that has proved >> sustainable (so far) and sufficiently reliable. > > This argument is flawed. Messages are not stored on recipient > systems, they are stored on the systems of the ISP that the > recipient subscribes to. That's what I was calling the "recipient system". It is the system where the message was received. > Yet those same guarantees are specifically disclaimed by HotMail > and other "free" providers, even though there is no technological > difference between a POP3 maildrop hosted at EarthLink and accessed > via a mail client, and a POP3/IMAP4 maildrop hosted at HotMail and > accessed via a mail client. Again, you're referencing situations that I consider to be irrelevant to the discussion. I don't give a flying flip about the poor business model they employ. I care about real systems that are paid for by real people and real companies. > Who the hell uses IDE on servers?!? Get real! You can't detach an > IDE drive during the data transfer on a write, so tagged command > queueing only works for *reading* data. For a server that does writes, > you use *SCSI* (or something else, but *not* IDE). Okay, so two 15kRPM SCSI hard drives, or FibreChannel. The type of interface doesn't matter when you're talking about a number of disks that is grossly inadequate to the task. > I think I see the misunderstanding here. You think IDE disks are > server parts. 8-). No, not at all. I think that focusing on disk storage capacity and not paying attention to disk I/O latency and I/O capacity is pure folly. > Use SCSI, or divide the load between a number of IDE spindles > equal to the tagged command queue depth for a single SCSI drive > (hmmm... should I buy five SCSI drives, or should I buy 500 IDE > drives?). See above. Regardless of the drive interface technology, what's important is the I/O latency and the I/O capacity. > It gets rid of the quota problem. No, not at all. You eliminate damn few duplicate messages, you greatly increase system complexity, you increase SPOFs, you increase system hot-spots, you reduce system reliability (and replication, something which you seem to be so fond of), and all for very, very little benefit. Try taking a real-world mail server and processing the logs. Count the number of recipients per message and see just how much space you'd actually save. I did that, and included my numbers in the previous message -- an average of ~1.3 recipients per message. You want to do all this for about 30% savings?!? > Heck, you could even store your indices on a SCSI drive, and then > store your SIS on an IDE drive, if you wanted. See above. This is pointless. > Mark's wrong. His assumptions are incorrect, and based on the > idea that metadata updates are not synchronous in all systems. Meta-data updates are at least partially synchronous on all systems I know of. Well, unless you are running with asynchronous mounts, but if you're doing that then you shouldn't be running a mail system until you understand why that's a bad idea. Even if they're not synchronous, they're still bottlenecks to be avoided if possible. > Cyrus is much closer to commercial usability, but it has it's own > set of problems, too. It is somewhat closer. If you want real commercial usability, you have to start with the MessagingDirect code, which is based on Cyrus but with lots of bug fixes, increased reliability and robustness, etc.... Then you graduate to Sendmail Advanced Message Server, which takes that to the next level. >> Either way, locking is a very important issue that has to be >> solved, one way or the other. > > No, it's a very important issue that has to be designed around, > rather than implemented. Somebody said that when they invented Maildir. I didn't believe it then, and I don't believe it now. > Yes, and no. It's very easy to paint a rosy picture in a technical > paper, particularly when you are in a position to need to obtain > funding. Nick didn't need any funding. He was describing a project that was largely complete, and which he had already left by that time. He definitely made use of that design at various customer sites while working for Sendmail, but he couldn't possibly have known that at the time. > You are unlikely to ever find someone using NFS in this capacity, > except as a back end for a single server message store. Show me an IMAP server that actually implements SIS. I don't know of any. > The point was that, without making changes requiring an in depth > understanding of the code of the components involved, which Nick's > solution doesn't really demonstrate, you're never going to get more > than "marginally better" numbers. Could be. In that case, we may have to find an alternative message store solution. If I can prove that this really is a problem, then I'll try to help them find a suitable SAN solution and then drop in SAMS. If not, I may end up writing a paper or doing another invited talk. > It works on NFS. You just have to run the delivery agent on the > same machine that's running the access agent, and not try to mix > multiple hosts accessing the same data. Nope. mmap on NFS doesn't work. > I understand you want a distributed, replicated message store, or > at least the appearance of one, but in order to get that, well, > you have to "write a distributed, replicated message store". A distributed, replicated message store would be nice, but is not strictly a requirement of this solution. One thing that was originally given as an absolute requirement was to find a way to put an e-mail front end on NFS. The distributed, replicated message store was a side-effect. Indeed, the architecture already has a concept of a primary server for a particular mailbox (as determined by LDAP), the only thing we'd have to change is whether or not that mailbox was also accessible from the other servers. However, we do have only one message store mount point at the moment. > The part of Netscape that Sun bought used to provide an IMAP4 > server (based on heavily modified UW IMAP code). Is there a > reason you can't use that? I guess the answer must be "I have > been directed to use Open Source". 8-). Actually, no. They would much prefer commercial software. However, they don't have any money to spend on software, and I know from personal experience that the Netscape/iPlanet stuff doesn't scale. Indeed, we're already in the process of scrapping all other Netscape/iPlanet software because we've had excessive problems with it. > This should be no problem. You should be able to handle this > with a single machine, IMO, without worrying about locking, at > all. Remember, Maildir doesn't do locking. > 10,000 client machines is nothing. 10,000 LAN clients? With 44MB messages and 200MB mailboxes? On NFS? Sorry, my testing so far indicates that this is a significant load and we need to take care to make sure that it is handled properly. > At worst, you should > seperate inbound and outbound SMTP servers, Already planned. > so you can treat the > inbound one as a bastion host, and keep the outbound entirely > inside, and the inbound server should use a transport protocol > for internal delivery to the machine running the IMAP4 server, > which makes lockign go away. How does locking go away? Through Maildir? Or did you have something else in mind? > At worst, you can limit the number > of bastion to internal server connections, which will make things > queue up at the bastion, if you get a large activity burst, and > let it drain out to the internal server, over time. I'm not worried about internal SMTP connections. But we have to be careful to make sure we don't put any additional limits on POP3 or IMAP connections. > At most, > you are well under 40,000 simultaneous TCP connections to the > IMAP4 server host, even if you are using OutLook, people have > two mailboxes open, each, and are monitoring incoming mail in > several folders. Sorry, I am still not convinced. -- Brad Knowles, "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania. GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++) To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-chat" in the body of the message