From owner-freebsd-chat Thu Feb 13 5:15: 2 2003 Delivered-To: freebsd-chat@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 89C7837B405 for ; Thu, 13 Feb 2003 05:14:51 -0800 (PST) Received: from pollux.asml.nl (ns.asml.nl [195.109.200.66]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0CD2A43FAF for ; Thu, 13 Feb 2003 05:14:48 -0800 (PST) (envelope-from brad.knowles@skynet.be) Received: from nlvdhv01.asml.nl (nlvdhv01 [195.109.200.68]) by pollux.asml.nl (8.8.8+Sun/8.8.8) with SMTP id OAA28474; Thu, 13 Feb 2003 14:14:05 +0100 (MET) Received: from unknown(146.106.1.223) by nlvdhv01.asml.nl via csmap id 11417; Thu, 13 Feb 2003 14:12:53 +0100 (CET) Received: from titan.asml.nl (titan [146.106.1.9]) by creon.asml.nl (8.11.6+Sun/8.11.6) with ESMTP id h1DDE5c14662; Thu, 13 Feb 2003 14:14:05 +0100 (MET) Received: from [10.0.1.2] (frobozz.asml.nl [146.106.12.76]) by titan.asml.nl (8.9.3+Sun/8.9.3) with ESMTP id OAA16003; Thu, 13 Feb 2003 14:14:03 +0100 (MET) Mime-Version: 1.0 X-Sender: bs663385@pop.skynet.be Message-Id: In-Reply-To: <3E4B11BA.A060AEFD@mindspring.com> References: <20030211032932.GA1253@papagena.rockefeller.edu> <3E498175.295FC389@mindspring.com> <3E49C2BC.F164F19A@mindspring.com> <3E4A81A3.A8626F3D@mindspring.com> <3E4B11BA.A060AEFD@mindspring.com> Date: Thu, 13 Feb 2003 14:13:55 +0100 To: Terry Lambert From: Brad Knowles Subject: Re: Email push and pull (was Re: matthew dillon) Cc: Brad Knowles , Rahul Siddharthan , freebsd-chat@freebsd.org Content-Type: text/plain; charset="us-ascii" ; format="flowed" Sender: owner-freebsd-chat@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org At 7:32 PM -0800 2003/02/12, Terry Lambert wrote: >> Under what circumstances are you not interested in I/O throughput?!? > > When the problem is recipient maildrop overflow, rather than > inability to handle load. Since a single RS/6000 with 2 166MHz > CPUs and a modified Sendmail can handle 500,000 average-sized > email messages in a 24 hour period, load isn't really the problem: > it's something you can "throw hardware at", and otherwise ignore. Again, you're talking about the MTA. For this discussion, I couldn't give a flying flip about the MTA. I care about the message store and mailbox access methods. I know how to solve MTA problems. Solving message store and mailbox access methods tend to be more difficult, especially if they're dependant on a underlying technology that you can't touch or change. > The issue is not real limits, it is administrative limits, and, of > you care about being DOS'ed, it's about aggregate limits not > resulting in overcommit. Quotas and making sure you have enough disk space are well-understood problems with well understood solutions. > You are looking at the problem from the wrong end. A quota is good > for you, but it sucks for your user, who loses legitimate traffic, > if illegitimate traffic pushed them over their quota. There's no way around this issue. If you don't set quotas then the entire system can be trivially taken down by a DOS attack, and this affects thousands, hundreds of thousands, or millions of other users. If you do set quotas, the entire system can still be taken down, but it takes a more concerted effort aimed at more than just one user. You have to have quotas. There simply is no other viable alternative. They key is setting them high enough that 95-99% of your users never hit them, and the remainder that do would probably have hit *any* quota that you set, and therefore they need to be dealt with in a different manner. For dealing with DOS attacks that take a single user over their quota, that's a different issue that has to be addressed in a different manner. > What this comes down to is the level of service you are offering > your customer. Your definition of "adequate" and their definition > of "adequate" are likely not the same. If 95-99% of all users never even notice that there is a quota, then I've solved the part of the problem that is feasible to solve. The remainder cannot possibly be solved with any quota at any level, and these users need to be dealt with separately. > If we take two examples: HotMail and Yahoo Mail (formerly Rocket > Mail), it's fairly easy to see that the "quota squeeze" was > originally intended to act as a circuit break for the disk space > issue. Right. A valid use of quotas, especially when you're talking about a free service. > However, we now see that it's being used as a lever to attempt to > extract revenue from a broken business model ("buy more disk space > for only $9.95/month!"). Another valid use, in this case allowing you to have an actual sustainable business model. Or would you prefer for everyone to offer all their services for "free" only to go bankrupt six months later, and forcing you to go somewhere else for your next fix of "free" service? That way lies madness. > The user convenience being sold here lies in the ability for the > user to request what is, in effect, a larger queue size, in > exchange for money. > > If this queue size were not an issue, then we would not be having > this discussion: it would not have value to users, and, not having > any value, it would not have a market cost associated with its > reduction. You have to pay for storage somehow. If you store it all on the sender's system, then you run into SPOFs, overload when a billion people all check their e-mail and read a copy of the same message, backups, etc.... If you use a flood-fill mechanism, then everyone pays to store everyone's messages all the time, and then you run into problems of not enough shared storage space so old messages get tossed away very quickly and then they just re-post them again. Look at what's happening to USENET today. If you store them on the recipient system, you have what exists today for e-mail. Of the three, this is the only one that has proved sustainable (so far) and sufficiently reliable. > Whether the expires is enforced by default, self, or administratively > is irrelevent to the mere fact that there is a limited lifetime in > the distributed persistant queueing system that is Usenet. Yeah, at 650GB/day for a full feed, it's called not having enough disk space for an entire day's full feed. At ~2GB/day for text only, it's called not having enough disk space for a weeks traffic. And you still lose messages that never somehow managed to flood over to your system. For USENET, this doesn't really matter. But for personal e-mail that needs some reasonable guarantees, this just doesn't fly. > This is a transport issue -- or, more properly, a queue management > and data replication issue. It would be very easy to envision a > system that could handle this, with "merely" enough spindles to > hold 650GB/day. Two IDE 320GB disks are not going to cut it. They cannot possibly get the data in and out fast enough. > An OS with a log structured or journalling FS, > or even soft updates, which exported a transaction dependency > interface to user space, could handle this, no problem. Bullshit. You have to have sufficient underlying I/O capacity to move a given amount of data in a given amount of time, regardless of what magic you try to work at a higher level. > Surely, you aren't saying an Oracle Database would need a significant > number of spindles in order to replicate another Oracle Database, > when all bottlenecks between the two machines, down to the disks, > are directly managed by a unified software set, written by Oracle? Yup. Indeed, this is *precisely* what is needed. Just try doing this on a single 320GB hard drive. Or even a pair of 320GB hard drives. Large-capacity hard drives don't do us any good for applications like this. If they did, then companies like EMC, Hitachi Data Systems, Auspex, Network Appliance, etc... wouldn't exist. We need enough drives with enough I/O capacity to handle the transaction rates. We worry about disk space secondarily, because we know that we can always buy the next size up. > I'm not positive that it matters, one way or the other, in the > long run, if thigs are implemented correctly. However, it is > Esthetically pleasing, on many levels. Aesthetically pleasing or not, it is not practical. SIS causes way too many problems and only solves issues that we don't really care about. >> > You do not need all the locking you state you need, at least not >> > at that low a granularity. >> >> Really? I'd like to hear your explanation for that claim. > > Why the heck are you locking at a mailbox granularity, instead > of a message granularity, for either of these operations? For IMAP, you need to lock at message granularity. But your ability to do that will be dependant on your mailbox format. Choosing a mailbox directory format has a whole host of associated problems, as well understood and explained by Mark Crispin at . Either way, locking is a very important issue that has to be solved, one way or the other. > Sorry, I was thinking of Compuserve, who had switched over to > FreeBSD for some of its systems, at one point. I don't have any direct knowledge of the Compuserve systems. I can tell you that the guys at Compuserve appeared to be blissfully unaware of many scaling issues when they had one millions customers and AOL had five million. I don't understand why, but somewhere between those two numbers, a change in scale had become a change in kind. > I have read it. The modifications he proposes are small ones, > which deal with impedence issues. They are low hanging fruit, > available to a system administrator, not an in depth modification > by a software engineer. The point is that these low-hanging fruit were enough to get Nick to a point where he could serve multiple millions of customers using this technology, and he didn't need to go any further. That same design was used by Nick and the other consultants at Sendmail for a number of early customers, the largest publicly known member of which was FlashNet with about ten million customers. There were others, even larger, but their names have been withheld at their request. Sendmail has since moved on to SAMS, which is much more full-features, scalable, etc.... But the original starting point was all Nick's work, and it did quite a lot for what little was done. >> True enough. However, this would imply that the sort of thing >> that Nick has done is not possible. He has demonstrated that this is >> not true. > > *You've* demonstrated it, or you would just adopt his solution > wholesale. The issue is that his solution doesn't scale nearly > as well as is possible, it only scales "much better than Open > Source on its own". I can't adopt his solution. He did POP3, I'm doing IMAP. The mailbox formats have to change, because we have to assume multiple simultaneous processes accessing it (unlike POP3). He did just fine with mailbox locking (or methods to work around that problem). I need message locking (or methods to work around that problem). There are a whole series of other domino-effect changes that end up making the end solution totally different. Simply put, there just aren't that many medium-scale IMAP implementations in the world, period. Even after my LISA 2000 paper, there still haven't been *any* truly large-scale IMAP implementations, despite things like , , , , and . Certainly, so far as I can tell, none of them have used NFS as the underlying mailbox storage method. > Try an experiment for me: tune the bejesus out of a FreeBSD box > with 4G of RAM. Do everything you can think of doing to it, in > the same time frame, and at the same level of depth of understanding > that Nick applied to his system. Then come back, and tell me two > numbers: (1) Maximum number of new connections per second before > and after, and (2) Total number of simultaneous connections, before > and after. Give me such a box and wait until I've gotten this project out of the way, and I'll be glad to do this sort of thing. I'm setting up my own consulting business, and a large part of the work I want to do is in relation to research on scaling issues. This would be right up my alley. But, I can't buy boxes like this for myself. > It doesn't have to be, that's agreed, but it takes substantially > more investment than it would cost to build out using multiple > instances of commercial software, plus the machines to run it, to > "brute force" the problem. Or the resulting system ends up being > fragile. Operations and maintenance is going to be significantly higher, that much I can guarantee you. > UW is not the place you should look. Stanford (as I said) > already has a deployed system, and they are generally helpful > when people want to copy what they have done. I'll check and see what they've done. > If you are looking at IMAP4, then Cyrus or a commercial product > are your only options, IMO, and neither will work well enough, if > used "as is". Cyrus doesn't work on NFS. Most of the commercial products I've been able to find are based on Cyrus or Cyrus-like technology and don't support NFS, either. The ones I've been able to find that would (theoretically) support NFS are based on Courier-IMAP, and run on Linux on PCs. One of the other can't-change criteria for this system is that it has to run on SPARC/Solaris, so for example Bynari Insight Server is not an option. > How many maildrops does this need to support? I will tell you if > your project will fail. 8-(. ~1800 LAN e-mail clients initially, quickly growing to ~3000-4000, and possible growth to ~6,000-10,000. Not counting headers, during one week of fairly typical activity for the initial ~1800 users, message size distributions were (measured in terms of bytes): Mininum: 0 5th Percentile: 328 10th Percentile: 541 25th Percentile: 623 Median: 1424 75th Percentile: 4266 90th Percentile: 41743 95th Percentile: 159314 Maximum: 41915955 Mean: 66502 Sample Std. Deviation: 553042 For the initial ~1800 users, the mailbox distributions are (bytes): Mininum: 0 5th Percentile: 318 10th Percentile: 318 25th Percentile: 318 Median: 595430 75th Percentile: 919726.25 90th Percentile: 9026371 95th Percentile: 25530278 Maximum: 200702940 Mean: 4.02673e+06 Sample Std. Deviation: 1.32811e+07 For the initial ~1800 users, during the same sample time above, message arrival rates per second were: Mininum: 1 5th Percentile: 1 10th Percentile: 1 25th Percentile: 1 Median: 1 75th Percentile: 1 90th Percentile: 2 95th Percentile: 2 Maximum: 28 Mean: 1.20909 Sample Std. Deviation: 0.577905 For the initial ~1800 users, during the same sample time above, message arrival rates per minute were: Mininum: 1 5th Percentile: 1 10th Percentile: 2 25th Percentile: 3 Median: 6 75th Percentile: 17 90th Percentile: 25 95th Percentile: 28 Maximum: 419 Mean: 10.4627 Sample Std. Deviation: 11.2166 For the initial ~1800 users, during the same sample time above, message arrival rates per hour were: Mininum: 113 5th Percentile: 153 10th Percentile: 186 25th Percentile: 240 Median: 360.5 75th Percentile: 1134 90th Percentile: 1388 95th Percentile: 1498 Maximum: 1844 Mean: 614.102 Sample Std. Deviation: 489.391 For the initial ~1800 users, during the same sample time above, message arrival rates per day were: Mininum: 4883 5th Percentile: 4883 10th Percentile: 4883 25th Percentile: 7763 Median: 17047 75th Percentile: 17467 90th Percentile: 21458 95th Percentile: 21458 Maximum: 21458 Mean: 14056.1 Sample Std. Deviation: 6333.29 For the initial ~1800 users, during the same sample time above, the distribution of number of recipients per message was: Mininum: 0 5th Percentile: 1 10th Percentile: 1 25th Percentile: 1 Median: 1 75th Percentile: 1 90th Percentile: 2 95th Percentile: 3 Maximum: 294 Mean: 1.33054 Sample Std. Deviation: 3.03305 -- Brad Knowles, "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania. GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++) To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-chat" in the body of the message