Date: Thu, 13 Feb 2003 14:13:55 +0100 From: Brad Knowles <brad.knowles@skynet.be> To: Terry Lambert <tlambert2@mindspring.com> Cc: Brad Knowles <brad.knowles@skynet.be>, Rahul Siddharthan <rsidd@online.fr>, freebsd-chat@freebsd.org Subject: Re: Email push and pull (was Re: matthew dillon) Message-ID: <a05200f5bba7128081b43@[10.0.1.2]> In-Reply-To: <3E4B11BA.A060AEFD@mindspring.com> References: <20030211032932.GA1253@papagena.rockefeller.edu> <a05200f2bba6e8fc03a0f@[10.0.1.2]> <3E498175.295FC389@mindspring.com> <a05200f37ba6f50bfc705@[10.0.1.2]> <3E49C2BC.F164F19A@mindspring.com> <a05200f43ba6fe1a9f4d8@[10.0.1.2]> <3E4A81A3.A8626F3D@mindspring.com> <a05200f4cba70710ad3f1@[10.0.1.2]> <3E4B11BA.A060AEFD@mindspring.com>
next in thread | previous in thread | raw e-mail | index | archive | help
At 7:32 PM -0800 2003/02/12, Terry Lambert wrote:
>> Under what circumstances are you not interested in I/O throughput?!?
>
> When the problem is recipient maildrop overflow, rather than
> inability to handle load. Since a single RS/6000 with 2 166MHz
> CPUs and a modified Sendmail can handle 500,000 average-sized
> email messages in a 24 hour period, load isn't really the problem:
> it's something you can "throw hardware at", and otherwise ignore.
Again, you're talking about the MTA. For this discussion, I
couldn't give a flying flip about the MTA. I care about the message
store and mailbox access methods. I know how to solve MTA problems.
Solving message store and mailbox access methods tend to be more
difficult, especially if they're dependant on a underlying technology
that you can't touch or change.
> The issue is not real limits, it is administrative limits, and, of
> you care about being DOS'ed, it's about aggregate limits not
> resulting in overcommit.
Quotas and making sure you have enough disk space are
well-understood problems with well understood solutions.
> You are looking at the problem from the wrong end. A quota is good
> for you, but it sucks for your user, who loses legitimate traffic,
> if illegitimate traffic pushed them over their quota.
There's no way around this issue. If you don't set quotas then
the entire system can be trivially taken down by a DOS attack, and
this affects thousands, hundreds of thousands, or millions of other
users. If you do set quotas, the entire system can still be taken
down, but it takes a more concerted effort aimed at more than just
one user.
You have to have quotas. There simply is no other viable
alternative. They key is setting them high enough that 95-99% of
your users never hit them, and the remainder that do would probably
have hit *any* quota that you set, and therefore they need to be
dealt with in a different manner.
For dealing with DOS attacks that take a single user over their
quota, that's a different issue that has to be addressed in a
different manner.
> What this comes down to is the level of service you are offering
> your customer. Your definition of "adequate" and their definition
> of "adequate" are likely not the same.
If 95-99% of all users never even notice that there is a quota,
then I've solved the part of the problem that is feasible to solve.
The remainder cannot possibly be solved with any quota at any level,
and these users need to be dealt with separately.
> If we take two examples: HotMail and Yahoo Mail (formerly Rocket
> Mail), it's fairly easy to see that the "quota squeeze" was
> originally intended to act as a circuit break for the disk space
> issue.
Right. A valid use of quotas, especially when you're talking
about a free service.
> However, we now see that it's being used as a lever to attempt to
> extract revenue from a broken business model ("buy more disk space
> for only $9.95/month!").
Another valid use, in this case allowing you to have an actual
sustainable business model.
Or would you prefer for everyone to offer all their services for
"free" only to go bankrupt six months later, and forcing you to go
somewhere else for your next fix of "free" service? That way lies
madness.
> The user convenience being sold here lies in the ability for the
> user to request what is, in effect, a larger queue size, in
> exchange for money.
>
> If this queue size were not an issue, then we would not be having
> this discussion: it would not have value to users, and, not having
> any value, it would not have a market cost associated with its
> reduction.
You have to pay for storage somehow.
If you store it all on the sender's system, then you run into
SPOFs, overload when a billion people all check their e-mail and read
a copy of the same message, backups, etc....
If you use a flood-fill mechanism, then everyone pays to store
everyone's messages all the time, and then you run into problems of
not enough shared storage space so old messages get tossed away very
quickly and then they just re-post them again. Look at what's
happening to USENET today.
If you store them on the recipient system, you have what exists
today for e-mail. Of the three, this is the only one that has proved
sustainable (so far) and sufficiently reliable.
> Whether the expires is enforced by default, self, or administratively
> is irrelevent to the mere fact that there is a limited lifetime in
> the distributed persistant queueing system that is Usenet.
Yeah, at 650GB/day for a full feed, it's called not having enough
disk space for an entire day's full feed. At ~2GB/day for text only,
it's called not having enough disk space for a weeks traffic. And
you still lose messages that never somehow managed to flood over to
your system. For USENET, this doesn't really matter. But for
personal e-mail that needs some reasonable guarantees, this just
doesn't fly.
> This is a transport issue -- or, more properly, a queue management
> and data replication issue. It would be very easy to envision a
> system that could handle this, with "merely" enough spindles to
> hold 650GB/day.
Two IDE 320GB disks are not going to cut it. They cannot
possibly get the data in and out fast enough.
> An OS with a log structured or journalling FS,
> or even soft updates, which exported a transaction dependency
> interface to user space, could handle this, no problem.
Bullshit. You have to have sufficient underlying I/O capacity to
move a given amount of data in a given amount of time, regardless of
what magic you try to work at a higher level.
> Surely, you aren't saying an Oracle Database would need a significant
> number of spindles in order to replicate another Oracle Database,
> when all bottlenecks between the two machines, down to the disks,
> are directly managed by a unified software set, written by Oracle?
Yup. Indeed, this is *precisely* what is needed. Just try doing
this on a single 320GB hard drive. Or even a pair of 320GB hard
drives.
Large-capacity hard drives don't do us any good for applications
like this. If they did, then companies like EMC, Hitachi Data
Systems, Auspex, Network Appliance, etc... wouldn't exist.
We need enough drives with enough I/O capacity to handle the
transaction rates. We worry about disk space secondarily, because we
know that we can always buy the next size up.
> I'm not positive that it matters, one way or the other, in the
> long run, if thigs are implemented correctly. However, it is
> Esthetically pleasing, on many levels.
Aesthetically pleasing or not, it is not practical. SIS causes
way too many problems and only solves issues that we don't really
care about.
>> > You do not need all the locking you state you need, at least not
>> > at that low a granularity.
>>
>> Really? I'd like to hear your explanation for that claim.
>
> Why the heck are you locking at a mailbox granularity, instead
> of a message granularity, for either of these operations?
For IMAP, you need to lock at message granularity. But your
ability to do that will be dependant on your mailbox format.
Choosing a mailbox directory format has a whole host of associated
problems, as well understood and explained by Mark Crispin at
<http://www.washington.edu/imap/documentation/formats.txt.html>.
Either way, locking is a very important issue that has to be
solved, one way or the other.
> Sorry, I was thinking of Compuserve, who had switched over to
> FreeBSD for some of its systems, at one point.
I don't have any direct knowledge of the Compuserve systems.
I can tell you that the guys at Compuserve appeared to be
blissfully unaware of many scaling issues when they had one millions
customers and AOL had five million. I don't understand why, but
somewhere between those two numbers, a change in scale had become a
change in kind.
> I have read it. The modifications he proposes are small ones,
> which deal with impedence issues. They are low hanging fruit,
> available to a system administrator, not an in depth modification
> by a software engineer.
The point is that these low-hanging fruit were enough to get Nick
to a point where he could serve multiple millions of customers using
this technology, and he didn't need to go any further.
That same design was used by Nick and the other consultants at
Sendmail for a number of early customers, the largest publicly known
member of which was FlashNet with about ten million customers. There
were others, even larger, but their names have been withheld at their
request.
Sendmail has since moved on to SAMS, which is much more
full-features, scalable, etc.... But the original starting point was
all Nick's work, and it did quite a lot for what little was done.
>> True enough. However, this would imply that the sort of thing
>> that Nick has done is not possible. He has demonstrated that this is
>> not true.
>
> *You've* demonstrated it, or you would just adopt his solution
> wholesale. The issue is that his solution doesn't scale nearly
> as well as is possible, it only scales "much better than Open
> Source on its own".
I can't adopt his solution. He did POP3, I'm doing IMAP.
The mailbox formats have to change, because we have to assume
multiple simultaneous processes accessing it (unlike POP3). He did
just fine with mailbox locking (or methods to work around that
problem). I need message locking (or methods to work around that
problem). There are a whole series of other domino-effect changes
that end up making the end solution totally different.
Simply put, there just aren't that many medium-scale IMAP
implementations in the world, period. Even after my LISA 2000 paper,
there still haven't been *any* truly large-scale IMAP
implementations, despite things like
<http://www-1.ibm.com/servers/esdd/articles/sendmail/>,
<http://www.networkcomputing.com/1117/1117f1.html?ls=NCJS_1117bt>,
<http://store.sendmail.com/pdfs/whitepapers/wp_samscapacity.pdf>,
<http://store.sendmail.com/pdfs/whitepapers/wp_samscapacity.zseries.pdf>,
and <http://www.dell.com/downloads/global/topics/linux/sendmail.doc>.
Certainly, so far as I can tell, none of them have used NFS as
the underlying mailbox storage method.
> Try an experiment for me: tune the bejesus out of a FreeBSD box
> with 4G of RAM. Do everything you can think of doing to it, in
> the same time frame, and at the same level of depth of understanding
> that Nick applied to his system. Then come back, and tell me two
> numbers: (1) Maximum number of new connections per second before
> and after, and (2) Total number of simultaneous connections, before
> and after.
Give me such a box and wait until I've gotten this project out of
the way, and I'll be glad to do this sort of thing. I'm setting up
my own consulting business, and a large part of the work I want to do
is in relation to research on scaling issues. This would be right up
my alley.
But, I can't buy boxes like this for myself.
> It doesn't have to be, that's agreed, but it takes substantially
> more investment than it would cost to build out using multiple
> instances of commercial software, plus the machines to run it, to
> "brute force" the problem. Or the resulting system ends up being
> fragile.
Operations and maintenance is going to be significantly higher,
that much I can guarantee you.
> UW is not the place you should look. Stanford (as I said)
> already has a deployed system, and they are generally helpful
> when people want to copy what they have done.
I'll check and see what they've done.
> If you are looking at IMAP4, then Cyrus or a commercial product
> are your only options, IMO, and neither will work well enough, if
> used "as is".
Cyrus doesn't work on NFS. Most of the commercial products I've
been able to find are based on Cyrus or Cyrus-like technology and
don't support NFS, either. The ones I've been able to find that
would (theoretically) support NFS are based on Courier-IMAP, and run
on Linux on PCs.
One of the other can't-change criteria for this system is that it
has to run on SPARC/Solaris, so for example Bynari Insight Server is
not an option.
> How many maildrops does this need to support? I will tell you if
> your project will fail. 8-(.
~1800 LAN e-mail clients initially, quickly growing to
~3000-4000, and possible growth to ~6,000-10,000.
Not counting headers, during one week of fairly typical activity
for the initial ~1800 users, message size distributions were
(measured in terms of bytes):
Mininum: 0
5th Percentile: 328
10th Percentile: 541
25th Percentile: 623
Median: 1424
75th Percentile: 4266
90th Percentile: 41743
95th Percentile: 159314
Maximum: 41915955
Mean: 66502
Sample Std. Deviation: 553042
For the initial ~1800 users, the mailbox distributions are (bytes):
Mininum: 0
5th Percentile: 318
10th Percentile: 318
25th Percentile: 318
Median: 595430
75th Percentile: 919726.25
90th Percentile: 9026371
95th Percentile: 25530278
Maximum: 200702940
Mean: 4.02673e+06
Sample Std. Deviation: 1.32811e+07
For the initial ~1800 users, during the same sample time above,
message arrival rates per second were:
Mininum: 1
5th Percentile: 1
10th Percentile: 1
25th Percentile: 1
Median: 1
75th Percentile: 1
90th Percentile: 2
95th Percentile: 2
Maximum: 28
Mean: 1.20909
Sample Std. Deviation: 0.577905
For the initial ~1800 users, during the same sample time above,
message arrival rates per minute were:
Mininum: 1
5th Percentile: 1
10th Percentile: 2
25th Percentile: 3
Median: 6
75th Percentile: 17
90th Percentile: 25
95th Percentile: 28
Maximum: 419
Mean: 10.4627
Sample Std. Deviation: 11.2166
For the initial ~1800 users, during the same sample time above,
message arrival rates per hour were:
Mininum: 113
5th Percentile: 153
10th Percentile: 186
25th Percentile: 240
Median: 360.5
75th Percentile: 1134
90th Percentile: 1388
95th Percentile: 1498
Maximum: 1844
Mean: 614.102
Sample Std. Deviation: 489.391
For the initial ~1800 users, during the same sample time above,
message arrival rates per day were:
Mininum: 4883
5th Percentile: 4883
10th Percentile: 4883
25th Percentile: 7763
Median: 17047
75th Percentile: 17467
90th Percentile: 21458
95th Percentile: 21458
Maximum: 21458
Mean: 14056.1
Sample Std. Deviation: 6333.29
For the initial ~1800 users, during the same sample time above,
the distribution of number of recipients per message was:
Mininum: 0
5th Percentile: 1
10th Percentile: 1
25th Percentile: 1
Median: 1
75th Percentile: 1
90th Percentile: 2
95th Percentile: 3
Maximum: 294
Mean: 1.33054
Sample Std. Deviation: 3.03305
--
Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary
safety deserve neither liberty nor safety."
-Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+
!w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++)
tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-chat" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?a05200f5bba7128081b43>
