FreeBSD Mail Archives

Date:      Thu, 13 Feb 2003 14:13:55 +0100
From:      Brad Knowles <brad.knowles@skynet.be>
To:        Terry Lambert <tlambert2@mindspring.com>
Cc:        Brad Knowles <brad.knowles@skynet.be>, Rahul Siddharthan <rsidd@online.fr>, freebsd-chat@freebsd.org
Subject:   Re: Email push and pull (was Re: matthew dillon)
Message-ID:  <a05200f5bba7128081b43@[10.0.1.2]>
In-Reply-To: <3E4B11BA.A060AEFD@mindspring.com>
References:  <20030211032932.GA1253@papagena.rockefeller.edu>			 <a05200f2bba6e8fc03a0f@[10.0.1.2]>			 <3E498175.295FC389@mindspring.com>		 <a05200f37ba6f50bfc705@[10.0.1.2]>		 <3E49C2BC.F164F19A@mindspring.com>	 <a05200f43ba6fe1a9f4d8@[10.0.1.2]>	 <3E4A81A3.A8626F3D@mindspring.com> <a05200f4cba70710ad3f1@[10.0.1.2]> <3E4B11BA.A060AEFD@mindspring.com>


At 7:32 PM -0800 2003/02/12, Terry Lambert wrote:

>>          Under what circumstances are you not interested in I/O throughput?!?
>
>  When the problem is recipient maildrop overflow, rather than
>  inability to handle load.  Since a single RS/6000 with 2 166MHz
>  CPUs and a modified Sendmail can handle 500,000 average-sized
>  email messages in a 24 hour period, load isn't really the problem:
>  it's something you can "throw hardware at", and otherwise ignore.

	Again, you're talking about the MTA.  For this discussion, I 
couldn't give a flying flip about the MTA.  I care about the message 
store and mailbox access methods.  I know how to solve MTA problems. 
Solving message store and mailbox access methods tend to be more 
difficult, especially if they're dependant on a underlying technology 
that you can't touch or change.

>  The issue is not real limits, it is administrative limits, and, of
>  you care about being DOS'ed, it's about aggregate limits not
>  resulting in overcommit.

	Quotas and making sure you have enough disk space are 
well-understood problems with well understood solutions.

>  You are looking at the problem from the wrong end.  A quota is good
>  for you, but it sucks for your user, who loses legitimate traffic,
>  if illegitimate traffic pushed them over their quota.

	There's no way around this issue.  If you don't set quotas then 
the entire system can be trivially taken down by a DOS attack, and 
this affects thousands, hundreds of thousands, or millions of other 
users.  If you do set quotas, the entire system can still be taken 
down, but it takes a more concerted effort aimed at more than just 
one user.

	You have to have quotas.  There simply is no other viable 
alternative.  They key is setting them high enough that 95-99% of 
your users never hit them, and the remainder that do would probably 
have hit *any* quota that you set, and therefore they need to be 
dealt with in a different manner.


	For dealing with DOS attacks that take a single user over their 
quota, that's a different issue that has to be addressed in a 
different manner.

>  What this comes down to is the level of service you are offering
>  your customer.  Your definition of "adequate" and their definition
>  of "adequate" are likely not the same.

	If 95-99% of all users never even notice that there is a quota, 
then I've solved the part of the problem that is feasible to solve. 
The remainder cannot possibly be solved with any quota at any level, 
and these users need to be dealt with separately.

>  If we take two examples: HotMail and Yahoo Mail (formerly Rocket
>  Mail), it's fairly easy to see that the "quota squeeze" was
>  originally intended to act as a circuit break for the disk space
>  issue.

	Right.  A valid use of quotas, especially when you're talking 
about a free service.

>  However, we now see that it's being used as a lever to attempt to
>  extract revenue from a broken business model ("buy more disk space
>  for only $9.95/month!").

	Another valid use, in this case allowing you to have an actual 
sustainable business model.

	Or would you prefer for everyone to offer all their services for 
"free" only to go bankrupt six months later, and forcing you to go 
somewhere else for your next fix of "free" service?  That way lies 
madness.

>  The user convenience being sold here lies in the ability for the
>  user to request what is, in effect, a larger queue size, in
>  exchange for money.
>
>  If this queue size were not an issue, then we would not be having
>  this discussion: it would not have value to users, and, not having
>  any value, it would not have a market cost associated with its
>  reduction.

	You have to pay for storage somehow.

	If you store it all on the sender's system, then you run into 
SPOFs, overload when a billion people all check their e-mail and read 
a copy of the same message, backups, etc....

	If you use a flood-fill mechanism, then everyone pays to store 
everyone's messages all the time, and then you run into problems of 
not enough shared storage space so old messages get tossed away very 
quickly and then they just re-post them again.  Look at what's 
happening to USENET today.

	If you store them on the recipient system, you have what exists 
today for e-mail.  Of the three, this is the only one that has proved 
sustainable (so far) and sufficiently reliable.

>  Whether the expires is enforced by default, self, or administratively
>  is irrelevent to the mere fact that there is a limited lifetime in
>  the distributed persistant queueing system that is Usenet.

	Yeah, at 650GB/day for a full feed, it's called not having enough 
disk space for an entire day's full feed.  At ~2GB/day for text only, 
it's called not having enough disk space for a weeks traffic.  And 
you still lose messages that never somehow managed to flood over to 
your system.  For USENET, this doesn't really matter.  But for 
personal e-mail that needs some reasonable guarantees, this just 
doesn't fly.

>  This is a transport issue -- or, more properly, a queue management
>  and data replication issue.  It would be very easy to envision a
>  system that could handle this, with "merely" enough spindles to
>  hold 650GB/day.

	Two IDE 320GB disks are not going to cut it.  They cannot 
possibly get the data in and out fast enough.

>                   An OS with a log structured or journalling FS,
>  or even soft updates, which exported a transaction dependency
>  interface to user space, could handle this, no problem.

	Bullshit.  You have to have sufficient underlying I/O capacity to 
move a given amount of data in a given amount of time, regardless of 
what magic you try to work at a higher level.

>  Surely, you aren't saying an Oracle Database would need a significant
>  number of spindles in order to replicate another Oracle Database,
>  when all bottlenecks between the two machines, down to the disks,
>  are directly managed by a unified software set, written by Oracle?

	Yup.  Indeed, this is *precisely* what is needed.  Just try doing 
this on a single 320GB hard drive.  Or even a pair of 320GB hard 
drives.

	Large-capacity hard drives don't do us any good for applications 
like this.  If they did, then companies like EMC, Hitachi Data 
Systems, Auspex, Network Appliance, etc... wouldn't exist.


	We need enough drives with enough I/O capacity to handle the 
transaction rates.  We worry about disk space secondarily, because we 
know that we can always buy the next size up.

>  I'm not positive that it matters, one way or the other, in the
>  long run, if thigs are implemented correctly.  However, it is
>  Esthetically pleasing, on many levels.

	Aesthetically pleasing or not, it is not practical.  SIS causes 
way too many problems and only solves issues that we don't really 
care about.

>>  >  You do not need all the locking you state you need, at least not
>>  >  at that low a granularity.
>>
>>          Really?  I'd like to hear your explanation for that claim.
>
>  Why the heck are you locking at a mailbox granularity, instead
>  of a message granularity, for either of these operations?

	For IMAP, you need to lock at message granularity.  But your 
ability to do that will be dependant on your mailbox format. 
Choosing a mailbox directory format has a whole host of associated 
problems, as well understood and explained by Mark Crispin at 
<http://www.washington.edu/imap/documentation/formats.txt.html>.

	Either way, locking is a very important issue that has to be 
solved, one way or the other.

>  Sorry, I was thinking of Compuserve, who had switched over to
>  FreeBSD for some of its systems, at one point.

	I don't have any direct knowledge of the Compuserve systems.

	I can tell you that the guys at Compuserve appeared to be 
blissfully unaware of many scaling issues when they had one millions 
customers and AOL had five million.  I don't understand why, but 
somewhere between those two numbers, a change in scale had become a 
change in kind.

>  I have read it.  The modifications he proposes are small ones,
>  which deal with impedence issues.  They are low hanging fruit,
>  available to a system administrator, not an in depth modification
>  by a software engineer.

	The point is that these low-hanging fruit were enough to get Nick 
to a point where he could serve multiple millions of customers using 
this technology, and he didn't need to go any further.

	That same design was used by Nick and the other consultants at 
Sendmail for a number of early customers, the largest publicly known 
member of which was FlashNet with about ten million customers.  There 
were others, even larger, but their names have been withheld at their 
request.


	Sendmail has since moved on to SAMS, which is much more 
full-features, scalable, etc....  But the original starting point was 
all Nick's work, and it did quite a lot for what little was done.

>>          True enough.  However, this would imply that the sort of thing
>>  that Nick has done is not possible.  He has demonstrated that this is
>>  not true.
>
>  *You've* demonstrated it, or you would just adopt his solution
>  wholesale.  The issue is that his solution doesn't scale nearly
>  as well as is possible, it only scales "much better than Open
>  Source on its own".

	I can't adopt his solution.  He did POP3, I'm doing IMAP.

	The mailbox formats have to change, because we have to assume 
multiple simultaneous processes accessing it (unlike POP3).  He did 
just fine with mailbox locking (or methods to work around that 
problem).  I need message locking (or methods to work around that 
problem).  There are a whole series of other domino-effect changes 
that end up making the end solution totally different.

	Simply put, there just aren't that many medium-scale IMAP 
implementations in the world, period.  Even after my LISA 2000 paper, 
there still haven't been *any* truly large-scale IMAP 
implementations, despite things like 
<http://www-1.ibm.com/servers/esdd/articles/sendmail/>, 
<http://www.networkcomputing.com/1117/1117f1.html?ls=NCJS_1117bt>, 
<http://store.sendmail.com/pdfs/whitepapers/wp_samscapacity.pdf>, 
<http://store.sendmail.com/pdfs/whitepapers/wp_samscapacity.zseries.pdf>, 
and <http://www.dell.com/downloads/global/topics/linux/sendmail.doc>.

	Certainly, so far as I can tell, none of them have used NFS as 
the underlying mailbox storage method.

>  Try an experiment for me: tune the bejesus out of a FreeBSD box
>  with 4G of RAM.  Do everything you can think of doing to it, in
>  the same time frame, and at the same level of depth of understanding
>  that Nick applied to his system.  Then come back, and tell me two
>  numbers: (1) Maximum number of new connections per second before
>  and after, and (2) Total number of simultaneous connections, before
>  and after.

	Give me such a box and wait until I've gotten this project out of 
the way, and I'll be glad to do this sort of thing.  I'm setting up 
my own consulting business, and a large part of the work I want to do 
is in relation to research on scaling issues.  This would be right up 
my alley.

	But, I can't buy boxes like this for myself.

>  It doesn't have to be, that's agreed, but it takes substantially
>  more investment than it would cost to build out using multiple
>  instances of commercial software, plus the machines to run it, to
>  "brute force" the problem.  Or the resulting system ends up being
>  fragile.

	Operations and maintenance is going to be significantly higher, 
that much I can guarantee you.

>  UW is not the place you should look.  Stanford (as I said)
>  already has a deployed system, and they are generally helpful
>  when people want to copy what they have done.

	I'll check and see what they've done.

>  If you are looking at IMAP4, then Cyrus or a commercial product
>  are your only options, IMO, and neither will work well enough, if
>  used "as is".

	Cyrus doesn't work on NFS.  Most of the commercial products I've 
been able to find are based on Cyrus or Cyrus-like technology and 
don't support NFS, either.  The ones I've been able to find that 
would (theoretically) support NFS are based on Courier-IMAP, and run 
on Linux on PCs.

	One of the other can't-change criteria for this system is that it 
has to run on SPARC/Solaris, so for example Bynari Insight Server is 
not an option.

>  How many maildrops does this need to support?  I will tell you if
>  your project will fail.  8-(.

	~1800 LAN e-mail clients initially, quickly growing to 
~3000-4000, and possible growth to ~6,000-10,000.

	Not counting headers, during one week of fairly typical activity 
for the initial ~1800 users, message size distributions were 
(measured in terms of bytes):

		Mininum:                        0
		5th Percentile:                 328
		10th Percentile:                541
		25th Percentile:                623
		Median:                         1424
		75th Percentile:                4266
		90th Percentile:                41743
		95th Percentile:                159314
		Maximum:                        41915955
		Mean:                           66502
		Sample Std. Deviation:          553042

	For the initial ~1800 users, the mailbox distributions are (bytes):

		Mininum:                        0
		5th Percentile:                 318
		10th Percentile:                318
		25th Percentile:                318
		Median:                         595430
		75th Percentile:                919726.25
		90th Percentile:                9026371
		95th Percentile:                25530278
		Maximum:                        200702940
		Mean:                           4.02673e+06
		Sample Std. Deviation:          1.32811e+07

	For the initial ~1800 users, during the same sample time above, 
message arrival rates per second were:

		Mininum:                        1
		5th Percentile:                 1
		10th Percentile:                1
		25th Percentile:                1
		Median:                         1
		75th Percentile:                1
		90th Percentile:                2
		95th Percentile:                2
		Maximum:                        28
		Mean:                           1.20909
		Sample Std. Deviation:          0.577905

	For the initial ~1800 users, during the same sample time above, 
message arrival rates per minute were:

		Mininum:                        1
		5th Percentile:                 1
		10th Percentile:                2
		25th Percentile:                3
		Median:                         6
		75th Percentile:                17
		90th Percentile:                25
		95th Percentile:                28
		Maximum:                        419
		Mean:                           10.4627
		Sample Std. Deviation:          11.2166

	For the initial ~1800 users, during the same sample time above, 
message arrival rates per hour were:

		Mininum:                        113
		5th Percentile:                 153
		10th Percentile:                186
		25th Percentile:                240
		Median:                         360.5
		75th Percentile:                1134
		90th Percentile:                1388
		95th Percentile:                1498
		Maximum:                        1844
		Mean:                           614.102
		Sample Std. Deviation:          489.391

	For the initial ~1800 users, during the same sample time above, 
message arrival rates per day were:

		Mininum:                        4883
		5th Percentile:                 4883
		10th Percentile:                4883
		25th Percentile:                7763
		Median:                         17047
		75th Percentile:                17467
		90th Percentile:                21458
		95th Percentile:                21458
		Maximum:                        21458
		Mean:                           14056.1
		Sample Std. Deviation:          6333.29

	For the initial ~1800 users, during the same sample time above, 
the distribution of number of recipients per message was:

		Mininum:                        0
		5th Percentile:                 1
		10th Percentile:                1
		25th Percentile:                1
		Median:                         1
		75th Percentile:                1
		90th Percentile:                2
		95th Percentile:                3
		Maximum:                        294
		Mean:                           1.33054
		Sample Std. Deviation:          3.03305

-- 
Brad Knowles, <brad.knowles@skynet.be>

"They that can give up essential liberty to obtain a little temporary
safety deserve neither liberty nor safety."
     -Benjamin Franklin, Historical Review of Pennsylvania.

GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+
!w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++)
tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-chat" in the body of the message

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?a05200f5bba7128081b43>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation