From owner-freebsd-chat  Thu Feb 13 19: 4:54 2003
Delivered-To: freebsd-chat@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id A054037B40A
	for <freebsd-chat@freebsd.org>; Thu, 13 Feb 2003 19:04:46 -0800 (PST)
Received: from c3po.skynet.be (c3po.skynet.be [195.238.3.237])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 4329943F85
	for <freebsd-chat@freebsd.org>; Thu, 13 Feb 2003 19:04:45 -0800 (PST)
	(envelope-from brad.knowles@skynet.be)
Received: from [10.0.1.2] (ip-26.shub-internet.org [194.78.144.26] (may be forged))
	by c3po.skynet.be (8.12.7/8.12.7/Skynet-OUT-2.21) with ESMTP id h1E33fYd018722;
	Fri, 14 Feb 2003 04:04:35 +0100 (MET)
	(envelope-from <brad.knowles@skynet.be>)
Mime-Version: 1.0
X-Sender: bs663385@pop.skynet.be
Message-Id: <a05200f07ba71ee8ee0b6@[10.0.1.2]>
In-Reply-To: <3E4BC32A.713AB0C4@mindspring.com>
References: <20030211032932.GA1253@papagena.rockefeller.edu>				
 <a05200f2bba6e8fc03a0f@[10.0.1.2]>				
 <3E498175.295FC389@mindspring.com>			
 <a05200f37ba6f50bfc705@[10.0.1.2]>			
 <3E49C2BC.F164F19A@mindspring.com>		
 <a05200f43ba6fe1a9f4d8@[10.0.1.2]>		
 <3E4A81A3.A8626F3D@mindspring.com>	
 <a05200f4cba70710ad3f1@[10.0.1.2]>	
 <3E4B11BA.A060AEFD@mindspring.com>
 <a05200f5bba7128081b43@[10.0.1.2]>
 <3E4BC32A.713AB0C4@mindspring.com>
Date: Fri, 14 Feb 2003 03:44:16 +0100
To: Terry Lambert <tlambert2@mindspring.com>
From: Brad Knowles <brad.knowles@skynet.be>
Subject: Re: Email push and pull (was Re: matthew dillon)
Cc: Brad Knowles <brad.knowles@skynet.be>,
	Rahul Siddharthan <rsidd@online.fr>, freebsd-chat@freebsd.org
Content-Type: text/plain; charset="us-ascii" ; format="flowed"
Sender: owner-freebsd-chat@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-chat.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-chat>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-chat>
X-Loop: FreeBSD.org

At 8:09 AM -0800 2003/02/13, Terry Lambert wrote:

>  OK, then why do you keep talking about I/O throughput?  Do you
>  mean *network I/O*?  Why the hell would you care about disk I/O
>  on a properly designed message store, when the bottleneck is
>  going to first be network I/O, followed closely by bus bandwidth?

	Disk I/O is many orders of magnitude slower than any other thing 
on the system.  Moreover, disk I/O suffers from issues with 
synchronous meta-data updates where entire directories must be locked 
for the entire period of time during which an update is occuring, 
thus reducing by many more orders of magnitude the number of small 
operations (e.g., file creation and deletion, renaming, updating of 
other file attributes, etc...) that we can perform in a given unit of 
time.

	This is an issue for MTAs, and is an issue for message stores, 
especially when the message stores use a meta-data intensive storage 
mechanism such as found in Maildir and Cyrus (to a lesser degree).

>  So what's the difference between not enforcing a quota, and ending
>  up with the email sitting on your disks in a user maildrop, or
>  enforcing a quota, and ending up with the email sitting on your
>  disks in an MTA queue?

	In "free" systems, quotas are frequently set ridiculously low. 
In systems with a sustainable business model, you pay for the storage 
you use.  If you want a higher quota, you pay for it (one way or 
another).  In those situations, quotas rarely need to be enforced, 
and this problem is not one that is faced very often.  In the case 
where you do have this issue, at the very least you can hold the 
message in the queue for a while, in the hope that the user will come 
clean out their mailbox.

>  Quotas are actually a strong argument for single image storage.

	SIS increases SPOFs, reduces reliability, increases complexity, 
increases the probability of hot-spots and other forms of contention, 
and all for very little possible benefit.

>  Obviously, unless setting the quota low on purpose is your revenue
>  model (HotMail, Yahoo Mail).

	As I said above, "free" systems frequently set quotas 
ridiculously low.  They are not of interest for this discussion.

>  How?  It's going to sit on your disks, no matter what, the only
>  choice you really have on it is *which* disk it's going to sit on.

	True, but it's easier for me to deal with multiple gigabyes of 
DOS crap in the mail queue than it is for the user to try to deal 
with multiple gigabytes of crap in their mailbox.  There are things 
that they need to be protected from, because they don't have the 
access or the power on their end.  If they did, they wouldn't need us.

>>          If 95-99% of all users never even notice that there is a quota,
>>  then I've solved the part of the problem that is feasible to solve.
>>  The remainder cannot possibly be solved with any quota at any level,
>>  and these users need to be dealt with separately.
>
>  Again, how?

	Outside of the DOS problem, they need education and proper 
management of their expectations.  TANSTAAFL.

>  Flood fill will only work as part of an individual infrastructure,
>  not as part of a shared infrasstrusture, if what you are trying to
>  sell is to be any different from what everyone else is giving away
>  for free.

	Ahh, something akin to the Yasushi model.  See 
<http://www.shub-internet.org/brad/papers/dihses/lisa2000/sld038.htm>.

	When restricted to the network internal to the mail system, 
replicating the mailbox over multiple servers is not a bad idea, 
although I don't think it matters so much what replication model you 
use.

>>          If you store them on the recipient system, you have what exists
>>  today for e-mail.  Of the three, this is the only one that has proved
>>  sustainable (so far) and sufficiently reliable.
>
>  This argument is flawed.  Messages are not stored on recipient
>  systems, they are stored on the systems of the ISP that the
>  recipient subscribes to.

	That's what I was calling the "recipient system".  It is the 
system where the message was received.

>  Yet those same guarantees are specifically disclaimed by HotMail
>  and other "free" providers, even though there is no technological
>  difference between a POP3 maildrop hosted at EarthLink and accessed
>  via a mail client, and a POP3/IMAP4 maildrop hosted at HotMail and
>  accessed via a mail client.

	Again, you're referencing situations that I consider to be 
irrelevant to the discussion.  I don't give a flying flip about the 
poor business model they employ.  I care about real systems that are 
paid for by real people and real companies.

>  Who the hell uses IDE on servers?!?  Get real!  You can't detach an
>  IDE drive during the data transfer on a write, so tagged command
>  queueing only works for *reading* data.  For a server that does writes,
>  you use *SCSI* (or something else, but *not* IDE).

	Okay, so two 15kRPM SCSI hard drives, or FibreChannel.  The type 
of interface doesn't matter when you're talking about a number of 
disks that is grossly inadequate to the task.

>  I think I see the misunderstanding here.  You think IDE disks are
>  server parts.  8-).

	No, not at all.  I think that focusing on disk storage capacity 
and not paying attention to disk I/O latency and I/O capacity is pure 
folly.

>  Use SCSI, or divide the load between a number of IDE spindles
>  equal to the tagged command queue depth for a single SCSI drive
>  (hmmm... should I buy five SCSI drives, or should I buy 500 IDE
>  drives?).

	See above.  Regardless of the drive interface technology, what's 
important is the I/O latency and the I/O capacity.

>  It gets rid of the quota problem.

	No, not at all.  You eliminate damn few duplicate messages, you 
greatly increase system complexity, you increase SPOFs, you increase 
system hot-spots, you reduce system reliability (and replication, 
something which you seem to be so fond of), and all for very, very 
little benefit.

	Try taking a real-world mail server and processing the logs. 
Count the number of recipients per message and see just how much 
space you'd actually save.  I did that, and included my numbers in 
the previous message -- an average of ~1.3 recipients per message.

	You want to do all this for about 30% savings?!?

>  Heck, you could even store your indices on a SCSI drive, and then
>  store your SIS on an IDE drive, if you wanted.

	See above.  This is pointless.

>  Mark's wrong.  His assumptions are incorrect, and based on the
>  idea that metadata updates are not synchronous in all systems.

	Meta-data updates are at least partially synchronous on all 
systems I know of.  Well, unless you are running with asynchronous 
mounts, but if you're doing that then you shouldn't be running a mail 
system until you understand why that's a bad idea.

	Even if they're not synchronous, they're still bottlenecks to be 
avoided if possible.

>  Cyrus is much closer to commercial usability, but it has it's own
>  set of problems, too.

	It is somewhat closer.  If you want real commercial usability, 
you have to start with the MessagingDirect code, which is based on 
Cyrus but with lots of bug fixes, increased reliability and 
robustness, etc....  Then you graduate to Sendmail Advanced Message 
Server, which takes that to the next level.

>>          Either way, locking is a very important issue that has to be
>>  solved, one way or the other.
>
>  No, it's a very important issue that has to be designed around,
>  rather than implemented.

	Somebody said that when they invented Maildir.  I didn't believe 
it then, and I don't believe it now.

>  Yes, and no.  It's very easy to paint a rosy picture in a technical
>  paper, particularly when you are in a position to need to obtain
>  funding.

	Nick didn't need any funding.  He was describing a project that 
was largely complete, and which he had already left by that time.  He 
definitely made use of that design at various customer sites while 
working for Sendmail, but he couldn't possibly have known that at the 
time.

>  You are unlikely to ever find someone using NFS in this capacity,
>  except as a back end for a single server message store.

	Show me an IMAP server that actually implements SIS.  I don't know of any.

>  The point was that, without making changes requiring an in depth
>  understanding of the code of the components involved, which Nick's
>  solution doesn't really demonstrate, you're never going to get more
>  than "marginally better" numbers.

	Could be.  In that case, we may have to find an alternative 
message store solution.  If I can prove that this really is a 
problem, then I'll try to help them find a suitable SAN solution and 
then drop in SAMS.  If not, I may end up writing a paper or doing 
another invited talk.

>  It works on NFS.  You just have to run the delivery agent on the
>  same machine that's running the access agent, and not try to mix
>  multiple hosts accessing the same data.

	Nope.  mmap on NFS doesn't work.

>  I understand you want a distributed, replicated message store, or
>  at least the appearance of one, but in order to get that, well,
>  you have to "write a distributed, replicated message store".

	A distributed, replicated message store would be nice, but is not 
strictly a requirement of this solution.  One thing that was 
originally given as an absolute requirement was to find a way to put 
an e-mail front end on NFS.  The distributed, replicated message 
store was a side-effect.

	Indeed, the architecture already has a concept of a primary 
server for a particular mailbox (as determined by LDAP), the only 
thing we'd have to change is whether or not that mailbox was also 
accessible from the other servers.  However, we do have only one 
message store mount point at the moment.

>  The part of Netscape that Sun bought used to provide an IMAP4
>  server (based on heavily modified UW IMAP code).  Is there a
>  reason you can't use that?  I guess the answer must be "I have
>  been directed to use Open Source".  8-).

	Actually, no.  They would much prefer commercial software. 
However, they don't have any money to spend on software, and I know 
from personal experience that the Netscape/iPlanet stuff doesn't 
scale.  Indeed, we're already in the process of scrapping all other 
Netscape/iPlanet software because we've had excessive problems with 
it.

>  This should be no problem.  You should be able to handle this
>  with a single machine, IMO, without worrying about locking, at
>  all.

	Remember, Maildir doesn't do locking.

>        10,000 client machines is nothing.

	10,000 LAN clients?  With 44MB messages and 200MB mailboxes?  On 
NFS?  Sorry, my testing so far indicates that this is a significant 
load and we need to take care to make sure that it is handled 
properly.

>                                            At worst, you should
>  seperate inbound and outbound SMTP servers,

	Already planned.

>                                              so you can treat the
>  inbound one as a bastion host, and keep the outbound entirely
>  inside, and the inbound server should use a transport protocol
>  for internal delivery to the machine running the IMAP4 server,
>  which makes lockign go away.

	How does locking go away?  Through Maildir?  Or did you have 
something else in mind?

>                                At worst, you can limit the number
>  of bastion to internal server connections, which will make things
>  queue up at the bastion, if you get a large activity burst, and
>  let it drain out to the internal server, over time.

	I'm not worried about internal SMTP connections.  But we have to 
be careful to make sure we don't put any additional limits on POP3 or 
IMAP connections.

>                                                       At most,
>  you are well under 40,000 simultaneous TCP connections to the
>  IMAP4 server host, even if you are using OutLook, people have
>  two mailboxes open, each, and are monitoring incoming mail in
>  several folders.

	Sorry, I am still not convinced.

-- 
Brad Knowles, <brad.knowles@skynet.be>

"They that can give up essential liberty to obtain a little temporary
safety deserve neither liberty nor safety."
     -Benjamin Franklin, Historical Review of Pennsylvania.

GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+
!w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++)
tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-chat" in the body of the message