Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 12 Feb 2003 09:17:23 -0800
From:      Terry Lambert <tlambert2@mindspring.com>
To:        Brad Knowles <brad.knowles@skynet.be>
Cc:        Rahul Siddharthan <rsidd@online.fr>, freebsd-chat@freebsd.org
Subject:   Re: Email push and pull (was Re: matthew dillon)
Message-ID:  <3E4A81A3.A8626F3D@mindspring.com>
References:  <20030211032932.GA1253@papagena.rockefeller.edu>	 <a05200f2bba6e8fc03a0f@[10.0.1.2]>	 <3E498175.295FC389@mindspring.com> <a05200f37ba6f50bfc705@[10.0.1.2]> <3E49C2BC.F164F19A@mindspring.com> <a05200f43ba6fe1a9f4d8@[10.0.1.2]>

next in thread | previous in thread | raw e-mail | index | archive | help
Brad Knowles wrote:
> At 7:42 PM -0800 2003/02/11, Terry Lambert wrote:
> >  Actually, I disagree with this presentation.  There is an assumption
> >  of "sufficient storage" there in the first place, when in practice,
> >  sufficient storage is commonly a prime economic issue.  If people
> >  were satisfied with the job "the big providers" were doing, there
> >  would be no little providers.
> 
>         In practice, the sole limiting factor of mail systems is
> synchronous meta-data update latency and related I/O throughput.
> You're far, far better off getting large numbers of smaller disks and
> putting them in a RAID 1+0 environment (or even RAID 1+0+0), with a
> large enough stripe size that almost all transactions can be taken
> care of as one logical operation.

In terms of I/O throughput, you are right.

But we are not interested in I/O throughput, in this case, we
are interested in minimizing dynamic pool size, for a given
pool retention time function, over a given input and output
volume.

It's best if you consider each POP3 maildrop as a queue.


>         This is very much like running a large-scale USENET news server.
> The total quantity of disk space is meaningless, if you don't have
> enough heads working for you in parallel.

The Usenet parallel is probably not that apt.  Usenet provides
an "Expires:" header, which bounds the pool retention time to a
fixed interval, reagardless of volume.

In the POP3 case, the volume is bounded, reagardless of time.  It
is a different problem, entirely, in terms of email servers which
contain maildrops.

I think the problem is that you are trying to apply transit server
mechanics to what is, effectively, a destination queue, as opposed
to a destination maildrop.

Even if it were a maildrop, if you enforce a quota, you can make
the argument for fixed time: once the pool fills (the maildrop
goes over quota), then the messages pile up in the main mail
queue (without an MDA reservation protocol, anyway), and the
time a message may remain in the main queue undelivered is not
relevent to volume, but the volume is bound to the quota.  There
is just an added element of hysteresis in the main queue retention
limit (e.g. 4 days).  Even so, this is only effective is you can
guarantee delivery will occur through the maildrop being brought
below quota in that interval -- which you can not do, since that
is not under your control [ Actually, I'd argue that your queue
retention time would have to be in excess of twice the maximum
polling interval, plus one, for it to become a factor again ].



> So, to solve the disk space issue, you just buy some slightly larger disks.
> 
>         Been there, done that.  We learned this lesson a long time ago at
> AOL.  Doing single-instance-store is a false economy, and indeed is
> one of the key limiting factors for LAN e-mail packages like cc:Mail,
> Lotus Notes, Microsoft Mail, or Microsoft Exchange.  This is one of
> the primary reasons why they don't scale.

Again, I disagree.  Poor design is why they don't scale.


> >  As to the metadata updates: I'm not positive this is the case,
> >  though it is certainly the case in "sendmail" and certain other
> >  MTAs.  It's really implementation dependent, I think, and the
> >  slide assumes a particular implementation.
> 
>         These slides have absolutely nothing whatsoever to do with the
> MTA.  They have to do with the mailbox, mailbox delivery, mailbox
> access, etc....  You need message locking in some fashion, you may
> need mailbox locking, and most schemes for handling mailboxes involve
> either re-writing the mailbox file, or changing file names of
> individual messages (or changing their location), etc....  These are
> all synchronous meta-data operations.

You do not need all the locking you state you need, at least not
at that low a granularity.  If you want to talk AOL, AOL used to
use VMS systems, which used record locking.


>         See
> <http://www.shub-internet.org/brad/papers/dihses/lisa2000/sld062.htm>;
> through
> <http://www.shub-internet.org/brad/papers/dihses/lisa2000/sld081.htm>.
> Then ask yourself how these issues relate to the operation of
> UW-IMAP, Courier-IMAP, and Cyrus.

I will have to examine these references at a later time.

>         If you want a detailed discussion of these points, I'll be more
> than happy to get into this.

...and then decide on this.


>         However, keep in mind that I've already had a deep discussion of
> these points with Nick Christenson (author of the book _Sendmail
> Performance Tuning_,

Sendmail performance tuning is not the issue, although if you
are a transit server for virtual domains, you should rewrite the
queueing algorithm.  See:

	ftp://ftp.whistle.com/pub/misc/sendmail/


> as well as the classic paper "A Highly Scalable
> Electronic Mail Service Using Open Systems" at
> <http://www.jetcafe.org/~npc/doc/mail_arch.html>, among others) and
> other large-scale Internet e-mail experts, and I don't think that
> we're likely to add much to this topic here.

The Open Source book is wrong.  You can not build such a system
without significant modification.  My source tree, for example,
contains more than 6 million lines of code at this point, and
about 250,000 of those are mine, modifying Cyrus, modifying
OpenLDAP, modifying Sendmail, modifying BIND, etc..


> >  The scaling issue at 150 is an implementation detail specific to
> >  the implementations you are referencing.  My advice is "get some
> >  real software".  8-).  You can scale something like this from
> >  Open Source components and about 3 months of concentrated coding
> >  to at least 50,000 per indivisible component cluster, and then
> >  throw hardware at it.
> 
>         Concentrated coding?  Open source?  Where?  What specific pieces
> of software are you envisioning?  Who would be doing the coding?  If
> it's that simple, then why hasn't this code already been written?

Because Open Source projects are inherently incapable of doing
productization work.  It's antithetical to their nature.  They
are also generally incapable of doing systems integration.  That's
also antithetical to their nature, since systems integration
required metacooperation between projects.

This is why commercial software really has nothing to fear from
pure Open Source.


>         I'm trying to build a LAN e-mail system today using open source
> software because certain constraints have been put on this project
> (e.g., they have literally $0 to spend on new hardware, mailboxes
> must be stored on NFS, the old multi-purpose machines must be
> replaced and a new mail system must be able to gradually take over),
> and I was painted into a corner before I ever became a member of this
> project.

$0 is not really true.  They are paying for you, in the hopes
that it will end up costing less than a prebuilt system.


>         I've done the best I can in the circumstances available, but I am
> still very, very concerned that given the best available solutions I
> can find for each of the components involved, the replacement still
> is not going to perform well enough to be considered anything other
> than "b0rken".

Contact Stanford, MIT, or other large institutions which have
already deployed such a system.


> >  The normal implementation to avoid single point of failure is
> >  replication of the data.  At that point, the replication
> >  protocol itself becomes (essentially) a transport.  The worst
> >  case failure mechanism, in that case, is a previously deleted
> >  message reappears.
> 
>         Correct, in theory.  Where's the practice?

Not in Open Source; Open Source does not perform productization or
systems integration.



> >  I think they do it on the basis of their name, and on the basis
> >  of the idea that "you can put any SQL server behind MS Exchange,
> >  so use ours, it's better than Microsoft's".
> 
>         Uh, no.  Take a closer look at the pitch.  They are completely
> and totally replacing Exchange.  There are no Microsoft components
> left.  The architecture is totally unlike Exchange.  The only thing
> they're doing is pitching a solution that is compatible with Exchange
> at the highest protocol levels, enough to fool Outlook.

I was unaware of this from the pitch on the billboards and WSJ
advertisements.  I rather expect anyone buying it will make the
same assumptions I did, that when they were talking about making
"MS Exchange Rock Solid", they were talking about leaving it there.
8-).

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-chat" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3E4A81A3.A8626F3D>