Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 16 Feb 2003 03:02:52 +0100
From:      Brad Knowles <brad.knowles@skynet.be>
To:        Terry Lambert <tlambert2@mindspring.com>
Cc:        Brad Knowles <brad.knowles@skynet.be>, Rahul Siddharthan <rsidd@online.fr>, freebsd-chat@freebsd.org
Subject:   Re: Email push and pull (was Re: matthew dillon)
Message-ID:  <a05200f02ba7492f1c777@[10.0.1.4]>
In-Reply-To: <3E4DB5DE.D3EA1FE4@mindspring.com>
References:  <20030211032932.GA1253@papagena.rockefeller.edu>						 <a05200f2bba6e8fc03a0f@[10.0.1.2]>									 <3E498175.295FC389@mindspring.com>								 <a05200f37ba6f50bfc705@[10.0.1.2]>								 <3E49C2BC.F164F19A@mindspring.com>							 <a05200f43ba6fe1a9f4d8@[10.0.1.2]>							 <3E4A81A3.A8626F3D@mindspring.com>						 <a05200f4cba70710ad3f1@[10.0.1.2]>						 <3E4B11BA.A060AEFD@mindspring.com>					 <a05200f5bba7128081b43@[10.0.1.2]>					 <3E4BC32A.713AB0C4@mindspring.com>				 <a05200f07ba71ee8ee0b6@[10.0.1.2]>				 <3E4CB9A5.645EC9C@mindspring.com>	 <a05200f14ba72aae77b18@[10.0.1.2]>	 <3E4D7702.8EE51F54@mindspring.com> <a05200f01ba734065f08e@[10.0.1.4]> <3E4DB5DE.D3EA1FE4@mindspring.com>

next in thread | previous in thread | raw e-mail | index | archive | help
At 7:37 PM -0800 2003/02/14, Terry Lambert wrote:

>  I like to build systems that are inherently scalable, so that you
>  can just throw more resources at them.

	The overall architecture that has been delivered should scale to 
tens of thousands, hundreds of thousands, or millions of users.  It's 
the same architecture I delivered at LISA 2000.

	The one thing that probably won't scale is the NFS server.  But 
that's not something I can change.

>  You have to specify it on the NFS server system,

	Do you know how to do that with NetApp filers?

>                                                   or you specify it
>  on the NFS client system, so the transactions are not attempted,

	Already tried that.  However, mount_nfs complained that "noatime" 
was not a recognized option.

>                                                                   if
>  you care about the request going over the wire and being ignored,
>  rather than just being ignored.  Since the issue you are personally
>  fighting is write latency for requests you should probably not be
>  making (8-)), you ought to work hard on this.

	I agree that expensive operations should be avoided if at all 
possible, but re-writing the entire OS, the entire NFS server code, 
filesystem code, or other significant components of the system is not 
an option.  We have to deliver in a matter of just a very few weeks 
-- we can't afford to take five years to make this happen.

>  If your NFS client systems are FreeBSD, then it's a fairly minor
>  hack to add the option the the NFS client code.  If your NFS server
>  is FreeBSD, then turning it off on the exported mount is probably
>  sufficient.
>
>  If neither is FreeBSD... well, can you switch to FreeBSD?

	Regretfully not.  Moreover, even if we could, it would only be on 
the client.  That only solves part of the problem, and perhaps not 
the biggest part.

>  Engineering.

	How much?  How many person-decades?

>  It's not intended to "hurt them".  It's only intended to deal with
>  mutiple recipients for a single message.  SPAM is almost the only
>  type of mail that's externally generated that gets multiple recipient
>  targets.

	We get almost no spam.  We do get some, and some people have 
complained.  The company took an internal survey and only something 
like 10% of the respondents felt that the company should spend any 
money to try to curb this problem, and most of the respondents felt 
that they'd be perfectly happy using client-side features to solve 
this issue.

>            The point is not to "hurt" them (if you wanted that, you
>  would run RBL or ORBS or SPEWS or ... and not accept connections from
>  their servers in the first place), but to mitigate their effect on
>  your storage costs.

	Spam doesn't come in 44MB messages.  The amount of disk storage 
required for spam is almost nothing.  My primary disk space problem 
is friggin' morons in marketing or product research that seem to feel 
that they have a God-given right to send the entire contents of their 
hard drives via e-mail, or people illegally sending entire video 
movies or illegal mp3s via e-mail.

	The worst thing of all is when these idiots send this kind of 
crap to someone and they are ignorant of where that person is 
located.  Many of the remote offices are behind very slow links (ISDN 
or 56k), and have ancient mail/everything else servers that have 
very, very little disk space for /var/spool/mqueue or /var/spool/mail.

	When some bloody VP in the US insists on sending an 50MB e-mail 
message on a daily basis to a guy in Asia (which is served through 
the European hub, which we run), and the mail server the guy in Asia 
doesn't even have 50MB free in /var/spool/mqueue on a good day, that 
is a much, much bigger headache.


	Trust me, disk space problems due to spam are the least of my worries.

>                       Note that this is the same philosophy you've been
>  espousing all along, with quotas: you don't care if it causes a problem
>  for your users, only if it causes a problem for you.

	No, I do care.  However, I have to be practical.  Moreover, if 
management has taken an internal survey on a subject and almost none 
of the respondents support the idea of actually spending money to try 
to solve the problem, then I can't really take this as a mandate to 
go convince management that they have to delay the project (and other 
projects depending on ours freeing up resources that they need).

>  Internally, you have a higher connectedness between users, so you
>  get much larger than your 1.3 multiplier, and for email lists, it's
>  higher still.

	They don't really have any mailing lists.  I installed the 
majordomo server a few months ago, and there has been relatively 
little use of it.  Part of that is because the current mail system is 
so screwed up that it can only be used for one-way announcements, but 
even after our replacement is online, I don't expect them to make all 
that much use of mailing lists.

>                 In fact, I would go so far as to say that DJB's idea
>  of sending a reference is applicable to email list messages, only
>  the messages would be stored on the list server, instead of on the
>  sender machine.

	In essence, they do this already, and have been doing it for 
quite some time.  They have a very healthy community of internal 
USENET news participants, and a very broad array of internal news 
groups that they have set up.  So, we don't need to re-invent the 
entirety of Internet e-mail for them.

>  At the point that you no longer care which machine you send a user
>  connection to to retrieve their mail, then you no longer care where
>  you send the mail,

	No, that's precisely the problem.  I *do* care where I send them 
to get their mail, or where their mail is sent.  But what I want to 
do is to hide from them where this location is, because they *don't* 
care (or shouldn't have to).

>                     or if the mail is single instance multiple
>  time, a real replica, or a virtual replica (SIS).  It takes a small
>  amount of additional work.

	I think the BS meter just broke.

	Sorry, guy.  Re-writing the entire mail system just to implement 
SIS seems like the dumbest possible idea.  Until you've done this for 
UW-IMAP, Courier-IMAP, or Cyrus, and you're willing to release that 
code back to the public, I don't think you've got a leg to stand on.

>  The repeated mailing ("mail bombing") that started this thread is,
>  or should be, simple to detect.

	Could be.  How would you do it?  Statistical techniques to 
discern how often a particular account normally receives mail based 
on the last five minutes, the last hour, day, or comparing today 
against a similar time period from yesterday, or last week, or last 
month?

	The cfengine technique does have a lot going for it, but for a 
person who normally receives a lot of e-mail, there'd be a good 
chance that it wouldn't detect a "mail bombing" of this sort.  You'd 
have to go into scanning the content of the message for that -- too 
many messages in a given period of time that have essentially the 
same body content, or whatever.  But what period of time?

>  Yes, it's a trivial case, but it's the most common case.  You don't
>  have to go to a compute-intensive technique to deal with it.

	Once you start looking at the message body content, you've 
already started down a slippery slope.  You've told the user that you 
will guarantee that they don't ever get mail-bombed, but as the 
incoming messages are more and more diverse, at what point do you 
decide that they really are different?

	What can you say to the user when you've accidentally thrown away 
a legitimate mail message because it seemed to be too similar to a 
previous one that they had received?

>  You are storing the reference wrong.  Use an encapsulated reference,
>  not a hard link.

	Give me the file system to do that.  Then port it to Solaris and 
NetApp filer.

>                    That will permit the metadata operations to occur
>  simultaneously, instead of constraining them to occur serially, like
>  a link does.

	Hell, if it can do all this kind of magic, just give it to me for 
FreeBSD.  Who needs softupdates, or dirprefs, or dirhash, or any of 
that other crap, when you've got TerrysMagicFS?

>  You keep saying this, and then you keep arranging the situation
>  (order of operations, FS backing store, networ transport protocol,
>  etc.) so that it's true, instead of trying to arrange them so it
>  isn't.

	No, the system is what it is.  There are certain things that 
cannot be changed.  One of them is the NetApp filers for NFS. 
Another is Solaris/SPARC.  Another is that they absolutely, 
positively, do not want *ANY* source code changes made whatsoever to 
any of the programs or packages they are using.

	They're already looking funny at me for specifying that we should 
be using sendmail from sendmail.org as opposed to the vendor-supplied 
sendmail which we actually commit the mortal sin of compiling 
ourselves, or for actually going in and making source-code level 
changes to procmail (which are required if you want to use Maildir 
format mailboxes, or a hashed mailbox structure).


	I am honestly looking for advice on how I can make this work as 
well as I can, and you keep telling me that the only way to do 
anything at all is to completely rewrite the entire OS of both the 
client and the server.

	Granted, I shouldn't be looking for help on non-FreeBSD systems 
on a FreeBSD mailing list, but what you're giving me is not very 
helpful.

	If you want to tell me that I'm screwed and I shouldn't even 
bother, that's fine.  But don't waste my time and yours by continuing 
to harangue me and stress how vitally important it is for me to 
completely re-invent all the protocols being used, completely 
re-write all the OSes and filesystems being used, etc....

>  No, we are not.  The transport protocols are the transport protocols,
>  and you are constrained to implement to the transport protocols, no
>  matter what else you do.  But you are not constrained to depend on
>  rename-based two phase commits (for example), if your FS or data
>  store exports a transaction interface for use by applications: you
>  can use that transaction interface instead.

	At the ivory tower level, I'm sure that statement makes sense. 
But don't tell me to eat cake when there isn't even any bread around.

>  I have to say, I've personally delt with "help desk" escalations
>  for problems like this, and it's incredibly labor intensive.  You
>  should always design as if you were going to have to deal with
>  100,000 customers or more, so that you put yourself in a position
>  that manual processes will not scale, and then think about the
>  problem.

	They've got 6000 employees world-wide, and are already the 
worlds-largest company in their niche.  They were spun off years ago 
by Philips, because they felt that this wasn't a core part of their 
business.  I don't think they're ever going to get particularly 
large.  And if they do, the architecture will scale, even if the 
implementation won't.


	On the other hand, I don't have to run or pay for the help desk 
operation, and when it comes to quotas, not implementing SIS, not 
re-writing the entire filesystem or OS for both the clients and the 
server, etc..., well those aren't things that my customer has paid me 
to do.

	They have set out some pretty specific guidelines as to what they 
want, and I have tried to give them the best possible architecture 
and best possible implementation that I can, within those guidelines.

>  Something simple like recognizing repetitive size/sender/subject
>  pairing on the SMTP transit server.

	And you would do that how?  And you would make use of that information how?

>  Ugh.  Would you, as a user, bet your comapny on that level of service?

	Sure.  I've always tried to provide the same level of service 
that I would want for myself, and I can be a very demanding customer. 
Indeed, I am in the process of setting up my own consulting company, 
and I will have a co-lo, where I run my own mail server.  I won't be 
going quite as far out for myself as I have for them, because my own 
personal needs aren't quite so large.  But I cannot strive to provide 
myself any better level of service than I have done for them.

>  2.8.  It's not like OpenLDAP, which needs the transactioning interfaces,
>  it's pretty straight-forward code.

	From the cyrus-2.0.14 distribution, doc/install-prerequisites.html:

      * Berkeley DB, version 3.0.55 or higher. Berkeley DB can be obtained
        from Sleepycat. It is strongly recommended that libsasl be compiled
        with Berkeley DB support, using the same version of Berkeley DB. (If
        you have a Berkeley DB version mismatch, somewhat perplexing crashes
        result.)

	From the cyrus-2.0.14 distribution, README:

		The 2.0 release contains many new features over 1.6.

		No further development work will progress on 1.6 so we
		encourage you to work with the Cyrus 2.0 code wherever
		possible.  Support for 1.6 will also be extremely limited.


	Sorry, guy.  Not an option.

>>          Certainly, when it comes to SAMS, all this stuff is pre-compiled
>>  and you don't get the option of building Berkeley DB in a different
>>  manner, etc....
>
>  Yes, you end up having to compile things yourself.

	With SAMS, you don't get that option.  That's about the only 
negative thing that I can say about it.

>  If you have control over the clients, you can avoid making update
>  requests.  If you have no control over either, well, "Bad news, Clem".

	I have no control over the clients.

>  Though... you *could* allow any of the replicas to accept and queue
>  on behalf of the primary, but then deliver only to the primary;
>  presumably you'd be able to replace a primary in 7 days.

	Doesn't help.  The front-end servers will accept and queue on 
behalf of the users.  I don't need yet another level of queueing on 
the back-end servers as well.

-- 
Brad Knowles, <brad.knowles@skynet.be>

"They that can give up essential liberty to obtain a little temporary
safety deserve neither liberty nor safety."
     -Benjamin Franklin, Historical Review of Pennsylvania.

GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+
!w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++)
tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-chat" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?a05200f02ba7492f1c777>