From owner-freebsd-chat Sat Feb 15 18: 3: 9 2003 Delivered-To: freebsd-chat@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 89D7A37B401 for ; Sat, 15 Feb 2003 18:03:00 -0800 (PST) Received: from picard.skynet.be (picard.skynet.be [195.238.3.88]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6835343FA3 for ; Sat, 15 Feb 2003 18:02:58 -0800 (PST) (envelope-from brad.knowles@skynet.be) Received: from [10.0.1.4] (ip-26.shub-internet.org [194.78.144.26] (may be forged)) by picard.skynet.be (8.12.7/8.12.7/Skynet-OUT-2.21) with ESMTP id h1G22nHh017181; Sun, 16 Feb 2003 03:02:54 +0100 (MET) (envelope-from ) Mime-Version: 1.0 X-Sender: bs663385@pop.skynet.be Message-Id: In-Reply-To: <3E4DB5DE.D3EA1FE4@mindspring.com> References: <20030211032932.GA1253@papagena.rockefeller.edu> <3E498175.295FC389@mindspring.com> <3E49C2BC.F164F19A@mindspring.com> <3E4A81A3.A8626F3D@mindspring.com> <3E4B11BA.A060AEFD@mindspring.com> <3E4BC32A.713AB0C4@mindspring.com> <3E4CB9A5.645EC9C@mindspring.com> <3E4D7702.8EE51F54@mindspring.com> <3E4DB5DE.D3EA1FE4@mindspring.com> Date: Sun, 16 Feb 2003 03:02:52 +0100 To: Terry Lambert From: Brad Knowles Subject: Re: Email push and pull (was Re: matthew dillon) Cc: Brad Knowles , Rahul Siddharthan , freebsd-chat@freebsd.org Content-Type: text/plain; charset="us-ascii" ; format="flowed" Sender: owner-freebsd-chat@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org At 7:37 PM -0800 2003/02/14, Terry Lambert wrote: > I like to build systems that are inherently scalable, so that you > can just throw more resources at them. The overall architecture that has been delivered should scale to tens of thousands, hundreds of thousands, or millions of users. It's the same architecture I delivered at LISA 2000. The one thing that probably won't scale is the NFS server. But that's not something I can change. > You have to specify it on the NFS server system, Do you know how to do that with NetApp filers? > or you specify it > on the NFS client system, so the transactions are not attempted, Already tried that. However, mount_nfs complained that "noatime" was not a recognized option. > if > you care about the request going over the wire and being ignored, > rather than just being ignored. Since the issue you are personally > fighting is write latency for requests you should probably not be > making (8-)), you ought to work hard on this. I agree that expensive operations should be avoided if at all possible, but re-writing the entire OS, the entire NFS server code, filesystem code, or other significant components of the system is not an option. We have to deliver in a matter of just a very few weeks -- we can't afford to take five years to make this happen. > If your NFS client systems are FreeBSD, then it's a fairly minor > hack to add the option the the NFS client code. If your NFS server > is FreeBSD, then turning it off on the exported mount is probably > sufficient. > > If neither is FreeBSD... well, can you switch to FreeBSD? Regretfully not. Moreover, even if we could, it would only be on the client. That only solves part of the problem, and perhaps not the biggest part. > Engineering. How much? How many person-decades? > It's not intended to "hurt them". It's only intended to deal with > mutiple recipients for a single message. SPAM is almost the only > type of mail that's externally generated that gets multiple recipient > targets. We get almost no spam. We do get some, and some people have complained. The company took an internal survey and only something like 10% of the respondents felt that the company should spend any money to try to curb this problem, and most of the respondents felt that they'd be perfectly happy using client-side features to solve this issue. > The point is not to "hurt" them (if you wanted that, you > would run RBL or ORBS or SPEWS or ... and not accept connections from > their servers in the first place), but to mitigate their effect on > your storage costs. Spam doesn't come in 44MB messages. The amount of disk storage required for spam is almost nothing. My primary disk space problem is friggin' morons in marketing or product research that seem to feel that they have a God-given right to send the entire contents of their hard drives via e-mail, or people illegally sending entire video movies or illegal mp3s via e-mail. The worst thing of all is when these idiots send this kind of crap to someone and they are ignorant of where that person is located. Many of the remote offices are behind very slow links (ISDN or 56k), and have ancient mail/everything else servers that have very, very little disk space for /var/spool/mqueue or /var/spool/mail. When some bloody VP in the US insists on sending an 50MB e-mail message on a daily basis to a guy in Asia (which is served through the European hub, which we run), and the mail server the guy in Asia doesn't even have 50MB free in /var/spool/mqueue on a good day, that is a much, much bigger headache. Trust me, disk space problems due to spam are the least of my worries. > Note that this is the same philosophy you've been > espousing all along, with quotas: you don't care if it causes a problem > for your users, only if it causes a problem for you. No, I do care. However, I have to be practical. Moreover, if management has taken an internal survey on a subject and almost none of the respondents support the idea of actually spending money to try to solve the problem, then I can't really take this as a mandate to go convince management that they have to delay the project (and other projects depending on ours freeing up resources that they need). > Internally, you have a higher connectedness between users, so you > get much larger than your 1.3 multiplier, and for email lists, it's > higher still. They don't really have any mailing lists. I installed the majordomo server a few months ago, and there has been relatively little use of it. Part of that is because the current mail system is so screwed up that it can only be used for one-way announcements, but even after our replacement is online, I don't expect them to make all that much use of mailing lists. > In fact, I would go so far as to say that DJB's idea > of sending a reference is applicable to email list messages, only > the messages would be stored on the list server, instead of on the > sender machine. In essence, they do this already, and have been doing it for quite some time. They have a very healthy community of internal USENET news participants, and a very broad array of internal news groups that they have set up. So, we don't need to re-invent the entirety of Internet e-mail for them. > At the point that you no longer care which machine you send a user > connection to to retrieve their mail, then you no longer care where > you send the mail, No, that's precisely the problem. I *do* care where I send them to get their mail, or where their mail is sent. But what I want to do is to hide from them where this location is, because they *don't* care (or shouldn't have to). > or if the mail is single instance multiple > time, a real replica, or a virtual replica (SIS). It takes a small > amount of additional work. I think the BS meter just broke. Sorry, guy. Re-writing the entire mail system just to implement SIS seems like the dumbest possible idea. Until you've done this for UW-IMAP, Courier-IMAP, or Cyrus, and you're willing to release that code back to the public, I don't think you've got a leg to stand on. > The repeated mailing ("mail bombing") that started this thread is, > or should be, simple to detect. Could be. How would you do it? Statistical techniques to discern how often a particular account normally receives mail based on the last five minutes, the last hour, day, or comparing today against a similar time period from yesterday, or last week, or last month? The cfengine technique does have a lot going for it, but for a person who normally receives a lot of e-mail, there'd be a good chance that it wouldn't detect a "mail bombing" of this sort. You'd have to go into scanning the content of the message for that -- too many messages in a given period of time that have essentially the same body content, or whatever. But what period of time? > Yes, it's a trivial case, but it's the most common case. You don't > have to go to a compute-intensive technique to deal with it. Once you start looking at the message body content, you've already started down a slippery slope. You've told the user that you will guarantee that they don't ever get mail-bombed, but as the incoming messages are more and more diverse, at what point do you decide that they really are different? What can you say to the user when you've accidentally thrown away a legitimate mail message because it seemed to be too similar to a previous one that they had received? > You are storing the reference wrong. Use an encapsulated reference, > not a hard link. Give me the file system to do that. Then port it to Solaris and NetApp filer. > That will permit the metadata operations to occur > simultaneously, instead of constraining them to occur serially, like > a link does. Hell, if it can do all this kind of magic, just give it to me for FreeBSD. Who needs softupdates, or dirprefs, or dirhash, or any of that other crap, when you've got TerrysMagicFS? > You keep saying this, and then you keep arranging the situation > (order of operations, FS backing store, networ transport protocol, > etc.) so that it's true, instead of trying to arrange them so it > isn't. No, the system is what it is. There are certain things that cannot be changed. One of them is the NetApp filers for NFS. Another is Solaris/SPARC. Another is that they absolutely, positively, do not want *ANY* source code changes made whatsoever to any of the programs or packages they are using. They're already looking funny at me for specifying that we should be using sendmail from sendmail.org as opposed to the vendor-supplied sendmail which we actually commit the mortal sin of compiling ourselves, or for actually going in and making source-code level changes to procmail (which are required if you want to use Maildir format mailboxes, or a hashed mailbox structure). I am honestly looking for advice on how I can make this work as well as I can, and you keep telling me that the only way to do anything at all is to completely rewrite the entire OS of both the client and the server. Granted, I shouldn't be looking for help on non-FreeBSD systems on a FreeBSD mailing list, but what you're giving me is not very helpful. If you want to tell me that I'm screwed and I shouldn't even bother, that's fine. But don't waste my time and yours by continuing to harangue me and stress how vitally important it is for me to completely re-invent all the protocols being used, completely re-write all the OSes and filesystems being used, etc.... > No, we are not. The transport protocols are the transport protocols, > and you are constrained to implement to the transport protocols, no > matter what else you do. But you are not constrained to depend on > rename-based two phase commits (for example), if your FS or data > store exports a transaction interface for use by applications: you > can use that transaction interface instead. At the ivory tower level, I'm sure that statement makes sense. But don't tell me to eat cake when there isn't even any bread around. > I have to say, I've personally delt with "help desk" escalations > for problems like this, and it's incredibly labor intensive. You > should always design as if you were going to have to deal with > 100,000 customers or more, so that you put yourself in a position > that manual processes will not scale, and then think about the > problem. They've got 6000 employees world-wide, and are already the worlds-largest company in their niche. They were spun off years ago by Philips, because they felt that this wasn't a core part of their business. I don't think they're ever going to get particularly large. And if they do, the architecture will scale, even if the implementation won't. On the other hand, I don't have to run or pay for the help desk operation, and when it comes to quotas, not implementing SIS, not re-writing the entire filesystem or OS for both the clients and the server, etc..., well those aren't things that my customer has paid me to do. They have set out some pretty specific guidelines as to what they want, and I have tried to give them the best possible architecture and best possible implementation that I can, within those guidelines. > Something simple like recognizing repetitive size/sender/subject > pairing on the SMTP transit server. And you would do that how? And you would make use of that information how? > Ugh. Would you, as a user, bet your comapny on that level of service? Sure. I've always tried to provide the same level of service that I would want for myself, and I can be a very demanding customer. Indeed, I am in the process of setting up my own consulting company, and I will have a co-lo, where I run my own mail server. I won't be going quite as far out for myself as I have for them, because my own personal needs aren't quite so large. But I cannot strive to provide myself any better level of service than I have done for them. > 2.8. It's not like OpenLDAP, which needs the transactioning interfaces, > it's pretty straight-forward code. From the cyrus-2.0.14 distribution, doc/install-prerequisites.html: * Berkeley DB, version 3.0.55 or higher. Berkeley DB can be obtained from Sleepycat. It is strongly recommended that libsasl be compiled with Berkeley DB support, using the same version of Berkeley DB. (If you have a Berkeley DB version mismatch, somewhat perplexing crashes result.) From the cyrus-2.0.14 distribution, README: The 2.0 release contains many new features over 1.6. No further development work will progress on 1.6 so we encourage you to work with the Cyrus 2.0 code wherever possible. Support for 1.6 will also be extremely limited. Sorry, guy. Not an option. >> Certainly, when it comes to SAMS, all this stuff is pre-compiled >> and you don't get the option of building Berkeley DB in a different >> manner, etc.... > > Yes, you end up having to compile things yourself. With SAMS, you don't get that option. That's about the only negative thing that I can say about it. > If you have control over the clients, you can avoid making update > requests. If you have no control over either, well, "Bad news, Clem". I have no control over the clients. > Though... you *could* allow any of the replicas to accept and queue > on behalf of the primary, but then deliver only to the primary; > presumably you'd be able to replace a primary in 7 days. Doesn't help. The front-end servers will accept and queue on behalf of the users. I don't need yet another level of queueing on the back-end servers as well. -- Brad Knowles, "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania. GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++) To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-chat" in the body of the message