From owner-freebsd-arch Mon Mar 3 23:17:41 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 41DF837B401 for ; Mon, 3 Mar 2003 23:17:36 -0800 (PST) Received: from stork.mail.pas.earthlink.net (stork.mail.pas.earthlink.net [207.217.120.188]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6239643FA3 for ; Mon, 3 Mar 2003 23:17:35 -0800 (PST) (envelope-from tlambert2@mindspring.com) Received: from pool0299.cvx40-bradley.dialup.earthlink.net ([216.244.43.44] helo=mindspring.com) by stork.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 18q6g3-0003Ys-00; Mon, 03 Mar 2003 23:17:28 -0800 Message-ID: <3E6452B4.E87BEC2@mindspring.com> Date: Mon, 03 Mar 2003 23:16:04 -0800 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Sean Chittenden Cc: Hiten Pandya , arch@FreeBSD.ORG Subject: Re: Should sendfile() to return ENOBUFS? References: <20030303224418.GU79234@perrin.int.nxad.com> <20030304001230.GC36475@unixdaemons.com> <20030304002218.GY79234@perrin.int.nxad.com> <3E641131.431A0BA8@mindspring.com> <20030304040859.GB79234@perrin.int.nxad.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4b39a621f7443ebf29d5b69aacea76e67a8438e0f32a48e08350badd9bab72f9c350badd9bab72f9c Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Sean Chittenden wrote: > > 2) You need to be damn sure you can guarantee a correct update > > of *sbytes; I believe this is very difficult in the case in > > question, which is why it blocks > > I'm not convinced of this. Have you poked through > src/sys/kern/uipc_syscalls.c? It's not that ugly/hard, nothing's > impossible with a bit of refactoring. I've done this. I've ported the -current sendfile external buffer code to FreeBSD 4.3, and again to FreeBSD 4.4, etc.. I'm rather familiar with it, actually... > > 3) If sbytes is NULL, you should probably block, even on a > > non-blocking call. The reason for this is that there is > > no way for the application to restart without *sbytes > > This degrades terribly though and if you get a spike in traffic, > degradation of performance is critical. Sendfile degrades terribly under traffic spikes, period. One thing sendfile fails to do is honor the so_snd size limits that other things honor, as it goes through its loop. Technically, sendfile should be an async interface so it can lock the so_snd window to the buffers-in-flight. If it did this, it could preallocate the memory at the time it's called, and then reuse it internally until the operation has been completed. Then it could write it's completion status. > Going from a non-blocking application to a blocking call simply > because of high use is murderous and is justification in itself > enough for me to move away from the really nice zero-copy sockets > that sendfile() affords me, back to the sluggish writev() syscall. For POP3 and SMTP, and most other RFC822 derived protocols, you end up having to store your files with line delimiters, instead of . For FTP, you can only do binary transfers, etc.. The sendfile interface is just a bad design, period. That it performs badly under load is just icing on the cake. > If a system is busy, it's stuck in an sfbufa state and blocks the > server from servicing thousands of connections. I understand. > The symptoms are common and synonymous with mbuf exhaustion or any > other kind of buffer exhaustion... my point is that having this > block is the worst way that sendfile() can degrade under > high performance. Djikstra: preallocate your resources, and you do not have this problem. In this case, set your tunable high enough that even were you to use up all your available buffers, there are NSFBUFS available... and the problem goes away. > > 4) If you get rid of the blocking with (sbytes == NULL), you > > better add a BUGS section to the manual page. > > There's nothing that says that sbytes can't be set to 0 if errno is > EAGAIN, in fact, that's what it does right now. If you send a non-zero amount of data, you need to know exactly what was sent, in order to maintain connection state data pipe coherency between the user space application requesting the send on a connection basis, and the kernel space code that has done a partial send. Given your statement, though, we can say pretty surely that this is HTTP... Any other approach, and your only option to recover your state is to close the connection and make the client retry. So in the situation where the resources are limited, you end up *increasing* the overall load by, instead of satisfying a client with a single request, converting that into 5 requests, all of which fail to deliver the data to the client. > > Frankly I'm really surprised that you are blocking in this place; it > > indicates an inability to get a page in the kernel map in the sf > > zone, which, in turn, indicates that your NSFBUFS is improperly > > tuned; if you are using sendfile, and tune up your other kernel > > parameters for your system, don't forget NSFBUFS. > > Well, it's set to 65535 at the moment. How much higher you think I > should set it? :-] At some point I have to say, "it's high enough and > I just need to get the application to degrade gracefully." :-] The sendfile interface does not degrade gracefully, period. Even if you dealt with the issue by setting *sbytes correctly in all cases, and returning the right value to use space, you've increased the number of system calls, potentially significantly. So even if you "correct" the behaviour, your degradation is going to be exponential. One potential solution is to go to using KSE's, so that the blocking context is not your whole process. This allows you to write the server as multithreaded. Another is to do what Apache does, and run processes per connection. My recommendation was (and is): get a sufficiently large NSFBUFS in the first place, so you never encounter the situation that results in the non-graceful degradation. > > While you could *technically* make sf_buf_alloc() non-blocking, in > > general this would be a bad idea, given that the one place it's > > called is in in interior loop that can be the subject of a "goto" > > (so it's an embedded interior loop) in sendfile() itself. I think > > it would be very hard to satisfy #2, to allow it to be restartable > > by the application, in the face of failure, and since *sbytes is not > > a mandatory parameter, likely your application will end up barfing > > (e.g. sending partial FTP files or HTML documents down, with no way > > to recover from a failure, other than closing the client socket, and > > hoping the client can recover). > > Frankly, if a developer is stupid enough to pass in NULL for sbytes, > they get what they deserve. Returning -1 and setting errno to EAGAIN > in the event that there aren't any sf_buf's available isn't what I'd > call the programming exercise of the decade. :-P Nevertheless, the sendfile interface appears to allow this situation; it is a flaw in the API design. There are two ways to handle it: 1) Any time you call sendfile on a non-blocking fd with (sbytes == NULL), *immediately* return EPARM or a similar error 2) Allow the API to be inconsistent, and then have the OS accept the blame for broken applications, since it permits known broken parameter values > > In a "flash crowd" case on an HTTP server, this basically means that > > you will continuously get retries, and the situation will worsen, > > exponentially, as people retry getting the same page. In the FTP > > case, or some other protocol without automatic retry on session > > abandonment, of course, it will be fatal. > > Hrm, let me redefine "fatal" as "changing the behavior of a system > call to go from returning in less than 0.001ms, to returning in 2-15s > for every connection when trying to make over ~500K sendfile(2) calls > a second." I'd call that a catastrophic failure to degrade > successfully. -sc "Fatal" in this context was intended to imply "the clients do not get their data, and get partial data and closed descriptors, instead, thus breaking the contract between the client and the server". And yeah, either way you look at it, it's a failure to degrade gracefully... once again: the easy fix is to not put your system in that position in the first place. A less easy approach would be to maintain a count of active sendfile instances in your application, and queue up requests above some high watermark, rather than making system calls. Another would be to hard limit the number of client connections you allow at once, etc.. The east ugly of these (to my mind) is to not overcommit NSFBUFS in the first place by always having at least 1 more than you could ever need, preconfigured into the kernel. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message