From owner-freebsd-arch@FreeBSD.ORG Wed May 28 08:59:31 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id DE5DA37B40A for ; Wed, 28 May 2003 08:59:31 -0700 (PDT) Received: from stork.mail.pas.earthlink.net (stork.mail.pas.earthlink.net [207.217.120.188]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3CBE843F93 for ; Wed, 28 May 2003 08:59:31 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from user-38ldt4s.dialup.mindspring.com ([209.86.244.156] helo=mindspring.com) by stork.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 19L3Kp-0007XY-00; Wed, 28 May 2003 08:59:28 -0700 Message-ID: <3ED4DC93.42A44D09@mindspring.com> Date: Wed, 28 May 2003 08:58:11 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Igor Sysoev References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4ebd4ec01ae3a2897914024d8cf60a40e3ca473d225a0f487350badd9bab72f9c350badd9bab72f9c cc: arch@freebsd.org Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 May 2003 15:59:32 -0000 Igor Sysoev wrote: > On Tue, 27 May 2003, Terry Lambert wrote: > > NOTE: TCP_NOPUSH *specifically* mentions writev(2), which, like > > sendfile(2), takes data from multiple discrete buffers and sends > > it. > > I agree with you, but writev() takes data from the memory while > sendfile() can read it from a disk - it's one of the cause of the partially > filled packets in the middle of the file stream. TF_NOPUSH (internal > TCP_NOPUSH representation) can be used to avoid it. The writev() takes it from memory... and sendfile() takes it from memory. The only difference is whether the memory that is referred to by the mbuf headers is from the program's address space, and copied into an mbuf in the kernel's address space, or is an external mbuf referred to by an sf_buf, and in the kernel's address space because it's in the buffer cache. > Suppose you have one page in VM and you need to read the next pages > from a disk. What would you do ? If you send this single page - it > will go as 1460, 1460 and 1176. Only if I set stupidly set TCP_NODELAY on the socket, which I have to go out of my way to do. If I can't read the next block off the disk, wire it, and set up an EXT_SFBUF for it in 2MSL, there's something seriously wrong in the OS. 2MSL is a *very* long time on modern systems. The "problem" is the call to: error = (*so->so_proto->pr_usrreqs->pru_send)(so, 0, m, 0, 0, td); in sendfile(2) in uipc_syscalls.c, in the case where it's not true that: (sbspace(&so->so_snd) >= so->so_snd.sb_lowat) ...or, more specifically, that it's effectively sent TCP_NODELAY. You'll notice that the page is only unwired when the external mbuf is freed. -- Terry