From owner-freebsd-arch@FreeBSD.ORG Sun May 25 01:31:01 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1680C37B401; Sun, 25 May 2003 01:31:01 -0700 (PDT) Received: from mx.nsu.ru (mx.nsu.ru [212.192.164.5]) by mx1.FreeBSD.org (Postfix) with ESMTP id D7EEA43F75; Sun, 25 May 2003 01:30:59 -0700 (PDT) (envelope-from danfe@regency.nsu.ru) Received: from mail by mx.nsu.ru with drweb-scanned (Exim 3.36 #1 (Debian)) id 19JqvO-0004V6-00; Sun, 25 May 2003 15:32:14 +0700 Received: from regency.nsu.ru ([193.124.210.26]) by mx.nsu.ru with esmtp (Exim 3.36 #1 (Debian)) id 19JqvG-0004UW-00; Sun, 25 May 2003 15:32:07 +0700 Received: from regency.nsu.ru (localhost [127.0.0.1]) by regency.nsu.ru (8.12.8/8.12.8) with ESMTP id h4P8UnM5096687; Sun, 25 May 2003 15:30:49 +0700 (NOVST) (envelope-from danfe@regency.nsu.ru) Received: (from danfe@localhost) by regency.nsu.ru (8.12.8/8.12.8/Submit) id h4P8Um24096686; Sun, 25 May 2003 15:30:48 +0700 (NOVST) Date: Sun, 25 May 2003 15:30:48 +0700 From: Alexey Dokuchaev To: Hiten Pandya Message-ID: <20030525083048.GA96007@regency.nsu.ru> References: <20030525004855.GA67985@perrin.int.nxad.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030525004855.GA67985@perrin.int.nxad.com> User-Agent: Mutt/1.4i X-Envelope-To: hmp@freebsd.org, arch@freebsd.org, des@freebsd.org X-Bogosity: No, tests=bogofilter, spamicity=0.000000, version=0.11.1.4 X-Spam-Status: No, hits=-134.0 required=5.0 tests=BOGOFILTER_TEST_PASS,EMAIL_ATTRIBUTION,IN_REP_TO, QUOTED_EMAIL_TEXT,REFERENCES,REPLY_WITH_QUOTES, USER_AGENT_MUTT,USER_IN_WHITELIST version=2.50 X-Spam-Level: X-Spam-Checker-Version: SpamAssassin 2.50 (1.173-2003-02-20-exp) cc: arch@freebsd.org cc: des@freebsd.org Subject: Re: scheduler determination X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 25 May 2003 08:31:01 -0000 On Sat, May 24, 2003 at 05:48:55PM -0700, Hiten Pandya wrote: > Hi Gang. > > It would be really nice if there was a way to find out the name of > the current scheduler in the system. I have attached a patch which adds > a sysctl called kern.sched.name, which does just that. > > Comments and suggestions welcome. Apart from what Jeff had already suggested, I think ``kern.scheduler'' is somewhat a better name. 8-) ./danfe From owner-freebsd-arch@FreeBSD.ORG Sun May 25 11:20:36 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id EC84C37B401; Sun, 25 May 2003 11:20:36 -0700 (PDT) Received: from mail.auriga.ru (mail.auriga.ru [80.240.102.102]) by mx1.FreeBSD.org (Postfix) with ESMTP id E61B843F3F; Sun, 25 May 2003 11:20:33 -0700 (PDT) (envelope-from alex.neyman@auriga.ru) Received: from mail.loopback.interface ([127.0.0.1] helo=vagabond.auriga.ru) by mail.auriga.ru with esmtp (Exim 4.14) id 19K09Q-0005DU-Pa; Sun, 25 May 2003 22:23:20 +0400 From: Alexey Neyman Organization: Auriga, Inc. To: Alexey Dokuchaev , Hiten Pandya Date: Sun, 25 May 2003 22:20:26 +0400 User-Agent: KMail/1.5.1 References: <20030525004855.GA67985@perrin.int.nxad.com> <20030525083048.GA96007@regency.nsu.ru> In-Reply-To: <20030525083048.GA96007@regency.nsu.ru> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200305252220.26861.alex.neyman@auriga.ru> cc: arch@freebsd.org Subject: Re: scheduler determination X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 25 May 2003 18:20:37 -0000 Hi, there! On Sunday 25 May 2003 12:30, Alexey Dokuchaev wrote: AD> Apart from what Jeff had already suggested, I think ``kern.scheduler'' AD> is somewhat a better name. 8-) OTOH, the scheduler may have some more things to report. It may even have runtime tunables - they could be added later to this kern.sched.* namespace. Regards, Alexey. -- A quoi ca sert d'etre sur la terre Si c'est pour faire nos vies a genoux? From owner-freebsd-arch@FreeBSD.ORG Sun May 25 21:15:33 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 686C737B401 for ; Sun, 25 May 2003 21:15:33 -0700 (PDT) Received: from magic.adaptec.com (magic-mail.adaptec.com [208.236.45.100]) by mx1.FreeBSD.org (Postfix) with ESMTP id F1E4443F3F for ; Sun, 25 May 2003 21:15:32 -0700 (PDT) (envelope-from scott_long@btc.adaptec.com) Received: from redfish.adaptec.com (redfish.adaptec.com [162.62.50.11]) by magic.adaptec.com (8.11.6/8.11.6) with ESMTP id h4Q4B1Z26707 for ; Sun, 25 May 2003 21:11:01 -0700 Received: from btc.adaptec.com (hollin.btc.adaptec.com [10.100.253.56]) by redfish.adaptec.com (8.8.8p2+Sun/8.8.8) with ESMTP id VAA19816 for ; Sun, 25 May 2003 21:15:27 -0700 (PDT) Message-ID: <3ED194D8.4040706@btc.adaptec.com> Date: Sun, 25 May 2003 22:15:20 -0600 From: Scott Long User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.3) Gecko/20030425 X-Accept-Language: en-us, en MIME-Version: 1.0 To: arch@freebsd.org Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Subject: New bootloader! X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 May 2003 04:15:33 -0000 All, I've written a nice little bootloader front-end script that allows one to enable/disable acpi, boot single users, etc. My primary motivation was to allow users to easily disable ACPI, since so many problems are popping up these days with it. Making the disabling of ACPI sticky is not that hard to do, probably a few lines of sh scripts in rcNG, but I haven't gotten to it yet. To use the new loader, grab it from http://people.freebsd.org/~scottl/beastie.4th and place it in /boot. Then copy frames.4th and screen.4th from /usr/share/examples/bootforth into /boot. Then edit /boot/loader.rc with the following diff. WARNING: if any mistakes are made and the script cannot start, it might leave your loader unable to load the kernel. In case of problems, either keep a fixit floppy handy, or recompile your kernel with embedded hints (so that things like the console will work) and load it from the boot1 loader. I'm really hoping to have this be on at least the i386 bootcd for 5.1, so any feedback is appreciated. Scott --- loader.rc.orig Sun May 25 22:10:03 2003 +++ loader.rc Sun May 25 22:10:47 2003 @@ -12,3 +12,14 @@ \ Unless set otherwise, autoboot is automatic at this point +\ Load our little menu +s" /boot/beastie.4th" O_RDONLY fopen dup fload fclose + +\ Initialize loader.4th stuff + +\ cr cr .( Initializing loader.4th...) +initialize drop + +\ Show the menu +\ cr +beastie-start From owner-freebsd-arch@FreeBSD.ORG Mon May 26 06:04:38 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B1FAE37B401; Mon, 26 May 2003 06:04:38 -0700 (PDT) Received: from mx.nsu.ru (mx.nsu.ru [212.192.164.5]) by mx1.FreeBSD.org (Postfix) with ESMTP id 97D0E43FCB; Mon, 26 May 2003 06:04:37 -0700 (PDT) (envelope-from danfe@regency.nsu.ru) Received: from mail by mx.nsu.ru with drweb-scanned (Exim 3.36 #1 (Debian)) id 19KHfv-0002Dy-00; Mon, 26 May 2003 20:06:03 +0700 Received: from regency.nsu.ru ([193.124.210.26]) by mx.nsu.ru with esmtp (Exim 3.36 #1 (Debian)) id 19KHfP-0002A9-00; Mon, 26 May 2003 20:05:31 +0700 Received: from regency.nsu.ru (localhost [127.0.0.1]) by regency.nsu.ru (8.12.8/8.12.8) with ESMTP id h4QD3wM5047405; Mon, 26 May 2003 20:03:58 +0700 (NOVST) (envelope-from danfe@regency.nsu.ru) Received: (from danfe@localhost) by regency.nsu.ru (8.12.8/8.12.8/Submit) id h4QD3stK047402; Mon, 26 May 2003 20:03:54 +0700 (NOVST) Date: Mon, 26 May 2003 20:03:53 +0700 From: Alexey Dokuchaev To: Alexey Neyman Message-ID: <20030526130353.GA47084@regency.nsu.ru> References: <20030525004855.GA67985@perrin.int.nxad.com> <20030525083048.GA96007@regency.nsu.ru> <200305252220.26861.alex.neyman@auriga.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200305252220.26861.alex.neyman@auriga.ru> User-Agent: Mutt/1.4i X-Envelope-To: alex.neyman@auriga.ru, hmp@freebsd.org, arch@freebsd.org X-Bogosity: No, tests=bogofilter, spamicity=0.000000, version=0.11.1.4 X-Spam-Status: No, hits=-130.8 required=5.0 tests=BOGOFILTER_TEST_PASS,EMAIL_ATTRIBUTION,IN_REP_TO, REFERENCES,REPLY_WITH_QUOTES,USER_AGENT_MUTT, USER_IN_WHITELIST version=2.50 X-Spam-Level: X-Spam-Checker-Version: SpamAssassin 2.50 (1.173-2003-02-20-exp) cc: Hiten Pandya cc: arch@freebsd.org Subject: Re: scheduler determination X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 May 2003 13:04:39 -0000 On Sun, May 25, 2003 at 10:20:26PM +0400, Alexey Neyman wrote: > Hi, there! > > On Sunday 25 May 2003 12:30, Alexey Dokuchaev wrote: > AD> Apart from what Jeff had already suggested, I think ``kern.scheduler'' > AD> is somewhat a better name. 8-) > > OTOH, the scheduler may have some more things to report. It may even have > runtime tunables - they could be added later to this kern.sched.* namespace. That is true; all that I'm saying is that that `name' part does not sound nice to [my] ear. I tend to believe that ``kern.sched.{type,flavor, whatever} serves us better since `name' does not really state anything than just a plain "name", while generally one would want something more fundamental for this type of identification IMHO. ./danfe From owner-freebsd-arch@FreeBSD.ORG Mon May 26 10:41:53 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 361F237B401 for ; Mon, 26 May 2003 10:41:53 -0700 (PDT) Received: from park.rambler.ru (park.rambler.ru [81.19.64.101]) by mx1.FreeBSD.org (Postfix) with ESMTP id AC4DF43F85 for ; Mon, 26 May 2003 10:41:51 -0700 (PDT) (envelope-from is@rambler-co.ru) Received: from is.park.rambler.ru (is.park.rambler.ru [81.19.64.102]) by park.rambler.ru (8.12.6/8.12.6) with ESMTP id h4QHfomF098110 for ; Mon, 26 May 2003 21:41:50 +0400 (MSD) Date: Mon, 26 May 2003 21:41:50 +0400 (MSD) From: Igor Sysoev X-Sender: is@is To: arch@freebsd.org Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Subject: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 May 2003 17:41:53 -0000 sendfile(2) now has two drawbacks: 1) it always sends the header, the file and the trailer in the separate packets even their sizes allow to place all them in one packet. For example the typical HTTP response header is less then an ethernet packet and sendfile() sends it in first small packet. 2) often enough it sends 4K page in three packets: 1460, 1460 and 1176 bytes. When I turn TCP_NOPUSH on just before sendfile() then it sends the header and the first part of the file in one 1460 bytes packet. Besides it sends file pages in the full ethernet 1460 bytes packets. When sendfile() completed or returned EAGAIN (I use non-blocking sockets) I turn TCP_NOPUSH off and the remaining file part is flushed to client. Without turing off the remaining file part is delayed for 5 seconds. Surprisingly that the turning TCP_NOPUSH off flushes the file part (I did not try the trailer) on FreeBSD 4.2 and 4.3 without src/sys/netinet/tcp_usrreq.c fixes 1.53 and 1.51.2.11. I looked in src/sys/netinet/tcp_usrreq.c to learn how to turn TCP_NOPUSH on/off. It's seems it's as simple as: struct inpcb *inp; struct tcpcb *tp; inp = sotoinpcb(so); tp = intotcpcb(inp); turn on: tp->t_flags |= TF_NOPUSH; turn off: tp->t_flags &= ~TF_NOPUSH; error = tcp_output(tp); So here is a proposal. We can introduce a sendfile(2) flag, i.e. SF_NOPUSH that will turn TF_NOPUSH on before the sending and turn it off just before return. It allows to save two syscalls on each sendfile() call and it's especially useful with non-blocking sockets - they can cause many sendfile() calls. Igor Sysoev http://sysoev.ru/en/ From owner-freebsd-arch@FreeBSD.ORG Mon May 26 13:17:50 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2CB0037B401 for ; Mon, 26 May 2003 13:17:50 -0700 (PDT) Received: from cirb503493.alcatel.com.au (c18609.belrs1.nsw.optusnet.com.au [210.49.80.204]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9236843F85 for ; Mon, 26 May 2003 13:17:48 -0700 (PDT) (envelope-from peterjeremy@optushome.com.au) Received: from cirb503493.alcatel.com.au (localhost.alcatel.com.au [127.0.0.1])h4QKHkp9043994; Tue, 27 May 2003 06:17:46 +1000 (EST) (envelope-from jeremyp@cirb503493.alcatel.com.au) Received: (from jeremyp@localhost) by cirb503493.alcatel.com.au (8.12.8/8.12.8/Submit) id h4QKHf4c043993; Tue, 27 May 2003 06:17:41 +1000 (EST) Date: Tue, 27 May 2003 06:17:41 +1000 From: Peter Jeremy To: Igor Sysoev Message-ID: <20030526201740.GA22178@cirb503493.alcatel.com.au> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.1i cc: arch@freebsd.org Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 May 2003 20:17:50 -0000 On Mon, May 26, 2003 at 09:41:50PM +0400, Igor Sysoev wrote: >sendfile(2) now has two drawbacks: [IP frames are not always full] ... >When I turn TCP_NOPUSH on just before sendfile() then it sends the header >and the first part of the file in one 1460 bytes packet. >Besides it sends file pages in the full ethernet 1460 bytes packets. >When sendfile() completed or returned EAGAIN (I use non-blocking sockets) >I turn TCP_NOPUSH off and the remaining file part is flushed to client. >Without turing off the remaining file part is delayed for 5 seconds. ... >So here is a proposal. We can introduce a sendfile(2) flag, i.e. SF_NOPUSH >that will turn TF_NOPUSH on before the sending and turn it off just >before return. It allows to save two syscalls on each sendfile() call >and it's especially useful with non-blocking sockets - they can cause many >sendfile() calls. I'm less certain of the benefits of this - particularly in the non- blocking case. As I understand your proposal, your patch would turn off TF_NOPUSH just before returning EAGAIN. At this point, the TCP send buffer is full so packets should start being sent immediately. The last data in the send buffer may not comprise a complete frame so it should not be sent, but left queued to be merged with the next sendfile(2). Once SO_SNDLOWAT bytes are available in the send buffer, the socket will become writable, allowing a further sendfile(2) call. As long as SO_SNDLOWAT is at least one frame smaller than SO_SNDBUF, there should not be any send delay caused by TF_NOPUSH being set. I believe TF_NOPUSH should be set at the beginning of a transaction (or when the socket is opened) and cleared at the end of a transaction (or implicitly by close()ing the socket). Peter From owner-freebsd-arch@FreeBSD.ORG Mon May 26 15:47:01 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1D01E37B401 for ; Mon, 26 May 2003 15:47:01 -0700 (PDT) Received: from harmony.village.org (rover.bsdimp.com [204.144.255.66]) by mx1.FreeBSD.org (Postfix) with ESMTP id 626BD43F75 for ; Mon, 26 May 2003 15:47:00 -0700 (PDT) (envelope-from imp@bsdimp.com) Received: from localhost (warner@rover2.village.org [10.0.0.1]) by harmony.village.org (8.12.8/8.12.3) with ESMTP id h4QMkvkA050348; Mon, 26 May 2003 16:46:57 -0600 (MDT) (envelope-from imp@bsdimp.com) Date: Mon, 26 May 2003 16:46:50 -0600 (MDT) Message-Id: <20030526.164650.10294010.imp@bsdimp.com> To: scott_long@btc.adaptec.com From: "M. Warner Losh" In-Reply-To: <3ED194D8.4040706@btc.adaptec.com> References: <3ED194D8.4040706@btc.adaptec.com> X-Mailer: Mew version 2.1 on Emacs 21.3 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit cc: arch@freebsd.org Subject: Re: New bootloader! X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 May 2003 22:47:01 -0000 Do you envision this loader script as something that would be part of the installation process, or something people would use everyday? Warner From owner-freebsd-arch@FreeBSD.ORG Mon May 26 16:03:59 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1DA5637B401 for ; Mon, 26 May 2003 16:03:59 -0700 (PDT) Received: from magic.adaptec.com (magic-mail.adaptec.com [208.236.45.100]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9EFC743FA3 for ; Mon, 26 May 2003 16:03:58 -0700 (PDT) (envelope-from scott_long@btc.adaptec.com) Received: from redfish.adaptec.com (redfish.adaptec.com [162.62.50.11]) by magic.adaptec.com (8.11.6/8.11.6) with ESMTP id h4QMxNZ05414; Mon, 26 May 2003 15:59:24 -0700 Received: from btc.adaptec.com (hollin.btc.adaptec.com [10.100.253.56]) by redfish.adaptec.com (8.8.8p2+Sun/8.8.8) with ESMTP id QAA29418; Mon, 26 May 2003 16:03:51 -0700 (PDT) Message-ID: <3ED29D4E.8070606@btc.adaptec.com> Date: Mon, 26 May 2003 17:03:42 -0600 From: Scott Long User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.3) Gecko/20030425 X-Accept-Language: en-us, en MIME-Version: 1.0 To: "M. Warner Losh" References: <3ED194D8.4040706@btc.adaptec.com> <20030526.164650.10294010.imp@bsdimp.com> In-Reply-To: <20030526.164650.10294010.imp@bsdimp.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit cc: arch@freebsd.org Subject: Re: New bootloader! X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 May 2003 23:03:59 -0000 M. Warner Losh wrote: > Do you envision this loader script as something that would be part of > the installation process, or something people would use everyday? > > Warner I think that it has the potential to be an everyday feature. Disabling it is quite easy; just remove the patch that I posted in the previous message (or just remove the line that calls the entry point). I'm looking for it to be a start to something that enables one to easily load arbitrary kernels, control device/kernel variables, etc. Scott From owner-freebsd-arch@FreeBSD.ORG Mon May 26 17:21:00 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 433AC37B401 for ; Mon, 26 May 2003 17:21:00 -0700 (PDT) Received: from stork.mail.pas.earthlink.net (stork.mail.pas.earthlink.net [207.217.120.188]) by mx1.FreeBSD.org (Postfix) with ESMTP id A260B43F93 for ; Mon, 26 May 2003 17:20:59 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from dialup-67.30.96.194.dial1.sanjose1.level3.net ([67.30.96.194] helo=mindspring.com) by stork.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 19KSD1-0002oY-00; Mon, 26 May 2003 17:20:56 -0700 Message-ID: <3ED2AF18.F5EB4FA5@mindspring.com> Date: Mon, 26 May 2003 17:19:36 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Igor Sysoev References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4f756173a69883f238a60c0f62127d40c387f7b89c61deb1d350badd9bab72f9c350badd9bab72f9c cc: arch@freebsd.org Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 May 2003 00:21:00 -0000 Igor Sysoev wrote: > sendfile(2) now has two drawbacks: Only two? ;^). > 1) it always sends the header, the file and the trailer in the separate > packets even their sizes allow to place all them in one packet. > For example the typical HTTP response header is less then an ethernet > packet and sendfile() sends it in first small packet. > > 2) often enough it sends 4K page in three packets: 1460, 1460 and 1176 bytes. > > When I turn TCP_NOPUSH on just before sendfile() then it sends the header > and the first part of the file in one 1460 bytes packet. > Besides it sends file pages in the full ethernet 1460 bytes packets. > When sendfile() completed or returned EAGAIN (I use non-blocking sockets) > I turn TCP_NOPUSH off and the remaining file part is flushed to client. > Without turing off the remaining file part is delayed for 5 seconds. OK, basically what is happening is that the data is being pushed out as it's made available, and it's being made available in seperate chunks. The small file case is not really the optimum case for using the sendfile interface at all. The problem here is that you have a send queue depth limit on the sockets, and it's expected that the file will end up exceeding this, so it's going to get buffered anyway, due to a buffer size limit stall on the send side of the socket. > So here is a proposal. We can introduce a sendfile(2) flag, i.e. SF_NOPUSH > that will turn TF_NOPUSH on before the sending and turn it off just > before return. It allows to save two syscalls on each sendfile() call > and it's especially useful with non-blocking sockets - they can cause many > sendfile() calls. I don't see this as being terrifically useful; small files should probably just be mapped and written; the copy expense is still there for the headers and trailers, no matter what, and the file size itself is very small overhead, relatively speaking, for files small enough for this to be an issue. I also think your headers and trailers are very small, if they are fitting with the file contents in a single packet. I think this is atypical. On the other hand, if you want to add a flag for this, I say "knock yourself out" -- go ahead and add the flag; it's not really going to benefit you that much, but it's not going to really hurt any of the rest of us either, so there's really no reason to make you not do it. 8-). BTW: if you go ahead with this, you should verify that it also works for the trailers, etc., and you should probably skip it if you headers > transmit queue depth, or file size > transmit queue depth, or trailers > transmit queue depth. -- Terry From owner-freebsd-arch@FreeBSD.ORG Tue May 27 00:57:23 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 50B5537B401 for ; Tue, 27 May 2003 00:57:23 -0700 (PDT) Received: from park.rambler.ru (park.rambler.ru [81.19.64.101]) by mx1.FreeBSD.org (Postfix) with ESMTP id D65C643F75 for ; Tue, 27 May 2003 00:57:21 -0700 (PDT) (envelope-from is@rambler-co.ru) Received: from is.park.rambler.ru (is.park.rambler.ru [81.19.64.102]) by park.rambler.ru (8.12.6/8.12.6) with ESMTP id h4R7vKmF012670; Tue, 27 May 2003 11:57:20 +0400 (MSD) Date: Tue, 27 May 2003 11:57:20 +0400 (MSD) From: Igor Sysoev X-Sender: is@is To: Peter Jeremy In-Reply-To: <20030526201740.GA22178@cirb503493.alcatel.com.au> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: arch@freebsd.org Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 May 2003 07:57:23 -0000 On Tue, 27 May 2003, Peter Jeremy wrote: > On Mon, May 26, 2003 at 09:41:50PM +0400, Igor Sysoev wrote: > >sendfile(2) now has two drawbacks: > [IP frames are not always full] > ... > >When I turn TCP_NOPUSH on just before sendfile() then it sends the header > >and the first part of the file in one 1460 bytes packet. > >Besides it sends file pages in the full ethernet 1460 bytes packets. > >When sendfile() completed or returned EAGAIN (I use non-blocking sockets) > >I turn TCP_NOPUSH off and the remaining file part is flushed to client. > >Without turing off the remaining file part is delayed for 5 seconds. > ... > >So here is a proposal. We can introduce a sendfile(2) flag, i.e. SF_NOPUSH > >that will turn TF_NOPUSH on before the sending and turn it off just > >before return. It allows to save two syscalls on each sendfile() call > >and it's especially useful with non-blocking sockets - they can cause many > >sendfile() calls. > > I'm less certain of the benefits of this - particularly in the non- > blocking case. As I understand your proposal, your patch would turn > off TF_NOPUSH just before returning EAGAIN. At this point, the TCP > send buffer is full so packets should start being sent immediately. > The last data in the send buffer may not comprise a complete frame so > it should not be sent, but left queued to be merged with the next > sendfile(2). Once SO_SNDLOWAT bytes are available in the send buffer, > the socket will become writable, allowing a further sendfile(2) call. > As long as SO_SNDLOWAT is at least one frame smaller than SO_SNDBUF, > there should not be any send delay caused by TF_NOPUSH being set. > > I believe TF_NOPUSH should be set at the beginning of a transaction > (or when the socket is opened) and cleared at the end of a transaction > (or implicitly by close()ing the socket). I thought about it more and I agree with you. TF_NOPUSH should be turned on at the start of a transaction and turned off at the end of a transaction. So I think there should be two flags: SF_NOPUSH - it turns TF_NOPUSH on before the sending. It's cheap: s = splnet(); inp = sotoinpcb(so); if (inp != NULL) { tp = intotcpcb(inp); tp->t_flags |= TF_NOPUSH; } splx(s); SF_PUSH - it turns TF_NOPUSH off after the sending has been completed. If the sending returned EAGAIN then TF_NOPUSH would not be touched. It's cheap too especially if the send buffer has enough data to fill one MSS: s = splnet(); inp = sotoinpcb(so); if (inp != NULL) { tp = intotcpcb(inp); tp->t_flags &= ~TF_NOPUSH; if (so->so_snd.sb_cc < tp->t_maxseg) { error = tcp_output(tp); } } splx(s); Igor Sysoev http://sysoev.ru/en/ From owner-freebsd-arch@FreeBSD.ORG Tue May 27 01:25:45 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9594237B401 for ; Tue, 27 May 2003 01:25:45 -0700 (PDT) Received: from park.rambler.ru (park.rambler.ru [81.19.64.101]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9063943F3F for ; Tue, 27 May 2003 01:25:44 -0700 (PDT) (envelope-from is@rambler-co.ru) Received: from is.park.rambler.ru (is.park.rambler.ru [81.19.64.102]) by park.rambler.ru (8.12.6/8.12.6) with ESMTP id h4R8PhmF013149; Tue, 27 May 2003 12:25:43 +0400 (MSD) Date: Tue, 27 May 2003 12:25:43 +0400 (MSD) From: Igor Sysoev X-Sender: is@is To: Terry Lambert In-Reply-To: <3ED2AF18.F5EB4FA5@mindspring.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: arch@freebsd.org Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 May 2003 08:25:45 -0000 On Mon, 26 May 2003, Terry Lambert wrote: > Igor Sysoev wrote: > > sendfile(2) now has two drawbacks: > > Only two? ;^). No, I know some other drawbacks but I do not see now the ways to resolve them. > > So here is a proposal. We can introduce a sendfile(2) flag, i.e. SF_NOPUSH > > that will turn TF_NOPUSH on before the sending and turn it off just > > before return. It allows to save two syscalls on each sendfile() call > > and it's especially useful with non-blocking sockets - they can cause many > > sendfile() calls. > > I don't see this as being terrifically useful; small files > should probably just be mapped and written; the copy expense > is still there for the headers and trailers, no matter what, > and the file size itself is very small overhead, relatively > speaking, for files small enough for this to be an issue. FreeBSD 4.x has no zero_copy(9) so mmap()ed files would be copied. sendfile() allows to avoid this copy. By the way what do you call by small files ? 4K, 30K or 100K ? > I also think your headers and trailers are very small, if > they are fitting with the file contents in a single packet. > I think this is atypical. If I ask Google: ---- HEAD /images/hp0.gif HTTP/1.0 Host: www.google.com ---- it will return me 230 bytes: ---- HTTP/1.0 200 OK Connection: Keep-Alive Date: Tue, 27 May 2003 08:10:59 GMT Content-Type: image/gif Last-Modified: Tue, 22 Apr 2003 22:18:49 GMT Expires: Sun, 17 Jan 2038 19:14:07 GMT Content-length: 4277 Server: GWS/2.0 ---- and it's the typical HTTP header of the static response (that can be handled with sendfile()). > On the other hand, if you want to add a flag for this, I say > "knock yourself out" -- go ahead and add the flag; it's not > really going to benefit you that much, but it's not going to > really hurt any of the rest of us either, so there's really > no reason to make you not do it. 8-). > > BTW: if you go ahead with this, you should verify that it > also works for the trailers, etc., and you should probably > skip it if you headers > transmit queue depth, or file size > > transmit queue depth, or trailers > transmit queue depth. Currently sendfile() can send the file in not full packets even file is bigger then the transmit queue depth. It can send it in 2 x 1460 + 1176 or 5 x 1460 + 892 packets. Igor Sysoev http://sysoev.ru/en/ From owner-freebsd-arch@FreeBSD.ORG Tue May 27 02:54:15 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5B33D37B401 for ; Tue, 27 May 2003 02:54:15 -0700 (PDT) Received: from bluejay.mail.pas.earthlink.net (bluejay.mail.pas.earthlink.net [207.217.120.218]) by mx1.FreeBSD.org (Postfix) with ESMTP id BED7E43F75 for ; Tue, 27 May 2003 02:54:14 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from user-2ivfjqh.dialup.mindspring.com ([165.247.207.81] helo=mindspring.com) by bluejay.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 19Kb9h-0004w4-00; Tue, 27 May 2003 02:54:05 -0700 Message-ID: <3ED33487.59C03DA9@mindspring.com> Date: Tue, 27 May 2003 02:48:55 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Igor Sysoev References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4dd3c9063920c09d43fc1e9f5b340bacea7ce0e8f8d31aa3f350badd9bab72f9c350badd9bab72f9c cc: arch@freebsd.org Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 May 2003 09:54:15 -0000 Igor Sysoev wrote: > > I don't see this as being terrifically useful; small files > > should probably just be mapped and written; the copy expense > > is still there for the headers and trailers, no matter what, > > and the file size itself is very small overhead, relatively > > speaking, for files small enough for this to be an issue. > > FreeBSD 4.x has no zero_copy(9) so mmap()ed files would be copied. > sendfile() allows to avoid this copy. It's actually "one copy", from an external mbuf reference to a VM buffer to the card, vs. two copies, from the mapped file in the process address space to mbufs, to the card. My point, though was that fr files small enough to fit in a packet, you are probably going to spend more on the headers and trailers copyin and the mbuf list assembly and the system call overhead, than you'll end up spending one the extra data copy for the very small file contents. I would be really surprised if you were able to demonstrate a measuarble performance difference which was above the noise. > By the way what do you call by small files ? 4K, 30K or 100K ? You were talking about the file and the header living in the same packet. Actually, according to the literature on the subject, the average web page returned by an HTTP server is ~8K. That would fit in a single jumbogram; but you said 14??b, which makes me believe you're using a standard MTU. Without knowing what protocol you're using, and without any knowledge of if you are using pipelining for server-to-client data, etc. (e.g. so back-to-back sendfiles are a possibility), I really don't know what to call "small". Certainly, I'd say that anything that went over 16K or 32K would take a stall in the socket send buffer, which is limited to around there, if you don't go out of your way to add RAM and increase the mbufs and use the sysctl to make the send buffer bigger. The stall would be a *much* bigger performance issue, IMO, since you will end up waiting for another NETISR (up to 100ms) plus a potential page fault on the file data, plus... etc.. So like I said: I don't think you are going to get a huge performance win out of doing it, no matter how you cook it. > > I also think your headers and trailers are very small, if > > they are fitting with the file contents in a single packet. > > I think this is atypical. > > If I ask Google: > ---- > HEAD /images/hp0.gif HTTP/1.0 > Host: www.google.com > > ---- > it will return me 230 bytes: The "HEAD" is atypical, compared to the "GET"; the full Google front page is larger than that, and consists of multiple files; assuming you support HTTP/1.1 and pipelining, it's going to be a back-to-back transfer involving multiple sendfile() calls. If you don't, then your major expense is going to be protection domain crossing and setting up and tearing down HTTP/1.0 connections, unless you set "Keep-Alive:". Even so, then if you get back something that's chunked, the only way to signal the end of it will be to drop the connection (no MIME size). > > BTW: if you go ahead with this, you should verify that it > > also works for the trailers, etc., and you should probably > > skip it if you headers > transmit queue depth, or file size > > > transmit queue depth, or trailers > transmit queue depth. > > Currently sendfile() can send the file in not full packets even > file is bigger then the transmit queue depth. It can send > it in 2 x 1460 + 1176 or 5 x 1460 + 892 packets. 3 packets vs. 6. And using HTTP/1.0, there's also the three handshake packets, SYN/SYN-ACK/ACK, and the tear-down three teardown packets, FIN/FIN-ACK/ACK (or 4), plus the ACK's for the packets you sent (should be one ACK, since that's below the TCP window size). Really: it's in the noise. Unless you are paying by packet count, you probably shouldn't care. -- Terry From owner-freebsd-arch@FreeBSD.ORG Tue May 27 03:28:13 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id BC9A737B401 for ; Tue, 27 May 2003 03:28:13 -0700 (PDT) Received: from cirb503493.alcatel.com.au (c18609.belrs1.nsw.optusnet.com.au [210.49.80.204]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8ACCB43F93 for ; Tue, 27 May 2003 03:28:12 -0700 (PDT) (envelope-from peterjeremy@optushome.com.au) Received: from cirb503493.alcatel.com.au (localhost.alcatel.com.au [127.0.0.1])h4RASAp9044708; Tue, 27 May 2003 20:28:11 +1000 (EST) (envelope-from jeremyp@cirb503493.alcatel.com.au) Received: (from jeremyp@localhost) by cirb503493.alcatel.com.au (8.12.8/8.12.8/Submit) id h4RAS6Ub044707; Tue, 27 May 2003 20:28:06 +1000 (EST) Date: Tue, 27 May 2003 20:28:06 +1000 From: Peter Jeremy To: Igor Sysoev Message-ID: <20030527102806.GC44520@cirb503493.alcatel.com.au> References: <20030526201740.GA22178@cirb503493.alcatel.com.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.1i cc: arch@freebsd.org Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 May 2003 10:28:14 -0000 On Tue, May 27, 2003 at 11:57:20AM +0400, Igor Sysoev wrote: >I thought about it more and I agree with you. TF_NOPUSH should be turned on >at the start of a transaction and turned off at the end of a transaction. > >So I think there should be two flags: >SF_NOPUSH - it turns TF_NOPUSH on before the sending. It's cheap: >SF_PUSH - it turns TF_NOPUSH off after the sending has been completed. I agree that the code appears trivial but in order to justify its inclusion, you will need to demonstrate that there is some benefit to FreeBSD to implement this code. Good justification would be: 1) The same API is implemented somewhere else (or there is agreement between multiple groups to implement it). I don't believe this functionality is implemented anywhere else and you've not provided any evidence that any other groups are considering such functionality. 2) The new feature provides significant performance benefit. In this case, I believe the overhead of calling setsockopt(2) is negligible so the performance gain would be negligible. 3) The new feature provides novel functionality that cannot be achieved using the existing API (eg kqueue(2)). The functionality is already available via setsockopt(2) so this isn't applicable. At this stage, I would suggest that you need to do better than "the change is cheap" to justify adding this feature. Can you quantify the performance benefits, or provide some other justification? Peter From owner-freebsd-arch@FreeBSD.ORG Tue May 27 03:49:39 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 23B8437B401 for ; Tue, 27 May 2003 03:49:39 -0700 (PDT) Received: from park.rambler.ru (park.rambler.ru [81.19.64.101]) by mx1.FreeBSD.org (Postfix) with ESMTP id D191443FAF for ; Tue, 27 May 2003 03:49:37 -0700 (PDT) (envelope-from is@rambler-co.ru) Received: from is.park.rambler.ru (is.park.rambler.ru [81.19.64.102]) by park.rambler.ru (8.12.6/8.12.6) with ESMTP id h4RAnZmF015243; Tue, 27 May 2003 14:49:35 +0400 (MSD) Date: Tue, 27 May 2003 14:49:35 +0400 (MSD) From: Igor Sysoev X-Sender: is@is To: Terry Lambert In-Reply-To: <3ED33487.59C03DA9@mindspring.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: arch@freebsd.org Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 May 2003 10:49:39 -0000 On Tue, 27 May 2003, Terry Lambert wrote: > Igor Sysoev wrote: > > > I don't see this as being terrifically useful; small files > > > should probably just be mapped and written; the copy expense > > > is still there for the headers and trailers, no matter what, > > > and the file size itself is very small overhead, relatively > > > speaking, for files small enough for this to be an issue. > > > > FreeBSD 4.x has no zero_copy(9) so mmap()ed files would be copied. > > sendfile() allows to avoid this copy. > > It's actually "one copy", from an external mbuf reference > to a VM buffer to the card, vs. two copies, from the mapped > file in the process address space to mbufs, to the card. I know what "zero copy" means. > My point, though was that fr files small enough to fit in a > packet, you are probably going to spend more on the headers > and trailers copyin and the mbuf list assembly and the system > call overhead, than you'll end up spending one the extra data > copy for the very small file contents. > > I would be really surprised if you were able to demonstrate a > measuarble performance difference which was above the noise. I hope I will demonstrate at least CPU usage in near future. > > By the way what do you call by small files ? 4K, 30K or 100K ? > > You were talking about the file and the header living in the > same packet. I mean that if you have 230 bytes header then sendfile() will send it in separate packet nevertheless the size of header and of the file. Something like this - 230, 1460, 1460, ... > > > I also think your headers and trailers are very small, if > > > they are fitting with the file contents in a single packet. > > > I think this is atypical. > > > > If I ask Google: > > ---- > > HEAD /images/hp0.gif HTTP/1.0 > > Host: www.google.com > > > > ---- > > it will return me 230 bytes: > > The "HEAD" is atypical, compared to the "GET"; the full Google > front page is larger than that, and consists of multiple files; > assuming you support HTTP/1.1 and pipelining, it's going to be > a back-to-back transfer involving multiple sendfile() calls. I use HEAD to show you the size of the HTTP header. The HEAD is atypical but such small HTTP header is typical. > > > BTW: if you go ahead with this, you should verify that it > > > also works for the trailers, etc., and you should probably > > > skip it if you headers > transmit queue depth, or file size > > > > transmit queue depth, or trailers > transmit queue depth. > > > > Currently sendfile() can send the file in not full packets even > > file is bigger then the transmit queue depth. It can send > > it in 2 x 1460 + 1176 or 5 x 1460 + 892 packets. > > 3 packets vs. 6. And using HTTP/1.0, there's also the three > handshake packets, SYN/SYN-ACK/ACK, and the tear-down three > teardown packets, FIN/FIN-ACK/ACK (or 4), plus the ACK's for > the packets you sent (should be one ACK, since that's below > the TCP window size). Actually 6 vs. 6 for this 8K file. But I said about another thing. Let's see 48K file and 250 bytes header. sendfile() usually sends it as 4K or 8K hunks so there are 48/8 * 6 + 1 (header) = 37 packets. But (48K + 250) / 1460 = 33 * 1460 + 1270 i.e. 34 packets. It's 8% decrease of data packets. Add here the possible retransmitions. > Really: it's in the noise. Unless you are paying by packet > count, you probably shouldn't care. So do you consider that IP fragmentation is the good thing ? Igor Sysoev http://sysoev.ru/en/ From owner-freebsd-arch@FreeBSD.ORG Tue May 27 04:25:33 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6292B37B401 for ; Tue, 27 May 2003 04:25:33 -0700 (PDT) Received: from park.rambler.ru (park.rambler.ru [81.19.64.101]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6ED5443F3F for ; Tue, 27 May 2003 04:25:32 -0700 (PDT) (envelope-from is@rambler-co.ru) Received: from is.park.rambler.ru (is.park.rambler.ru [81.19.64.102]) by park.rambler.ru (8.12.6/8.12.6) with ESMTP id h4RBPVmF015830; Tue, 27 May 2003 15:25:31 +0400 (MSD) Date: Tue, 27 May 2003 15:25:31 +0400 (MSD) From: Igor Sysoev X-Sender: is@is To: Peter Jeremy In-Reply-To: <20030527102806.GC44520@cirb503493.alcatel.com.au> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: arch@freebsd.org Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 May 2003 11:25:33 -0000 On Tue, 27 May 2003, Peter Jeremy wrote: > 2) The new feature provides significant performance benefit. In this > case, I believe the overhead of calling setsockopt(2) is negligible > so the performance gain would be negligible. I think the calling setsockopt(TCP_NOPUSH, 1) syscall has huge overhead as compared to several C operators inside sendfile(2). The turing TF_NOPUSH off has almost the same overhead as setsockopt(TCP_NOPUSH, 0) if you need to call tcp_output(tp) inside sendfile(2) and has no overhead at all if you do not need to call it. > At this stage, I would suggest that you need to do better than "the > change is cheap" to justify adding this feature. Can you quantify > the performance benefits, or provide some other justification? My point is not "the cheap change" but "the cheap overhead". Igor Sysoev http://sysoev.ru/en/ From owner-freebsd-arch@FreeBSD.ORG Tue May 27 04:31:45 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2B93037B401 for ; Tue, 27 May 2003 04:31:45 -0700 (PDT) Received: from cirb503493.alcatel.com.au (c18609.belrs1.nsw.optusnet.com.au [210.49.80.204]) by mx1.FreeBSD.org (Postfix) with ESMTP id E864143F3F for ; Tue, 27 May 2003 04:31:43 -0700 (PDT) (envelope-from peterjeremy@optushome.com.au) Received: from cirb503493.alcatel.com.au (localhost.alcatel.com.au [127.0.0.1])h4RBVgp9044767; Tue, 27 May 2003 21:31:42 +1000 (EST) (envelope-from jeremyp@cirb503493.alcatel.com.au) Received: (from jeremyp@localhost) by cirb503493.alcatel.com.au (8.12.8/8.12.8/Submit) id h4RBVdCH044766; Tue, 27 May 2003 21:31:39 +1000 (EST) Date: Tue, 27 May 2003 21:31:38 +1000 From: Peter Jeremy To: Igor Sysoev Message-ID: <20030527113138.GD44520@cirb503493.alcatel.com.au> References: <3ED33487.59C03DA9@mindspring.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.1i cc: arch@freebsd.org Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 May 2003 11:31:45 -0000 On Tue, May 27, 2003 at 02:49:35PM +0400, Igor Sysoev wrote: >Actually 6 vs. 6 for this 8K file. But I said about another thing. >Let's see 48K file and 250 bytes header. sendfile() usually sends >it as 4K or 8K hunks so there are 48/8 * 6 + 1 (header) = 37 packets. >But (48K + 250) / 1460 = 33 * 1460 + 1270 i.e. 34 packets. >It's 8% decrease of data packets. Add here the possible retransmitions. Why is the number of data packets so important? If you repeat the calculation considering bytes across the wire (assuming Ethernet), then the saving is closer to 0.4% - this is in the noise. For that matter, have you considered the impact of Path MTU discovery? I think possible retransmissions are irrelevant here. If your packet loss is anything above negligible then you have other problems. If the retransmission is caused by transmission noise, then the smaller packets are less likely to get hit. And the sender is likely to retransmit a full packet rather than the small packet originally sent. >> Really: it's in the noise. Unless you are paying by packet >> count, you probably shouldn't care. > >So do you consider that IP fragmentation is the good thing ? "IP fragmentation" normally refers to a single IP packet being split up into multiple smaller packets by a router. It has nothing to do with the topic under discussion. If anything, transmitting smaller IP packets reduces the likelihood that an intervening router will need to fragment packets - so your patch actually increases the probability of IP fragmentation. Peter From owner-freebsd-arch@FreeBSD.ORG Tue May 27 04:41:12 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D9C4237B401 for ; Tue, 27 May 2003 04:41:12 -0700 (PDT) Received: from cirb503493.alcatel.com.au (c18609.belrs1.nsw.optusnet.com.au [210.49.80.204]) by mx1.FreeBSD.org (Postfix) with ESMTP id D315743F93 for ; Tue, 27 May 2003 04:41:11 -0700 (PDT) (envelope-from peterjeremy@optushome.com.au) Received: from cirb503493.alcatel.com.au (localhost.alcatel.com.au [127.0.0.1])h4RBfAp9044784; Tue, 27 May 2003 21:41:10 +1000 (EST) (envelope-from jeremyp@cirb503493.alcatel.com.au) Received: (from jeremyp@localhost) by cirb503493.alcatel.com.au (8.12.8/8.12.8/Submit) id h4RBfAMf044783; Tue, 27 May 2003 21:41:10 +1000 (EST) Date: Tue, 27 May 2003 21:41:10 +1000 From: Peter Jeremy To: Igor Sysoev Message-ID: <20030527114110.GE44520@cirb503493.alcatel.com.au> References: <20030527102806.GC44520@cirb503493.alcatel.com.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.1i cc: arch@freebsd.org Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 May 2003 11:41:13 -0000 On Tue, May 27, 2003 at 03:25:31PM +0400, Igor Sysoev wrote: >On Tue, 27 May 2003, Peter Jeremy wrote: > >> 2) The new feature provides significant performance benefit. In this >> case, I believe the overhead of calling setsockopt(2) is negligible >> so the performance gain would be negligible. > >I think the calling setsockopt(TCP_NOPUSH, 1) syscall has huge overhead >as compared to several C operators inside sendfile(2). Agreed. But needing another one or two relatively cheap system calls is negligible compared to the total overhead of accept(), several select()/poll()/kqueue() calls, several sendfile() calls and a close(). If you can produce some figures demonstrating that two setsockopt() syscalls makes a noticable difference to the total cost of sending a 48K file then your change might be worth considering. >My point is not "the cheap change" but "the cheap overhead". Please quantify the improvement amortised over a complete transaction or connection. Peter From owner-freebsd-arch@FreeBSD.ORG Tue May 27 05:03:07 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A6A8237B401 for ; Tue, 27 May 2003 05:03:07 -0700 (PDT) Received: from park.rambler.ru (park.rambler.ru [81.19.64.101]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3CD2A43F3F for ; Tue, 27 May 2003 05:03:06 -0700 (PDT) (envelope-from is@rambler-co.ru) Received: from is.park.rambler.ru (is.park.rambler.ru [81.19.64.102]) by park.rambler.ru (8.12.6/8.12.6) with ESMTP id h4RC34mF016479; Tue, 27 May 2003 16:03:04 +0400 (MSD) Date: Tue, 27 May 2003 16:03:04 +0400 (MSD) From: Igor Sysoev X-Sender: is@is To: Peter Jeremy In-Reply-To: <20030527113138.GD44520@cirb503493.alcatel.com.au> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: arch@freebsd.org Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 May 2003 12:03:07 -0000 On Tue, 27 May 2003, Peter Jeremy wrote: > On Tue, May 27, 2003 at 02:49:35PM +0400, Igor Sysoev wrote: > >Actually 6 vs. 6 for this 8K file. But I said about another thing. > >Let's see 48K file and 250 bytes header. sendfile() usually sends > >it as 4K or 8K hunks so there are 48/8 * 6 + 1 (header) = 37 packets. > >But (48K + 250) / 1460 = 33 * 1460 + 1270 i.e. 34 packets. > >It's 8% decrease of data packets. Add here the possible retransmitions. > > Why is the number of data packets so important? If you repeat the > calculation considering bytes across the wire (assuming Ethernet), > then the saving is closer to 0.4% - this is in the noise. When there's the simple way to avoid the partially filled packets I do not see any reason not to use it. > For that matter, have you considered the impact of Path MTU discovery? What impact ? > I think possible retransmissions are irrelevant here. If your packet > loss is anything above negligible then you have other problems. If > the retransmission is caused by transmission noise, then the smaller > packets are less likely to get hit. And the sender is likely to > retransmit a full packet rather than the small packet originally sent. > > >> Really: it's in the noise. Unless you are paying by packet > >> count, you probably shouldn't care. > > > >So do you consider that IP fragmentation is the good thing ? > > "IP fragmentation" normally refers to a single IP packet being > split up into multiple smaller packets by a router. It has nothing > to do with the topic under discussion. If anything, transmitting I know what is IP fragmentation. But in terms of the packet overhead they are similar to not full packets. > smaller IP packets reduces the likelihood that an intervening router > will need to fragment packets - so your patch actually increases > the probability of IP fragmentation. Yes, I understand it. Igor Sysoev http://sysoev.ru/en/ From owner-freebsd-arch@FreeBSD.ORG Tue May 27 08:30:36 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9BF5037B401 for ; Tue, 27 May 2003 08:30:36 -0700 (PDT) Received: from puffin.mail.pas.earthlink.net (puffin.mail.pas.earthlink.net [207.217.120.139]) by mx1.FreeBSD.org (Postfix) with ESMTP id F0FC043FBF for ; Tue, 27 May 2003 08:30:35 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from user-2ivfjqj.dialup.mindspring.com ([165.247.207.83] helo=mindspring.com) by puffin.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 19KgPF-0005u0-00; Tue, 27 May 2003 08:30:30 -0700 Message-ID: <3ED3844F.713FB360@mindspring.com> Date: Tue, 27 May 2003 08:29:19 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Peter Jeremy References: <20030526201740.GA22178@cirb503493.alcatel.com.au> <20030527102806.GC44520@cirb503493.alcatel.com.au> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a430e0f4f477a8cd8998973b7e8d6551c2a7ce0e8f8d31aa3f350badd9bab72f9c350badd9bab72f9c cc: arch@freebsd.org Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 May 2003 15:30:37 -0000 Peter Jeremy wrote: > On Tue, May 27, 2003 at 11:57:20AM +0400, Igor Sysoev wrote: > >I thought about it more and I agree with you. TF_NOPUSH should be turned on > >at the start of a transaction and turned off at the end of a transaction. > > > >So I think there should be two flags: > >SF_NOPUSH - it turns TF_NOPUSH on before the sending. It's cheap: > >SF_PUSH - it turns TF_NOPUSH off after the sending has been completed. > > I agree that the code appears trivial but in order to justify its > inclusion, you will need to demonstrate that there is some benefit to > FreeBSD to implement this code. Good justification would be: > > 1) The same API is implemented somewhere else (or there is agreement > between multiple groups to implement it). I don't believe this > functionality is implemented anywhere else and you've not provided > any evidence that any other groups are considering such functionality. Actually, the functionality can be implemented *without* going and implementing the API. It should really be contrlled already by the TCP_NODELAY option *not* having been set by the user, and, for last-block next-first-block coelescing, by TCP_NOPUSH *having* been set. Basically, the stack is minorly misbehaving on us in the sendfile case; effectively, it's unintentionally fragging up to one packet between the user supplied header (if any) and the file content, and the file content and the user-supplied trailer (if any). It's nothing to be terrifically concerned about, unless you are paying by the packet, you keep you connections open a very long time (e.g. HTTP/1.1), such that the amortized packet count is relatively high, and your files, headers, and trailers are tiny, enough that the frags constitute a significant portion of your packet traffic. In other words, you have to win the lottery. 8-). > 2) The new feature provides significant performance benefit. In this > case, I believe the overhead of calling setsockopt(2) is negligible > so the performance gain would be negligible. The overhead of toggling it would be costly. However, I really don't understand why he isn't just not setting TCP_NODELAY in the first place, since it's an affirmative option, and then leaaving the socket alone to act like it's supposed to act. > 3) The new feature provides novel functionality that cannot be > achieved using the existing API (eg kqueue(2)). The functionality > is already available via setsockopt(2) so this isn't applicable. Heck; I'd argue that it can be achieved with sendfile(2), if you leave the TCP options alone, and are willing to accept not setting TCP_NOPUSH for back-to-back potentially one packet worth of overhead, just by reorganizing the sendfile(2) implementation to comply with existing default conditionals. > At this stage, I would suggest that you need to do better than "the > change is cheap" to justify adding this feature. Can you quantify > the performance benefits, or provide some other justification? I'd also like to see a performance comparison; the issue is probably that, without a testbed that can drive traffic at full Gigabit speeds, he's probably not going to be able to show anything of statistical significance from this; at full Gigabit speed, he could probably show CPU copy overhead that's high enough to impact total top-end throughput, as he runs out of CPU to do the copies. IMO, that'd only be true if his data set was small enough to fit in cache after the first one or two sends. The mbuf allocator overhead shows the same level of overhead, though, and you could reclaim performance there, instead, if you were looking for low-hanging fruit. -- Terry From owner-freebsd-arch@FreeBSD.ORG Tue May 27 08:55:11 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3566937B404 for ; Tue, 27 May 2003 08:55:11 -0700 (PDT) Received: from stork.mail.pas.earthlink.net (stork.mail.pas.earthlink.net [207.217.120.188]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6022B43FCB for ; Tue, 27 May 2003 08:55:09 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from user-2ivfjqj.dialup.mindspring.com ([165.247.207.83] helo=mindspring.com) by stork.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 19Kgn2-0006qX-00; Tue, 27 May 2003 08:55:05 -0700 Message-ID: <3ED38A13.524529B2@mindspring.com> Date: Tue, 27 May 2003 08:53:55 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Igor Sysoev References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4edbce846b2e507d84d06f7bfba2784a2666fa475841a1c7a350badd9bab72f9c350badd9bab72f9c cc: arch@freebsd.org Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 May 2003 15:55:11 -0000 Igor Sysoev wrote: > > I would be really surprised if you were able to demonstrate a > > measuarble performance difference which was above the noise. > > I hope I will demonstrate at least CPU usage in near future. See other post: that's the only place I expect there to be a potential win; however, unless you CPU power is relatively low, compared to memory and PCI bus bandwidth, I expect the limiting factor to be PCI bus bandwidth first, memory second, and CPU overhead a distant third. That changes if you are doing crypto, but then IPSEC changes all your assumptions. > > You were talking about the file and the header living in the > > same packet. > > I mean that if you have 230 bytes header then sendfile() will send it > in separate packet nevertheless the size of header and of the file. > Something like this - 230, 1460, 1460, ... Again, see other post: this is arguably a sendfile(2) bug, though a reall minor one; one which should be addressed in the sendfile(2) implementation, and doesn't need options added to the API in order to address it. > > > it will return me 230 bytes: > > > > The "HEAD" is atypical, compared to the "GET"; the full Google > > front page is larger than that, and consists of multiple files; > > assuming you support HTTP/1.1 and pipelining, it's going to be > > a back-to-back transfer involving multiple sendfile() calls. > > I use HEAD to show you the size of the HTTP header. > The HEAD is atypical but such small HTTP header is typical. Here is my problem: you are arguing both amortized cost and total cost, depending on which is more supportive of your main thesis. These arguments are seperate and orthogonal to each other: they don't support each other. You can argue tiny files, and a relatively high total cost, or you can argue large files and pipelining, and a relatively high amortized cost, but you can't argue both time and large files and many connections and one connection at the same time. Personally, I'd step back and get the arguments straight, and get an implementation that demonstrates statistically significant performance differences, and then come back, if I wanted to press the case for additional option flags. I have done this several times in the past, e.g. with my soft interrupt coelescing implementation that's now part of most of the ethernet drivers people care about. Actually, in this case, I'd just try to fix sendfile(2) to do the packet coelescing I'd expect, given the relative state of the TCP_NODELAY and TCP_NOPUSH options flags. > > 3 packets vs. 6. And using HTTP/1.0, there's also the three > > handshake packets, SYN/SYN-ACK/ACK, and the tear-down three > > teardown packets, FIN/FIN-ACK/ACK (or 4), plus the ACK's for > > the packets you sent (should be one ACK, since that's below > > the TCP window size). > > Actually 6 vs. 6 for this 8K file. But I said about another thing. > Let's see 48K file and 250 bytes header. sendfile() usually sends > it as 4K or 8K hunks so there are 48/8 * 6 + 1 (header) = 37 packets. > But (48K + 250) / 1460 = 33 * 1460 + 1270 i.e. 34 packets. > It's 8% decrease of data packets. Which may or may not be a possible win; it depends on how close to the bandwidth limit you are capable of driving your hardware. The bandwidth delay product between you and the other end of the connection is probably going to be much more significant a factor, when moving barely enough data to trigger one window framing event (forced ACK). > Add here the possible retransmitions. Retransmissions are probably irrelevent; when you talk about a retransmit, you are talking about data which is persisting in your send sockbuf because it is outstanding unacknowledged data. At that point, the mbuf chains are assemebled. The internal fragmentation you are complaining about here happens because of the initial lack of a TF_NOPUSH flag on tcpcb when the tcp_output() is called on it after the headers have been enqueued, but before any file data has been enqueued. So when a retransmit, if any, is necessary, the packet stream will not have the same decoelesced state: it will retransmit exactly as you wanted it to transmit in the first place. BTW: I'm still wary of the initial fault on the file data, if it's not already in cache: arguably, it's better to start sending the headers, and avoid the startup latency of delaying sending the headers until the fault is satisfied: part of the thing that's going to be eating your PCI bandwidth is the disk I/O, and your disks are going to be the slowest data sources/sinks in the whole equation. > > Really: it's in the noise. Unless you are paying by packet > > count, you probably shouldn't care. > > So do you consider that IP fragmentation is the good thing ? Depends; can I go end-to-end without any fragmentation that happens at all, or am I required to use frags to get packets through at all? If I have to use frags to get packets through, fragged data is *much* better than no data. 8-) 8-). In any case, I expect that this should be handled in the context of TCP_NODELAY and TCP_NOPUSH, rather than by adding options to work around an arguably broken sendfile(2). -- Terry From owner-freebsd-arch@FreeBSD.ORG Tue May 27 08:59:34 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 495C037B401 for ; Tue, 27 May 2003 08:59:34 -0700 (PDT) Received: from park.rambler.ru (park.rambler.ru [81.19.64.101]) by mx1.FreeBSD.org (Postfix) with ESMTP id 78A0743F75 for ; Tue, 27 May 2003 08:59:32 -0700 (PDT) (envelope-from is@rambler-co.ru) Received: from is.park.rambler.ru (is.park.rambler.ru [81.19.64.102]) by park.rambler.ru (8.12.6/8.12.6) with ESMTP id h4RFxUmF020137; Tue, 27 May 2003 19:59:30 +0400 (MSD) Date: Tue, 27 May 2003 19:59:30 +0400 (MSD) From: Igor Sysoev X-Sender: is@is To: Terry Lambert In-Reply-To: <3ED3844F.713FB360@mindspring.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: arch@freebsd.org Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 May 2003 15:59:34 -0000 On Tue, 27 May 2003, Terry Lambert wrote: > Peter Jeremy wrote: > > On Tue, May 27, 2003 at 11:57:20AM +0400, Igor Sysoev wrote: > > >I thought about it more and I agree with you. TF_NOPUSH should be turned on > > >at the start of a transaction and turned off at the end of a transaction. > > > > > >So I think there should be two flags: > > >SF_NOPUSH - it turns TF_NOPUSH on before the sending. It's cheap: > > >SF_PUSH - it turns TF_NOPUSH off after the sending has been completed. > > > > I agree that the code appears trivial but in order to justify its > > inclusion, you will need to demonstrate that there is some benefit to > > FreeBSD to implement this code. Good justification would be: > > > > 1) The same API is implemented somewhere else (or there is agreement > > between multiple groups to implement it). I don't believe this > > functionality is implemented anywhere else and you've not provided > > any evidence that any other groups are considering such functionality. > > Actually, the functionality can be implemented *without* going > and implementing the API. It should really be contrlled already > by the TCP_NODELAY option *not* having been set by the user, and, > for last-block next-first-block coelescing, by TCP_NOPUSH *having* > been set. It's not an implementing the API. It's an addition to the already existed API - sendfile(2). sendfile(2) already has the flags parameter and this parameter is currently unused and should be zero. I propose two sendfile(2) flags - SF_NOPUSH and SF_PUSH. > > 2) The new feature provides significant performance benefit. In this > > case, I believe the overhead of calling setsockopt(2) is negligible > > so the performance gain would be negligible. > > The overhead of toggling it would be costly. However, I really > don't understand why he isn't just not setting TCP_NODELAY in > the first place, since it's an affirmative option, and then > leaaving the socket alone to act like it's supposed to act. TCP_NODELAY is not set. Igor Sysoev http://sysoev.ru/en/ From owner-freebsd-arch@FreeBSD.ORG Tue May 27 09:04:07 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 39E3C37B404 for ; Tue, 27 May 2003 09:04:07 -0700 (PDT) Received: from stork.mail.pas.earthlink.net (stork.mail.pas.earthlink.net [207.217.120.188]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4024F43FAF for ; Tue, 27 May 2003 09:04:06 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from user-2ivfjqj.dialup.mindspring.com ([165.247.207.83] helo=mindspring.com) by stork.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 19Kgvh-0001Fw-00; Tue, 27 May 2003 09:04:02 -0700 Message-ID: <3ED38C2B.DEA23AB8@mindspring.com> Date: Tue, 27 May 2003 09:02:51 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Igor Sysoev References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a455a956b0f4a4b4331ed3b12870dfb34f350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c cc: arch@freebsd.org Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 May 2003 16:04:07 -0000 Igor Sysoev wrote: > On Tue, 27 May 2003, Peter Jeremy wrote: > > 2) The new feature provides significant performance benefit. In this > > case, I believe the overhead of calling setsockopt(2) is negligible > > so the performance gain would be negligible. > > I think the calling setsockopt(TCP_NOPUSH, 1) syscall has huge overhead > as compared to several C operators inside sendfile(2). But this call should not be necessary. Internally, the sendfile(2) implementation should treat the headers + file contents + trailers as a single stream. Your problem is that the implementation of sendfile(2) sucks and is not doing this, not that you need to set TCP_NOPUSH to avoid seperation of three back-to-back transmits: you don't *have* three back-to-back transmits here, you have only *one* transmit. Would you expect a writev(2) operation to break up each of the chunks described by the vector into seperate back-to-back transmits? If not, why do you expect sendfile(2) to do it? > The turing TF_NOPUSH off has almost the same overhead as > setsockopt(TCP_NOPUSH, 0) if you need to call tcp_output(tp) inside > sendfile(2) and has no overhead at all if you do not need to call it. The problem is that you need to break tcp_output() into a couple of routines, OR you need to not call it on the headers, file data, and trailers seperately. > > At this stage, I would suggest that you need to do better than "the > > change is cheap" to justify adding this feature. Can you quantify > > the performance benefits, or provide some other justification? > > My point is not "the cheap change" but "the cheap overhead". I think we can all make up our own stories, where the overhead could become important enough for a specific application that we wouldn't complain about you eliminating it so you could do your application, as long as it doesn't negatively impact the rest of us (say, by adding non-standard sendfile(2) flags that no one else supports, if that isn't the only possible way to solve the problem). I don't think overhead is the issue, at this point: say we agree with you on overhead, for your particular application, and we are not against you solving your overhead problem: why exactly does the API have to change to fix the root cause of the problem? -- Terry From owner-freebsd-arch@FreeBSD.ORG Tue May 27 09:12:03 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0AD0E37B401 for ; Tue, 27 May 2003 09:12:03 -0700 (PDT) Received: from stork.mail.pas.earthlink.net (stork.mail.pas.earthlink.net [207.217.120.188]) by mx1.FreeBSD.org (Postfix) with ESMTP id 80BB143F75 for ; Tue, 27 May 2003 09:12:02 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from user-2ivfjqj.dialup.mindspring.com ([165.247.207.83] helo=mindspring.com) by stork.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 19Kh3N-0003RV-00; Tue, 27 May 2003 09:11:57 -0700 Message-ID: <3ED38E06.3FC82F09@mindspring.com> Date: Tue, 27 May 2003 09:10:46 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Igor Sysoev References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a455a956b0f4a4b4333d25c75bbab9e2b0666fa475841a1c7a350badd9bab72f9c350badd9bab72f9c cc: arch@freebsd.org Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 May 2003 16:12:03 -0000 Igor Sysoev wrote: > When there's the simple way to avoid the partially filled packets > I do not see any reason not to use it. So set a flag when you would have called tcp_output() in between the headers and file data, or the file data and the trailers, and, if you have to early return, and the flag is set, call it before your early return. Problem solved. -- Terry From owner-freebsd-arch@FreeBSD.ORG Tue May 27 09:30:37 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8586237B401 for ; Tue, 27 May 2003 09:30:37 -0700 (PDT) Received: from stork.mail.pas.earthlink.net (stork.mail.pas.earthlink.net [207.217.120.188]) by mx1.FreeBSD.org (Postfix) with ESMTP id EBA9543F85 for ; Tue, 27 May 2003 09:30:36 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from user-2ivfjqj.dialup.mindspring.com ([165.247.207.83] helo=mindspring.com) by stork.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 19KhLM-0000SY-00; Tue, 27 May 2003 09:30:33 -0700 Message-ID: <3ED39260.27639328@mindspring.com> Date: Tue, 27 May 2003 09:29:20 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Igor Sysoev References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4964b9ff6710088d37d588dac50df4af3350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c cc: arch@freebsd.org Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 May 2003 16:30:37 -0000 Igor Sysoev wrote: > > Actually, the functionality can be implemented *without* going > > and implementing the API. It should really be contrlled already > > by the TCP_NODELAY option *not* having been set by the user, and, > > for last-block next-first-block coelescing, by TCP_NOPUSH *having* > > been set. > > It's not an implementing the API. It's an addition to the already existed > API - sendfile(2). sendfile(2) already has the flags parameter and this > parameter is currently unused and should be zero. I propose two sendfile(2) > flags - SF_NOPUSH and SF_PUSH. Why? Why not just fix the broken sendfile(2) implementation, instead? > > > 2) The new feature provides significant performance benefit. In this > > > case, I believe the overhead of calling setsockopt(2) is negligible > > > so the performance gain would be negligible. > > > > The overhead of toggling it would be costly. However, I really > > don't understand why he isn't just not setting TCP_NODELAY in > > the first place, since it's an affirmative option, and then > > leaaving the socket alone to act like it's supposed to act. > > TCP_NODELAY is not set. So there's no barrier to you fixing this by either breaking up tcp_output() into two functions, or lazy-calling tcp_output(), instead of aggreesively calling it between headers and file data and file data and trailers in sendfile(2). Right? No API change necessary? -- Terry From owner-freebsd-arch@FreeBSD.ORG Tue May 27 10:46:57 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 825C937B401 for ; Tue, 27 May 2003 10:46:57 -0700 (PDT) Received: from park.rambler.ru (park.rambler.ru [81.19.64.101]) by mx1.FreeBSD.org (Postfix) with ESMTP id 321E443FAF for ; Tue, 27 May 2003 10:46:56 -0700 (PDT) (envelope-from is@rambler-co.ru) Received: from is.park.rambler.ru (is.park.rambler.ru [81.19.64.102]) by park.rambler.ru (8.12.6/8.12.6) with ESMTP id h4RHksmF022233; Tue, 27 May 2003 21:46:54 +0400 (MSD) Date: Tue, 27 May 2003 21:46:54 +0400 (MSD) From: Igor Sysoev X-Sender: is@is To: Terry Lambert In-Reply-To: <3ED38A13.524529B2@mindspring.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: arch@freebsd.org Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 May 2003 17:46:57 -0000 On Tue, 27 May 2003, Terry Lambert wrote: > Igor Sysoev wrote: > > I mean that if you have 230 bytes header then sendfile() will send it > > in separate packet nevertheless the size of header and of the file. > > Something like this - 230, 1460, 1460, ... > > Again, see other post: this is arguably a sendfile(2) bug, > though a reall minor one; one which should be addressed in > the sendfile(2) implementation, and doesn't need options > added to the API in order to address it. How do suppose to coelesce the file pages ? Wire two or more pages to mbuf's at once ? BTW I did not see how sendfile() work over jumbo ethernet. I suspect that without TCP_NOPUSH it sometimes sends 4096 or 8192 bytes packets instead of 9000. > > > > it will return me 230 bytes: > > > > > > The "HEAD" is atypical, compared to the "GET"; the full Google > > > front page is larger than that, and consists of multiple files; > > > assuming you support HTTP/1.1 and pipelining, it's going to be > > > a back-to-back transfer involving multiple sendfile() calls. > > > > I use HEAD to show you the size of the HTTP header. > > The HEAD is atypical but such small HTTP header is typical. > > Here is my problem: you are arguing both amortized cost and > total cost, depending on which is more supportive of your > main thesis. These arguments are seperate and orthogonal to > each other: they don't support each other. You can argue > tiny files, and a relatively high total cost, or you can argue > large files and pipelining, and a relatively high amortized > cost, but you can't argue both time and large files and > many connections and one connection at the same time. Terry, I do not understand you. My argument is simple - I want to avoid the partial packets because it decreases the number of packets. That's all. There's nothing about amortized cost or total cost. I do not even know what they are. > Personally, I'd step back and get the arguments straight, > and get an implementation that demonstrates statistically > significant performance differences, and then come back, if > I wanted to press the case for additional option flags. I > have done this several times in the past, e.g. with my soft > interrupt coelescing implementation that's now part of most > of the ethernet drivers people care about. > > Actually, in this case, I'd just try to fix sendfile(2) to > do the packet coelescing I'd expect, given the relative > state of the TCP_NODELAY and TCP_NOPUSH options flags. Actually, sendfile() already works according to TCP_NOPUSH flag. I do not know about TCP_NODELAY - I do not work with it. But if you turn TCP_NOPUSH on then sendfile() will send the full packets. If you turn TCP_NOPUSH off then sendfile() will send some packets partially filled. It's correct. > BTW: I'm still wary of the initial fault on the file data, if > it's not already in cache: arguably, it's better to start > sending the headers, and avoid the startup latency of delaying > sending the headers until the fault is satisfied: part of the > thing that's going to be eating your PCI bandwidth is the > disk I/O, and your disks are going to be the slowest data > sources/sinks in the whole equation. I agree but after all it's 20ms or so delay. > In any case, I expect that this should be handled in the > context of TCP_NODELAY and TCP_NOPUSH, rather than by adding > options to work around an arguably broken sendfile(2). sendfile() already works nice with TCP_NOPUSH. I propose only the flags that allow to turn TCP_NOPUSH (actually TF_NOPUSH) on/off inside sendfile(). Then in one syscall you can turn TCP_NOPUSH on, send the HTTP header, the file pages and turn TCP_NOPUSH off if all file pages are wired to mbuf's. And this TCP_NOPUSH state is not bound by sendfile() internals, you can control it via setsockopt/getsockopt(TCP_NOPUSH). Igor Sysoev http://sysoev.ru/en/ From owner-freebsd-arch@FreeBSD.ORG Tue May 27 10:49:16 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8B3F037B401 for ; Tue, 27 May 2003 10:49:16 -0700 (PDT) Received: from sccrmhc03.attbi.com (sccrmhc03.attbi.com [204.127.202.63]) by mx1.FreeBSD.org (Postfix) with ESMTP id B0DE043F93 for ; Tue, 27 May 2003 10:49:15 -0700 (PDT) (envelope-from bmah@employees.org) Received: from bmah.dyndns.org (12-240-204-110.client.attbi.com[12.240.204.110]) by attbi.com (sccrmhc03) with ESMTP id <200305271749140030061fume>; Tue, 27 May 2003 17:49:14 +0000 Received: from intruder.bmah.org (localhost [127.0.0.1]) by bmah.dyndns.org (8.12.9/8.12.9) with ESMTP id h4RHnE9c070358; Tue, 27 May 2003 10:49:14 -0700 (PDT) (envelope-from bmah@intruder.bmah.org) Received: (from bmah@localhost) by intruder.bmah.org (8.12.9/8.12.9/Submit) id h4RHnD4t070357; Tue, 27 May 2003 10:49:13 -0700 (PDT) Date: Tue, 27 May 2003 10:49:13 -0700 From: "Bruce A. Mah" To: Scott Long Message-ID: <20030527174913.GA70249@intruder.bmah.org> References: <3ED194D8.4040706@btc.adaptec.com> <20030526.164650.10294010.imp@bsdimp.com> <3ED29D4E.8070606@btc.adaptec.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="fUYQa+Pmc3FrFX/N" Content-Disposition: inline In-Reply-To: <3ED29D4E.8070606@btc.adaptec.com> User-Agent: Mutt/1.4.1i X-Image-Url: http://www.employees.org/~bmah/Images/bmah-cisco-small.gif X-url: http://www.employees.org/~bmah/ cc: arch@freebsd.org Subject: Re: New bootloader! X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 May 2003 17:49:16 -0000 --fUYQa+Pmc3FrFX/N Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable If memory serves me right, Scott Long wrote: > M. Warner Losh wrote: > >Do you envision this loader script as something that would be part of > >the installation process, or something people would use everyday? > > > >Warner >=20 > I think that it has the potential to be an everyday feature. Disabling > it is quite easy; just remove the patch that I posted in the previous > message (or just remove the line that calls the entry point). I'm > looking for it to be a start to something that enables one to easily > load arbitrary kernels, control device/kernel variables, etc. Cute. :-) I like this. This point probably isn't worthy of an arch@ discussion, but I'm concerned that it is not clear to the first-time user what happens when the count-down timer expires. (In other words, "[Space] to pause" *what*?!?) Maybe steal some of the wording from the old bootloader prompt? Bruce. --fUYQa+Pmc3FrFX/N Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (FreeBSD) iD8DBQE+06UZ2MoxcVugUsMRArElAJ9WRnOrKHM9ezN9Hj9bJ89y5chiCQCgwTTQ dtzUTZXq0KEy3of4SfKwRKA= =OWvR -----END PGP SIGNATURE----- --fUYQa+Pmc3FrFX/N-- From owner-freebsd-arch@FreeBSD.ORG Tue May 27 11:01:03 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5C2B437B401 for ; Tue, 27 May 2003 11:01:03 -0700 (PDT) Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id E248643FA3 for ; Tue, 27 May 2003 11:01:02 -0700 (PDT) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (localhost [127.0.0.1]) by apollo.backplane.com (8.12.9/8.12.6) with ESMTP id h4RI12VI066865; Tue, 27 May 2003 11:01:02 -0700 (PDT) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.12.9/8.12.6/Submit) id h4RI12Qr066864; Tue, 27 May 2003 11:01:02 -0700 (PDT) Date: Tue, 27 May 2003 11:01:02 -0700 (PDT) From: Matthew Dillon Message-Id: <200305271801.h4RI12Qr066864@apollo.backplane.com> To: Igor Sysoev References: cc: arch@freebsd.org Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 May 2003 18:01:03 -0000 :... :> have done this several times in the past, e.g. with my soft :> interrupt coelescing implementation that's now part of most :> of the ethernet drivers people care about. :> :> Actually, in this case, I'd just try to fix sendfile(2) to :> do the packet coelescing I'd expect, given the relative :> state of the TCP_NODELAY and TCP_NOPUSH options flags. : :Actually, sendfile() already works according to TCP_NOPUSH flag. :I do not know about TCP_NODELAY - I do not work with it. :But if you turn TCP_NOPUSH on then sendfile() will send the full packets. :If you turn TCP_NOPUSH off then sendfile() will send some packets partially :filled. It's correct. But considering the fairly high syscall overhead of sendfile() verses the 1uS or so it takes to do a setsockopt(), implementing additional flags in the sendfile() API to work around sendfile()'s inefficient implementation of the header sending code SOLELY to avoid the additional syscalls is not a good enough reason to change the API. It would just be adding one hack on top of another with the side effect of the new hack being visible in the API. This is bad. This (minor) problem *should* be solved by fixing the sendfile() implementation itself. It may well be that a reasonable solution would be to have sendfile() itself set TCP_NOPUSH internally to wrap the header sending writev() and the first data packet, then restore the previous state after queueing the first data packet. That would still be a hack, but at least it would be one that is not being made visible in the API. Visible changes in APIs create porting headaches between UNIXes and should be avoided whenever possible. -Matt From owner-freebsd-arch@FreeBSD.ORG Tue May 27 11:14:09 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D628837B401 for ; Tue, 27 May 2003 11:14:09 -0700 (PDT) Received: from rhombus.znep.com (sense-sea-MegaSub-1-507.oz.net [216.39.145.253]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0CA1943F3F for ; Tue, 27 May 2003 11:14:09 -0700 (PDT) (envelope-from marcs@znep.com) Received: by rhombus.znep.com (Postfix, from userid 1000) id 20D6A1A291; Tue, 27 May 2003 11:14:08 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by rhombus.znep.com (Postfix) with ESMTP id 190861AAA5 for ; Tue, 27 May 2003 11:14:08 -0700 (PDT) Date: Tue, 27 May 2003 11:14:07 -0700 (PDT) From: Marc Slemko To: arch@freebsd.org In-Reply-To: <3ED3844F.713FB360@mindspring.com> Message-ID: References: <20030526201740.GA22178@cirb503493.alcatel.com.au> <3ED3844F.713FB360@mindspring.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 May 2003 18:14:10 -0000 On Tue, 27 May 2003, Terry Lambert wrote: > The overhead of toggling it would be costly. However, I really > don't understand why he isn't just not setting TCP_NODELAY in > the first place, since it's an affirmative option, and then > leaaving the socket alone to act like it's supposed to act. Given the bug in FreeBSD's sendfile() that results in it sending the headers in a separate segment, then just leaving Nagle enabled will destroy your performance if you are implementing a protocol such as HTTP with persistent (pipelined or not) connections due to the interaction betwteen Nagle and delayed ack. However, that is somewhat of a moot point for HTTP anyway because if you support pipelined requests then for the most efficient implementation of HTTP on the network layer you need to ensure that the response headers and body from multiple requests can be coalesced into the same packet. When using sendfile(), this means you have to control TCP_NOPUSH yourself, even if sendfile coalesced headers, data, and trailers. sendfile() still should be fixed to properly coalesce them. From owner-freebsd-arch@FreeBSD.ORG Tue May 27 11:31:37 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A3F4637B49E for ; Tue, 27 May 2003 11:31:36 -0700 (PDT) Received: from rhombus.znep.com (sense-sea-MegaSub-1-507.oz.net [216.39.145.253]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9B27443FB1 for ; Tue, 27 May 2003 11:31:35 -0700 (PDT) (envelope-from marcs@znep.com) Received: by rhombus.znep.com (Postfix, from userid 1000) id 003551A326; Tue, 27 May 2003 11:31:34 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by rhombus.znep.com (Postfix) with ESMTP id EC3DF1AAA5 for ; Tue, 27 May 2003 11:31:34 -0700 (PDT) Date: Tue, 27 May 2003 11:31:34 -0700 (PDT) From: Marc Slemko To: arch@freebsd.org In-Reply-To: <20030527113138.GD44520@cirb503493.alcatel.com.au> Message-ID: References: <3ED33487.59C03DA9@mindspring.com> <20030527113138.GD44520@cirb503493.alcatel.com.au> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 May 2003 18:31:38 -0000 On Tue, 27 May 2003, Peter Jeremy wrote: > On Tue, May 27, 2003 at 02:49:35PM +0400, Igor Sysoev wrote: > >Actually 6 vs. 6 for this 8K file. But I said about another thing. > >Let's see 48K file and 250 bytes header. sendfile() usually sends > >it as 4K or 8K hunks so there are 48/8 * 6 + 1 (header) = 37 packets. > >But (48K + 250) / 1460 = 33 * 1460 + 1270 i.e. 34 packets. > >It's 8% decrease of data packets. Add here the possible retransmitions. > > Why is the number of data packets so important? If you repeat the > calculation considering bytes across the wire (assuming Ethernet), > then the saving is closer to 0.4% - this is in the noise. I think Ethernet is pretty close to the best case here, other link layers have significantly higher overheads. Granted, it still is fairly low but not noise. > For that matter, have you considered the impact of Path MTU discovery? Withput PMTU-D, the overhead in number of packets would normally be a smaller percent because your MTU will be lower... what are you suggesting? > I think possible retransmissions are irrelevant here. If your packet > loss is anything above negligible then you have other problems. If > the retransmission is caused by transmission noise, then the smaller > packets are less likely to get hit. And the sender is likely to > retransmit a full packet rather than the small packet originally sent. The reality is that there are many environments where packet loss is a fact of life, and TCP can and does deal with it. I think you will find that the majority of packet loss on the Internet is due to events that are independent of the size of the packet, sugh as many types of buffer overflow. The number of packets also matters from the perspective of a lot of networking equipment that has a heavy per packet overhead, then minimal per byte overhead after that. Regardless of all this, which gets more and more complex to analyze once you look at all the complex behaviours of what routers actually do these days on the net, it is just a "good thing"(tm) to do to ensure that your application uses the minimal number of packets practical. From owner-freebsd-arch@FreeBSD.ORG Tue May 27 11:36:27 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 91A3C37B401 for ; Tue, 27 May 2003 11:36:27 -0700 (PDT) Received: from park.rambler.ru (park.rambler.ru [81.19.64.101]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8406E43FA3 for ; Tue, 27 May 2003 11:36:26 -0700 (PDT) (envelope-from is@rambler-co.ru) Received: from is.park.rambler.ru (is.park.rambler.ru [81.19.64.102]) by park.rambler.ru (8.12.6/8.12.6) with ESMTP id h4RIaPmF023358; Tue, 27 May 2003 22:36:25 +0400 (MSD) Date: Tue, 27 May 2003 22:36:25 +0400 (MSD) From: Igor Sysoev X-Sender: is@is To: Terry Lambert In-Reply-To: <3ED38C2B.DEA23AB8@mindspring.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: arch@freebsd.org Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 May 2003 18:36:28 -0000 On Tue, 27 May 2003, Terry Lambert wrote: > But this call should not be necessary. Internally, the > sendfile(2) implementation should treat the headers + > file contents + trailers as a single stream. Your problem > is that the implementation of sendfile(2) sucks and is not > doing this, not that you need to set TCP_NOPUSH to avoid > seperation of three back-to-back transmits: you don't *have* > three back-to-back transmits here, you have only *one* > transmit. > > Would you expect a writev(2) operation to break up each of > the chunks described by the vector into seperate back-to-back > transmits? If not, why do you expect sendfile(2) to do it? Yes, I agree that sendfile() should work as writev(). > > The turing TF_NOPUSH off has almost the same overhead as > > setsockopt(TCP_NOPUSH, 0) if you need to call tcp_output(tp) inside > > sendfile(2) and has no overhead at all if you do not need to call it. > > The problem is that you need to break tcp_output() into a > couple of routines, OR you need to not call it on the > headers, file data, and trailers seperately. No, tcp_output() is called only once and only if the data in the send buffer is less than MSS: sendfile() { if (flags & SF_NOPUSH) { tp->t_flags |= TF_NOPUSH; } writev(header); send file pages; writev(trailer); if (error == 0 && flags & SF_PUSH) { tp->t_flags &= ~TF_NOPUSH; if (so->so_snd.sb_cc < tp->t_maxseg) { error = tcp_output(tp); } } } > I think we can all make up our own stories, where the overhead > could become important enough for a specific application that > we wouldn't complain about you eliminating it so you could do > your application, as long as it doesn't negatively impact the > rest of us (say, by adding non-standard sendfile(2) flags that > no one else supports, if that isn't the only possible way to > solve the problem). sendfile(2) is completly non-standard thing. Among FreeBSD, Linux, Solaris, HP/UX and AIX no one has even similar prototypes. And all of them have different functionality. > I don't think overhead is the issue, at this point: say we agree > with you on overhead, for your particular application, and we are > not against you solving your overhead problem: why exactly does > the API have to change to fix the root cause of the problem? I do not propose the change of the API, I propose the source and binary compatible addition. Igor Sysoev http://sysoev.ru/en/ From owner-freebsd-arch@FreeBSD.ORG Tue May 27 11:38:58 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 38DC037B401 for ; Tue, 27 May 2003 11:38:58 -0700 (PDT) Received: from park.rambler.ru (park.rambler.ru [81.19.64.101]) by mx1.FreeBSD.org (Postfix) with ESMTP id 18FDF43FAF for ; Tue, 27 May 2003 11:38:57 -0700 (PDT) (envelope-from is@rambler-co.ru) Received: from is.park.rambler.ru (is.park.rambler.ru [81.19.64.102]) by park.rambler.ru (8.12.6/8.12.6) with ESMTP id h4RIctmF023454; Tue, 27 May 2003 22:38:55 +0400 (MSD) Date: Tue, 27 May 2003 22:38:55 +0400 (MSD) From: Igor Sysoev X-Sender: is@is To: Terry Lambert In-Reply-To: <3ED39260.27639328@mindspring.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: arch@freebsd.org Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 May 2003 18:38:58 -0000 On Tue, 27 May 2003, Terry Lambert wrote: > Why? Why not just fix the broken sendfile(2) implementation, > instead? Well, how ? > So there's no barrier to you fixing this by either breaking > up tcp_output() into two functions, or lazy-calling tcp_output(), > instead of aggreesively calling it between headers and file > data and file data and trailers in sendfile(2). Right? No > API change necessary? Did you look inside sendfile() implementation ? There'are no tcp_output() calls at all. Header and trailer are written by writev() and file pages are written by so->so_proto->pr_usrreqs->pru_send(). Igor Sysoev http://sysoev.ru/en/ From owner-freebsd-arch@FreeBSD.ORG Tue May 27 13:05:27 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id DBE7A37B401 for ; Tue, 27 May 2003 13:05:27 -0700 (PDT) Received: from park.rambler.ru (park.rambler.ru [81.19.64.101]) by mx1.FreeBSD.org (Postfix) with ESMTP id 647A543F85 for ; Tue, 27 May 2003 13:05:26 -0700 (PDT) (envelope-from is@rambler-co.ru) Received: from is.park.rambler.ru (is.park.rambler.ru [81.19.64.102]) by park.rambler.ru (8.12.6/8.12.6) with ESMTP id h4RK5DmF025579; Wed, 28 May 2003 00:05:13 +0400 (MSD) Date: Wed, 28 May 2003 00:05:13 +0400 (MSD) From: Igor Sysoev X-Sender: is@is To: Matthew Dillon In-Reply-To: <200305271801.h4RI12Qr066864@apollo.backplane.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: arch@freebsd.org Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 May 2003 20:05:28 -0000 On Tue, 27 May 2003, Matthew Dillon wrote: > But considering the fairly high syscall overhead of sendfile() verses the > 1uS or so it takes to do a setsockopt(), implementing additional > flags in the sendfile() API to work around sendfile()'s inefficient > implementation of the header sending code SOLELY to avoid the additional > syscalls is not a good enough reason to change the API. It would just > be adding one hack on top of another with the side effect of the new > hack being visible in the API. This is bad. > > This (minor) problem *should* be solved by fixing the sendfile() > implementation itself. > It may well be that a reasonable solution would be to have sendfile() > itself set TCP_NOPUSH internally to wrap the header sending writev() > and the first data packet, then restore the previous state after > queueing the first data packet. That would still be a hack, but at least > it would be one that is not being made visible in the API. TCP_NOPUSH is required not only to coalesce the header and the first file part or the last file part and the trailer. It's also required to avoid the partially filled file packets. As I said already the file pages can be sent in 2x1460+1176 or 5x1460+892 packets. The implicit setting TF_NOPUSH in sendfile() and restoring it before return would not resolve completely partially filled file packets in the case of non-blocking socket. > Visible changes in APIs create porting headaches between UNIXes and > should be avoided whenever possible. I agree with portability issues but sendfile() is already very non-portable interface. sendfile() has different prototypes and functionality in all unicies. For example, Linux's sendfile() can not send headers or trailers, HP/UX's sendfile() allows to send only one header and one trailer (i.e. they are not iovec's), Solaris's sendfilev() allows to send combination of the several files and iovecs like WinXP's TransmitPackets(). Igor Sysoev http://sysoev.ru/en/ From owner-freebsd-arch@FreeBSD.ORG Tue May 27 14:46:42 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5D6A337B401; Tue, 27 May 2003 14:46:42 -0700 (PDT) Received: from mail2.qc.uunet.ca (mail2.qc.uunet.ca [198.168.54.17]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5B0FE43F85; Tue, 27 May 2003 14:46:41 -0700 (PDT) (envelope-from anarcat@espresso-com.com) Received: from xtanbul.studio.espresso-com.com ([216.94.147.57]) by mail2.qc.uunet.ca (8.12.9/8.12.9) with ESMTP id h4RLkaU8032695; Tue, 27 May 2003 17:46:36 -0400 Received: from anarcat by xtanbul.studio.espresso-com.com with local (Exim 3.36 #1 (Debian)) id 19KmHG-0001F7-00; Tue, 27 May 2003 17:46:38 -0400 Date: Tue, 27 May 2003 17:46:38 -0400 From: The Anarcat To: "Bruce A. Mah" Message-ID: <20030527214637.GA461@xtanbul> References: <3ED194D8.4040706@btc.adaptec.com> <20030526.164650.10294010.imp@bsdimp.com> <3ED29D4E.8070606@btc.adaptec.com> <20030527174913.GA70249@intruder.bmah.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030527174913.GA70249@intruder.bmah.org> User-Agent: Mutt/1.5.4i Sender: The Anarcat cc: arch@freebsd.org cc: Scott Long Subject: Re: New bootloader! X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 May 2003 21:46:42 -0000 On Tue May 27, 2003 at 10:49:13AM -0700, Bruce A. Mah wrote: > Cute. :-) I like this. > > This point probably isn't worthy of an arch@ discussion, but I'm > concerned that it is not clear to the first-time user what happens > when the count-down timer expires. (In other words, "[Space] to > pause" *what*?!?) Maybe steal some of the wording from the old > bootloader prompt? Or point to what is the default "choice". Also a quick thought... [space] does stop the counter but freezes the screen. It would be nice if it would change the display to acknowledge the fact that the counter is paused. Maybe changing the number of seconds to "paused"? Other than that, I must join the cries of "I like this"! :) A. From owner-freebsd-arch@FreeBSD.ORG Tue May 27 18:22:00 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 63D3737B401 for ; Tue, 27 May 2003 18:22:00 -0700 (PDT) Received: from hotmail.com (oe14.law9.hotmail.com [64.4.8.118]) by mx1.FreeBSD.org (Postfix) with ESMTP id F148743FB1 for ; Tue, 27 May 2003 18:21:59 -0700 (PDT) (envelope-from ccorayer@hotmail.com) Received: from mail pickup service by hotmail.com with Microsoft SMTPSVC; Tue, 27 May 2003 18:21:59 -0700 Received: from 66.30.133.209 by oe14.law9.hotmail.com with DAV; Wed, 28 May 2003 01:21:58 +0000 X-Originating-IP: [66.30.133.209] X-Originating-Email: [ccorayer@hotmail.com] From: "chris corayer" To: Date: Tue, 27 May 2003 21:30:10 -0400 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1106 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1106 Message-ID: X-OriginalArrivalTime: 28 May 2003 01:21:59.0371 (UTC) FILETIME=[8910F1B0:01C324B7] Subject: Re: Subject: New bootloader! X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 May 2003 01:22:00 -0000 Would it be possible to also have the settings to disable ultra dma in this as well? I have a Shuttle SS51G and from what I was last aware, there were issues with the SIS 651 chipset in it. This may have been fixed, but it's likely that there are other unsupported chipsets out there that will cause problems with this. Overall it sounds very good. Keep up the good work! -Chris Corayer From owner-freebsd-arch@FreeBSD.ORG Tue May 27 22:56:12 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7555137B401 for ; Tue, 27 May 2003 22:56:12 -0700 (PDT) Received: from heron.mail.pas.earthlink.net (heron.mail.pas.earthlink.net [207.217.120.189]) by mx1.FreeBSD.org (Postfix) with ESMTP id C921243F75 for ; Tue, 27 May 2003 22:56:11 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from user-38ldvsn.dialup.mindspring.com ([209.86.255.151] helo=mindspring.com) by heron.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 19Ktut-0000Gz-00; Tue, 27 May 2003 22:56:03 -0700 Message-ID: <3ED44F2D.DAF1FA08@mindspring.com> Date: Tue, 27 May 2003 22:54:53 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Igor Sysoev References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a44ca7e6f85b35fe7a1afad9e0c081d3a8350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c cc: arch@freebsd.org Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 May 2003 05:56:12 -0000 Igor Sysoev wrote: > How do suppose to coelesce the file pages ? Wire two or more pages > to mbuf's at once ? It's done by the network driver, using the network card's DMA's scatter/gather. > Terry, I do not understand you. > My argument is simple - I want to avoid the partial packets because it > decreases the number of packets. That's all. There's nothing about > amortized cost or total cost. I do not even know what they are. The total cost is the total overhead in packets to send a given amount of data. For a small amount of data, the total cost is small, compared to the overhead involved in sending the ethernet, IP, and TCP headers. The amortized cost is how much an extra packet costs you to send, relative to what you have to send anyway. If you have a lot of data to send, sending an extra packet or two is really not very costly, since it's just one more packet out of hundreds. If you argue there's a tiny amount of data, then the total cost is important. If you argue there's a lot of data, then the amortized cost is important. When you talk about extra packets being sent, you can't claim that the amortized cost is important for a small amount of data, or that the total cost is important for a huge amount of data. Your focus on number of packets, rather than your ability to move a total amount of data at or near the theoretical maximum, makes no sense. > > Actually, in this case, I'd just try to fix sendfile(2) to > > do the packet coelescing I'd expect, given the relative > > state of the TCP_NODELAY and TCP_NOPUSH options flags. > > Actually, sendfile() already works according to TCP_NOPUSH flag. > I do not know about TCP_NODELAY - I do not work with it. > But if you turn TCP_NOPUSH on then sendfile() will send the full packets. > If you turn TCP_NOPUSH off then sendfile() will send some packets partially > filled. It's correct. Sending some packets partially filled, instead of just the last packet in a series partially filled, is *wrong*, IMO. > > BTW: I'm still wary of the initial fault on the file data, if > > it's not already in cache: arguably, it's better to start > > sending the headers, and avoid the startup latency of delaying > > sending the headers until the fault is satisfied: part of the > > thing that's going to be eating your PCI bandwidth is the > > disk I/O, and your disks are going to be the slowest data > > sources/sinks in the whole equation. > > I agree but after all it's 20ms or so delay. Plus the delay for the NETISR. > > In any case, I expect that this should be handled in the > > context of TCP_NODELAY and TCP_NOPUSH, rather than by adding > > options to work around an arguably broken sendfile(2). > > sendfile() already works nice with TCP_NOPUSH. I propose only the flags > that allow to turn TCP_NOPUSH (actually TF_NOPUSH) on/off inside sendfile(). > Then in one syscall you can turn TCP_NOPUSH on, send the HTTP header, the file > pages and turn TCP_NOPUSH off if all file pages are wired to mbuf's. > And this TCP_NOPUSH state is not bound by sendfile() internals, you > can control it via setsockopt/getsockopt(TCP_NOPUSH). You're wrong about what TCP_NOPUSH is for; it's only for the last packet of one system call being concatenated with the first packet of another, to save empty packets between seperate system calls. When you call sendfile with a file, headers, and trailers, you are making *only one system call*. "man 4 tcp" tells us: TCP_NOPUSH By convention, the sender-TCP will set the ``push'' bit and begin transmission immediately (if permitted) at the end of every user call to write(2) or writev(2). The TCP_NOPUSH option is provided to allow servers to easily make use of Transaction TCP (see ttcp(4)). When the option is set to a non-zero value, TCP will delay sending any data at all until either the socket is closed, or the internal send buffer is filled. FWIW, here's what it tells us about TCP_NODELAY: TCP_NODELAY Under most circumstances, TCP sends data when it is pre- sented; when outstanding data has not yet been acknowl- edged, it gathers small amounts of output to be sent in a single packet once an acknowledgement is received. For a small number of clients, such as window systems that send a stream of mouse events which receive no replies, this pack- etization may cause significant delays. The boolean option TCP_NODELAY defeats this algorithm. IMO, sendfile(2) should be acting the way you want it to act *just by you *NOT* setting TCP_NODELAY*. If you *do* set TCP_NOPUSH, then it should delay sending the last partial packet until the timer goes, or until you write(2), writev(2), sendfile(2), or send/sendto/sendmsg(2) more data. NOTE: TCP_NOPUSH *specifically* mentions writev(2), which, like sendfile(2), takes data from multiple discrete buffers and sends it. Make sense now? You think sendfile(2) needs options; I think sendfile(2) is broken. -- Terry From owner-freebsd-arch@FreeBSD.ORG Tue May 27 23:05:02 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5E65E37B401 for ; Tue, 27 May 2003 23:05:02 -0700 (PDT) Received: from heron.mail.pas.earthlink.net (heron.mail.pas.earthlink.net [207.217.120.189]) by mx1.FreeBSD.org (Postfix) with ESMTP id CBA7843F3F for ; Tue, 27 May 2003 23:05:01 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from user-38ldvsn.dialup.mindspring.com ([209.86.255.151] helo=mindspring.com) by heron.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 19Ku3V-0001RW-00; Tue, 27 May 2003 23:04:58 -0700 Message-ID: <3ED45145.5389980@mindspring.com> Date: Tue, 27 May 2003 23:03:49 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Igor Sysoev References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a44ca7e6f85b35fe7a8b164c831da3d180350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c cc: arch@freebsd.org Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 May 2003 06:05:02 -0000 Igor Sysoev wrote: > > I don't think overhead is the issue, at this point: say we agree > > with you on overhead, for your particular application, and we are > > not against you solving your overhead problem: why exactly does > > the API have to change to fix the root cause of the problem? > > I do not propose the change of the API, I propose the source and binary > compatible addition. The "Subject:" line says you want to add a flag. This is binary backward compatible, but it is not binary portable, and it is not source portable to systems that use your flag. What happens when you want to recompile or run your code that uses the new flag on NetBSD, Darwin, MacOS X, etc.? I'll tell you what happens: you get a compilation error with an undefined variable. -- Terry From owner-freebsd-arch@FreeBSD.ORG Wed May 28 01:08:32 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 07A6A37B404 for ; Wed, 28 May 2003 01:08:31 -0700 (PDT) Received: from park.rambler.ru (park.rambler.ru [81.19.64.101]) by mx1.FreeBSD.org (Postfix) with ESMTP id 295BC43FA3 for ; Wed, 28 May 2003 01:08:30 -0700 (PDT) (envelope-from is@rambler-co.ru) Received: from is.park.rambler.ru (is.park.rambler.ru [81.19.64.102]) by park.rambler.ru (8.12.6/8.12.6) with ESMTP id h4S88SmF037161; Wed, 28 May 2003 12:08:28 +0400 (MSD) Date: Wed, 28 May 2003 12:08:27 +0400 (MSD) From: Igor Sysoev X-Sender: is@is To: Terry Lambert In-Reply-To: <3ED45145.5389980@mindspring.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: arch@freebsd.org Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 May 2003 08:08:32 -0000 On Tue, 27 May 2003, Terry Lambert wrote: > Igor Sysoev wrote: > > > I don't think overhead is the issue, at this point: say we agree > > > with you on overhead, for your particular application, and we are > > > not against you solving your overhead problem: why exactly does > > > the API have to change to fix the root cause of the problem? > > > > I do not propose the change of the API, I propose the source and binary > > compatible addition. > > The "Subject:" line says you want to add a flag. This is > binary backward compatible, but it is not binary portable, > and it is not source portable to systems that use your flag. > > What happens when you want to recompile or run your code > that uses the new flag on NetBSD, Darwin, MacOS X, etc.? > > I'll tell you what happens: you get a compilation error > with an undefined variable. Really ? I think that on NetBSD, Darwin, and MacOS X I would get: ----- warning: implicit declaration of function `sendfile' ----- and then: ----- /tmp/ccQ50515.o(.text+0x7): undefined reference to `sendfile' ----- On Solaris and Linux I will get: ----- implicit declaration of function `sendfile' ----- if I do not #include (it's not exist on FreeBSD) or ----- too many arguments to function `sendfile' ----- sendfile() is very and very unportable interface. If developer want to use the lowest common things then he should use #defines that emulate Linux's interface - it sends file only. And #define can hide any flags. If developer want to use maximum features that available on the platform then he should use a huge wrapper something like Apache 2.0's one: http://cvs.apache.org/viewcvs.cgi/apr/network_io/unix/sendrecv.c Igor Sysoev http://sysoev.ru/en/ From owner-freebsd-arch@FreeBSD.ORG Wed May 28 02:11:36 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D9DC537B401 for ; Wed, 28 May 2003 02:11:36 -0700 (PDT) Received: from stork.mail.pas.earthlink.net (stork.mail.pas.earthlink.net [207.217.120.188]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3158243F3F for ; Wed, 28 May 2003 02:11:36 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from user-2ivfkad.dialup.mindspring.com ([165.247.209.77] helo=mindspring.com) by stork.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 19Kwy0-0003jj-00; Wed, 28 May 2003 02:11:28 -0700 Message-ID: <3ED47CAA.30B03B8E@mindspring.com> Date: Wed, 28 May 2003 02:08:58 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Igor Sysoev References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4581dbb2020a7ccf73f2b9375a3d93cb3387f7b89c61deb1d350badd9bab72f9c350badd9bab72f9c cc: arch@freebsd.org Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 May 2003 09:11:37 -0000 Igor Sysoev wrote: > Really ? I think that on NetBSD, Darwin, and MacOS X I would get: > ----- > warning: implicit declaration of function `sendfile' I think on NetBSD and OpenBSD, a single search-engine query would show you three experimental implementations, all of which have the FreeBSD syntax. The Darwin/MacOS X is a no-brainer: someone will get around to it eventually; the big barrier is external mbufs, and those are really trivial to implement (IMO; I've done it on three separate occasions in different code bases, now). > On Solaris and Linux I will get: > ----- > too many arguments to function `sendfile' Yes, the argument lists aren't the same. AIX and MVS both have identical interfaces, though. > sendfile() is very and very unportable interface. I have no doubt that sendfile(2) will eventually be standardized by some well-intentioned standards body, and that the standard will not include implementation-bug-based flags definitions. > And #define can hide any flags. Code written that way is not portable, it has been ported. There is a big difference. Why are you so dead-set on adding crufty flags, when three people who have been in that code before (I back-ported the external mbuf code and sendfile to FreeBSD 4.2 and 4.3 at one point; Matt has lived in that code; Peter had his nose in for quite a while; etc.) say that it's broken, and the correct thing to do is to fix it, not add a bunch of kludge code to work around the bugs that shouldn't be there in the first place? -- Terry From owner-freebsd-arch@FreeBSD.ORG Wed May 28 02:46:37 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 31D9537B401 for ; Wed, 28 May 2003 02:46:37 -0700 (PDT) Received: from park.rambler.ru (park.rambler.ru [81.19.64.101]) by mx1.FreeBSD.org (Postfix) with ESMTP id B750843FA3 for ; Wed, 28 May 2003 02:46:35 -0700 (PDT) (envelope-from is@rambler-co.ru) Received: from is.park.rambler.ru (is.park.rambler.ru [81.19.64.102]) by park.rambler.ru (8.12.6/8.12.6) with ESMTP id h4S9kWmF038952; Wed, 28 May 2003 13:46:32 +0400 (MSD) Date: Wed, 28 May 2003 13:46:32 +0400 (MSD) From: Igor Sysoev X-Sender: is@is To: Terry Lambert In-Reply-To: <3ED44F2D.DAF1FA08@mindspring.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: arch@freebsd.org Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 May 2003 09:46:37 -0000 On Tue, 27 May 2003, Terry Lambert wrote: > Igor Sysoev wrote: > > How do suppose to coelesce the file pages ? Wire two or more pages > > to mbuf's at once ? > > It's done by the network driver, using the network card's DMA's > scatter/gather. Network driver already gathers file pages in current sendfile() implementation that wire one file page at once. And all the same sendfile() sometimes sends the partially filled packets. If you wire two or more pages at once then sendfile() would send the partially filled packets more rare but it will sends them all the same. > NOTE: TCP_NOPUSH *specifically* mentions writev(2), which, like > sendfile(2), takes data from multiple discrete buffers and sends > it. I agree with you, but writev() takes data from the memory while sendfile() can read it from a disk - it's one of the cause of the partially filled packets in the middle of the file stream. TF_NOPUSH (internal TCP_NOPUSH representation) can be used to avoid it. Suppose you have one page in VM and you need to read the next pages from a disk. What would you do ? If you send this single page - it will go as 1460, 1460 and 1176. Igor Sysoev http://sysoev.ru/en/ From owner-freebsd-arch@FreeBSD.ORG Wed May 28 03:03:20 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7305337B401 for ; Wed, 28 May 2003 03:03:20 -0700 (PDT) Received: from park.rambler.ru (park.rambler.ru [81.19.64.101]) by mx1.FreeBSD.org (Postfix) with ESMTP id 200E743F93 for ; Wed, 28 May 2003 03:03:17 -0700 (PDT) (envelope-from is@rambler-co.ru) Received: from is.park.rambler.ru (is.park.rambler.ru [81.19.64.102]) by park.rambler.ru (8.12.6/8.12.6) with ESMTP id h4SA3FmF039232; Wed, 28 May 2003 14:03:15 +0400 (MSD) Date: Wed, 28 May 2003 14:03:15 +0400 (MSD) From: Igor Sysoev X-Sender: is@is To: Terry Lambert In-Reply-To: <3ED47CAA.30B03B8E@mindspring.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: arch@freebsd.org Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 May 2003 10:03:20 -0000 On Wed, 28 May 2003, Terry Lambert wrote: > Igor Sysoev wrote: > > Really ? I think that on NetBSD, Darwin, and MacOS X I would get: > > ----- > > warning: implicit declaration of function `sendfile' > > I think on NetBSD and OpenBSD, a single search-engine query > would show you three experimental implementations, all of > which have the FreeBSD syntax. I did not found any. > The Darwin/MacOS X is a no-brainer: someone will get around > to it eventually; the big barrier is external mbufs, and > those are really trivial to implement (IMO; I've done it on > three separate occasions in different code bases, now). If someone will eventually implement on NetBSD, OpenBSD or Darwin/MacOS X the FreeBSD compatible sendfile() then he can simply ignore any unsupported flags. As well as FreeBSD's rfork() implementation ignores some plan9 flags. > > On Solaris and Linux I will get: > > ----- > > too many arguments to function `sendfile' > > Yes, the argument lists aren't the same. AIX and MVS both > have identical interfaces, though. But different with FreeBSD, right ? It was be very strange if IBM made different send_file() interfaces for its own systems. > > sendfile() is very and very unportable interface. > > I have no doubt that sendfile(2) will eventually be standardized > by some well-intentioned standards body, and that the standard > will not include implementation-bug-based flags definitions. So what ? Developer would wrote yet more #define or wrapper for POSIX sendfile(). > > And #define can hide any flags. > > Code written that way is not portable, it has been ported. > There is a big difference. Well, but it's the same thing I told you. If developer want to use sendfile() he should be ready that he needs to port its source. He can make easy port with #define or harder port with wrapper. > Why are you so dead-set on adding crufty flags, when three > people who have been in that code before (I back-ported the > external mbuf code and sendfile to FreeBSD 4.2 and 4.3 at > one point; Matt has lived in that code; Peter had his nose > in for quite a while; etc.) say that it's broken, and the > correct thing to do is to fix it, not add a bunch of kludge > code to work around the bugs that shouldn't be there in the > first place? Well, how do your code handle the partially filled file packets ? Igor Sysoev http://sysoev.ru/en/ From owner-freebsd-arch@FreeBSD.ORG Wed May 28 08:59:31 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id DE5DA37B40A for ; Wed, 28 May 2003 08:59:31 -0700 (PDT) Received: from stork.mail.pas.earthlink.net (stork.mail.pas.earthlink.net [207.217.120.188]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3CBE843F93 for ; Wed, 28 May 2003 08:59:31 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from user-38ldt4s.dialup.mindspring.com ([209.86.244.156] helo=mindspring.com) by stork.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 19L3Kp-0007XY-00; Wed, 28 May 2003 08:59:28 -0700 Message-ID: <3ED4DC93.42A44D09@mindspring.com> Date: Wed, 28 May 2003 08:58:11 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Igor Sysoev References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4ebd4ec01ae3a2897914024d8cf60a40e3ca473d225a0f487350badd9bab72f9c350badd9bab72f9c cc: arch@freebsd.org Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 May 2003 15:59:32 -0000 Igor Sysoev wrote: > On Tue, 27 May 2003, Terry Lambert wrote: > > NOTE: TCP_NOPUSH *specifically* mentions writev(2), which, like > > sendfile(2), takes data from multiple discrete buffers and sends > > it. > > I agree with you, but writev() takes data from the memory while > sendfile() can read it from a disk - it's one of the cause of the partially > filled packets in the middle of the file stream. TF_NOPUSH (internal > TCP_NOPUSH representation) can be used to avoid it. The writev() takes it from memory... and sendfile() takes it from memory. The only difference is whether the memory that is referred to by the mbuf headers is from the program's address space, and copied into an mbuf in the kernel's address space, or is an external mbuf referred to by an sf_buf, and in the kernel's address space because it's in the buffer cache. > Suppose you have one page in VM and you need to read the next pages > from a disk. What would you do ? If you send this single page - it > will go as 1460, 1460 and 1176. Only if I set stupidly set TCP_NODELAY on the socket, which I have to go out of my way to do. If I can't read the next block off the disk, wire it, and set up an EXT_SFBUF for it in 2MSL, there's something seriously wrong in the OS. 2MSL is a *very* long time on modern systems. The "problem" is the call to: error = (*so->so_proto->pr_usrreqs->pru_send)(so, 0, m, 0, 0, td); in sendfile(2) in uipc_syscalls.c, in the case where it's not true that: (sbspace(&so->so_snd) >= so->so_snd.sb_lowat) ...or, more specifically, that it's effectively sent TCP_NODELAY. You'll notice that the page is only unwired when the external mbuf is freed. -- Terry From owner-freebsd-arch@FreeBSD.ORG Wed May 28 09:13:49 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B100E37B401 for ; Wed, 28 May 2003 09:13:49 -0700 (PDT) Received: from stork.mail.pas.earthlink.net (stork.mail.pas.earthlink.net [207.217.120.188]) by mx1.FreeBSD.org (Postfix) with ESMTP id 165EE43F75 for ; Wed, 28 May 2003 09:13:49 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from user-38ldt4s.dialup.mindspring.com ([209.86.244.156] helo=mindspring.com) by stork.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 19L3Yc-0003Hl-00; Wed, 28 May 2003 09:13:43 -0700 Message-ID: <3ED4DFF2.51E40894@mindspring.com> Date: Wed, 28 May 2003 09:12:34 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Igor Sysoev References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a41cf1fa1a29f39f5147b44993b752c63a548b785378294e88350badd9bab72f9c350badd9bab72f9c cc: arch@freebsd.org Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 May 2003 16:13:50 -0000 Igor Sysoev wrote: > On Wed, 28 May 2003, Terry Lambert wrote: > > Igor Sysoev wrote: > > > Really ? I think that on NetBSD, Darwin, and MacOS X I would get: > > > ----- > > > warning: implicit declaration of function `sendfile' > > > > I think on NetBSD and OpenBSD, a single search-engine query > > would show you three experimental implementations, all of > > which have the FreeBSD syntax. > > I did not found any. Are you using an academically good search engine, or are you using a commercial one, like Google and Yahoo? There are two projects on NetBSD, one is rather stale, but it's on the NetBSD "Projects Page"; the other was done by a Japanese researcher into zero copy, whi has suggested that his work be used to implement a splice(2) call as well. The OpenBSD work was done in the context of their zro-copy "zbuf" implementation, as a proof of concept. I suggest the search terms "sendfile" and "openbsd", or the terms "sendfile", "netbsd", "splice". Even Google finds some mailing list chatter about those (the interesting one is by Jason Thorpe from Wasabi Systems, and is in Japanese). > > The Darwin/MacOS X is a no-brainer: someone will get around > > to it eventually; the big barrier is external mbufs, and > > those are really trivial to implement (IMO; I've done it on > > three separate occasions in different code bases, now). > > If someone will eventually implement on NetBSD, OpenBSD or Darwin/MacOS X > the FreeBSD compatible sendfile() then he can simply ignore any unsupported > flags. And have any code that uses your new flags not compile for lack of a definition of those flags in a header file. > As well as FreeBSD's rfork() implementation ignores some plan9 flags. FreeBSD's rfork() predates plan9. Sequent's rfork() predates FreeBSD's. > > Yes, the argument lists aren't the same. AIX and MVS both > > have identical interfaces, though. > > But different with FreeBSD, right ? FreeBSD is gratuitously different. You are planning on making it *more* gratuitously different. That's probably a bad thing. > > > sendfile() is very and very unportable interface. > > > > I have no doubt that sendfile(2) will eventually be standardized > > by some well-intentioned standards body, and that the standard > > will not include implementation-bug-based flags definitions. > > So what ? Developer would wrote yet more #define or wrapper for > POSIX sendfile(). Or... here's a thought... people would change the FreeBSD implementation to comply with international standards, and programmers would write only to the POSIX interface. > > Why are you so dead-set on adding crufty flags, when three > > people who have been in that code before (I back-ported the > > external mbuf code and sendfile to FreeBSD 4.2 and 4.3 at > > one point; Matt has lived in that code; Peter had his nose > > in for quite a while; etc.) say that it's broken, and the > > correct thing to do is to fix it, not add a bunch of kludge > > code to work around the bugs that shouldn't be there in the > > first place? > > Well, how do your code handle the partially filled file packets ? Here we get to the nub of things, then. You are being obstinate not because you really want the flags, but because you want someone else to do the actual work of fixing sendfile for you. There's probably no reason to continue this conversation, unless you are going to insist on it. Five people have told you the correct approach is to fix sendfile(2), and you have ignored all of us. Three people, not all the same people as before, have told you that if you could measure a statistically significant performance improvement from your proposed changes, we wouldn't object to your code going in, even though it breaks source compatability further. Other than writing the correct code for you, there's little else we can do, and I, at least, have other code I need to write, to solve my own problems. -- Terry From owner-freebsd-arch@FreeBSD.ORG Wed May 28 10:55:53 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9A2D737B401 for ; Wed, 28 May 2003 10:55:53 -0700 (PDT) Received: from linux.research.att.com (H-135-207-24-16.research.att.com [135.207.24.16]) by mx1.FreeBSD.org (Postfix) with ESMTP id 907E943FAF for ; Wed, 28 May 2003 10:55:44 -0700 (PDT) (envelope-from fenner@research.att.com) Received: from unixmail.research.att.com (unixmail.research.att.com [135.207.26.71])h4SHvLEI031720; Wed, 28 May 2003 13:57:21 -0400 Received: from windsor.research.att.com (windsor.research.att.com [135.207.26.46])h4SHsFXs021875; Wed, 28 May 2003 13:54:15 -0400 (EDT) From: Bill Fenner Received: (from fenner@localhost) by windsor.research.att.com (8.11.6+Sun/8.8.5) id h4SHtbu05504; Wed, 28 May 2003 10:55:37 -0700 (PDT) Message-Id: <200305281755.h4SHtbu05504@windsor.research.att.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII To: is@rambler-co.ru Date: Wed, 28 May 2003 10:55:37 -0700 Versions: dmail (solaris) 2.5a/makemail 2.9d cc: arch@freebsd.org Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 May 2003 17:55:53 -0000 Why not set PRUS_MORETOCOME on all but the final pru_send() call? Bill From owner-freebsd-arch@FreeBSD.ORG Wed May 28 11:20:31 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 074E737B401 for ; Wed, 28 May 2003 11:20:31 -0700 (PDT) Received: from gateway.posi.net (adsl-63-201-91-11.dsl.snfc21.pacbell.net [63.201.91.11]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1C6E543FA3 for ; Wed, 28 May 2003 11:20:28 -0700 (PDT) (envelope-from kbyanc@posi.net) Received: from localhost (localhost [127.0.0.1]) by gateway.posi.net (8.12.6/8.12.8) with ESMTP id h4SIKLYl035314; Wed, 28 May 2003 11:20:22 -0700 (PDT) (envelope-from kbyanc@posi.net) Date: Wed, 28 May 2003 11:20:21 -0700 (PDT) From: Kelly Yancey To: Terry Lambert In-Reply-To: <3ED4DFF2.51E40894@mindspring.com> Message-ID: <20030528110651.A35278-100000@gateway.posi.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: arch@freebsd.org Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 May 2003 18:20:31 -0000 On Wed, 28 May 2003, Terry Lambert wrote: > Igor Sysoev wrote: > > On Wed, 28 May 2003, Terry Lambert wrote: > > > Igor Sysoev wrote: > > > > Really ? I think that on NetBSD, Darwin, and MacOS X I would get: > > > > ----- > > > > warning: implicit declaration of function `sendfile' > > > > > > I think on NetBSD and OpenBSD, a single search-engine query > > > would show you three experimental implementations, all of > > > which have the FreeBSD syntax. > > > > I did not found any. > > Are you using an academically good search engine, or are you > using a commercial one, like Google and Yahoo? > Are you going to tell us where we can find this academically good search engine, or are we to play charades? > I suggest the search terms "sendfile" and "openbsd", or the > terms "sendfile", "netbsd", "splice". Even Google finds some > mailing list chatter about those (the interesting one is by > Jason Thorpe from Wasabi Systems, and is in Japanese). > Actually that isn't Japanese. Chinese (traditional) I would guess, but certainly not Japanese. Anyway, if you want a less obscure reference, perhaps the tech-kern mention of splice() with a reference to the USENIX abstract would have been more useful: http://mail-index.netbsd.org/tech-kern/1999/05/07/0008.html Kelly -- Kelly Yancey -- kbyanc@{posi.net,FreeBSD.org} -- kelly@nttmcl.com Join distributed.net Team FreeBSD: http://www.posi.net/freebsd/Team-FreeBSD/ From owner-freebsd-arch@FreeBSD.ORG Wed May 28 11:21:06 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C176B37B401 for ; Wed, 28 May 2003 11:21:06 -0700 (PDT) Received: from park.rambler.ru (park.rambler.ru [81.19.64.101]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3134443FA3 for ; Wed, 28 May 2003 11:21:03 -0700 (PDT) (envelope-from is@rambler-co.ru) Received: from is.park.rambler.ru (is.park.rambler.ru [81.19.64.102]) by park.rambler.ru (8.12.6/8.12.6) with ESMTP id h4SIL0mF049551; Wed, 28 May 2003 22:21:00 +0400 (MSD) Date: Wed, 28 May 2003 22:21:00 +0400 (MSD) From: Igor Sysoev X-Sender: is@is To: Terry Lambert In-Reply-To: <3ED4DFF2.51E40894@mindspring.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: arch@freebsd.org Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 May 2003 18:21:07 -0000 On Wed, 28 May 2003, Terry Lambert wrote: > Igor Sysoev wrote: > > On Wed, 28 May 2003, Terry Lambert wrote: > > > Why are you so dead-set on adding crufty flags, when three > > > people who have been in that code before (I back-ported the > > > external mbuf code and sendfile to FreeBSD 4.2 and 4.3 at > > > one point; Matt has lived in that code; Peter had his nose > > > in for quite a while; etc.) say that it's broken, and the > > > correct thing to do is to fix it, not add a bunch of kludge > > > code to work around the bugs that shouldn't be there in the > > > first place? > > > > Well, how do your code handle the partially filled file packets ? > > Here we get to the nub of things, then. You are being > obstinate not because you really want the flags, but > because you want someone else to do the actual work of > fixing sendfile for you. No, I do want these flags because they resolve the problem of partially filled packets. I believe that this problem can be solved without a fixing the sendfile() implementation. The real argument was Peter Jeremy's one that overhead of flags vs. overhead of setsockopt() is negligible after all. The portability argument is bogus because sendfile() portability is nonsense. The drawback that really annoyed me is that sendfile() blocks on a reading from a disk while a sending to non-blocking socket. Although I see three workarounds it's much better to fix this inside sendfile(). > There's probably no reason to continue this conversation, > unless you are going to insist on it. Five people have > told you the correct approach is to fix sendfile(2), and > you have ignored all of us. Three people, not all the same > people as before, have told you that if you could measure a > statistically significant performance improvement from your > proposed changes, we wouldn't object to your code going in, > even though it breaks source compatability further. Five people ? I probably missed something - there were only four people except me that discussed this subject. > Other than writing the correct code for you, there's little > else we can do, and I, at least, have other code I need to > write, to solve my own problems. Well, Terry, write your code and I will write my own. Igor Sysoev http://sysoev.ru/en/ From owner-freebsd-arch@FreeBSD.ORG Wed May 28 11:35:59 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7C4F837B401 for ; Wed, 28 May 2003 11:35:59 -0700 (PDT) Received: from rhombus.znep.com (sense-sea-MegaSub-1-507.oz.net [216.39.145.253]) by mx1.FreeBSD.org (Postfix) with ESMTP id D2F9F43F75 for ; Wed, 28 May 2003 11:35:58 -0700 (PDT) (envelope-from marcs@znep.com) Received: by rhombus.znep.com (Postfix, from userid 1000) id DD25E1A291; Wed, 28 May 2003 11:35:57 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by rhombus.znep.com (Postfix) with ESMTP id C2F7A1AAA5; Wed, 28 May 2003 11:35:57 -0700 (PDT) Date: Wed, 28 May 2003 11:35:57 -0700 (PDT) From: Marc Slemko To: Igor Sysoev In-Reply-To: Message-ID: References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: arch@freebsd.org Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 May 2003 18:35:59 -0000 On Wed, 28 May 2003, Igor Sysoev wrote: > No, I do want these flags because they resolve the problem of partially > filled packets. I believe that this problem can be solved without a fixing > the sendfile() implementation. As people have said a few times now, making an API change to work around a bug in the implementation of sendfile() simply doesn't make any sense, especially when there are other workarounds you can use until it is fixed that impose a very low overhead. No one is saying it can't be solved without fixing sendfile(), we are just saying it _shouldn't_ be because any API changes will be around for a very long time. From owner-freebsd-arch@FreeBSD.ORG Wed May 28 12:23:17 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6138537B401 for ; Wed, 28 May 2003 12:23:17 -0700 (PDT) Received: from smtp1.server.rpi.edu (smtp1.server.rpi.edu [128.113.2.1]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9A65943F3F for ; Wed, 28 May 2003 12:23:16 -0700 (PDT) (envelope-from drosih@rpi.edu) Received: from [128.113.24.47] (gilead.netel.rpi.edu [128.113.24.47]) by smtp1.server.rpi.edu (8.12.9/8.12.9) with ESMTP id h4SJNFCS002189; Wed, 28 May 2003 15:23:15 -0400 Mime-Version: 1.0 X-Sender: drosih@mail.rpi.edu Message-Id: In-Reply-To: References: Date: Wed, 28 May 2003 15:23:14 -0400 To: Marc Slemko , Igor Sysoev From: Garance A Drosihn Content-Type: text/plain; charset="us-ascii" ; format="flowed" X-Scanned-By: MIMEDefang 2.28 cc: arch@freebsd.org Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 May 2003 19:23:17 -0000 At 11:35 AM -0700 5/28/03, Marc Slemko wrote: >On Wed, 28 May 2003, Igor Sysoev wrote: > > > No, I do want these flags because they resolve the problem of > > partially filled packets. I believe that this problem can be > > solved without a fixing the sendfile() implementation. > >As people have said a few times now, making an API change to work >around a bug in the implementation of sendfile() simply doesn't >make any sense, especially when there are other workarounds you >can use until it is fixed that impose a very low overhead. No >one is saying it can't be solved without fixing sendfile(), we >are just saying it _shouldn't_ be because any API changes will >be around for a very long time. For what it's worth, the debate so far has not convinced me that there would be enough benefit from this API change to bother with it. If you (Igor) wanted to write up the code and some good benchmarks to prove a significant performance boost, that would probably help many of us who are just watching the debate go by. So far I've just seen that you really really want it, while some pretty reasonable arguments have been made against making an API change for this. At the moment, I would side with the people saying "do not make the API change", particularly if it's just to hide a bug in sendfile(). -- Garance Alistair Drosehn = gad@gilead.netel.rpi.edu Senior Systems Programmer or gad@freebsd.org Rensselaer Polytechnic Institute or drosih@rpi.edu From owner-freebsd-arch@FreeBSD.ORG Wed May 28 12:56:27 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9073A37B401 for ; Wed, 28 May 2003 12:56:27 -0700 (PDT) Received: from park.rambler.ru (park.rambler.ru [81.19.64.101]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2047643F3F for ; Wed, 28 May 2003 12:56:26 -0700 (PDT) (envelope-from is@rambler-co.ru) Received: from is.park.rambler.ru (is.park.rambler.ru [81.19.64.102]) by park.rambler.ru (8.12.6/8.12.6) with ESMTP id h4SJuOmF050926; Wed, 28 May 2003 23:56:24 +0400 (MSD) Date: Wed, 28 May 2003 23:56:24 +0400 (MSD) From: Igor Sysoev X-Sender: is@is To: Bill Fenner In-Reply-To: <200305281755.h4SHtbu05504@windsor.research.att.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: arch@freebsd.org Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 May 2003 19:56:27 -0000 On Wed, 28 May 2003, Bill Fenner wrote: > Why not set PRUS_MORETOCOME on all but the final pru_send() call? I think it's a good solution. We can 1) use TF_NOPUSH to coalesce the header and the first file part or the last file part and the trailer as Matthew Dillon suggested; 2) use the same TF_NOPUSH to postpone the sending partial packet while a reading file page from the disk; 3) and use PRUS_MORETOCOME to avoid partial packet after EAGAIN on non-blocking socket. ---------- sendfile() { saved = tp->t_flags & TF_NOPUSH; if (header) { writev(header); } if (file) { send file pages with PRUS_MORETOCOME } if (trailer) { writev(trailer); } done: if (sendfile completed) { tp->t_flags &= ~TF_MORETOCOME; } tp->t_flags |= saved; if (saved & TF_NOPUSH) { tcp_output(tp); } } ---------- As I understand TF_MORETOCOME should postpone the sending the partially filled packet after sendfile will return EAGAIN. Igor Sysoev http://sysoev.ru/en/ From owner-freebsd-arch@FreeBSD.ORG Wed May 28 13:02:34 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id DA77137B401 for ; Wed, 28 May 2003 13:02:34 -0700 (PDT) Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id 628F943FBF for ; Wed, 28 May 2003 13:02:32 -0700 (PDT) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (localhost [127.0.0.1]) by apollo.backplane.com (8.12.9/8.12.6) with ESMTP id h4SK2WVI073965; Wed, 28 May 2003 13:02:32 -0700 (PDT) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.12.9/8.12.6/Submit) id h4SK2WUr073964; Wed, 28 May 2003 13:02:32 -0700 (PDT) Date: Wed, 28 May 2003 13:02:32 -0700 (PDT) From: Matthew Dillon Message-Id: <200305282002.h4SK2WUr073964@apollo.backplane.com> To: Bill Fenner References: <200305281755.h4SHtbu05504@windsor.research.att.com> cc: arch@freebsd.org Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 May 2003 20:02:35 -0000 : :Why not set PRUS_MORETOCOME on all but the final pru_send() call? : : Bill An excellent idea, Bill, it would work. Some additional modifications would have to be done to the fo_write() and writev() interfaces but it looks quite reasonable (and non-hackish) to me. A new FOF_ flag would have to be added to allow the caller of fo_write() to specify that there is more data to come, e.g. FOF_MORETOCOME, which would be translated to PRUS_MORETOCOME in sosend(). writev() would have to be split into a writev() syscall and a do_writev() implementation instead of the two being combined like they are now. Then do_sendfile() could call the do_writev() implementation in order to pass additional flags (aka FOF_MORETOCOME) to it, rather then call the writev() sys call. Additionally, the writev() implementation could set FOF_MORETOCOME for all but the last iovec under normal conditions (and use the passed flag for the last iovec). This would actually improve any C code that uses writev() on sockets regardless of whether sendfile() is fixed or not. I'm afraid I do not have time to actually implement this right now, but I think it's simple enough that virtually any kernel programmer could do it in a day or less. I think these changes would be an excellent and non-hackish addition to FreeBSD. -Matt Matthew Dillon From owner-freebsd-arch@FreeBSD.ORG Wed May 28 13:10:39 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 71A8237B401 for ; Wed, 28 May 2003 13:10:39 -0700 (PDT) Received: from mailman.research.att.com (H-135-207-24-32.research.att.com [135.207.24.32]) by mx1.FreeBSD.org (Postfix) with ESMTP id 82D4E43F3F for ; Wed, 28 May 2003 13:10:38 -0700 (PDT) (envelope-from fenner@research.att.com) Received: from unixmail.research.att.com (unixmail.research.att.com [135.207.26.71])h4SK6D3j011652; Wed, 28 May 2003 16:06:13 -0400 Received: from windsor.research.att.com (windsor.research.att.com [135.207.26.46])h4SK9CXs027865; Wed, 28 May 2003 16:09:12 -0400 (EDT) From: Bill Fenner Received: (from fenner@localhost) by windsor.research.att.com (8.11.6+Sun/8.8.5) id h4SKAZh07220; Wed, 28 May 2003 13:10:35 -0700 (PDT) Message-Id: <200305282010.h4SKAZh07220@windsor.research.att.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII To: is@rambler-co.ru Date: Wed, 28 May 2003 13:10:34 -0700 Versions: dmail (solaris) 2.5a/makemail 2.9d cc: arch@freebsd.org Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 May 2003 20:10:39 -0000 >1) use TF_NOPUSH to coalesce the header and the first file part > or the last file part and the trailer as Matthew Dillon suggested; Is this really necessary? >2) use the same TF_NOPUSH to postpone the sending partial packet while > a reading file page from the disk; PRUS_MORETOCOME does this too, and is quite a bit less hacky. >3) and use PRUS_MORETOCOME to avoid partial packet after EAGAIN > on non-blocking socket. Yup. Bill From owner-freebsd-arch@FreeBSD.ORG Wed May 28 22:11:09 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3437137B401 for ; Wed, 28 May 2003 22:11:09 -0700 (PDT) Received: from stork.mail.pas.earthlink.net (stork.mail.pas.earthlink.net [207.217.120.188]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9F7E943FA3 for ; Wed, 28 May 2003 22:11:08 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from user-38lc0vo.dialup.mindspring.com ([209.86.3.248] helo=mindspring.com) by stork.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 19LFgv-0001ch-00; Wed, 28 May 2003 22:11:05 -0700 Message-ID: <3ED5961E.5DE0F41B@mindspring.com> Date: Wed, 28 May 2003 22:09:50 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Bill Fenner References: <200305281755.h4SHtbu05504@windsor.research.att.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a43029615e63b1f0affca9d6ea4bb7f791350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c cc: arch@freebsd.org Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 May 2003 05:11:09 -0000 Bill Fenner wrote: > Why not set PRUS_MORETOCOME on all but the final pru_send() call? If the file is larger than `sysctl net.inet.tcp.sendspace`, then this code in do_sendfile(): if (sbspace(&so->so_snd) < so->so_snd.sb_lowat) { if (so->so_state & SS_NBIO) { m_freem(m); sbunlock(&so->so_snd); splx(s); error = EAGAIN; goto done; } error = sbwait(&so->so_snd); will result in you sleeping with PRUS_MORETOCOME set, but with no more being sent because the send buffer doesn't get emptied, as it's waiting for more data to send. -- Terry From owner-freebsd-arch@FreeBSD.ORG Wed May 28 22:12:40 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 637C837B401 for ; Wed, 28 May 2003 22:12:40 -0700 (PDT) Received: from park.rambler.ru (park.rambler.ru [81.19.64.101]) by mx1.FreeBSD.org (Postfix) with ESMTP id E097043F85 for ; Wed, 28 May 2003 22:12:38 -0700 (PDT) (envelope-from is@rambler-co.ru) Received: from is.park.rambler.ru (is.park.rambler.ru [81.19.64.102]) by park.rambler.ru (8.12.6/8.12.6) with ESMTP id h4T5CamF059940; Thu, 29 May 2003 09:12:36 +0400 (MSD) Date: Thu, 29 May 2003 09:12:36 +0400 (MSD) From: Igor Sysoev X-Sender: is@is To: Bill Fenner In-Reply-To: <200305282010.h4SKAZh07220@windsor.research.att.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: arch@freebsd.org Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 May 2003 05:12:40 -0000 On Wed, 28 May 2003, Bill Fenner wrote: > >1) use TF_NOPUSH to coalesce the header and the first file part > > or the last file part and the trailer as Matthew Dillon suggested; > Is this really necessary? What exactly ? The use of TF_NOPUSH or a coalescence ? If the former then it's a simple hackish solutiun without a modifyng writev(). > >2) use the same TF_NOPUSH to postpone the sending partial packet while > > a reading file page from the disk; > PRUS_MORETOCOME does this too, and is quite a bit less hacky. Yes, of course both flags do this. Igor Sysoev http://sysoev.ru/en/ From owner-freebsd-arch@FreeBSD.ORG Wed May 28 22:22:57 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 043E937B401 for ; Wed, 28 May 2003 22:22:57 -0700 (PDT) Received: from park.rambler.ru (park.rambler.ru [81.19.64.101]) by mx1.FreeBSD.org (Postfix) with ESMTP id 09AB343F3F for ; Wed, 28 May 2003 22:22:56 -0700 (PDT) (envelope-from is@rambler-co.ru) Received: from is.park.rambler.ru (is.park.rambler.ru [81.19.64.102]) by park.rambler.ru (8.12.6/8.12.6) with ESMTP id h4T5MsmF060090; Thu, 29 May 2003 09:22:54 +0400 (MSD) Date: Thu, 29 May 2003 09:22:54 +0400 (MSD) From: Igor Sysoev X-Sender: is@is To: Terry Lambert In-Reply-To: <3ED5961E.5DE0F41B@mindspring.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: arch@freebsd.org Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 May 2003 05:22:57 -0000 On Wed, 28 May 2003, Terry Lambert wrote: > Bill Fenner wrote: > > Why not set PRUS_MORETOCOME on all but the final pru_send() call? > > If the file is larger than `sysctl net.inet.tcp.sendspace`, then > this code in do_sendfile(): > > if (sbspace(&so->so_snd) < so->so_snd.sb_lowat) { > if (so->so_state & SS_NBIO) { > m_freem(m); > sbunlock(&so->so_snd); > splx(s); > error = EAGAIN; > goto done; > } > error = sbwait(&so->so_snd); > > will result in you sleeping with PRUS_MORETOCOME set, but with > no more being sent because the send buffer doesn't get emptied, > as it's waiting for more data to send. But as I understand PRUS_MORETOCOME is not set if socket is non-blocking. Igor Sysoev http://sysoev.ru/en/ From owner-freebsd-arch@FreeBSD.ORG Wed May 28 22:31:14 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id AC06537B401 for ; Wed, 28 May 2003 22:31:14 -0700 (PDT) Received: from stork.mail.pas.earthlink.net (stork.mail.pas.earthlink.net [207.217.120.188]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2206643FAF for ; Wed, 28 May 2003 22:31:14 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from user-38lc0vo.dialup.mindspring.com ([209.86.3.248] helo=mindspring.com) by stork.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 19LG0M-0004YA-00; Wed, 28 May 2003 22:31:10 -0700 Message-ID: <3ED59AD7.AA0CA6D5@mindspring.com> Date: Wed, 28 May 2003 22:29:59 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Igor Sysoev References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4213aeeae5ba80b876fdce354690a3ebd667c3043c0873f7e350badd9bab72f9c350badd9bab72f9c cc: arch@freebsd.org Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 May 2003 05:31:15 -0000 Igor Sysoev wrote: > No, I do want these flags because they resolve the problem of partially > filled packets. I believe that this problem can be solved without a fixing > the sendfile() implementation. By kludging it, you mean. > The portability argument is bogus because sendfile() portability is nonsense. Darwin has sendfile. See the released source code: it matches the FreeBSD semantics, from what I can tell. > The drawback that really annoyed me is that sendfile() blocks on a reading > from a disk while a sending to non-blocking socket. Although I see three > workarounds it's much better to fix this inside sendfile(). There's no workaround for the latency issue, which comes from the fact that a trap handles the request for more pages, and that blocks all callers. Threads has the same problem in libc_r. > Five people ? Bill Fenner, Matt Dillon, Peter Jeremy, Marc Slemko, Terry Lambert, Garance Droshin. > > Other than writing the correct code for you, there's little > > else we can do, and I, at least, have other code I need to > > write, to solve my own problems. > > Well, Terry, write your code and I will write my own. Fine. Just don't ask us to run it. -- Terry From owner-freebsd-arch@FreeBSD.ORG Wed May 28 22:57:38 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3F61737B401 for ; Wed, 28 May 2003 22:57:38 -0700 (PDT) Received: from stork.mail.pas.earthlink.net (stork.mail.pas.earthlink.net [207.217.120.188]) by mx1.FreeBSD.org (Postfix) with ESMTP id C04CB43F85 for ; Wed, 28 May 2003 22:57:37 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from user-38lc0vo.dialup.mindspring.com ([209.86.3.248] helo=mindspring.com) by stork.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 19LGPv-0007kO-00; Wed, 28 May 2003 22:57:35 -0700 Message-ID: <3ED5A105.26F528A6@mindspring.com> Date: Wed, 28 May 2003 22:56:21 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Igor Sysoev References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4f159af4a743a800703a6f20e5737617093caf27dac41a8fd350badd9bab72f9c350badd9bab72f9c cc: arch@freebsd.org Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 May 2003 05:57:38 -0000 Igor Sysoev wrote: > > will result in you sleeping with PRUS_MORETOCOME set, but with > > no more being sent because the send buffer doesn't get emptied, > > as it's waiting for more data to send. > > But as I understand PRUS_MORETOCOME is not set if socket is non-blocking. Then the bug is still not fixed by setting it, since your total send size might be less than `sysctl net.inet.tcp.sendspace`. You guys should really just have a buffer finalize function that gets called on block/return cases, and impute the flags you need on the socket while it's being used for sendfile. -- Terry From owner-freebsd-arch@FreeBSD.ORG Thu May 29 00:53:08 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 89EF437B401 for ; Thu, 29 May 2003 00:53:08 -0700 (PDT) Received: from park.rambler.ru (park.rambler.ru [81.19.64.101]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1A6B843F75 for ; Thu, 29 May 2003 00:53:07 -0700 (PDT) (envelope-from is@rambler-co.ru) Received: from is.park.rambler.ru (is.park.rambler.ru [81.19.64.102]) by park.rambler.ru (8.12.6/8.12.6) with ESMTP id h4T7r5mF062090; Thu, 29 May 2003 11:53:05 +0400 (MSD) Date: Thu, 29 May 2003 11:53:05 +0400 (MSD) From: Igor Sysoev X-Sender: is@is To: Terry Lambert In-Reply-To: <3ED59AD7.AA0CA6D5@mindspring.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: arch@freebsd.org Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 May 2003 07:53:08 -0000 On Wed, 28 May 2003, Terry Lambert wrote: > Igor Sysoev wrote: > > The portability argument is bogus because sendfile() portability is nonsense. > > Darwin has sendfile. See the released source code: it matches > the FreeBSD semantics, from what I can tell. So now FreeBSD/Darwin is second pair after AIX/MVS that has the same sendfile() prototype. It surely improves the sendfile() portability. Undoubtedly. > > The drawback that really annoyed me is that sendfile() blocks on a reading > > from a disk while a sending to non-blocking socket. Although I see three > > workarounds it's much better to fix this inside sendfile(). > > There's no workaround for the latency issue, which comes from > the fact that a trap handles the request for more pages, and > that blocks all callers. Threads has the same problem in libc_r. The workaround idea is simple - a preloading. But implementation on user level is complex. In FreeBSD 4.x I see three ways: *) the use of aio_read() to read the single bytes; *) the use rfork()ed helper processes to read the single bytes; *) and the use the pool of rfork()ed processes to handle connections. But all of them requires significant changes of an application. > > Five people ? > > Bill Fenner, Matt Dillon, Peter Jeremy, Marc Slemko, Terry Lambert, > Garance Droshin. At time of your mail there were only 4 people, in order of appearance: Peter Jeremy, you, Matt Dillon, and Marc Slemko. Bill Fenner's email was sent one and a half hour after yours and just before my response. Garance Droshin's mail was sent several hours later. Igor Sysoev http://sysoev.ru/en/ From owner-freebsd-arch@FreeBSD.ORG Thu May 29 00:55:06 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 127E437B401 for ; Thu, 29 May 2003 00:55:06 -0700 (PDT) Received: from park.rambler.ru (park.rambler.ru [81.19.64.101]) by mx1.FreeBSD.org (Postfix) with ESMTP id BDED943F85 for ; Thu, 29 May 2003 00:55:04 -0700 (PDT) (envelope-from is@rambler-co.ru) Received: from is.park.rambler.ru (is.park.rambler.ru [81.19.64.102]) by park.rambler.ru (8.12.6/8.12.6) with ESMTP id h4T7t3mF062162; Thu, 29 May 2003 11:55:03 +0400 (MSD) Date: Thu, 29 May 2003 11:55:03 +0400 (MSD) From: Igor Sysoev X-Sender: is@is To: Terry Lambert In-Reply-To: <3ED5A105.26F528A6@mindspring.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: arch@freebsd.org Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 May 2003 07:55:06 -0000 On Wed, 28 May 2003, Terry Lambert wrote: > Igor Sysoev wrote: > > > will result in you sleeping with PRUS_MORETOCOME set, but with > > > no more being sent because the send buffer doesn't get emptied, > > > as it's waiting for more data to send. > > > > But as I understand PRUS_MORETOCOME is not set if socket is non-blocking. > > Then the bug is still not fixed by setting it, since your total > send size might be less than `sysctl net.inet.tcp.sendspace`. Why ? We can reset TF_MORETOCOME if the sending is completed. > You guys should really just have a buffer finalize function > that gets called on block/return cases, and impute the flags > you need on the socket while it's being used for sendfile. Igor Sysoev http://sysoev.ru/en/ From owner-freebsd-arch@FreeBSD.ORG Thu May 29 08:38:22 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 83D3B37B401 for ; Thu, 29 May 2003 08:38:22 -0700 (PDT) Received: from stork.mail.pas.earthlink.net (stork.mail.pas.earthlink.net [207.217.120.188]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1354243F75 for ; Thu, 29 May 2003 08:38:22 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from user-38lc1ds.dialup.mindspring.com ([209.86.5.188] helo=mindspring.com) by stork.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 19LPTp-0000YD-00; Thu, 29 May 2003 08:38:14 -0700 Message-ID: <3ED6291B.36F382F@mindspring.com> Date: Thu, 29 May 2003 08:36:59 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Igor Sysoev References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a40d3be225e75d3443d62eddb25e663b4d350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c cc: arch@freebsd.org Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 May 2003 15:38:22 -0000 Igor Sysoev wrote: > On Wed, 28 May 2003, Terry Lambert wrote: > > Igor Sysoev wrote: > > > > will result in you sleeping with PRUS_MORETOCOME set, but with > > > > no more being sent because the send buffer doesn't get emptied, > > > > as it's waiting for more data to send. > > > > > > But as I understand PRUS_MORETOCOME is not set if socket is non-blocking. > > > > Then the bug is still not fixed by setting it, since your total > > send size might be less than `sysctl net.inet.tcp.sendspace`. > > Why ? We can reset TF_MORETOCOME if the sending is completed. It's called a "deadly embrace" deadlock. Look it up. -- Terry From owner-freebsd-arch@FreeBSD.ORG Thu May 29 08:46:21 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D577537B401 for ; Thu, 29 May 2003 08:46:21 -0700 (PDT) Received: from stork.mail.pas.earthlink.net (stork.mail.pas.earthlink.net [207.217.120.188]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3F07843F85 for ; Thu, 29 May 2003 08:46:21 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from user-38lc1ds.dialup.mindspring.com ([209.86.5.188] helo=mindspring.com) by stork.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 19LPbd-0002Ma-00; Thu, 29 May 2003 08:46:18 -0700 Message-ID: <3ED62AFE.187A40F9@mindspring.com> Date: Thu, 29 May 2003 08:45:02 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Igor Sysoev References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4260130f6130a44fbd1d2699391779772350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c cc: arch@freebsd.org Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 May 2003 15:46:22 -0000 Igor Sysoev wrote: > On Wed, 28 May 2003, Terry Lambert wrote: > > Igor Sysoev wrote: > > > The portability argument is bogus because sendfile() portability is nonsense. > > > > Darwin has sendfile. See the released source code: it matches > > the FreeBSD semantics, from what I can tell. > > So now FreeBSD/Darwin is second pair after AIX/MVS that has the same sendfile() > prototype. It surely improves the sendfile() portability. Undoubtedly. FreeBSD, NetBSD, OpenBSD, Darwin. > > > The drawback that really annoyed me is that sendfile() blocks on a reading > > > from a disk while a sending to non-blocking socket. Although I see three > > > workarounds it's much better to fix this inside sendfile(). > > > > There's no workaround for the latency issue, which comes from > > the fact that a trap handles the request for more pages, and > > that blocks all callers. Threads has the same problem in libc_r. > > The workaround idea is simple - a preloading. But implementation on user > level is complex. In FreeBSD 4.x I see three ways: > > *) the use of aio_read() to read the single bytes; This does not fix the problem. See the extensive discussion, last month, about the differences between libthr and libc_r and libpthreads. Even when doing async I/O, you stall all threads in any model that isn't 1:1 when you fault for a user page in a system call. This is because a fault on a user page results in entering the trap handler, which suspends the calling program until such time as the fault is satisfied, or a SIGSEGV is raised to the caller because the fault cannot be satisfied. > *) the use rfork()ed helper processes to read the single bytes; > *) and the use the pool of rfork()ed processes to handle connections. We call this "libthr". > But all of them requires significant changes of an application. Yes, it does. > > > Five people ? > > > > Bill Fenner, Matt Dillon, Peter Jeremy, Marc Slemko, Terry Lambert, > > Garance Droshin. > > At time of your mail there were only 4 people, in order of appearance: > Peter Jeremy, you, Matt Dillon, and Marc Slemko. Bill Fenner's email > was sent one and a half hour after yours and just before my response. > Garance Droshin's mail was sent several hours later. Apparently, I received some mail that did not go to the list. However, if you are willing to include the contents of the "sendfile() broken" thread, I could add three more before your time deadline, including Bruce Evans. But more to the point, so far you are the only one who is not saying that sendfile() needs to be fixed, instead of kludged. -- Terry From owner-freebsd-arch@FreeBSD.ORG Thu May 29 09:20:52 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5C5B837B401 for ; Thu, 29 May 2003 09:20:52 -0700 (PDT) Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id D10BC43F85 for ; Thu, 29 May 2003 09:20:51 -0700 (PDT) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (localhost [127.0.0.1]) by apollo.backplane.com (8.12.9/8.12.6) with ESMTP id h4TGKpVI078313; Thu, 29 May 2003 09:20:51 -0700 (PDT) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.12.9/8.12.6/Submit) id h4TGKobi078312; Thu, 29 May 2003 09:20:50 -0700 (PDT) Date: Thu, 29 May 2003 09:20:50 -0700 (PDT) From: Matthew Dillon Message-Id: <200305291620.h4TGKobi078312@apollo.backplane.com> To: Terry Lambert References: <200305281755.h4SHtbu05504@windsor.research.att.com> <3ED5961E.5DE0F41B@mindspring.com> cc: arch@freebsd.org Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 May 2003 16:20:52 -0000 :Bill Fenner wrote: :> Why not set PRUS_MORETOCOME on all but the final pru_send() call? : :If the file is larger than `sysctl net.inet.tcp.sendspace`, then :this code in do_sendfile(): : : if (sbspace(&so->so_snd) < so->so_snd.sb_lowat) { : if (so->so_state & SS_NBIO) { : m_freem(m); : sbunlock(&so->so_snd); : splx(s); : error = EAGAIN; : goto done; : } : error = sbwait(&so->so_snd); : :will result in you sleeping with PRUS_MORETOCOME set, but with :no more being sent because the send buffer doesn't get emptied, :as it's waiting for more data to send. : :-- Terry Not unless the send buffer is substantially near the size of a single packet, which it isn't (it's far larger). PRUS_MORETOCOME is smarter then that, Terry. tcp_output() just uses it as a hint, it doesn't unconditionally hold off a flush. The code to refer to is netinet/tcp_output.c around line 313 (in stable), in tcp_output(). The section within the if (len) { ... } sequence. This section does all tests related to sending a packet and as you can see TF_MORETOCOME will not prevent the data from being flushed the more there is enough to fill a packet. -Matt Matthew Dillon From owner-freebsd-arch@FreeBSD.ORG Thu May 29 10:33:06 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5A7AC37B401 for ; Thu, 29 May 2003 10:33:06 -0700 (PDT) Received: from park.rambler.ru (park.rambler.ru [81.19.64.101]) by mx1.FreeBSD.org (Postfix) with ESMTP id D308143F85 for ; Thu, 29 May 2003 10:33:04 -0700 (PDT) (envelope-from is@rambler-co.ru) Received: from is.park.rambler.ru (is.park.rambler.ru [81.19.64.102]) by park.rambler.ru (8.12.6/8.12.6) with ESMTP id h4THX3mF075818; Thu, 29 May 2003 21:33:03 +0400 (MSD) Date: Thu, 29 May 2003 21:33:03 +0400 (MSD) From: Igor Sysoev X-Sender: is@is To: Terry Lambert In-Reply-To: <3ED6291B.36F382F@mindspring.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: arch@freebsd.org Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 May 2003 17:33:06 -0000 On Thu, 29 May 2003, Terry Lambert wrote: > Igor Sysoev wrote: > > On Wed, 28 May 2003, Terry Lambert wrote: > > > Igor Sysoev wrote: > > > > > will result in you sleeping with PRUS_MORETOCOME set, but with > > > > > no more being sent because the send buffer doesn't get emptied, > > > > > as it's waiting for more data to send. > > > > > > > > But as I understand PRUS_MORETOCOME is not set if socket is non-blocking. > > > > > > Then the bug is still not fixed by setting it, since your total > > > send size might be less than `sysctl net.inet.tcp.sendspace`. > > > > Why ? We can reset TF_MORETOCOME if the sending is completed. > > It's called a "deadly embrace" deadlock. Look it up. I misread you. I thought that sbwait() set PRUS_MORETOCOME itself. Nevertheless there would not be a deadlock because tcp_output() tests TF_MORETOCOME and TF_NOPSUH only the data is less than MSS. Otherwise tcp_output() can send it. Igor Sysoev http://sysoev.ru/en/ From owner-freebsd-arch@FreeBSD.ORG Thu May 29 10:35:30 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1C17037B401 for ; Thu, 29 May 2003 10:35:30 -0700 (PDT) Received: from park.rambler.ru (park.rambler.ru [81.19.64.101]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3C02543F85 for ; Thu, 29 May 2003 10:35:29 -0700 (PDT) (envelope-from is@rambler-co.ru) Received: from is.park.rambler.ru (is.park.rambler.ru [81.19.64.102]) by park.rambler.ru (8.12.6/8.12.6) with ESMTP id h4THZSmF075887; Thu, 29 May 2003 21:35:28 +0400 (MSD) Date: Thu, 29 May 2003 21:35:28 +0400 (MSD) From: Igor Sysoev X-Sender: is@is To: Terry Lambert In-Reply-To: <3ED62AFE.187A40F9@mindspring.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: arch@freebsd.org Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 May 2003 17:35:30 -0000 On Thu, 29 May 2003, Terry Lambert wrote: > Igor Sysoev wrote: > > On Wed, 28 May 2003, Terry Lambert wrote: > > > Igor Sysoev wrote: > > > > The portability argument is bogus because sendfile() portability is nonsense. > > > > > > Darwin has sendfile. See the released source code: it matches > > > the FreeBSD semantics, from what I can tell. > > > > So now FreeBSD/Darwin is second pair after AIX/MVS that has the same sendfile() > > prototype. It surely improves the sendfile() portability. Undoubtedly. > > FreeBSD, NetBSD, OpenBSD, Darwin. There's no sendfile() implementation in NetBSD and OpenBSD. If you apply some experimental patch you can easy fix some non-portable issues. By the way what's about kqueue(2) ? Are you not confused that NetBSD does not support EVFILT_AIO and OpenBSD does not support EVFILT_AIO and EVFILT_TIMER ? Does this mean that FreeBSD should not introduce any new kqueue filters or flags ? > > > > The drawback that really annoyed me is that sendfile() blocks on a reading > > > > from a disk while a sending to non-blocking socket. Although I see three > > > > workarounds it's much better to fix this inside sendfile(). > > > > > > There's no workaround for the latency issue, which comes from > > > the fact that a trap handles the request for more pages, and > > > that blocks all callers. Threads has the same problem in libc_r. > > > > The workaround idea is simple - a preloading. But implementation on user > > level is complex. In FreeBSD 4.x I see three ways: > > > > *) the use of aio_read() to read the single bytes; > > This does not fix the problem. See the extensive discussion, > last month, about the differences between libthr and libc_r > and libpthreads. Even when doing async I/O, you stall all > threads in any model that isn't 1:1 when you fault for a > user page in a system call. > > This is because a fault on a user page results in entering > the trap handler, which suspends the calling program until > such time as the fault is satisfied, or a SIGSEGV is raised > to the caller because the fault cannot be satisfied. I agree but I told not about the blocking on a page fault but the blocking on the reading the file page from a disk by sendfile(). These pages can be preloaded. > > *) the use rfork()ed helper processes to read the single bytes; > > *) and the use the pool of rfork()ed processes to handle connections. > > We call this "libthr". I believe that rfork() and libthr are different things. I think that the rfork()ed process in FreeBSD 5.x is still the process but not the thread. > > > > Five people ? > > > > > > Bill Fenner, Matt Dillon, Peter Jeremy, Marc Slemko, Terry Lambert, > > > Garance Droshin. > > > > At time of your mail there were only 4 people, in order of appearance: > > Peter Jeremy, you, Matt Dillon, and Marc Slemko. Bill Fenner's email > > was sent one and a half hour after yours and just before my response. > > Garance Droshin's mail was sent several hours later. > > Apparently, I received some mail that did not go to the list. > > However, if you are willing to include the contents of the > "sendfile() broken" thread, I could add three more before > your time deadline, including Bruce Evans. Well, but then these "five people have told you [ but not me ] the correct approach is to fix sendfile(2)". > But more to the point, so far you are the only one who is not > saying that sendfile() needs to be fixed, instead of kludged. By the way I do not see that Peter Jeremy said that sendfile() needs to be fixed, he said only that my flags are needless and I should prove their usefulness. Igor Sysoev http://sysoev.ru/en/ From owner-freebsd-arch@FreeBSD.ORG Thu May 29 10:44:25 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4C08937B401 for ; Thu, 29 May 2003 10:44:25 -0700 (PDT) Received: from mail.auriga.ru (mail.auriga.ru [80.240.102.102]) by mx1.FreeBSD.org (Postfix) with ESMTP id 45F9443F93 for ; Thu, 29 May 2003 10:44:24 -0700 (PDT) (envelope-from alex.neyman@auriga.ru) Received: from mail.loopback.interface ([127.0.0.1] helo=vagabond.auriga.ru) by mail.auriga.ru with esmtp (Exim 4.14) id 19LRUu-00052F-Ts for freebsd-arch@freebsd.org; Thu, 29 May 2003 21:47:28 +0400 From: Alexey Neyman Organization: Auriga, Inc To: freebsd-arch@freebsd.org Date: Thu, 29 May 2003 21:44:37 +0400 User-Agent: KMail/1.5.2 MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200305292144.37793.alex.neyman@auriga.ru> Subject: different users for NNTP server X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 May 2003 17:44:25 -0000 Hi, there! I just stumbled over this: while /etc/master.passwd includes the 'news' user, the /etc/inetd.conf suggests running NNTP server as 'usenet' user (missing from /etc/master.passwd). Shouldn't both these files refer to the same user, e.g. 'news'? Regards, Alexey. -- ,----------------------------------------, | A quoi ca sert d'etre sur la terre | Alexey V. Neyman | Si c'est pour faire nos vies a genoux! | mailto:alex.neyman@auriga.ru `------------------( Les Rois du Monde )-' From owner-freebsd-arch@FreeBSD.ORG Thu May 29 13:06:30 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8071537B401 for ; Thu, 29 May 2003 13:06:30 -0700 (PDT) Received: from cirb503493.alcatel.com.au (c18609.belrs1.nsw.optusnet.com.au [210.49.80.204]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3FBA043F3F for ; Thu, 29 May 2003 13:06:29 -0700 (PDT) (envelope-from peterjeremy@optushome.com.au) Received: from cirb503493.alcatel.com.au (localhost.alcatel.com.au [127.0.0.1])h4TK6Rp9048659; Fri, 30 May 2003 06:06:27 +1000 (EST) (envelope-from jeremyp@cirb503493.alcatel.com.au) Received: (from jeremyp@localhost) by cirb503493.alcatel.com.au (8.12.8/8.12.8/Submit) id h4TK6FaA048658; Fri, 30 May 2003 06:06:16 +1000 (EST) Date: Fri, 30 May 2003 06:06:15 +1000 From: Peter Jeremy To: Igor Sysoev Message-ID: <20030529200615.GC22178@cirb503493.alcatel.com.au> References: <3ED62AFE.187A40F9@mindspring.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.1i cc: arch@freebsd.org Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 May 2003 20:06:30 -0000 On Thu, May 29, 2003 at 09:35:28PM +0400, Igor Sysoev wrote: >> But more to the point, so far you are the only one who is not >> saying that sendfile() needs to be fixed, instead of kludged. > >By the way I do not see that Peter Jeremy said that sendfile() needs >to be fixed, he said only that my flags are needless and I should >prove their usefulness. I admit I never said sendfile() was broken - only because I wasn't familiar enough with either its current behaviour or with the most desirable behaviour. But instead of arguing semantics, how about providing some hard data on why your changes should be applied (as I requested in my first post). So far you have between 4 and 8 people who have argued against your changes (depending on how you count). You have yet to come up with any people or hard facts to back up your position. As far as I am concerned, unless and until you manage to produce something concrete to back up your assertions, this thread has outlived its usefulness. Peter From owner-freebsd-arch@FreeBSD.ORG Fri May 30 00:39:21 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A94D037B401 for ; Fri, 30 May 2003 00:39:21 -0700 (PDT) Received: from relay2.masterhost.ru (relay2.masterhost.ru [213.59.3.18]) by mx1.FreeBSD.org (Postfix) with SMTP id 1EB9D43F3F for ; Fri, 30 May 2003 00:39:20 -0700 (PDT) (envelope-from postmaster@apt-telecom.ru) Received: (qmail 77704 invoked from network); 30 May 2003 07:39:00 -0000 Received: from smtp.masterhost.ru (213.59.3.17) by relay2.masterhost.ru with SMTP; 30 May 2003 07:38:59 -0000 Received: (qmail 77696 invoked from network); 30 May 2003 07:38:59 -0000 Received: from unknown (HELO victor) (postmaster@apt-telecom.ru@213.247.145.106) by smtp.masterhost.ru with SMTP; 30 May 2003 07:38:59 -0000 Message-ID: <000f01c3267e$2b15f560$0400a8c0@ntl.ru> From: "Victor Bratsev" To: Date: Fri, 30 May 2003 11:36:21 +0400 MIME-Version: 1.0 X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1106 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1106 Content-Type: text/plain; charset="koi8-r" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.1 Subject: RBEM10/100+56k at 32-bit cardBus X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 May 2003 07:39:22 -0000 Greetings! Can anybody help me to solve this - I've got RBEM10/100+56k PCMCIA card and I can't get it work. Because my cardBus - it's 32-bit, and driver is 16-bit. I've got a message at startup pcmcia: 32-bit cardbus is unsupported and as result if_xe.ko is unloaded with strange fhdjklsah - if I try to kldload if_xe it says that it can't load module because it is already loaded, but kldstat says that there is NO if_xe loaded. FreeBSD is RELEASE-4.7 with custom kernel with device xe included. I've got little knowledge in PC-internals and haven't got experience in system programming so if anybody will guide me I should try to solve this by myself. And I don't want to change my laptop or pc-card:) Thanks. VBratsev@netscape.net ---------------------------------------- No trees were harmed in the composition of this message, although some electrons were mildly inconvenienced. (after Chris Dillon - cdillon(at)wolves.k12.mo.us) FreeBSD: The fastest and most stable server OS on the planet - Available for IA32 (Intel x86) and Alpha architectures - IA64, PowerPC, UltraSPARC, ARM, and S/390 under development - http://www.freebsd.org From owner-freebsd-arch@FreeBSD.ORG Fri May 30 01:50:52 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 84F0E37B401 for ; Fri, 30 May 2003 01:50:52 -0700 (PDT) Received: from stork.mail.pas.earthlink.net (stork.mail.pas.earthlink.net [207.217.120.188]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0217743F3F for ; Fri, 30 May 2003 01:50:52 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from user-38lc0lu.dialup.mindspring.com ([209.86.2.190] helo=mindspring.com) by stork.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 19Lfb7-0000Et-00; Fri, 30 May 2003 01:50:50 -0700 Message-ID: <3ED71B19.339E8A95@mindspring.com> Date: Fri, 30 May 2003 01:49:29 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Igor Sysoev References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4b3a8a48ae4003698c0aed7412474a059350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c cc: arch@freebsd.org Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 May 2003 08:50:52 -0000 Igor Sysoev wrote: > > FreeBSD, NetBSD, OpenBSD, Darwin. > > There's no sendfile() implementation in NetBSD and OpenBSD. If you > apply some experimental patch you can easy fix some non-portable issues. Or you could just fix sendfile. 8-). > By the way what's about kqueue(2) ? Are you not confused that NetBSD > does not support EVFILT_AIO and OpenBSD does not support EVFILT_AIO and > EVFILT_TIMER ? Does this mean that FreeBSD should not introduce any > new kqueue filters or flags ? These are incredibly trivial to support. I estimate the work at an hour each, including writing a unit test. It took me about an hour to write the SystemV IPC Message Queue KNOTE() code for FreeBSD. > I agree but I told not about the blocking on a page fault but the blocking > on the reading the file page from a disk by sendfile(). These pages > can be preloaded. It doesn't "read" it, per se: it creates a mapping, and it faults the pages; when they are in core, then they can be sent. [ removed 3 tangents not germane to this discussion; if you want to revive them, please revive them on -chat ] -- Terry From owner-freebsd-arch@FreeBSD.ORG Fri May 30 02:23:05 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C357137B401 for ; Fri, 30 May 2003 02:23:05 -0700 (PDT) Received: from park.rambler.ru (park.rambler.ru [81.19.64.101]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5C57543F3F for ; Fri, 30 May 2003 02:23:04 -0700 (PDT) (envelope-from is@rambler-co.ru) Received: from is.park.rambler.ru (is.park.rambler.ru [81.19.64.102]) by park.rambler.ru (8.12.6/8.12.6) with ESMTP id h4U9N2mF089976; Fri, 30 May 2003 13:23:02 +0400 (MSD) Date: Fri, 30 May 2003 13:23:02 +0400 (MSD) From: Igor Sysoev X-Sender: is@is To: Terry Lambert In-Reply-To: <3ED71B19.339E8A95@mindspring.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: arch@freebsd.org Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 May 2003 09:23:06 -0000 On Fri, 30 May 2003, Terry Lambert wrote: > Igor Sysoev wrote: > > > FreeBSD, NetBSD, OpenBSD, Darwin. > > > > There's no sendfile() implementation in NetBSD and OpenBSD. If you > > apply some experimental patch you can easy fix some non-portable issues. > > Or you could just fix sendfile. 8-). I'm going to fix it as Matthew Dillon suggested if no one else is going to do it in the near future. > > By the way what's about kqueue(2) ? Are you not confused that NetBSD > > does not support EVFILT_AIO and OpenBSD does not support EVFILT_AIO and > > EVFILT_TIMER ? Does this mean that FreeBSD should not introduce any > > new kqueue filters or flags ? > > These are incredibly trivial to support. I estimate the work > at an hour each, including writing a unit test. It took me > about an hour to write the SystemV IPC Message Queue KNOTE() > code for FreeBSD. Nevetheless there's no support for EVFILT_AIO and EVFILT_TIMER. By the way I do not think that EVFILT_AIO is a trivial thing. Actually it requires at least the working AIO enviroment in the kernel. Now we have more portable kqueue() that exists in FreeBSD, NetBSD, and OpenBSD (I do not know about Darwin and MacOS X) with the same prototype and some unsupported filters. And we have much less portable sendfile() that exists in the most modern unices but with the different prototypes and functionality. > > I agree but I told not about the blocking on a page fault but the blocking > > on the reading the file page from a disk by sendfile(). These pages > > can be preloaded. > > It doesn't "read" it, per se: it creates a mapping, and it > faults the pages; when they are in core, then they can be > sent. So what do these lines in /sys/kern/uipc_syscalls.c:sendfile(): if (!pg->valid || !vm_page_is_valid(pg, pgoff, xfsize)) { .... error = VOP_READ(vp, &auio, IO_VMIO | ((MAXBSIZE / bsize) << 16), p->p_ucred); .... } Igor Sysoev http://sysoev.ru/en/ From owner-freebsd-arch@FreeBSD.ORG Fri May 30 02:55:18 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 69FAA37B401 for ; Fri, 30 May 2003 02:55:18 -0700 (PDT) Received: from bluejay.mail.pas.earthlink.net (bluejay.mail.pas.earthlink.net [207.217.120.218]) by mx1.FreeBSD.org (Postfix) with ESMTP id C220E43FA3 for ; Fri, 30 May 2003 02:55:17 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from user-38lc0lu.dialup.mindspring.com ([209.86.2.190] helo=mindspring.com) by bluejay.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 19LgbN-0001NN-00; Fri, 30 May 2003 02:55:10 -0700 Message-ID: <3ED72A16.9CACD4C5@mindspring.com> Date: Fri, 30 May 2003 02:53:26 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Igor Sysoev References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a427793c588c70b29ca754cb1c6160fb0e350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c cc: arch@freebsd.org Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 May 2003 09:55:18 -0000 Igor Sysoev wrote: > On Fri, 30 May 2003, Terry Lambert wrote: > > Or you could just fix sendfile. 8-). > > I'm going to fix it as Matthew Dillon suggested if no one else is going to > do it in the near future. I'm pretty sure it will deadlock on boundary conditions, but Matt has confidence it won't; I looked at the code he pointed to in -stable in -current, and I'm not so sure I agree, but I'm willing to be wrong. If it fixes the problem for you, and doesn't deadlock, then more power to you. I would ask that you test with files sizes in 1 byte increments, up to 32769 bytes, with headers of 0 bytes and 300 bytes for your test cases, so that the boundary that I'm worried about ends up getting exercised. > > > By the way what's about kqueue(2) ? Are you not confused that NetBSD > > > does not support EVFILT_AIO and OpenBSD does not support EVFILT_AIO and > > > EVFILT_TIMER ? Does this mean that FreeBSD should not introduce any > > > new kqueue filters or flags ? > > > > These are incredibly trivial to support. I estimate the work > > at an hour each, including writing a unit test. It took me > > about an hour to write the SystemV IPC Message Queue KNOTE() > > code for FreeBSD. > > Nevetheless there's no support for EVFILT_AIO and EVFILT_TIMER. > By the way I do not think that EVFILT_AIO is a trivial thing. > Actually it requires at least the working AIO enviroment in the kernel. This is really a tangent again; however, I would point out that aio can be implemented in the context of sceduler activations and a spawned AIO kernel thread per request (the alternative is to implement it entirely in user space, and then implement a loopback "send" mechanism for the KNOTE()'s). So implementing aio is probably a 20 hour task (1/2 a man-week). More work, but still all doable in a weeks time or less. In general, most of the things you are pointing at, including the sendfile problem, don't take a lot of thinking to fix, only the grunt-work to actually crank out the code. > Now we have more portable kqueue() that exists in FreeBSD, NetBSD, and OpenBSD > (I do not know about Darwin and MacOS X) with the same prototype and > some unsupported filters. And we have much less portable sendfile() that > exists in the most modern unices but with the different prototypes and > functionality. This illustrates my thesis that interfaces with the same names tend to converge over time. Another example is select(), which Linux initially implemented as updating the timeout struct with the time which had elapsed; this was divergents, and broke a lot of code, until they relented and fixed it to defacto standard behaviour. I'm confident the same thing will eventually happen with kqueue/kevent. The main issue with Linux adoption of kqueue/kevent is that they claim it's level triggered instead of edge triggered, that they want events, they don't want conditions raised. To a small extent, they are right. But this is trivially correctable, and needs to be corrected anyway, for EVFILT_PROC to support a larger numbr of PID's. Right now, the PID is OR'ed in with the event, and so is limited to 20 bits. Another parameter, a void * (in which the PID value can be cast and recovered) would be enough to provide additional context. With this context, it's possible to arrange a contract between the user kn_data that was passed in and the filter routine, in order to copy out arbitrary data, making the event edge rather than level triggered. With this single modification, you fix both the 20 bit PID limit problem and the Linux objection to the adoption of the kevent interface. In other words, you increase convergence. It's natural over time for visible source bases to converge. > > It doesn't "read" it, per se: it creates a mapping, and it > > faults the pages; when they are in core, then they can be > > sent. > > So what do these lines in /sys/kern/uipc_syscalls.c:sendfile(): > > if (!pg->valid || !vm_page_is_valid(pg, pgoff, xfsize)) { > .... > error = VOP_READ(vp, &auio, IO_VMIO | ((MAXBSIZE / bsize) << 16), > p->p_ucred); > .... > } That's easy: they mean you aren't looking at version 1.147 of the file, and that you're looking at RELENG_4, and not -CURRENT (version 1.65.2.17, or earlier). You are 82 HEAD revisions behind the state of the art. -- Terry From owner-freebsd-arch@FreeBSD.ORG Fri May 30 03:00:02 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0DBC037B401 for ; Fri, 30 May 2003 03:00:02 -0700 (PDT) Received: from phoenix.infradead.org (carisma.slowglass.com [195.224.96.167]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3392B43FB1 for ; Fri, 30 May 2003 03:00:01 -0700 (PDT) (envelope-from hch@infradead.org) Received: from hch by phoenix.infradead.org with local (Exim 4.10) id 19Lgfz-0001iQ-00; Fri, 30 May 2003 10:59:55 +0100 Date: Fri, 30 May 2003 10:59:55 +0100 From: Christoph Hellwig To: Terry Lambert Message-ID: <20030530105955.A6562@infradead.org> References: <3ED72A16.9CACD4C5@mindspring.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <3ED72A16.9CACD4C5@mindspring.com>; from tlambert2@mindspring.com on Fri, May 30, 2003 at 02:53:26AM -0700 cc: arch@freebsd.org Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 May 2003 10:00:02 -0000 On Fri, May 30, 2003 at 02:53:26AM -0700, Terry Lambert wrote: > This illustrates my thesis that interfaces with the same names > tend to converge over time. Another example is select(), which > Linux initially implemented as updating the timeout struct with > the time which had elapsed; this was divergents, and broke a > lot of code, until they relented and fixed it to defacto standard It hasn't changed. And trying to change it broke perfectly working code. From owner-freebsd-arch@FreeBSD.ORG Fri May 30 03:09:28 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8E2E737B401 for ; Fri, 30 May 2003 03:09:28 -0700 (PDT) Received: from park.rambler.ru (park.rambler.ru [81.19.64.101]) by mx1.FreeBSD.org (Postfix) with ESMTP id 97B5043F75 for ; Fri, 30 May 2003 03:09:27 -0700 (PDT) (envelope-from is@rambler-co.ru) Received: from is.park.rambler.ru (is.park.rambler.ru [81.19.64.102]) by park.rambler.ru (8.12.6/8.12.6) with ESMTP id h4UA9QmF090562; Fri, 30 May 2003 14:09:26 +0400 (MSD) Date: Fri, 30 May 2003 14:09:26 +0400 (MSD) From: Igor Sysoev X-Sender: is@is To: Terry Lambert In-Reply-To: <3ED72A16.9CACD4C5@mindspring.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: arch@freebsd.org Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 May 2003 10:09:28 -0000 On Fri, 30 May 2003, Terry Lambert wrote: > Igor Sysoev wrote: > > On Fri, 30 May 2003, Terry Lambert wrote: > > > It doesn't "read" it, per se: it creates a mapping, and it > > > faults the pages; when they are in core, then they can be > > > sent. > > > > So what do these lines in /sys/kern/uipc_syscalls.c:sendfile(): > > > > if (!pg->valid || !vm_page_is_valid(pg, pgoff, xfsize)) { > > .... > > error = VOP_READ(vp, &auio, IO_VMIO | ((MAXBSIZE / bsize) << 16), > > p->p_ucred); > > .... > > } > > That's easy: they mean you aren't looking at version 1.147 of > the file, and that you're looking at RELENG_4, and not -CURRENT > (version 1.65.2.17, or earlier). You are 82 HEAD revisions > behind the state of the art. Yes, I looked in FreeBSD 4.x. In the HEAD VOP_READ() was changed to: error = vn_rdwr(UIO_READ, vp, NULL, MAXBSIZE, trunc_page(off), UIO_NOCOPY, IO_NODELOCKED | IO_VMIO | ((MAXBSIZE / bsize) << 16), td->td_ucred, NOCRED, &resid, td); What does vn_rdwr() ? Does it fault the page as you said or it reads pages ? Igor Sysoev http://sysoev.ru/en/ From owner-freebsd-arch@FreeBSD.ORG Sat May 31 12:38:51 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 586E337B401 for ; Sat, 31 May 2003 12:38:51 -0700 (PDT) Received: from ns2.gnf.org (ns2.gnf.org [63.196.132.68]) by mx1.FreeBSD.org (Postfix) with ESMTP id 87F2143F85 for ; Sat, 31 May 2003 12:38:50 -0700 (PDT) (envelope-from gtetlow@gnf.org) Received: from EXCHCLUSTER01.lj.gnf.org (exch01.lj.gnf.org [172.25.10.19]) by ns2.gnf.org (8.12.8p1/8.12.8) with ESMTP id h4VJcmRo008708 for ; Sat, 31 May 2003 12:38:48 -0700 (PDT) (envelope-from gtetlow@gnf.org) Received: from roark.gnf.org ([172.25.24.15]) by EXCHCLUSTER01.lj.gnf.org with Microsoft SMTPSVC(5.0.2195.5329); Sat, 31 May 2003 12:38:50 -0700 Received: from roark.gnf.org (localhost [127.0.0.1]) by roark.gnf.org (8.12.9/8.12.9) with ESMTP id h4VJcojX093390 for ; Sat, 31 May 2003 12:38:50 -0700 (PDT) (envelope-from gtetlow@gnf.org) Received: (from gtetlow@localhost) by roark.gnf.org (8.12.9/8.12.9/Submit) id h4VJcn1g093389 for arch@FreeBSD.org; Sat, 31 May 2003 12:38:49 -0700 (PDT) (envelope-from gtetlow) Date: Sat, 31 May 2003 12:38:49 -0700 From: Gordon Tetlow To: arch@FreeBSD.org Message-ID: <20030531193849.GR87863@roark.gnf.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="7vS62bsm3BVGCDKV" Content-Disposition: inline User-Agent: Mutt/1.4i X-Habeas-SWE-1: winter into spring X-Habeas-SWE-2: brightly anticipated X-Habeas-SWE-3: like Habeas SWE (tm) X-Habeas-SWE-4: Copyright 2002 Habeas (tm) X-Habeas-SWE-5: Sender Warranted Email (SWE) (tm). The sender of this X-Habeas-SWE-6: email in exchange for a license for this Habeas X-Habeas-SWE-7: warrant mark warrants that this is a Habeas Compliant X-Habeas-SWE-8: Message (HCM) and not spam. Please report use of this X-Habeas-SWE-9: mark in spam to . X-OriginalArrivalTime: 31 May 2003 19:38:50.0264 (UTC) FILETIME=[42A78980:01C327AC] Subject: Moving some items out of src/sbin to src/usr.sbin X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 31 May 2003 19:38:51 -0000 --7vS62bsm3BVGCDKV Content-Type: text/plain; charset=us-ascii Content-Disposition: inline To cut down on the size of a dynamically-linked root, I'd like to repo-copy the following utilities from src/sbin to src/usr.sbin: mount_portalfs mount_nwfs mount_smbfs natd ipnat Does anyone have any objections? -gordon --7vS62bsm3BVGCDKV Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (FreeBSD) iD8DBQE+2QTJRu2t9DV9ZfsRAkOwAJ4tlDD0WoPCawCvqRwprt1qYH3YOQCfdqzC BnNgtGnxldzMvbNc/X+gvmw= =S7Et -----END PGP SIGNATURE----- --7vS62bsm3BVGCDKV-- From owner-freebsd-arch@FreeBSD.ORG Sat May 31 12:52:18 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9449B37B404 for ; Sat, 31 May 2003 12:52:18 -0700 (PDT) Received: from heron.mail.pas.earthlink.net (heron.mail.pas.earthlink.net [207.217.120.189]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0180F43F93 for ; Sat, 31 May 2003 12:52:16 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from user-v8ldv6l.dsl.mindspring.com ([209.86.252.213] helo=mindspring.com) by heron.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 19MCOj-0004iF-00; Sat, 31 May 2003 12:52:14 -0700 Message-ID: <3ED90796.91188E84@mindspring.com> Date: Sat, 31 May 2003 12:50:46 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Gordon Tetlow References: <20030531193849.GR87863@roark.gnf.org> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4530f9e57a47276e8eed2b937d9bd5df2387f7b89c61deb1d350badd9bab72f9c350badd9bab72f9c cc: arch@FreeBSD.org Subject: Re: Moving some items out of src/sbin to src/usr.sbin X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 31 May 2003 19:52:18 -0000 Gordon Tetlow wrote: > To cut down on the size of a dynamically-linked root, I'd like to > repo-copy the following utilities from src/sbin to src/usr.sbin: > > mount_portalfs > mount_nwfs > mount_smbfs > natd > ipnat > > Does anyone have any objections? All the mount programs should be in the same directory. I would actually be tempted to go farther, and to adopt the SVR4 layout for these types of programs, and the stub programs that call them, and put them under /libexec; that probably would not fly to well, even though it would mean you could drop in new file systems, and the tools would "just know" about them. If you do go ahead with your plan, make sure to inform the mount program of the additional directory to look in, and (maybe) add a "verbose" option, so that it can tell you where it found the thing that it was exec'ing in order to do the mount, since if the files are sprinkled all over the directory hierarchy like pixie-dust, it's going to be difficult to track them down. -- Terry From owner-freebsd-arch@FreeBSD.ORG Sat May 31 13:22:36 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id DBAC237B4FC for ; Sat, 31 May 2003 13:22:34 -0700 (PDT) Received: from dragon.nuxi.com (trang.nuxi.com [66.93.134.19]) by mx1.FreeBSD.org (Postfix) with ESMTP id D822843F75 for ; Sat, 31 May 2003 13:22:33 -0700 (PDT) (envelope-from obrien@NUXI.com) Received: from dragon.nuxi.com (obrien@localhost [127.0.0.1]) by dragon.nuxi.com (8.12.9/8.12.9) with ESMTP id h4VKMLVm022256; Sat, 31 May 2003 13:22:25 -0700 (PDT) (envelope-from obrien@dragon.nuxi.com) Received: (from obrien@localhost) by dragon.nuxi.com (8.12.9/8.12.9/Submit) id h4VKMLdD022255; Sat, 31 May 2003 13:22:21 -0700 (PDT) Date: Sat, 31 May 2003 13:22:21 -0700 From: "David O'Brien" To: Gordon Tetlow Message-ID: <20030531202221.GA22056@dragon.nuxi.com> Mail-Followup-To: David O'Brien , Gordon Tetlow , arch@FreeBSD.org References: <20030531193849.GR87863@roark.gnf.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030531193849.GR87863@roark.gnf.org> User-Agent: Mutt/1.4i X-Operating-System: FreeBSD 5.1-BETA Organization: The NUXI BSD Group X-Pgp-Rsa-Fingerprint: B7 4D 3E E9 11 39 5F A3 90 76 5D 69 58 D9 98 7A X-Pgp-Rsa-Keyid: 1024/34F9F9D5 cc: arch@FreeBSD.org Subject: Re: Moving some items out of src/sbin to src/usr.sbin X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: arch@FreeBSD.org List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 31 May 2003 20:22:36 -0000 On Sat, May 31, 2003 at 12:38:49PM -0700, Gordon Tetlow wrote: > To cut down on the size of a dynamically-linked root, I'd like to > repo-copy the following utilities from src/sbin to src/usr.sbin: > > mount_portalfs > mount_nwfs > mount_smbfs > natd > ipnat > > Does anyone have any objections? yes to natd. -- -- David (obrien@FreeBSD.org) From owner-freebsd-arch@FreeBSD.ORG Sat May 31 13:48:06 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 373B437B401 for ; Sat, 31 May 2003 13:48:06 -0700 (PDT) Received: from ns1.gnf.org (ns1.gnf.org [63.196.132.67]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3E18743F75 for ; Sat, 31 May 2003 13:48:05 -0700 (PDT) (envelope-from gtetlow@gnf.org) Received: from EXCHCLUSTER01.lj.gnf.org (exch01.lj.gnf.org [172.25.10.19]) by ns1.gnf.org (8.12.8p1/8.12.8) with ESMTP id h4VKm3tF085662 for ; Sat, 31 May 2003 13:48:03 -0700 (PDT) (envelope-from gtetlow@gnf.org) Received: from roark.gnf.org ([172.25.24.15]) by EXCHCLUSTER01.lj.gnf.org with Microsoft SMTPSVC(5.0.2195.5329); Sat, 31 May 2003 13:48:04 -0700 Received: from roark.gnf.org (localhost [127.0.0.1]) by roark.gnf.org (8.12.9/8.12.9) with ESMTP id h4VKm4jX095553; Sat, 31 May 2003 13:48:04 -0700 (PDT) (envelope-from gtetlow@gnf.org) Received: (from gtetlow@localhost) by roark.gnf.org (8.12.9/8.12.9/Submit) id h4VKm45U095552; Sat, 31 May 2003 13:48:04 -0700 (PDT) (envelope-from gtetlow) Date: Sat, 31 May 2003 13:48:04 -0700 From: Gordon Tetlow To: Terry Lambert Message-ID: <20030531204804.GS87863@roark.gnf.org> References: <20030531193849.GR87863@roark.gnf.org> <3ED90796.91188E84@mindspring.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="FqExhoTtZ2UFWcPs" Content-Disposition: inline In-Reply-To: <3ED90796.91188E84@mindspring.com> User-Agent: Mutt/1.4i X-Habeas-SWE-1: winter into spring X-Habeas-SWE-2: brightly anticipated X-Habeas-SWE-3: like Habeas SWE (tm) X-Habeas-SWE-4: Copyright 2002 Habeas (tm) X-Habeas-SWE-5: Sender Warranted Email (SWE) (tm). The sender of this X-Habeas-SWE-6: email in exchange for a license for this Habeas X-Habeas-SWE-7: warrant mark warrants that this is a Habeas Compliant X-Habeas-SWE-8: Message (HCM) and not spam. Please report use of this X-Habeas-SWE-9: mark in spam to . X-OriginalArrivalTime: 31 May 2003 20:48:04.0965 (UTC) FILETIME=[EF0C9950:01C327B5] cc: arch@FreeBSD.org Subject: Re: Moving some items out of src/sbin to src/usr.sbin X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 31 May 2003 20:48:06 -0000 --FqExhoTtZ2UFWcPs Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, May 31, 2003 at 12:50:46PM -0700, Terry Lambert wrote: > Gordon Tetlow wrote: > > To cut down on the size of a dynamically-linked root, I'd like to > > repo-copy the following utilities from src/sbin to src/usr.sbin: > >=20 > > mount_portalfs > > mount_nwfs > > mount_smbfs > > natd > > ipnat > >=20 > > Does anyone have any objections? >=20 > All the mount programs should be in the same directory. >=20 > I would actually be tempted to go farther, and to adopt the SVR4 > layout for these types of programs, and the stub programs that > call them, and put them under /libexec; that probably would not > fly to well, even though it would mean you could drop in new > file systems, and the tools would "just know" about them. They already do. mount -t foo will try execing /sbin/mount_foo and then /usr/sbin/mount_foo. You'd know that if you read the source. > If you do go ahead with your plan, make sure to inform the mount > program of the additional directory to look in, and (maybe) add > a "verbose" option, so that it can tell you where it found the > thing that it was exec'ing in order to do the mount, since if > the files are sprinkled all over the directory hierarchy like > pixie-dust, it's going to be difficult to track them down. The directory search is already there. As for adding the debugging, that could be done trivially. -gordon --FqExhoTtZ2UFWcPs Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (FreeBSD) iD8DBQE+2RUERu2t9DV9ZfsRAmWMAJ9dAELF++N+o2tIzyUFyykdGK+kQgCfU61z WKIzbaUzvBuws6TmHzuq5FQ= =uuLv -----END PGP SIGNATURE----- --FqExhoTtZ2UFWcPs-- From owner-freebsd-arch@FreeBSD.ORG Sat May 31 15:10:14 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1EDE037B404 for ; Sat, 31 May 2003 15:10:14 -0700 (PDT) Received: from cirb503493.alcatel.com.au (c18609.belrs1.nsw.optusnet.com.au [210.49.80.204]) by mx1.FreeBSD.org (Postfix) with ESMTP id DEB0543F75 for ; Sat, 31 May 2003 15:10:12 -0700 (PDT) (envelope-from peterjeremy@optushome.com.au) Received: from cirb503493.alcatel.com.au (localhost.alcatel.com.au [127.0.0.1])h4VMAAp9054301; Sun, 1 Jun 2003 08:10:10 +1000 (EST) (envelope-from jeremyp@cirb503493.alcatel.com.au) Received: (from jeremyp@localhost) by cirb503493.alcatel.com.au (8.12.8/8.12.8/Submit) id h4VM9wL2054297; Sun, 1 Jun 2003 08:09:58 +1000 (EST) Date: Sun, 1 Jun 2003 08:09:57 +1000 From: Peter Jeremy To: "David O'Brien" , Gordon Tetlow , arch@FreeBSD.org Message-ID: <20030531220957.GA54163@cirb503493.alcatel.com.au> References: <20030531193849.GR87863@roark.gnf.org> <20030531202221.GA22056@dragon.nuxi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030531202221.GA22056@dragon.nuxi.com> User-Agent: Mutt/1.4.1i Subject: Re: Moving some items out of src/sbin to src/usr.sbin X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 31 May 2003 22:10:14 -0000 On Sat, May 31, 2003 at 01:22:21PM -0700, David O'Brien wrote: >On Sat, May 31, 2003 at 12:38:49PM -0700, Gordon Tetlow wrote: >> To cut down on the size of a dynamically-linked root, I'd like to >> repo-copy the following utilities from src/sbin to src/usr.sbin: >> >> mount_portalfs >> mount_nwfs >> mount_smbfs >> natd >> ipnat >> >> Does anyone have any objections? > >yes to natd. David, would you like to go into a bit more detail please. The traditional justification for an object to be in the root partition is that it is required to either allow the system to boot to the point where /usr is mounted, or to restore the remaining filesystems (including /usr) from a backup. IMHO, it's reasonable to assume/require that /usr be a 'native' filesystem - so MS-DOS, NTFS, Netware and SMB are not needed - though a case could be made for requiring Netware and/or SMB to allow for a situation where backups are made to a Netware or SMB server. I can't foresee any requirement for portals before /usr is mounted. NAT is normally used at boundaries between different privilege zones (though this isn't its only use) and it would seem unusual to mount /usr from a different privilege zone to the local system. Normally, natd is started before ipfw rules are loaded, but I don't believe there is a requirement for a process to be bound to a divert socket before diversion rules are added. If Gordon is looking for programs to move from /sbin to /usr/sbin, mount_msdos, mount_ntfs, mountd, nfsd and maybe ipfstat all seem candidates. The first two are covered above. IMHO, there's no point a machine becomming a NFS server before it has /usr mounted - which covers the next two. Finally, ipfstat is not needed to configure IPFilter - just monitor it. Peter From owner-freebsd-arch@FreeBSD.ORG Sat May 31 15:28:07 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id CA87837B401 for ; Sat, 31 May 2003 15:28:07 -0700 (PDT) Received: from dragon.nuxi.com (trang.nuxi.com [66.93.134.19]) by mx1.FreeBSD.org (Postfix) with ESMTP id E228043F75 for ; Sat, 31 May 2003 15:28:06 -0700 (PDT) (envelope-from obrien@NUXI.com) Received: from dragon.nuxi.com (obrien@localhost [127.0.0.1]) by dragon.nuxi.com (8.12.9/8.12.9) with ESMTP id h4VMRpVm023440; Sat, 31 May 2003 15:27:55 -0700 (PDT) (envelope-from obrien@dragon.nuxi.com) Received: (from obrien@localhost) by dragon.nuxi.com (8.12.9/8.12.9/Submit) id h4VMRlba023439; Sat, 31 May 2003 15:27:47 -0700 (PDT) Date: Sat, 31 May 2003 15:27:47 -0700 From: "David O'Brien" To: Peter Jeremy Message-ID: <20030531222747.GA23373@dragon.nuxi.com> Mail-Followup-To: David O'Brien , Peter Jeremy , arch@FreeBSD.org References: <20030531193849.GR87863@roark.gnf.org> <20030531202221.GA22056@dragon.nuxi.com> <20030531220957.GA54163@cirb503493.alcatel.com.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030531220957.GA54163@cirb503493.alcatel.com.au> User-Agent: Mutt/1.4i X-Operating-System: FreeBSD 5.1-BETA Organization: The NUXI BSD Group X-Pgp-Rsa-Fingerprint: B7 4D 3E E9 11 39 5F A3 90 76 5D 69 58 D9 98 7A X-Pgp-Rsa-Keyid: 1024/34F9F9D5 cc: arch@FreeBSD.org Subject: Re: Moving some items out of src/sbin to src/usr.sbin X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: arch@FreeBSD.org List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 31 May 2003 22:28:08 -0000 On Sun, Jun 01, 2003 at 08:09:57AM +1000, Peter Jeremy wrote: > On Sat, May 31, 2003 at 01:22:21PM -0700, David O'Brien wrote: > >On Sat, May 31, 2003 at 12:38:49PM -0700, Gordon Tetlow wrote: > >> To cut down on the size of a dynamically-linked root, I'd like to > >> repo-copy the following utilities from src/sbin to src/usr.sbin: > >> > >> mount_portalfs > >> mount_nwfs > >> mount_smbfs > >> natd > >> ipnat > >> > >> Does anyone have any objections? > > > >yes to natd. > > David, would you like to go into a bit more detail please. ... > NAT is normally used at boundaries between different privilege zones > (though this isn't its only use) and it would seem unusual to mount > /usr from a different privilege zone to the local system. Normally, > natd is started before ipfw rules are loaded, but I don't believe > there is a requirement for a process to be bound to a divert socket > before diversion rules are added. Not really. Just to say that as a user of natd and one that knows how fragile ipfw & natd are to passing packets I don't want to disturb things. I want to see some people (other than me) experiment with this the natd issue before it is moved. > IMHO, it's reasonable to assume/require that /usr be a 'native' > filesystem - so MS-DOS, NTFS, Netware and SMB are not needed - though ... > If Gordon is looking for programs to move from /sbin to /usr/sbin, > mount_msdos, mount_ntfs, mountd, nfsd and maybe ipfstat all seem > candidates. The first two are covered above. IMHO, there's no point > a machine becomming a NFS server before it has /usr mounted - which > covers the next two. Finally, ipfstat is not needed to configure > IPFilter - just monitor it. Native also covers NFS mounted /usr and UFS /, and Gordon didn't mention that he had carefully looked at /etc/rc.d/* and the implications of moving things. -- -- David (obrien@FreeBSD.org) From owner-freebsd-arch@FreeBSD.ORG Sat May 31 15:28:28 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D022B37B404 for ; Sat, 31 May 2003 15:28:28 -0700 (PDT) Received: from rwcrmhc11.attbi.com (rwcrmhc11.attbi.com [204.127.198.35]) by mx1.FreeBSD.org (Postfix) with ESMTP id 143A343F85 for ; Sat, 31 May 2003 15:28:28 -0700 (PDT) (envelope-from julian@elischer.org) Received: from interjet.elischer.org ([12.232.168.4]) by attbi.com (rwcrmhc11) with ESMTP id <2003053122282701300sls0qe>; Sat, 31 May 2003 22:28:27 +0000 Received: from localhost (localhost.elischer.org [127.0.0.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id PAA67183; Sat, 31 May 2003 15:28:26 -0700 (PDT) Date: Sat, 31 May 2003 15:28:22 -0700 (PDT) From: Julian Elischer To: arch@FreeBSD.org In-Reply-To: <20030531202221.GA22056@dragon.nuxi.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Subject: Re: Moving some items out of src/sbin to src/usr.sbin X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 31 May 2003 22:28:29 -0000 On Sat, 31 May 2003, David O'Brien wrote: > On Sat, May 31, 2003 at 12:38:49PM -0700, Gordon Tetlow wrote: > > To cut down on the size of a dynamically-linked root, I'd like to > > repo-copy the following utilities from src/sbin to src/usr.sbin: > > > > mount_portalfs > > mount_nwfs > > mount_smbfs > > natd > > ipnat > > > > Does anyone have any objections? it would make it hard to mount an smbfs /usr right? I think it goes against POLA to mofe mount subtypes away from where they are.. > > yes to natd. > > -- > -- David (obrien@FreeBSD.org) > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" > From owner-freebsd-arch@FreeBSD.ORG Sat May 31 15:50:41 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id DCEF637B401 for ; Sat, 31 May 2003 15:50:41 -0700 (PDT) Received: from ns1.gnf.org (ns1.gnf.org [63.196.132.67]) by mx1.FreeBSD.org (Postfix) with ESMTP id E5E7043F85 for ; Sat, 31 May 2003 15:50:40 -0700 (PDT) (envelope-from gtetlow@gnf.org) Received: from EXCHCLUSTER01.lj.gnf.org (exch01.lj.gnf.org [172.25.10.19]) by ns1.gnf.org (8.12.8p1/8.12.8) with ESMTP id h4VMoctF085891 for ; Sat, 31 May 2003 15:50:38 -0700 (PDT) (envelope-from gtetlow@gnf.org) Received: from roark.gnf.org ([172.25.24.15]) by EXCHCLUSTER01.lj.gnf.org with Microsoft SMTPSVC(5.0.2195.5329); Sat, 31 May 2003 15:50:40 -0700 Received: from roark.gnf.org (localhost [127.0.0.1]) by roark.gnf.org (8.12.9/8.12.9) with ESMTP id h4VMoejX096871; Sat, 31 May 2003 15:50:40 -0700 (PDT) (envelope-from gtetlow@gnf.org) Received: (from gtetlow@localhost) by roark.gnf.org (8.12.9/8.12.9/Submit) id h4VMoeRx096870; Sat, 31 May 2003 15:50:40 -0700 (PDT) (envelope-from gtetlow) Date: Sat, 31 May 2003 15:50:40 -0700 From: Gordon Tetlow To: Julian Elischer Message-ID: <20030531225040.GV87863@roark.gnf.org> References: <20030531202221.GA22056@dragon.nuxi.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="AG41CdL1ZWVzkk/P" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4i X-Habeas-SWE-1: winter into spring X-Habeas-SWE-2: brightly anticipated X-Habeas-SWE-3: like Habeas SWE (tm) X-Habeas-SWE-4: Copyright 2002 Habeas (tm) X-Habeas-SWE-5: Sender Warranted Email (SWE) (tm). The sender of this X-Habeas-SWE-6: email in exchange for a license for this Habeas X-Habeas-SWE-7: warrant mark warrants that this is a Habeas Compliant X-Habeas-SWE-8: Message (HCM) and not spam. Please report use of this X-Habeas-SWE-9: mark in spam to . X-OriginalArrivalTime: 31 May 2003 22:50:40.0653 (UTC) FILETIME=[0F6183D0:01C327C7] cc: arch@FreeBSD.org Subject: Re: Moving some items out of src/sbin to src/usr.sbin X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 31 May 2003 22:50:42 -0000 --AG41CdL1ZWVzkk/P Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, May 31, 2003 at 03:28:22PM -0700, Julian Elischer wrote: >=20 >=20 > On Sat, 31 May 2003, David O'Brien wrote: >=20 > > On Sat, May 31, 2003 at 12:38:49PM -0700, Gordon Tetlow wrote: > > > To cut down on the size of a dynamically-linked root, I'd like to > > > repo-copy the following utilities from src/sbin to src/usr.sbin: > > >=20 > > > mount_portalfs > > > mount_nwfs > > > mount_smbfs > > > natd > > > ipnat > > >=20 > > > Does anyone have any objections? > it would make it hard to mount an smbfs /usr right? >=20 > I think it goes against POLA to mofe mount subtypes away from where they > are.. mount_smbfs is dynamically linked currently. You can't use it to mount /usr even if you wanted to. No POLA will be broken by moving it. And if you are using nwfs or portalfs for /usr, may $DEITY have pity on your soul. -gordon --AG41CdL1ZWVzkk/P Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (FreeBSD) iD8DBQE+2THARu2t9DV9ZfsRAmZXAJ9rTCv71bXDSZtg3sNOhyZri4Bz+wCfWp3S Av8UWFebay2hIZmSg0BJwYQ= =MKgV -----END PGP SIGNATURE----- --AG41CdL1ZWVzkk/P-- From owner-freebsd-arch@FreeBSD.ORG Sat May 31 16:16:55 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E2DEC37B401 for ; Sat, 31 May 2003 16:16:55 -0700 (PDT) Received: from ns1.gnf.org (ns1.gnf.org [63.196.132.67]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0521443F75 for ; Sat, 31 May 2003 16:16:55 -0700 (PDT) (envelope-from gtetlow@gnf.org) Received: from EXCHCLUSTER01.lj.gnf.org (exch01.lj.gnf.org [172.25.10.19]) by ns1.gnf.org (8.12.8p1/8.12.8) with ESMTP id h4VNGqtF085964 for ; Sat, 31 May 2003 16:16:52 -0700 (PDT) (envelope-from gtetlow@gnf.org) Received: from roark.gnf.org ([172.25.24.15]) by EXCHCLUSTER01.lj.gnf.org with Microsoft SMTPSVC(5.0.2195.5329); Sat, 31 May 2003 16:16:54 -0700 Received: from roark.gnf.org (localhost [127.0.0.1]) by roark.gnf.org (8.12.9/8.12.9) with ESMTP id h4VNGsjX097093; Sat, 31 May 2003 16:16:54 -0700 (PDT) (envelope-from gtetlow@gnf.org) Received: (from gtetlow@localhost) by roark.gnf.org (8.12.9/8.12.9/Submit) id h4VNGr9v097092; Sat, 31 May 2003 16:16:53 -0700 (PDT) (envelope-from gtetlow) Date: Sat, 31 May 2003 16:16:53 -0700 From: Gordon Tetlow To: Peter Jeremy , arch@FreeBSD.org Message-ID: <20030531231653.GW87863@roark.gnf.org> References: <20030531193849.GR87863@roark.gnf.org> <20030531202221.GA22056@dragon.nuxi.com> <20030531220957.GA54163@cirb503493.alcatel.com.au> <20030531222747.GA23373@dragon.nuxi.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="KCLoHzx0Ylaw/v4x" Content-Disposition: inline In-Reply-To: <20030531222747.GA23373@dragon.nuxi.com> User-Agent: Mutt/1.4i X-Habeas-SWE-1: winter into spring X-Habeas-SWE-2: brightly anticipated X-Habeas-SWE-3: like Habeas SWE (tm) X-Habeas-SWE-4: Copyright 2002 Habeas (tm) X-Habeas-SWE-5: Sender Warranted Email (SWE) (tm). The sender of this X-Habeas-SWE-6: email in exchange for a license for this Habeas X-Habeas-SWE-7: warrant mark warrants that this is a Habeas Compliant X-Habeas-SWE-8: Message (HCM) and not spam. Please report use of this X-Habeas-SWE-9: mark in spam to . X-OriginalArrivalTime: 31 May 2003 23:16:54.0537 (UTC) FILETIME=[B97D2790:01C327CA] Subject: Re: Moving some items out of src/sbin to src/usr.sbin X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 31 May 2003 23:16:56 -0000 --KCLoHzx0Ylaw/v4x Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, May 31, 2003 at 03:27:47PM -0700, David O'Brien wrote: > On Sun, Jun 01, 2003 at 08:09:57AM +1000, Peter Jeremy wrote: > > On Sat, May 31, 2003 at 01:22:21PM -0700, David O'Brien wrote: > > >On Sat, May 31, 2003 at 12:38:49PM -0700, Gordon Tetlow wrote: > > >> To cut down on the size of a dynamically-linked root, I'd like to > > >> repo-copy the following utilities from src/sbin to src/usr.sbin: > > >>=20 > > >> mount_portalfs > > >> mount_nwfs > > >> mount_smbfs > > >> natd > > >> ipnat > > >>=20 > > >> Does anyone have any objections? > > > > > >yes to natd. > >=20 > > David, would you like to go into a bit more detail please. > ... > > NAT is normally used at boundaries between different privilege zones > > (though this isn't its only use) and it would seem unusual to mount > > /usr from a different privilege zone to the local system. Normally, > > natd is started before ipfw rules are loaded, but I don't believe > > there is a requirement for a process to be bound to a divert socket > > before diversion rules are added. >=20 > Not really. Just to say that as a user of natd and one that knows how > fragile ipfw & natd are to passing packets I don't want to disturb things. > I want to see some people (other than me) experiment with this the natd > issue before it is moved. I agree testing needs to take place. I'm doing this sweep from the point of view of wanting to minimize libraries that need to be /lib. libalias (who's only consumer is natd) is only 48k. I can live with it, but I was just trying to clean up some low-hanging fruit. > > IMHO, it's reasonable to assume/require that /usr be a 'native' > > filesystem - so MS-DOS, NTFS, Netware and SMB are not needed - though > ... > > If Gordon is looking for programs to move from /sbin to /usr/sbin, > > mount_msdos, mount_ntfs, mountd, nfsd and maybe ipfstat all seem > > candidates. The first two are covered above. IMHO, there's no point > > a machine becomming a NFS server before it has /usr mounted - which > > covers the next two. Finally, ipfstat is not needed to configure > > IPFilter - just monitor it. >=20 > Native also covers NFS mounted /usr and UFS /, and Gordon didn't mention > that he had carefully looked at /etc/rc.d/* and the implications of > moving things. I didn't look to carefully for natd and ipnat, but I did look at the implications of moving the various mount_* providers. The way the current boot scripts work is they mount local-type filesystems (read not NFS, SMBFS, and PORTALFS (it's a bug that NWFS isn't in this list)) first. Then it mounts NFS filesystems. Finally all other network-type filesystems are mounted. As such, if you have /usr mounted via NFS (and only NFS), your other network filesystems will mount just fine. This is the reason I didn't move things like mount_msdosfs and other local-type filesystems. They will be mounted before any network filesystems (including NFS /usr) has a chance to be mounted. -gordon --KCLoHzx0Ylaw/v4x Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (FreeBSD) iD8DBQE+2TflRu2t9DV9ZfsRAqI/AJ42zB1UpfGbDYjipItNDWVHiVdc2gCgiS5a +A360RjNf3MvUkoyG3l8Fv0= =ZXYi -----END PGP SIGNATURE----- --KCLoHzx0Ylaw/v4x--