From owner-freebsd-hackers@FreeBSD.ORG Thu Nov 6 02:31:51 2003 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0EC4A16A4CE for ; Thu, 6 Nov 2003 02:31:51 -0800 (PST) Received: from park.rambler.ru (park.rambler.ru [81.19.64.101]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7C60743FDD for ; Thu, 6 Nov 2003 02:31:49 -0800 (PST) (envelope-from is@rambler-co.ru) Received: from is.park.rambler.ru (is.park.rambler.ru [81.19.64.102]) by park.rambler.ru (8.12.6/8.12.6) with ESMTP id hA6AVcJ6029458; Thu, 6 Nov 2003 13:31:38 +0300 (MSK) (envelope-from is@rambler-co.ru) Date: Thu, 6 Nov 2003 13:31:38 +0300 (MSK) From: Igor Sysoev X-Sender: is@is.park.rambler.ru To: Mike Silbersack In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-hackers@freebsd.org cc: Vivek Pai cc: Alan Cox Subject: Re: Update: Debox sendfile modifications X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 06 Nov 2003 10:31:51 -0000 On Wed, 5 Nov 2003, Igor Sysoev wrote: > On Wed, 5 Nov 2003, Mike Silbersack wrote: > > > On Wed, 5 Nov 2003, Vivek Pai wrote: > > > > > If you were to have sendfile issue the disk reads, how would you signal > > > completion? I guess one approach is to make the socket buffer appear to > > > have no space while the sendfile-initiated read is in progress, but > > > it seems to me that such an approach would be considered too ugly. It > > > would cause the least modification to applications, because otherwise > > > apps need to disable interest on the socket having space, and re-enable > > > it after getting notified that the sendfile-initiated read (and > > > transfer) completed. Am I missing something? > > > > > > -Vivek > > > > I'm not quite certain how I would do it yet. At this point in time I'm > > just brainstorming. I have some other things I'd like to work on in the > > next few weeks, I'll sit down and think about this more in late November / > > early December if I'm still in the right mindset. > > I think it can done in the following way - a socket should have flag > that says that sendfile() had started the reading a page. > select()/poll()/kevent() should check this flag before the checking > a socket buffer space. When the page had been read this flag is reset. > If there was error while a reading a page then second flag should be set and > the first one should be reset. sendfile() should check the error flag > before processing. If it set then sendfile() should do a blocking read() > to learn errno. I think that this blocking read() would not occur under > normal conditions and would not decrease perfomance. And if we have file > errors we should think not about perfomance but about correctness of a whole > server. > > I think it would be transparent to the existent user applications > that uses select()/etc. Here is more clear (I hope) description of the above method. Each socket buffer has two flags - SB_SFBUSY and SB_SFERR. When sendfile() needs to read a file page it sets SB_SFBUSY in so->so_snd.sb_flags and initiates the reading by starting kthread (probably easy to program but non-optimal method) or by queueing async disk operation. Then sendfile() returns EWOULDBLOCK. An application calls select()/poll()/kevent() to learn when the socket would be ready to write. select()/etc sees SB_SFBUSY set and decides that the socket is not ready. I think it's correct behaviour because I do not think that there is an application that wants to writev() after sendfile() returns EWOULDBLOCK. When a reading has completed kthread or aio completion procedure clear SB_SFBUSY flags. If there was an error while a reading then it sets SB_SFERR. And then it calls wakeup. select()/etc sees that SB_SFBUSY is clear, checks the buffer space and reports readiness to an application. An application calls sendfile() again. If sendfile() sees the SB_SFERR it does a blocking read to learn error code. Although we can save the error code before but I do not see a place where to save it. It certainly should not be an addition to the socket structures because it increases their size. Or we can save it to so->so_error (I think it would be rather EIO so it can not be mixed with other socket errors). Then kevent() can report about it and sendfile() can see it too. In this case we do not need SB_SFERR flag. Igor Sysoev http://sysoev/ru/en/