From owner-freebsd-arch@FreeBSD.ORG Sun Aug 31 20:15:40 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A5E0D898; Sun, 31 Aug 2014 20:15:40 +0000 (UTC) Received: from elvis.mu.org (elvis.mu.org [IPv6:2001:470:1f05:b76::196]) by mx1.freebsd.org (Postfix) with ESMTP id 890D713DF; Sun, 31 Aug 2014 20:15:40 +0000 (UTC) Received: from u10-2-16-021.office.norse-data.com (unknown [50.204.88.51]) by elvis.mu.org (Postfix) with ESMTPSA id A749F346DDEF; Sun, 31 Aug 2014 13:15:39 -0700 (PDT) Message-ID: <540382E2.3040004@freebsd.org> Date: Sun, 31 Aug 2014 13:17:38 -0700 From: Alfred Perlstein Organization: FreeBSD User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:31.0) Gecko/20100101 Thunderbird/31.0 MIME-Version: 1.0 To: freebsd-arch@freebsd.org, Gleb Smirnoff Subject: Re: [CFT/review] new sendfile(2) References: <20140529102054.GX50679@FreeBSD.org> <20140729232404.GF43962@funkthat.com> <20140831165022.GE7693@FreeBSD.org> In-Reply-To: <20140831165022.GE7693@FreeBSD.org> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 31 Aug 2014 20:15:40 -0000 On 8/31/14 9:50 AM, Gleb Smirnoff wrote: > John-Mark, > > On Tue, Jul 29, 2014 at 04:24:04PM -0700, John-Mark Gurney wrote: > J> Gleb Smirnoff wrote this message on Thu, May 29, 2014 at 14:20 +0400: > J> > One of the approaches we are experimenting with is new sendfile(2) > J> > implementation, that doesn't block on the I/O done from the file > J> > descriptor. > J> > J> I know this is a reply to an old message, but... > > I am also sorry for late reply on late reply :) > > J> How is this different from: > J> SF_NODISKIO. This flag causes any sendfile() call which would > J> block on disk I/O to instead return EBUSY. Busy servers may bene- > J> fit by transferring requests that would block to a separate I/O > J> worker thread. > > It is very different. New sendfile(2) simply doesn't block and returns > success :) The I/O completes outside of syscall context. > > J> > 1) Split of socket buffer sb_cc field into sb_acc and sb_ccc. Where > J> > sb_acc stands for "available character count" and sb_ccc is "claimed > J> > character count". This allows us to write a data to a socket, that is > J> > not ready yet. The data sits in the socket, consumes its space, and > J> > keeps itself in the right order with earlier or later writes to socket. > J> > But it can be send only after it is marked as ready. This change is > J> > split across many files. > J> > J> This change really should be split out and possibly committed seperately > J> after a review by the proper people... > > Of course. It actually makes 80% of the volume of the patch. This change has high value, although it has a lot of changes for what appears to be an interesting edge case. As I read this it really confused me, can't this be accomplished by utilizing the socket's callback and pointer parameter instead? Basically you would put all that accounting inside a struct hung off of so->sb_snd.sb_upcallarg and set a callback to do your queuing. That is how you can async drive thread to queue more data, in fact by using aio to read/write to the socket from a stream. It should be relatively simple, the only tricky part being that you'll need to watch your locks and sleeps inside the so->sb_snd.sb_upcall function. Basically move the sb_acc and all of that into a special struct hung off of so->sb_snd.sb_upcallarg and leverage so->sb_snd.sb_upcall to queue more data as space becomes available. At least that's how I would have tried to accomplish this... but maybe you went down this path and hit a non-starter? -Alfred