From owner-freebsd-arch@FreeBSD.ORG Fri May 30 08:58:21 2014 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B59E9E7B for ; Fri, 30 May 2014 08:58:21 +0000 (UTC) Received: from relay02.pair.com (relay02.pair.com [209.68.5.16]) by mx1.freebsd.org (Postfix) with SMTP id 509092F94 for ; Fri, 30 May 2014 08:58:20 +0000 (UTC) Received: (qmail 97335 invoked from network); 30 May 2014 08:51:38 -0000 Received: from 87.58.146.155 (HELO x2.osted.lan) (87.58.146.155) by relay02.pair.com with SMTP; 30 May 2014 08:51:38 -0000 X-pair-Authenticated: 87.58.146.155 Received: from x2.osted.lan (localhost [127.0.0.1]) by x2.osted.lan (8.14.5/8.14.5) with ESMTP id s4U8pcGN011993; Fri, 30 May 2014 10:51:38 +0200 (CEST) (envelope-from pho@x2.osted.lan) Received: (from pho@localhost) by x2.osted.lan (8.14.5/8.14.5/Submit) id s4U8pcLi011992; Fri, 30 May 2014 10:51:38 +0200 (CEST) (envelope-from pho) Date: Fri, 30 May 2014 10:51:37 +0200 From: Peter Holm To: Gleb Smirnoff Subject: Re: [CFT/review] new sendfile(2) Message-ID: <20140530085137.GA11895@x2.osted.lan> References: <20140529102054.GX50679@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140529102054.GX50679@FreeBSD.org> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 May 2014 08:58:21 -0000 On Thu, May 29, 2014 at 02:20:54PM +0400, Gleb Smirnoff wrote: > Hello! > > At Netflix and Nginx we are experimenting with improving FreeBSD > wrt sending large amounts of static data via HTTP. > > One of the approaches we are experimenting with is new sendfile(2) > implementation, that doesn't block on the I/O done from the file > descriptor. > > The problem with classic sendfile(2) is that if the the request > length is large enough, and file data is not cached in VM, then > sendfile(2) syscall would not return until it fills socket buffer > with data. With modern internet socket buffers can be up to 1 Mb, > thus time taken by the syscall raises by order of magnitude. All > the time, the nginx worker is blocked in syscall and doesn't > process data from other clients. The best current practice to > mitigate that is known as "sendfile(2) + aio_read(2)". This is > special mode of nginx operation on FreeBSD. The sendfile(2) call > is issued with SF_NODISKIO flag, that forbids the syscall to > perform disk I/O, and send only data that is cached by VM. If > sendfile(2) reports that I/O needs to be done (but forbidden), then > nginx would do aio_read() of a chunk of the file. The data read > is cached by VM, as side affect. Then sendfile() is called again. > > Now for the new sendfile. The core idea is that sendfile() > schedules the I/O, but doesn't wait for it to complete. It > returns immediately to the process, and I/O completion is > processed in kernel context. Unlike aio(4), no additional > threads in kernel are created. The new sendfile is a drop-in > replacement for the old one. Applications (like nginx) doesn't > need recompile, neither configuration change. The SF_NODISKIO is > ignored. > > The patch for review is available at: > > https://phabric.freebsd.org/D102 > > And for those who prefer email attachments, it is also attached. > The patch has 3 logically separate changes in itself: > > 1) Split of socket buffer sb_cc field into sb_acc and sb_ccc. Where > sb_acc stands for "available character count" and sb_ccc is "claimed > character count". This allows us to write a data to a socket, that is > not ready yet. The data sits in the socket, consumes its space, and > keeps itself in the right order with earlier or later writes to socket. > But it can be send only after it is marked as ready. This change is > split across many files. > > 2) A new vnode operation: VOP_GETPAGES_ASYNC(). This one lives in sys/vm. > > 3) Actual implementation of new sendfile(2). This one lives in > kern/uipc_syscalls.c > > > > At Netflix, we already see improvements with new sendfile(2). > We can send more data utilizing same amount of CPU, and we can > push closer to 0% idle, without experiencing short lags. > > However, we have somewhat modified VM subsystem, that behaves > optimal for our task, but suboptimal for average FreeBSD system. > I'd like someone from community to try the new sendfile(2) at > other setup and see how does it serve for you. > > To be the early tester you need to checkout projects/sendfile > branch and build kernel from it. The world from head/ would > run fine with it. > > svn co http://svn.freebsd.org/base/projects/sendfile > cd sendfile > ... build kernel ... > > Limitations: > - No testing were done on serving files on NFS. > - No testing were done on serving files on ZFS. > I got this: panic: sbready: sb 0xfffff801834219e8 NULL fnrdy http://people.freebsd.org/~pho/stress/log/gleb007.txt - Peter