Date: Fri, 30 May 2014 10:51:37 +0200 From: Peter Holm <peter@holm.cc> To: Gleb Smirnoff <glebius@freebsd.org> Cc: arch@freebsd.org Subject: Re: [CFT/review] new sendfile(2) Message-ID: <20140530085137.GA11895@x2.osted.lan> In-Reply-To: <20140529102054.GX50679@FreeBSD.org> References: <20140529102054.GX50679@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, May 29, 2014 at 02:20:54PM +0400, Gleb Smirnoff wrote: > Hello! > > At Netflix and Nginx we are experimenting with improving FreeBSD > wrt sending large amounts of static data via HTTP. > > One of the approaches we are experimenting with is new sendfile(2) > implementation, that doesn't block on the I/O done from the file > descriptor. > > The problem with classic sendfile(2) is that if the the request > length is large enough, and file data is not cached in VM, then > sendfile(2) syscall would not return until it fills socket buffer > with data. With modern internet socket buffers can be up to 1 Mb, > thus time taken by the syscall raises by order of magnitude. All > the time, the nginx worker is blocked in syscall and doesn't > process data from other clients. The best current practice to > mitigate that is known as "sendfile(2) + aio_read(2)". This is > special mode of nginx operation on FreeBSD. The sendfile(2) call > is issued with SF_NODISKIO flag, that forbids the syscall to > perform disk I/O, and send only data that is cached by VM. If > sendfile(2) reports that I/O needs to be done (but forbidden), then > nginx would do aio_read() of a chunk of the file. The data read > is cached by VM, as side affect. Then sendfile() is called again. > > Now for the new sendfile. The core idea is that sendfile() > schedules the I/O, but doesn't wait for it to complete. It > returns immediately to the process, and I/O completion is > processed in kernel context. Unlike aio(4), no additional > threads in kernel are created. The new sendfile is a drop-in > replacement for the old one. Applications (like nginx) doesn't > need recompile, neither configuration change. The SF_NODISKIO is > ignored. > > The patch for review is available at: > > https://phabric.freebsd.org/D102 > > And for those who prefer email attachments, it is also attached. > The patch has 3 logically separate changes in itself: > > 1) Split of socket buffer sb_cc field into sb_acc and sb_ccc. Where > sb_acc stands for "available character count" and sb_ccc is "claimed > character count". This allows us to write a data to a socket, that is > not ready yet. The data sits in the socket, consumes its space, and > keeps itself in the right order with earlier or later writes to socket. > But it can be send only after it is marked as ready. This change is > split across many files. > > 2) A new vnode operation: VOP_GETPAGES_ASYNC(). This one lives in sys/vm. > > 3) Actual implementation of new sendfile(2). This one lives in > kern/uipc_syscalls.c > > > > At Netflix, we already see improvements with new sendfile(2). > We can send more data utilizing same amount of CPU, and we can > push closer to 0% idle, without experiencing short lags. > > However, we have somewhat modified VM subsystem, that behaves > optimal for our task, but suboptimal for average FreeBSD system. > I'd like someone from community to try the new sendfile(2) at > other setup and see how does it serve for you. > > To be the early tester you need to checkout projects/sendfile > branch and build kernel from it. The world from head/ would > run fine with it. > > svn co http://svn.freebsd.org/base/projects/sendfile > cd sendfile > ... build kernel ... > > Limitations: > - No testing were done on serving files on NFS. > - No testing were done on serving files on ZFS. > I got this: panic: sbready: sb 0xfffff801834219e8 NULL fnrdy http://people.freebsd.org/~pho/stress/log/gleb007.txt - Peter
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140530085137.GA11895>