From owner-freebsd-hackers@FreeBSD.ORG Thu Sep 17 22:26:53 2009 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 21454106566B for ; Thu, 17 Sep 2009 22:26:53 +0000 (UTC) (envelope-from delphij@delphij.net) Received: from tarsier.delphij.net (delphij-pt.tunnel.tserv2.fmt.ipv6.he.net [IPv6:2001:470:1f03:2c9::2]) by mx1.freebsd.org (Postfix) with ESMTP id 641B78FC12 for ; Thu, 17 Sep 2009 22:26:52 +0000 (UTC) Received: from tarsier.geekcn.org (tarsier.geekcn.org [211.166.10.233]) (using TLSv1 with cipher ADH-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by tarsier.delphij.net (Postfix) with ESMTPS id 5F1975C025 for ; Fri, 18 Sep 2009 06:26:51 +0800 (CST) Received: from localhost (tarsier.geekcn.org [211.166.10.233]) by tarsier.geekcn.org (Postfix) with ESMTP id 1DEE955CE091; Fri, 18 Sep 2009 06:26:51 +0800 (CST) X-Virus-Scanned: amavisd-new at geekcn.org Received: from tarsier.geekcn.org ([211.166.10.233]) by localhost (mail.geekcn.org [211.166.10.233]) (amavisd-new, port 10024) with ESMTP id ct9kd74Sg4SF; Fri, 18 Sep 2009 06:26:45 +0800 (CST) Received: from charlie.delphij.net (adsl-76-237-33-62.dsl.pltn13.sbcglobal.net [76.237.33.62]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by tarsier.geekcn.org (Postfix) with ESMTPSA id E533C55CE08A; Fri, 18 Sep 2009 06:26:43 +0800 (CST) DomainKey-Signature: a=rsa-sha1; s=default; d=delphij.net; c=nofws; q=dns; h=message-id:date:from:reply-to:organization:user-agent: mime-version:to:cc:subject:references:in-reply-to: x-enigmail-version:openpgp:content-type; b=uIokQqfY0zR1XpFj37SAnSdJV2ebV/Te+0dWtAjvUeLZ9wlcU3AdylCTrGVYFH/Mg AjVZ8xZ80TTkTyDvLs4UQ== Message-ID: <4AB2B7A1.5000601@delphij.net> Date: Thu, 17 Sep 2009 15:26:41 -0700 From: Xin LI Organization: The FreeBSD Project User-Agent: Thunderbird 2.0.0.22 (X11/20090803) MIME-Version: 1.0 To: Igor Sysoev References: <20090917101526.GF57619@rambler-co.ru> In-Reply-To: <20090917101526.GF57619@rambler-co.ru> X-Enigmail-Version: 0.96.0 OpenPGP: id=18EDEBA0; url=http://www.delphij.net/delphij.asc Content-Type: multipart/mixed; boundary="------------020507030401020101040202" Cc: freebsd-hackers@freebsd.org Subject: Re: fcntl(F_RDAHEAD) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: d@delphij.net List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Sep 2009 22:26:53 -0000 This is a multi-part message in MIME format. --------------020507030401020101040202 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, Igor, Igor Sysoev wrote: > Hi, > > nginx-0.8.15 can use completely non-blocking sendfile() using SF_NODISKIO > flag. When sendfile() returns EBUSY, nginx calls aio_read() to read single > byte. The first aio_read() preloads the first 128K part of a file in VM cache, > however, all successive aio_read()s preload just 16K parts of the file. > This makes non-blocking sendfile() usage ineffective for files larger > than 128K. > > I've created a small patch for Darwin compatible F_RDAHEAD fcntl: > > fcntl(fd, F_RDAHEAD, preload_size) > > There is small incompatibilty: Darwin's fcntl allows just to enable/disable > read ahead, while the proposed patch allows to set exact preload size. > > Currently the preload size affects vn_read() code path only and does not > affect on sendfile() code path. However, it can be easy extended on > sendfile() part too. The preload size is still limited by sysctl vfs.read_max. > > The patch is against FreeBSD 7.2 and was tested on FreeBSD 7.2-STABLE only. I have ported this as a patch against -HEAD (should apply on 8.0-R but it's too late for us to add a new feature) plus a manual page entry documenting the feature. I've used F_READAHEAD as the name, but reading the manual page, it looks like we can just use F_RDAHEAD since Darwin seems to just distinguish 0 and !=0 case so that programmers won't have to use #ifdef or something else to get code working on different platform? Cheers, - -- Xin LI http://www.delphij.net/ FreeBSD - The Power to Serve! -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.12 (FreeBSD) iEYEARECAAYFAkqyt40ACgkQi+vbBBjt66AdKgCfXOo/Vn+zw0cCjS+gGJUgPo8t WToAmgKIXaVKsKUcqVOqTwHl4eTFsbkM =uP3m -----END PGP SIGNATURE----- --------------020507030401020101040202 Content-Type: text/plain; name="readahead.diff" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="readahead.diff" Index: lib/libc/sys/fcntl.2 =================================================================== --- lib/libc/sys/fcntl.2 (revision 197297) +++ lib/libc/sys/fcntl.2 (working copy) @@ -28,7 +28,7 @@ .\" @(#)fcntl.2 8.2 (Berkeley) 1/12/94 .\" $FreeBSD$ .\" -.Dd March 8, 2008 +.Dd September 19, 2009 .Dt FCNTL 2 .Os .Sh NAME @@ -241,6 +241,14 @@ .Dv SA_RESTART (see .Xr sigaction 2 ) . +.It Dv F_READAHEAD +Set or clear the read ahead amount for sequential access to the third +argument, +.Fa arg , +which is rounded up to the nearest block size. +A zero value in +.Fa arg +turns off read ahead. .El .Pp When a shared lock has been set on a segment of a file, Index: sys/kern/kern_descrip.c =================================================================== --- sys/kern/kern_descrip.c (revision 197297) +++ sys/kern/kern_descrip.c (working copy) @@ -421,6 +421,7 @@ struct vnode *vp; int error, flg, tmp; int vfslocked; + uint64_t bsize; vfslocked = 0; error = 0; @@ -686,6 +687,31 @@ vfslocked = 0; fdrop(fp, td); break; + + case F_READAHEAD: + FILEDESC_SLOCK(fdp); + if ((fp = fdtofp(fd, fdp)) == NULL) { + FILEDESC_SUNLOCK(fdp); + error = EBADF; + break; + } + if (fp->f_type != DTYPE_VNODE) { + FILEDESC_SUNLOCK(fdp); + error = EBADF; + break; + } + fhold(fp); + FILEDESC_SUNLOCK(fdp); + if (arg) { + bsize = fp->f_vnode->v_mount->mnt_stat.f_iosize; + fp->f_seqcount = (arg + bsize - 1) / bsize; + fp->f_flag |= O_READAHEAD; + } else { + fp->f_flag &= ~O_READAHEAD; + } + fdrop(fp, td); + break; + default: error = EINVAL; break; Index: sys/kern/vfs_vnops.c =================================================================== --- sys/kern/vfs_vnops.c (revision 197297) +++ sys/kern/vfs_vnops.c (working copy) @@ -312,6 +312,9 @@ sequential_heuristic(struct uio *uio, struct file *fp) { + if (fp->f_flag & O_READAHEAD) + return (fp->f_seqcount << IO_SEQSHIFT); + /* * Offset 0 is handled specially. open() sets f_seqcount to 1 so * that the first I/O is normally considered to be slightly Index: sys/sys/fcntl.h =================================================================== --- sys/sys/fcntl.h (revision 197297) +++ sys/sys/fcntl.h (working copy) @@ -112,7 +112,11 @@ #if __BSD_VISIBLE /* Attempt to bypass buffer cache */ #define O_DIRECT 0x00010000 +#ifdef _KERNEL +/* Read ahead */ +#define O_READAHEAD 0x00020000 #endif +#endif /* Defined by POSIX Extended API Set Part 2 */ #if __BSD_VISIBLE @@ -218,6 +222,7 @@ #define F_SETLK 12 /* set record locking information */ #define F_SETLKW 13 /* F_SETLK; wait if blocked */ #define F_SETLK_REMOTE 14 /* debugging support for remote locks */ +#define F_READAHEAD 15 /* read ahead */ /* file descriptor flags (F_GETFD, F_SETFD) */ #define FD_CLOEXEC 1 /* close-on-exec flag */ --------------020507030401020101040202--