Date: Thu, 17 Sep 2009 15:26:41 -0700 From: Xin LI <delphij@delphij.net> To: Igor Sysoev <is@rambler-co.ru> Cc: freebsd-hackers@freebsd.org Subject: Re: fcntl(F_RDAHEAD) Message-ID: <4AB2B7A1.5000601@delphij.net> In-Reply-To: <20090917101526.GF57619@rambler-co.ru> References: <20090917101526.GF57619@rambler-co.ru>
next in thread | previous in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format. --------------020507030401020101040202 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, Igor, Igor Sysoev wrote: > Hi, > > nginx-0.8.15 can use completely non-blocking sendfile() using SF_NODISKIO > flag. When sendfile() returns EBUSY, nginx calls aio_read() to read single > byte. The first aio_read() preloads the first 128K part of a file in VM cache, > however, all successive aio_read()s preload just 16K parts of the file. > This makes non-blocking sendfile() usage ineffective for files larger > than 128K. > > I've created a small patch for Darwin compatible F_RDAHEAD fcntl: > > fcntl(fd, F_RDAHEAD, preload_size) > > There is small incompatibilty: Darwin's fcntl allows just to enable/disable > read ahead, while the proposed patch allows to set exact preload size. > > Currently the preload size affects vn_read() code path only and does not > affect on sendfile() code path. However, it can be easy extended on > sendfile() part too. The preload size is still limited by sysctl vfs.read_max. > > The patch is against FreeBSD 7.2 and was tested on FreeBSD 7.2-STABLE only. I have ported this as a patch against -HEAD (should apply on 8.0-R but it's too late for us to add a new feature) plus a manual page entry documenting the feature. I've used F_READAHEAD as the name, but reading the manual page, it looks like we can just use F_RDAHEAD since Darwin seems to just distinguish 0 and !=0 case so that programmers won't have to use #ifdef or something else to get code working on different platform? Cheers, - -- Xin LI <delphij@delphij.net> http://www.delphij.net/ FreeBSD - The Power to Serve! -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.12 (FreeBSD) iEYEARECAAYFAkqyt40ACgkQi+vbBBjt66AdKgCfXOo/Vn+zw0cCjS+gGJUgPo8t WToAmgKIXaVKsKUcqVOqTwHl4eTFsbkM =uP3m -----END PGP SIGNATURE----- --------------020507030401020101040202 Content-Type: text/plain; name="readahead.diff" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="readahead.diff" Index: lib/libc/sys/fcntl.2 =================================================================== --- lib/libc/sys/fcntl.2 (revision 197297) +++ lib/libc/sys/fcntl.2 (working copy) @@ -28,7 +28,7 @@ .\" @(#)fcntl.2 8.2 (Berkeley) 1/12/94 .\" $FreeBSD$ .\" -.Dd March 8, 2008 +.Dd September 19, 2009 .Dt FCNTL 2 .Os .Sh NAME @@ -241,6 +241,14 @@ .Dv SA_RESTART (see .Xr sigaction 2 ) . +.It Dv F_READAHEAD +Set or clear the read ahead amount for sequential access to the third +argument, +.Fa arg , +which is rounded up to the nearest block size. +A zero value in +.Fa arg +turns off read ahead. .El .Pp When a shared lock has been set on a segment of a file, Index: sys/kern/kern_descrip.c =================================================================== --- sys/kern/kern_descrip.c (revision 197297) +++ sys/kern/kern_descrip.c (working copy) @@ -421,6 +421,7 @@ struct vnode *vp; int error, flg, tmp; int vfslocked; + uint64_t bsize; vfslocked = 0; error = 0; @@ -686,6 +687,31 @@ vfslocked = 0; fdrop(fp, td); break; + + case F_READAHEAD: + FILEDESC_SLOCK(fdp); + if ((fp = fdtofp(fd, fdp)) == NULL) { + FILEDESC_SUNLOCK(fdp); + error = EBADF; + break; + } + if (fp->f_type != DTYPE_VNODE) { + FILEDESC_SUNLOCK(fdp); + error = EBADF; + break; + } + fhold(fp); + FILEDESC_SUNLOCK(fdp); + if (arg) { + bsize = fp->f_vnode->v_mount->mnt_stat.f_iosize; + fp->f_seqcount = (arg + bsize - 1) / bsize; + fp->f_flag |= O_READAHEAD; + } else { + fp->f_flag &= ~O_READAHEAD; + } + fdrop(fp, td); + break; + default: error = EINVAL; break; Index: sys/kern/vfs_vnops.c =================================================================== --- sys/kern/vfs_vnops.c (revision 197297) +++ sys/kern/vfs_vnops.c (working copy) @@ -312,6 +312,9 @@ sequential_heuristic(struct uio *uio, struct file *fp) { + if (fp->f_flag & O_READAHEAD) + return (fp->f_seqcount << IO_SEQSHIFT); + /* * Offset 0 is handled specially. open() sets f_seqcount to 1 so * that the first I/O is normally considered to be slightly Index: sys/sys/fcntl.h =================================================================== --- sys/sys/fcntl.h (revision 197297) +++ sys/sys/fcntl.h (working copy) @@ -112,7 +112,11 @@ #if __BSD_VISIBLE /* Attempt to bypass buffer cache */ #define O_DIRECT 0x00010000 +#ifdef _KERNEL +/* Read ahead */ +#define O_READAHEAD 0x00020000 #endif +#endif /* Defined by POSIX Extended API Set Part 2 */ #if __BSD_VISIBLE @@ -218,6 +222,7 @@ #define F_SETLK 12 /* set record locking information */ #define F_SETLKW 13 /* F_SETLK; wait if blocked */ #define F_SETLK_REMOTE 14 /* debugging support for remote locks */ +#define F_READAHEAD 15 /* read ahead */ /* file descriptor flags (F_GETFD, F_SETFD) */ #define FD_CLOEXEC 1 /* close-on-exec flag */ --------------020507030401020101040202--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4AB2B7A1.5000601>