Date: Tue, 22 Sep 2009 16:33:55 +0400 From: Igor Sysoev <is@rambler-co.ru> To: Kostik Belousov <kostikbel@gmail.com> Cc: freebsd-hackers@freebsd.org, d@delphij.net Subject: Re: fcntl(F_RDAHEAD) Message-ID: <20090922123355.GA30679@rambler-co.ru> In-Reply-To: <20090918074027.GI47688@deviant.kiev.zoral.com.ua> References: <20090917101526.GF57619@rambler-co.ru> <4AB2B7A1.5000601@delphij.net> <20090918074027.GI47688@deviant.kiev.zoral.com.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
--n8g4imXOkfNTN/H1 Content-Type: text/plain; charset=koi8-r Content-Disposition: inline On Fri, Sep 18, 2009 at 10:40:27AM +0300, Kostik Belousov wrote: > On Thu, Sep 17, 2009 at 03:26:41PM -0700, Xin LI wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > > Hash: SHA1 > > > > Hi, Igor, > > > > Igor Sysoev wrote: > > > Hi, > > > > > > nginx-0.8.15 can use completely non-blocking sendfile() using SF_NODISKIO > > > flag. When sendfile() returns EBUSY, nginx calls aio_read() to read single > > > byte. The first aio_read() preloads the first 128K part of a file in VM cache, > > > however, all successive aio_read()s preload just 16K parts of the file. > > > This makes non-blocking sendfile() usage ineffective for files larger > > > than 128K. > > > > > > I've created a small patch for Darwin compatible F_RDAHEAD fcntl: > > > > > > fcntl(fd, F_RDAHEAD, preload_size) > > > > > > There is small incompatibilty: Darwin's fcntl allows just to enable/disable > > > read ahead, while the proposed patch allows to set exact preload size. > > > > > > Currently the preload size affects vn_read() code path only and does not > > > affect on sendfile() code path. However, it can be easy extended on > > > sendfile() part too. The preload size is still limited by sysctl vfs.read_max. > > > > > > The patch is against FreeBSD 7.2 and was tested on FreeBSD 7.2-STABLE only. > > > > I have ported this as a patch against -HEAD (should apply on 8.0-R but > > it's too late for us to add a new feature) plus a manual page entry > > documenting the feature. > > > > I've used F_READAHEAD as the name, but reading the manual page, it looks > > like we can just use F_RDAHEAD since Darwin seems to just distinguish 0 > > and !=0 case so that programmers won't have to use #ifdef or something > > else to get code working on different platform? > > What I dislike about the patch is the new kernel-private flag that is > eaten from the open(2) flags namespace. We do already have FHASLOCK, > so far the only such flag. The new patch version against 7.2 is attached. Changes: 1) two fcntl's: F_READAHEAD and Darwin compatible F_RDAHEAD, 2) FREADAHEAD uses O_CREAT bit. -- Igor Sysoev http://sysoev.ru/en/ --n8g4imXOkfNTN/H1 Content-Type: text/plain; charset=koi8-r Content-Disposition: attachment; filename="patch.readahead" --- /sys/sys/fcntl.h 2009-06-02 19:05:17.000000000 +0400 +++ /sys/sys/fcntl.h 2009-09-22 16:28:52.000000000 +0400 @@ -132,7 +132,7 @@ /* bits to save after open */ #define FMASK (FREAD|FWRITE|FAPPEND|FASYNC|FFSYNC|FNONBLOCK|O_DIRECT) /* bits settable by fcntl(F_SETFL, ...) */ -#define FCNTLFLAGS (FAPPEND|FASYNC|FFSYNC|FNONBLOCK|FPOSIXSHM|O_DIRECT) +#define FCNTLFLAGS (FAPPEND|FASYNC|FFSYNC|FNONBLOCK|FPOSIXSHM|FRDAHEAD|O_DIRECT) #endif /* @@ -163,6 +163,9 @@ * implemented as plain files). */ #define FPOSIXSHM O_NOFOLLOW + +/* Read ahead */ +#define FRDAHEAD O_CREAT #endif /* @@ -187,6 +190,8 @@ #define F_SETLK 12 /* set record locking information */ #define F_SETLKW 13 /* F_SETLK; wait if blocked */ #define F_SETLK_REMOTE 14 /* debugging support for remote locks */ +#define F_READAHEAD 15 /* read ahead */ +#define F_RDAHEAD 16 /* Darwin compatible read ahead */ /* file descriptor flags (F_GETFD, F_SETFD) */ #define FD_CLOEXEC 1 /* close-on-exec flag */ --- /sys/kern/vfs_vnops.c 2009-06-02 19:05:00.000000000 +0400 +++ /sys/kern/vfs_vnops.c 2009-09-22 14:08:03.000000000 +0400 @@ -305,6 +305,9 @@ sequential_heuristic(struct uio *uio, struct file *fp) { + if (fp->f_flag & FRDAHEAD) + return(fp->f_seqcount << IO_SEQSHIFT); + if ((uio->uio_offset == 0 && fp->f_seqcount > 0) || uio->uio_offset == fp->f_nextoff) { /* --- /sys/kern/kern_descrip.c 2009-08-28 18:50:11.000000000 +0400 +++ /sys/kern/kern_descrip.c 2009-09-22 14:17:47.000000000 +0400 @@ -411,6 +411,7 @@ u_int newmin; int error, flg, tmp; int vfslocked; + uint64_t bsize; vfslocked = 0; error = 0; @@ -694,6 +695,35 @@ vfslocked = 0; fdrop(fp, td); break; + + case F_RDAHEAD: + arg = arg ? 128 * 1024: 0; + /* FALLTHROUGH F_READAHEAD */ + + case F_READAHEAD: + FILEDESC_SLOCK(fdp); + if ((fp = fdtofp(fd, fdp)) == NULL) { + FILEDESC_SUNLOCK(fdp); + error = EBADF; + break; + } + if (fp->f_type != DTYPE_VNODE) { + FILEDESC_SUNLOCK(fdp); + error = EBADF; + break; + } + FILE_LOCK(fp); + if (arg) { + bsize = fp->f_vnode->v_mount->mnt_stat.f_iosize; + fp->f_seqcount = (arg + bsize - 1) / bsize; + fp->f_flag |= FRDAHEAD; + } else { + fp->f_flag &= ~FRDAHEAD; + } + FILE_UNLOCK(fp); + FILEDESC_SUNLOCK(fdp); + break; + default: error = EINVAL; break; --n8g4imXOkfNTN/H1--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20090922123355.GA30679>