Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 17 Sep 2009 14:15:26 +0400
From:      Igor Sysoev <is@rambler-co.ru>
To:        freebsd-hackers@freebsd.org
Subject:   fcntl(F_RDAHEAD)
Message-ID:  <20090917101526.GF57619@rambler-co.ru>

next in thread | raw e-mail | index | archive | help

--VS++wcV0S1rZb1Fb
Content-Type: text/plain; charset=koi8-r
Content-Disposition: inline

Hi,

nginx-0.8.15 can use completely non-blocking sendfile() using SF_NODISKIO
flag. When sendfile() returns EBUSY, nginx calls aio_read() to read single
byte. The first aio_read() preloads the first 128K part of a file in VM cache,
however, all successive aio_read()s preload just 16K parts of the file.
This makes non-blocking sendfile() usage ineffective for files larger
than 128K.

I've created a small patch for Darwin compatible F_RDAHEAD fcntl:

   fcntl(fd, F_RDAHEAD, preload_size)

There is small incompatibilty: Darwin's fcntl allows just to enable/disable
read ahead, while the proposed patch allows to set exact preload size.

Currently the preload size affects vn_read() code path only and does not
affect on sendfile() code path. However, it can be easy extended on
sendfile() part too. The preload size is still limited by sysctl vfs.read_max.

The patch is against FreeBSD 7.2 and was tested on FreeBSD 7.2-STABLE only.


-- 
Igor Sysoev
http://sysoev.ru/en/

--VS++wcV0S1rZb1Fb
Content-Type: text/plain; charset=koi8-r
Content-Disposition: attachment; filename="patch.rdahead"

--- sys/sys/fcntl.h	2009-06-02 19:05:17.000000000 +0400
+++ sys/sys/fcntl.h	2009-09-12 20:29:34.000000000 +0400
@@ -118,6 +118,10 @@
 #if __BSD_VISIBLE
 /* Attempt to bypass buffer cache */
 #define O_DIRECT	0x00010000
+#ifdef _KERNEL
+/* Read ahead */
+#define O_RDAHEAD	0x00020000
+#endif
 #endif
 
 /*
@@ -187,6 +191,7 @@
 #define	F_SETLK		12		/* set record locking information */
 #define	F_SETLKW	13		/* F_SETLK; wait if blocked */
 #define	F_SETLK_REMOTE	14		/* debugging support for remote locks */
+#define	F_RDAHEAD	15		/* read ahead */
 
 /* file descriptor flags (F_GETFD, F_SETFD) */
 #define	FD_CLOEXEC	1		/* close-on-exec flag */
--- sys/kern/vfs_vnops.c	2009-06-02 19:05:00.000000000 +0400
+++ sys/kern/vfs_vnops.c	2009-09-12 20:24:00.000000000 +0400
@@ -305,6 +305,9 @@
 sequential_heuristic(struct uio *uio, struct file *fp)
 {
 
+	if (fp->f_flag & O_RDAHEAD)
+		return(fp->f_seqcount << IO_SEQSHIFT);
+
 	if ((uio->uio_offset == 0 && fp->f_seqcount > 0) ||
 	    uio->uio_offset == fp->f_nextoff) {
 		/*
--- sys/kern/kern_descrip.c	2009-08-28 18:50:11.000000000 +0400
+++ sys/kern/kern_descrip.c	2009-09-12 20:23:36.000000000 +0400
@@ -411,6 +411,7 @@
 	u_int newmin;
 	int error, flg, tmp;
 	int vfslocked;
+	uint64_t bsize;
 
 	vfslocked = 0;
 	error = 0;
@@ -694,6 +695,31 @@
 		vfslocked = 0;
 		fdrop(fp, td);
 		break;
+
+	case F_RDAHEAD:
+		FILEDESC_SLOCK(fdp);
+		if ((fp = fdtofp(fd, fdp)) == NULL) {
+			FILEDESC_SUNLOCK(fdp);
+			error = EBADF;
+			break;
+		}
+		if (fp->f_type != DTYPE_VNODE) {
+			FILEDESC_SUNLOCK(fdp);
+			error = EBADF;
+			break;
+		}
+		FILE_LOCK(fp);
+		if (arg) {
+			bsize = fp->f_vnode->v_mount->mnt_stat.f_iosize;
+			fp->f_seqcount = (arg + bsize - 1) / bsize;
+			fp->f_flag |= O_RDAHEAD;
+		} else {
+			fp->f_flag &= ~O_RDAHEAD;
+		}
+		FILE_UNLOCK(fp);
+		FILEDESC_SUNLOCK(fdp);
+		break;
+
 	default:
 		error = EINVAL;
 		break;

--VS++wcV0S1rZb1Fb--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20090917101526.GF57619>