From owner-freebsd-arch@FreeBSD.ORG Wed Dec 14 16:41:46 2011 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 69B251065678 for ; Wed, 14 Dec 2011 16:41:46 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 295278FC1C for ; Wed, 14 Dec 2011 16:41:46 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [96.47.65.170]) by cyrus.watson.org (Postfix) with ESMTPSA id 0665046B52 for ; Wed, 14 Dec 2011 11:41:44 -0500 (EST) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 6B7ABB96C for ; Wed, 14 Dec 2011 11:41:43 -0500 (EST) From: John Baldwin To: arch@freebsd.org Date: Wed, 14 Dec 2011 11:41:41 -0500 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p8; KDE/4.5.5; amd64; ; ) MIME-Version: 1.0 Content-Type: Text/Plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Message-Id: <201112141141.41168.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Wed, 14 Dec 2011 11:41:43 -0500 (EST) Cc: Subject: Changing lseek() to KNOTE on the vnode when seeking on a file X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Dec 2011 16:41:46 -0000 A co-worker ran into an issue with using an EVFILT_READ kevent on a regular file recently. Specifically, in the manpage it says: EVFILT_READ Takes a descriptor as the identifier, and returns whenever there is data available to read. The behavior of the fil- ter is slightly different depending on the descriptor type. ... Vnodes Returns when the file pointer is not at the end of file. data contains the offset from current position to end of file, and may be negative. He was then working on a program that read to EOF, then seeked back into the file. He was expecting to get a new kevent after seeking back into the file since for his file descriptor after the lseek "there is data available to read" and "the file pointer is not at the end of file". I have a patch to fix this by doing a KNOTE() on a vnode after a successful seek. I checked OS X and it looks like they added this to their lseek() in Snow Leopard (http://fxr.watson.org/fxr/source/bsd/vfs/vfs_syscalls.c?v=xnu-1699.24.8#L4182). The one patch to fix this is below along with a test. Note that unlike OS X I did not add a new NOTE_NONE for this case. OS X has logic in their VFS filter operations that make special assumptions about a hint value of 0, so they had to add NOTE_NONE as a hack. We do not have the same special assumptions about a hint of 0, so we can just use "0". Without this fix the test below complains about missing events for the "after seek" and "after third read" cases. Index: vfs_syscalls.c =================================================================== --- vfs_syscalls.c (revision 228311) +++ vfs_syscalls.c (working copy) @@ -2049,6 +2049,7 @@ sys_lseek(td, uap) if (error != 0) goto drop; fp->f_offset = offset; + VFS_KNOTE_UNLOCKED(vp, 0); *(off_t *)(td->td_retval) = fp->f_offset; drop: fdrop(fp, td); /*- * Test to see if lseek(2) provokes an updated kevent on a regular * file descriptor. */ #include #include #include #include #include #include #include #include #include char template[] = "/tmp/kevent_lseek.XXXXXX"; static int fd, kq; static void check_event(bool expected, off_t offset, const char *desc) { struct timespec ts = { 0, 0 }; struct kevent ev; int retval; retval = kevent(kq, NULL, 0, &ev, 1, &ts); if (retval < 0) err(1, "kevent"); if (!expected) { if (retval != 0) printf("Unexpected kevent: %s\n", desc); } else { if (retval == 0) printf("Missing kevent: %s\n", desc); else if (ev.data != offset) printf("Wrong offset (%jd vs %jd): %s\n", (intmax_t)ev.data, (intmax_t)offset, desc); } } static void readn(size_t count) { char buf[count]; ssize_t nread; nread = read(fd, buf, count); if (nread < 0) err(1, "read"); if (nread != count) errx(1, "short read: %zd vs %zu\n", nread, count); } int main(int ac, char **av) { struct kevent ev; kq = kqueue(); if (kq < 0) err(1, "kqueue"); fd = mkstemp(template); if (fd < 0) err(1, "mkstemp"); if (unlink(template) < 0) err(1, "unlink"); EV_SET(&ev, fd, EVFILT_READ, EV_ADD, 0, 0, 0); if (kevent(kq, &ev, 1, NULL, 0, NULL) < 0) err(1, "kevent(EV_ADD)"); check_event(false, 0, "initial check"); if (ftruncate(fd, 2048) < 0) err(1, "ftruncate(grow)"); check_event(true, 2048, "after grow"); readn(512); check_event(true, 2048 - 512, "after read"); readn(2048 - 512); check_event(false, 0, "after read to EOF"); if (lseek(fd, 1024, SEEK_SET) < 0) err(1, "lseek"); check_event(true, 2048 - 1024, "after seek"); readn(512); check_event(true, 2048 - 1024 - 512, "after third read"); readn(512); check_event(false, 0, "after fourth read to EOF"); close(fd); close(kq); return (0); } -- John Baldwin