Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 14 Dec 2011 11:41:41 -0500
From:      John Baldwin <jhb@freebsd.org>
To:        arch@freebsd.org
Subject:   Changing lseek() to KNOTE on the vnode when seeking on a file
Message-ID:  <201112141141.41168.jhb@freebsd.org>

next in thread | raw e-mail | index | archive | help
A co-worker ran into an issue with using an EVFILT_READ kevent on a regular 
file recently.  Specifically, in the manpage it says:

     EVFILT_READ    Takes a descriptor as the identifier, and returns whenever
                    there is data available to read.  The behavior of the fil-
                    ter is slightly different depending on the descriptor
                    type.

                    ...

                    Vnodes
                        Returns when the file pointer is not at the end of
                        file.  data contains the offset from current position
                        to end of file, and may be negative.

He was then working on a program that read to EOF, then seeked back into the
file.  He was expecting to get a new kevent after seeking back into the file
since for his file descriptor after the lseek "there is data available to 
read" and "the file pointer is not at the end of file".  I have a patch to fix 
this by doing a KNOTE() on a vnode after a successful seek.  I checked OS X 
and it looks like they added this to their lseek() in Snow Leopard
(http://fxr.watson.org/fxr/source/bsd/vfs/vfs_syscalls.c?v=xnu-1699.24.8#L4182).

The one patch to fix this is below along with a test.  Note that unlike OS X
I did not add a new NOTE_NONE for this case.  OS X has logic in their VFS
filter operations that make special assumptions about a hint value of 0, so
they had to add NOTE_NONE as a hack.  We do not have the same special 
assumptions about a hint of 0, so we can just use "0".  Without this fix the
test below complains about missing events for the "after seek" and "after 
third read" cases.

Index: vfs_syscalls.c
===================================================================
--- vfs_syscalls.c	(revision 228311)
+++ vfs_syscalls.c	(working copy)
@@ -2049,6 +2049,7 @@ sys_lseek(td, uap)
 	if (error != 0)
 		goto drop;
 	fp->f_offset = offset;
+	VFS_KNOTE_UNLOCKED(vp, 0);
 	*(off_t *)(td->td_retval) = fp->f_offset;
 drop:
 	fdrop(fp, td);


/*-
 * Test to see if lseek(2) provokes an updated kevent on a regular
 * file descriptor.
 */

#include <sys/types.h>
#include <sys/event.h>
#include <sys/time.h>
#include <err.h>
#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

char template[] = "/tmp/kevent_lseek.XXXXXX";
static int fd, kq;

static void
check_event(bool expected, off_t offset, const char *desc)
{
	struct timespec ts = { 0, 0 };
	struct kevent ev;
	int retval;

	retval = kevent(kq, NULL, 0, &ev, 1, &ts);
	if (retval < 0)
		err(1, "kevent");
	if (!expected) {
		if (retval != 0)
			printf("Unexpected kevent: %s\n", desc);
	} else {
		if (retval == 0)
			printf("Missing kevent: %s\n", desc);
		else if (ev.data != offset)
			printf("Wrong offset (%jd vs %jd): %s\n",
			    (intmax_t)ev.data, (intmax_t)offset, desc);
	}
}

static void
readn(size_t count)
{
	char buf[count];
	ssize_t nread;

	nread = read(fd, buf, count);
	if (nread < 0)
		err(1, "read");
	if (nread != count)
		errx(1, "short read: %zd vs %zu\n", nread, count);
}

int
main(int ac, char **av)
{
	struct kevent ev;

	kq = kqueue();
	if (kq < 0)
		err(1, "kqueue");
	fd = mkstemp(template);
	if (fd < 0)
		err(1, "mkstemp");
	if (unlink(template) < 0)
		err(1, "unlink");
	EV_SET(&ev, fd, EVFILT_READ, EV_ADD, 0, 0, 0);
	if (kevent(kq, &ev, 1, NULL, 0, NULL) < 0)
		err(1, "kevent(EV_ADD)");

	check_event(false, 0, "initial check");

	if (ftruncate(fd, 2048) < 0)
		err(1, "ftruncate(grow)");
	check_event(true, 2048, "after grow");

	readn(512);
	check_event(true, 2048 - 512, "after read");

	readn(2048 - 512);
	check_event(false, 0, "after read to EOF");

	if (lseek(fd, 1024, SEEK_SET) < 0)
		err(1, "lseek");
	check_event(true, 2048 - 1024, "after seek");

	readn(512);
	check_event(true, 2048 - 1024 - 512, "after third read");

	readn(512);
	check_event(false, 0, "after fourth read to EOF");

	close(fd);
	close(kq);
	return (0);
}

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201112141141.41168.jhb>