Date: Fri, 27 Mar 2015 15:26:54 +0200 From: Konstantin Belousov <kostikbel@gmail.com> To: Jilles Tjoelker <jilles@stack.nl> Cc: freebsd-hackers@FreeBSD.org, Ivan Radovanovic <radovanovic@gmail.com> Subject: Re: kevent behavior Message-ID: <20150327132654.GJ2379@kib.kiev.ua> In-Reply-To: <20150326225826.GA97319@stack.nl> References: <550A6DA2.1070004@gmail.com> <20150324221541.GA67584@stack.nl> <20150325090041.GY2379@kib.kiev.ua> <20150325223530.GA79065@stack.nl> <20150326214551.GG2379@kib.kiev.ua> <20150326225826.GA97319@stack.nl>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Mar 26, 2015 at 11:58:27PM +0100, Jilles Tjoelker wrote:
> On Thu, Mar 26, 2015 at 11:45:51PM +0200, Konstantin Belousov wrote:
> > On Wed, Mar 25, 2015 at 11:35:30PM +0100, Jilles Tjoelker wrote:
> 
> > > POSIX does not say anything about kevent(), including whether it should
> > > be a cancellation point or not. Given that kevent() may block for a long
> > > time, making it a cancellation point seems to make sense.
> 
> > But POSIX language is very explicit for its own set of interfaces, and
> > that makes sense.  This is why I think that cancellability, after the
> > 15+ years of kqueue existence, should be opt in.
> 
> Hmm, I think most code written is cancel-unsafe. It is unlikely to have
> cancel-safe code using kevent() and relying on it not being a
> cancellation point, but still possible.
Ok, I re-considered my opinion after re-reading your responses.
> 
> This also means we shouldn't wait too long with adding missing
> cancellation points like ppoll().
> 
> > > The kevent() cancellation point implementation would be much like most
> > > other cancellation points such as pselect(), with the difference that
> > > the call may have committed even if it fails, in the case that
> > > nchanges > nevents and in the case that nchanges > 0 && errno == EINTR.
> > > If cancellation is requested while blocking in the latter case, libthr
> > > will have to return -1/EINTR to indicate to the application that the
> > > changes were applied successfully while allowing the thread to be
> > > cancelled at the next cancellation point, even though there may not be
> > > any signal handler.
I do not quite understand this.
If any error (except things like EFAULT) occured while processing the
changelist, kevent(2) returns error count, and not the -1, regarless of
the length of eventlist.  In this case, we do not cancel.
Syscall cannot return n/EINTR, where n >= 0.
If -1/EINTR is returned, this means that the call blocked and sleep
was interrupted.  The changes from the changelist were applied, 
do you suggested that we must not cancel if nchanges > 0 ?
> 
> > > If nevents == 0, kevent() does not block and need not be a cancellation
> > > point. This special case may simplify things slightly for application
> > > programmers.
> 
> > > A kqueue() flag for cancellation does not seem very useful: since such a
> > > flag would be a kernel thing and pthread cancellation is a libthr thing,
> > > requesting the flag from the kernel would slow down all kevent() calls.
> 
> > Yes, I thought about adding some (small) userspace table which would
> > track cancellable kqueues.
> 
> I think the interaction with dup() and similar calls and with exec makes
> this too nasty.
Hm, yes.
> 
> > But if we make the cancellation state per-call, and not a state for kqueue
> > descriptor, we might indicate the cancellability by some event type, which
> > is not passed to the kernel at all.  The libthr wrapper for kevent(2) would
> > need to scan the changelist and zero-out the cancellation indicator.
> 
> This seems a bit hackish. However, enabling cancellation per-call seems
> better than per-kqueue.
Patch below always considers kevent(2) as cancellable, unless nevents == 0.
I will correct the call to thr_cancel_leave() if I misunderstood you.
diff --git a/lib/libc/include/libc_private.h b/lib/libc/include/libc_private.h
index e4bf4a6..ceaa2a3 100644
--- a/lib/libc/include/libc_private.h
+++ b/lib/libc/include/libc_private.h
@@ -221,6 +221,7 @@ enum {
 	INTERPOS__pthread_mutex_init_calloc_cb,
 	INTERPOS_spinlock,
 	INTERPOS_spinunlock,
+	INTERPOS_kevent,
 	INTERPOS_MAX
 };
 
@@ -293,6 +294,7 @@ void *	__sys_freebsd6_mmap(void *, __size_t, int, int, int, int, __off_t);
 struct aiocb;
 struct fd_set;
 struct iovec;
+struct kevent;
 struct msghdr;
 struct pollfd;
 struct rusage;
@@ -315,6 +317,8 @@ int		__sys_fsync(int);
 __pid_t		__sys_fork(void);
 int		__sys_ftruncate(int, __off_t);
 int		__sys_gettimeofday(struct timeval *, struct timezone *);
+int		__sys_kevent(int, const struct kevent *, int, struct kevent *,
+		    int, const struct timespec *);
 __off_t		__sys_lseek(int, __off_t, int);
 void	       *__sys_mmap(void *, __size_t, int, int, int, __off_t);
 int		__sys_msync(void *, __size_t, int);
diff --git a/lib/libc/sys/Makefile.inc b/lib/libc/sys/Makefile.inc
index 0edf644b..7745b2a 100644
--- a/lib/libc/sys/Makefile.inc
+++ b/lib/libc/sys/Makefile.inc
@@ -51,6 +51,7 @@ INTERPOSED = \
 	fcntl \
 	fsync \
 	fork \
+	kevent \
 	msync \
 	nanosleep \
 	open \
diff --git a/lib/libc/sys/interposing_table.c b/lib/libc/sys/interposing_table.c
index 0fd6c75..4290bc6 100644
--- a/lib/libc/sys/interposing_table.c
+++ b/lib/libc/sys/interposing_table.c
@@ -75,6 +75,7 @@ interpos_func_t __libc_interposing[INTERPOS_MAX] = {
 	SLOT(_pthread_mutex_init_calloc_cb, _pthread_mutex_init_calloc_cb_stub),
 	SLOT(spinlock, __libc_spinlock_stub),
 	SLOT(spinunlock, __libc_spinunlock_stub),
+	SLOT(kevent, __sys_kevent),
 };
 #undef SLOT
 
diff --git a/lib/libc/sys/kevent.c b/lib/libc/sys/kevent.c
new file mode 100644
index 0000000..5f84ef8
--- /dev/null
+++ b/lib/libc/sys/kevent.c
@@ -0,0 +1,53 @@
+/*
+ * Copyright (c) 2015 The FreeBSD Foundation.
+ * All rights reserved.
+ *
+ * Portions of this software were developed by Konstantin Belousov
+ * under sponsorship from the FreeBSD Foundation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice(s), this list of conditions and the following disclaimer as
+ *    the first lines of this file unmodified other than the possible
+ *    addition of one or more copyright notices.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice(s), this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDER(S) ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE COPYRIGHT HOLDER(S) BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+ * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
+ * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
+ * OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE,
+ * EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <sys/cdefs.h>
+__FBSDID("$FreeBSD$");
+
+#include <sys/types.h>
+#include <sys/event.h>
+#include <sys/time.h>
+#include "libc_private.h"
+
+__weak_reference(__sys_kevent, __kevent);
+
+#pragma weak kevent
+int
+kevent(int kq, const struct kevent *changelist, int nchanges,
+    struct kevent *eventlist, int nevents, const struct timespec *timeout)
+{
+
+	return (((int (*)(int, const struct kevent *, int,
+	    struct kevent *, int, const struct timespec *))
+	    __libc_interposing[INTERPOS_kevent])(kq, changelist, nchanges,
+	   eventlist, nevents, timeout));
+}
diff --git a/lib/libc/sys/kqueue.2 b/lib/libc/sys/kqueue.2
index 93223f1..fa76c54 100644
--- a/lib/libc/sys/kqueue.2
+++ b/lib/libc/sys/kqueue.2
@@ -24,7 +24,7 @@
 .\"
 .\" $FreeBSD$
 .\"
-.Dd July 18, 2014
+.Dd March 27, 2015
 .Dt KQUEUE 2
 .Os
 .Sh NAME
@@ -41,7 +41,7 @@
 .Fn kqueue "void"
 .Ft int
 .Fn kevent "int kq" "const struct kevent *changelist" "int nchanges" "struct kevent *eventlist" "int nevents" "const struct timespec *timeout"
-.Fn EV_SET "&kev" ident filter flags fflags data udata
+.Fn EV_SET "kev" ident filter flags fflags data udata
 .Sh DESCRIPTION
 The
 .Fn kqueue
@@ -550,6 +550,16 @@ On return,
 .Va fflags
 contains the users defined flags in the lower 24 bits.
 .El
+.Sh CANCELLATION BEHAVIOUR
+If
+.Fa nevents
+is non-zero, i.e. the function is potentially blocking, the call
+is the cancellation point.
+Otherwise, i.e. if
+.Fa nevents
+is zero, the call is not cancellable.
+Cancellation can only occur when the call was blocked and no changes
+to the queue were requested.
 .Sh RETURN VALUES
 The
 .Fn kqueue
diff --git a/lib/libthr/thread/thr_syscalls.c b/lib/libthr/thread/thr_syscalls.c
index 10fbad4..e71bf4a 100644
--- a/lib/libthr/thread/thr_syscalls.c
+++ b/lib/libthr/thread/thr_syscalls.c
@@ -341,6 +341,29 @@ __thr_pselect(int count, fd_set *rfds, fd_set *wfds, fd_set *efds,
 	return (ret);
 }
 
+static int
+__thr_kevent(int kq, const struct kevent *changelist, int nchanges,
+    struct kevent *eventlist, int nevents, const struct timespec *timeout)
+{
+	struct pthread *curthread;
+	int ret;
+
+	if (nevents == 0) {
+		/*
+		 * No blocking, do not make the call cancellable.
+		 */
+		return (__sys_kevent(kq, changelist, nchanges, eventlist,
+		    nevents, timeout));
+	}
+	curthread = _get_curthread();
+	_thr_cancel_enter(curthread);
+	ret = __sys_kevent(kq, changelist, nchanges, eventlist, nevents,
+	    timeout);
+	_thr_cancel_leave(curthread, ret == -1 && nchanges == 0);
+
+	return (ret);
+}
+
 /*
  * Cancellation behavior:
  *   Thread may be canceled at start, but if the system call got some data, 
@@ -599,6 +622,7 @@ __thr_interpose_libc(void)
 	SLOT(writev);
 	SLOT(spinlock);
 	SLOT(spinunlock);
+	SLOT(kevent);
 #undef SLOT
 	*(__libc_interposing_slot(
 	    INTERPOS__pthread_mutex_init_calloc_cb)) =
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20150327132654.GJ2379>
