From owner-freebsd-net@freebsd.org Sun Jan 17 21:18:57 2016 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0C521A85765 for ; Sun, 17 Jan 2016 21:18:57 +0000 (UTC) (envelope-from jilles@stack.nl) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id E62681EC9 for ; Sun, 17 Jan 2016 21:18:56 +0000 (UTC) (envelope-from jilles@stack.nl) Received: by mailman.ysv.freebsd.org (Postfix) id E3554A85764; Sun, 17 Jan 2016 21:18:56 +0000 (UTC) Delivered-To: net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C91A9A85763 for ; Sun, 17 Jan 2016 21:18:56 +0000 (UTC) (envelope-from jilles@stack.nl) Received: from mx1.stack.nl (relay04.stack.nl [IPv6:2001:610:1108:5010::107]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client CN "mailhost.stack.nl", Issuer "CA Cert Signing Authority" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 60F471EC8 for ; Sun, 17 Jan 2016 21:18:56 +0000 (UTC) (envelope-from jilles@stack.nl) Received: from snail.stack.nl (snail.stack.nl [IPv6:2001:610:1108:5010::131]) by mx1.stack.nl (Postfix) with ESMTP id C2C98B806B; Sun, 17 Jan 2016 22:18:53 +0100 (CET) Received: by snail.stack.nl (Postfix, from userid 1677) id AD77C28494; Sun, 17 Jan 2016 22:18:53 +0100 (CET) Date: Sun, 17 Jan 2016 22:18:53 +0100 From: Jilles Tjoelker To: Konstantin Belousov Cc: Boris Astardzhiev , net@freebsd.org Subject: Re: Does FreeBSD have sendmmsg or recvmmsg system calls? Message-ID: <20160117211853.GA37847@stack.nl> References: <20160108172323.W1815@besplex.bde.org> <20160108075815.3243.qmail@f5-external.bushwire.net> <20160108204606.G2420@besplex.bde.org> <20160113080349.GC72455@kib.kiev.ua> <20160116195657.GJ3942@kib.kiev.ua> <20160116202534.GK3942@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160116202534.GK3942@kib.kiev.ua> User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 17 Jan 2016 21:18:57 -0000 On Sat, Jan 16, 2016 at 10:25:34PM +0200, Konstantin Belousov wrote: > On Sat, Jan 16, 2016 at 09:56:57PM +0200, Konstantin Belousov wrote: > > After thinking some more, I believe I managed to construct a possible > > way to implement this, in libc, with some libthr extensions. Basically, > > the idea is to have function pthread_cancel_is_pending_np(), which > > would return the state of pending cancel. For some time I thought that > > this cannot work, because cancellation could happen between call to > > the cancel_is_pending() and next recvmmsg(). But, libc has a privilege > > of having access to the syscalls without libthr interposing, just > > call __sys_recvmmsg(), which would give EINTR on the cancel attempt. > > This is an implementation detail, but we can rely on it in implementation. > > In other words, the structure of the code would be like this > > for (i = 0; i < vlen; i++) { > > if (pthread_cancel_is_pending_np()) > > goto out; > Right after writing the text and hitting send, I realized that the > pthread_cancel_is_pending_np() is not needed at all. You get EINTR > from __sys_recvmsg() on the cancel attempt, so everything would just > work without the function. > The crusial part is to use __sys_recvmsg instead of interposable > _recvmsg(). This will typically work (if the cancellation occurs while blocked inside __sys_recvmsg()) but has the usual problem of relying on [EINTR]: lost wakeups. This is certainly less bad than using the interposable recvmsg(), though, which would discard the already received data. As a slight modification, the first recvmsg could use the interposable version, since there is no pending data at that point. This avoids needing to call pthread_testcancel() manually. The regular cancellation code closes this race window using the undocumented thr_wake() system call, on the thread itself, in the signal handler for the cancellation signal. This causes the next attempt to sleep(9) to fail with [EINTR]. (On another note, it appears to be possible for user code (cleanup handlers and pthread_key_create() destructors) to be called with thr_wake() still active, if the cancellation signal handler is called immediately after the cancellation point system call returns.) The race in recvmmsg could be removed using this mechanism but it requires either a separate version of recvmmsg in libthr or a new interface in libthr. I imagine the new interface as a new cancellation type which causes cancellation point functions such as recvmsg() to fail with a new errno when cancelled while leaving cancellation pending. This seems conceptually possible but adds some code to the common path for cancellation points. A new version of pthread_testcancel() with a return value would be needed. -- Jilles Tjoelker