From owner-freebsd-arch@freebsd.org Thu Feb 11 00:21:36 2016 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id DC969AA3A06 for ; Thu, 11 Feb 2016 00:21:36 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 9D609CC5 for ; Thu, 11 Feb 2016 00:21:36 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from ralph.baldwin.cx (c-73-231-226-104.hsd1.ca.comcast.net [73.231.226.104]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 31688B91F; Wed, 10 Feb 2016 19:21:35 -0500 (EST) From: John Baldwin To: freebsd-arch@freebsd.org Cc: Jilles Tjoelker Subject: Re: Refactoring asynchronous I/O Date: Wed, 10 Feb 2016 16:21:30 -0800 Message-ID: <2604669.yblYTI1T32@ralph.baldwin.cx> User-Agent: KMail/4.14.3 (FreeBSD/10.2-STABLE; KDE/4.14.3; amd64; ; ) In-Reply-To: <1897781.JF7dmmKAT1@ralph.baldwin.cx> References: <2793494.0Z1kBV82mT@ralph.baldwin.cx> <20160210225844.GA89743@stack.nl> <1897781.JF7dmmKAT1@ralph.baldwin.cx> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Wed, 10 Feb 2016 19:21:35 -0500 (EST) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Feb 2016 00:21:37 -0000 On Wednesday, February 10, 2016 03:19:12 PM John Baldwin wrote: > On Wednesday, February 10, 2016 11:58:44 PM Jilles Tjoelker wrote: > > On Mon, Feb 08, 2016 at 11:22:25AM -0800, John Baldwin wrote: > > > On Friday, February 05, 2016 10:32:12 PM Jilles Tjoelker wrote: > > > > On Fri, Feb 05, 2016 at 12:21:52PM -0800, John Baldwin wrote: > > > > > On Monday, February 01, 2016 12:02:14 AM Jilles Tjoelker wrote: > > > > > > On Tue, Jan 26, 2016 at 05:39:03PM -0800, John Baldwin wrote: > > > > > > > Note that binding the AIO support to a new fileop does mean that > > > > > > > the AIO code now becomes mandatory (rather than optional). We > > > > > > > could perhaps make the system calls continue to be optional if > > > > > > > people really need that, but the guts of the code will now need to > > > > > > > always be in the kernel. > > > > > > > > Enabling this by default is OK with me as long as the easy ways to get a > > > > > > stuck process are at least disabled by default. Currently, a process > > > > > > gets stuck forever if it has an AIO request from or to a pipe that will > > > > > > never complete. An AIO daemon should not be allowed to perform an > > > > > > unbounded sleep such as for a pipe (NFS server should be OK). > > > > > > > One thing I could do is split vfs_aio.c into two files: kern_aio.c that > > > > > holds the "library" such as aio_aqueue() / aio_complete(), etc. and a > > > > > sys_aio.c that is just the system calls. kern_aio.c would be standard, > > > > > but sys_aio.c could still be optional and aio.ko would contain it. > > > > > This would still make AIO optional, though aio.ko would be fairly small, > > > > > so not having it probably wouldn't save much in terms of size. > > > > > > > Does this seem reasonable or is a trivial aio.ko not worth it? > > > > > > It is one possible option. Another option is to refuse AIO for kinds of > > > > file that have not been vetted to avoid blocking an AIO daemon for too > > > > long. "Kinds of file" is about the code that will be executed to do I/O, > > > > so that, for example, /dev/klog and /dev/ttyv0 are different kinds. > > > > Depending on whether this restriction breaks existing code, it may need > > > > to be a sysctl. > > > > > This doesn't seem to be very easy to implement, especially if it should > > > vary by character device. It sounds like what you would do today is > > > to allow mlock() and AIO on sockets and maybe the fast physio path > > > and disable everything else by default for now? This would be slightly > > > more feasible than something more fine-grained. A sysctl would allow > > > "full" AIO use that would disable these checks? > > > > It does not seem that complicated to me. A check can be inserted before > > placing a job into aio_jobs (in aio_queue_file() in the patched code). > > If the code can detect some safe cases, they could be allowed; > > otherwise, the AIO daemons will not be used. I'm assuming that the code > > in the new fo_aio_queue file ops behaves properly and will not cause AIO > > daemons to get stuck. > > > > Note that this means there is no easy way to do kernel AIO on a kind of > > file without specific support in the code for that kind of file. I think > > the risk of a stuck process (that may even cause shutdown to hang > > indefinitely) is bad enough to forbid it by default. > > Ok. One issue is that some software may assume that aio will "work" if > modfind("aio") works and might be surprised by it not working. I'm not sure > how real that is. I don't plan on merging this to 10 so we will have some > testing time in 11 to figure that out. If it does prove problematic we can > revert to splitting the syscalls out into aio.ko. For now I think the two > cases to enable by default would be sockets (which use a fo_aio_queue method) > and physio. We could let the VFS_AIO option change the sysctl's default > value to allow all AIO for compat, though that seems a bit clumsy as the > name doesn't really make sense for that. > > (I also still don't like the vfs_aio.c filename as aio is not VFS-specific.) I've pushed a new version with a vfs.aio.enable_unsafe sysctl (defaults to off). If you think this is ok then I'll work on cleaning up some of the module bits (removing the unload support mostly). I figure marking the syscalls as STD and removing all the helper stuff can be done as a followup commit to reduce extra noise in this diff (which is large enough on its own). https://github.com/freebsd/freebsd/compare/master...bsdjhb:aio_rework (Also, for the inital commit I will leave out the socket protocol hook. That can be added later.) -- John Baldwin