From owner-freebsd-arch@FreeBSD.ORG Mon May 21 10:45:28 2012 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 68AA0106566B for ; Mon, 21 May 2012 10:45:28 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 0026F8FC15 for ; Mon, 21 May 2012 10:45:27 +0000 (UTC) Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q4LAjK6R044780 for ; Mon, 21 May 2012 13:45:20 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id q4LAjItl014291 for ; Mon, 21 May 2012 13:45:18 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q4LAjIbv014290 for arch@freebsd.org; Mon, 21 May 2012 13:45:18 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Mon, 21 May 2012 13:45:18 +0300 From: Konstantin Belousov To: arch@freebsd.org Message-ID: <20120521104518.GU2358@deviant.kiev.zoral.com.ua> References: <20120203193719.GB3283@deviant.kiev.zoral.com.ua> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="i6eMXvFBn1dR7Tww" Content-Disposition: inline In-Reply-To: <20120203193719.GB3283@deviant.kiev.zoral.com.ua> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-4.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: Subject: Re: Prefaulting for i/o buffers X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 21 May 2012 10:45:28 -0000 --i6eMXvFBn1dR7Tww Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Feb 03, 2012 at 09:37:19PM +0200, Konstantin Belousov wrote: > FreeBSD I/O infrastructure has well known issue with deadlock caused > by vnode lock order reversal when buffers supplied to read(2) or > write(2) syscalls are backed by mmaped file. >=20 > I previously published the patches to convert i/o path to use VMIO, > based on the Jeff Roberson proposal, see > http://wiki.freebsd.org/VM6. As a side effect, the VM6 fixed the > deadlock. Since that work is very intrusive and did not got any > follow-up, it get stalled. >=20 > Below is very lightweight patch which only goal is to fix deadlock in > the least intrusive way. This is possible after FreeBSD got the > vm_fault_quick_hold_pages(9) and vm_fault_disable_pagefaults(9) KPIs. > http://people.freebsd.org/~kib/misc/vm1.3.patch >=20 > Theory of operation is described in the patched sys/kern/vfs_vnops.c, > see preamble comment for vn_io_fault(). The patch borrows the > rangelocks implementation from VM6, which was discussed and improved > together with Attilio Rao. >=20 > I was not able to reproduce the deadlock in the targeted test running > for several hours, while stock HEAD deadlocks in the first iteration. >=20 > Below is the benchmark for the worst-case situation for the patched > system, reading 1 byte from a file in a loop. The value is the time in > seconds to execute read(2) for single byte and lseek back to the start > of the file. The loop is executed 100,000,000 times. Machine has > 3.4Ghz Core i7 2600K and used HEAD@230866 with debugging options > turned off. >=20 > As you see, the rangelock overhead for the worst (but uncontented) > case is less then 10%. >=20 > x stock-1-byte.txt > + vm1-1-byte.txt > +------------------------------------------------------------------------= --+ > |xx = ++| > |xxx += ++| > ||A |= A|| > +------------------------------------------------------------------------= --+ > N Min Max Median Avg Stdd= ev > x 5 1.063206e-06 1.065569e-06 1.064172e-06 1.064109e-06 9.8031959e-= 10 > + 5 1.167145e-06 1.170244e-06 1.168939e-06 1.1690444e-06 1.2477022e-= 09 > Difference at 95.0% confidence > 1.04935e-07 +/- 1.63638e-09 > 9.86134% +/- 0.153779% > (Student's t, pooled s =3D 1.122e-09) >=20 I am reviving the thread. Since the original publication of the patch, it got quite intensive reviews and testing from several people, which I appreciate very much. The tagline for the commit would include Reviewed by: attilio, mdf, pjd, rmacklem (nfs client bits) Tested by: pho, flo, Gustau P?rez The latest version of the patch is at http://people.freebsd.org/~kib/misc/vm1.13.patch The main change comparing with the previous publically discussed version is the handling of the user buffers after vm_fault_quick_hold_pages(). I did uiomove() over the region in the previous patch, but apparently VM does not guarantee that corresponding pte entries are not removed, or writeable access is kept. So new version of the patch uses uiomove_fromphys() to avoid touching the usermode buffer, and operates on the hold pages. I shall note that the issue was never observed in real life. This requires trivial modifications of the filesystem code, namely the replacement of uiomove() with new helper function vn_io_fault_uiomove() which handles the details for hold pages access transparently for the filesystem. Again, comments and testers are welcomed. I consider the patch ready to be committed. --i6eMXvFBn1dR7Tww Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (FreeBSD) iEYEARECAAYFAk+6HL4ACgkQC3+MBN1Mb4ihsACgxmsOVO9aLgH8r3xG2VjhD7dm 70EAoNs9r6LPmsatQ1wLOa3BfYsPB/3a =Sfef -----END PGP SIGNATURE----- --i6eMXvFBn1dR7Tww--