From owner-freebsd-arch@FreeBSD.ORG Fri Feb 3 20:10:21 2012 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B07891065678 for ; Fri, 3 Feb 2012 20:10:21 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from mail-ww0-f50.google.com (mail-ww0-f50.google.com [74.125.82.50]) by mx1.freebsd.org (Postfix) with ESMTP id 2CA708FC16 for ; Fri, 3 Feb 2012 20:10:20 +0000 (UTC) Received: by wgbdq11 with SMTP id dq11so4372743wgb.31 for ; Fri, 03 Feb 2012 12:10:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=P9CjPqk0COYuEZIj0GF8LV+JyHchr7EoESR9hYxCSQw=; b=FIAFm32vh+0VgI30KTjBl8c+BwLB8HV5rH5U4X0kxgTK+m7QiFTAyOAT3YgyIF0O4f Rs47H/5//eORccvZj5OzxCOQxT9DYZigE7D4xYwF6n2zZZ+evHBvrYKUii5rbT3dgzq8 HA5Th0McJt1BUFvhPul9Te+P6lIA+Zlsy4ScE= MIME-Version: 1.0 Received: by 10.180.92.226 with SMTP id cp2mr13657714wib.10.1328298037851; Fri, 03 Feb 2012 11:40:37 -0800 (PST) Sender: asmrookie@gmail.com Received: by 10.216.177.73 with HTTP; Fri, 3 Feb 2012 11:40:37 -0800 (PST) In-Reply-To: <20120203193719.GB3283@deviant.kiev.zoral.com.ua> References: <20120203193719.GB3283@deviant.kiev.zoral.com.ua> Date: Fri, 3 Feb 2012 19:40:37 +0000 X-Google-Sender-Auth: kqo88koH42lAy3SzxAxGoq17LRc Message-ID: From: Attilio Rao To: Konstantin Belousov Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: arch@freebsd.org Subject: Re: Prefaulting for i/o buffers X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 03 Feb 2012 20:10:21 -0000 2012/2/3 Konstantin Belousov : > FreeBSD I/O infrastructure has well known issue with deadlock caused > by vnode lock order reversal when buffers supplied to read(2) or > write(2) syscalls are backed by mmaped file. > > I previously published the patches to convert i/o path to use VMIO, > based on the Jeff Roberson proposal, see > http://wiki.freebsd.org/VM6. As a side effect, the VM6 fixed the > deadlock. Since that work is very intrusive and did not got any > follow-up, it get stalled. > > Below is very lightweight patch which only goal is to fix deadlock in > the least intrusive way. This is possible after FreeBSD got the > vm_fault_quick_hold_pages(9) and vm_fault_disable_pagefaults(9) KPIs. > http://people.freebsd.org/~kib/misc/vm1.3.patch > > Theory of operation is described in the patched sys/kern/vfs_vnops.c, > see preamble comment for vn_io_fault(). The patch borrows the > rangelocks implementation from VM6, which was discussed and improved > together with Attilio Rao. > > I was not able to reproduce the deadlock in the targeted test running > for several hours, while stock HEAD deadlocks in the first iteration. > > Below is the benchmark for the worst-case situation for the patched > system, reading 1 byte from a file in a loop. The value is the time in > seconds to execute read(2) for single byte and lseek back to the start > of the file. The loop is executed 100,000,000 times. Machine has > 3.4Ghz Core i7 2600K and used HEAD@230866 with debugging options > turned off. > > As you see, the rangelock overhead for the worst (but uncontented) > case is less then 10%. > > x stock-1-byte.txt > + vm1-1-byte.txt > +------------------------------------------------------------------------= --+ > |xx =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0++| > |xxx =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0+++| > ||A =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 |A|| > +------------------------------------------------------------------------= --+ > =C2=A0 =C2=A0N =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Min =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 Max =C2=A0 =C2=A0 =C2=A0 =C2=A0Median =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 Avg =C2=A0 =C2=A0 =C2=A0 =C2=A0Stddev > x =C2=A0 5 =C2=A01.063206e-06 =C2=A01.065569e-06 =C2=A01.064172e-06 =C2= =A01.064109e-06 9.8031959e-10 > + =C2=A0 5 =C2=A01.167145e-06 =C2=A01.170244e-06 =C2=A01.168939e-06 1.169= 0444e-06 1.2477022e-09 > Difference at 95.0% confidence > =C2=A0 =C2=A0 =C2=A0 =C2=A01.04935e-07 +/- 1.63638e-09 > =C2=A0 =C2=A0 =C2=A0 =C2=A09.86134% +/- 0.153779% > =C2=A0 =C2=A0 =C2=A0 =C2=A0(Student's t, pooled s =3D 1.122e-09) Do you have an ETA for reviews? When do you plan to commit this? it would be valuable to get a grasp on the benchmark and refine the performance difference as much as possible. Thanks, Attilio --=20 Peace can only be achieved by understanding - A. Einstein