Date: Fri, 3 Feb 2012 19:40:37 +0000 From: Attilio Rao <attilio@freebsd.org> To: Konstantin Belousov <kostikbel@gmail.com> Cc: arch@freebsd.org Subject: Re: Prefaulting for i/o buffers Message-ID: <CAJ-FndDyFBQvmg1sBXfdZij6jC=WvWoYDBBurAOg=q36mdcPYw@mail.gmail.com> In-Reply-To: <20120203193719.GB3283@deviant.kiev.zoral.com.ua> References: <20120203193719.GB3283@deviant.kiev.zoral.com.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
2012/2/3 Konstantin Belousov <kostikbel@gmail.com>: > FreeBSD I/O infrastructure has well known issue with deadlock caused > by vnode lock order reversal when buffers supplied to read(2) or > write(2) syscalls are backed by mmaped file. > > I previously published the patches to convert i/o path to use VMIO, > based on the Jeff Roberson proposal, see > http://wiki.freebsd.org/VM6. As a side effect, the VM6 fixed the > deadlock. Since that work is very intrusive and did not got any > follow-up, it get stalled. > > Below is very lightweight patch which only goal is to fix deadlock in > the least intrusive way. This is possible after FreeBSD got the > vm_fault_quick_hold_pages(9) and vm_fault_disable_pagefaults(9) KPIs. > http://people.freebsd.org/~kib/misc/vm1.3.patch > > Theory of operation is described in the patched sys/kern/vfs_vnops.c, > see preamble comment for vn_io_fault(). The patch borrows the > rangelocks implementation from VM6, which was discussed and improved > together with Attilio Rao. > > I was not able to reproduce the deadlock in the targeted test running > for several hours, while stock HEAD deadlocks in the first iteration. > > Below is the benchmark for the worst-case situation for the patched > system, reading 1 byte from a file in a loop. The value is the time in > seconds to execute read(2) for single byte and lseek back to the start > of the file. The loop is executed 100,000,000 times. Machine has > 3.4Ghz Core i7 2600K and used HEAD@230866 with debugging options > turned off. > > As you see, the rangelock overhead for the worst (but uncontented) > case is less then 10%. > > x stock-1-byte.txt > + vm1-1-byte.txt > +------------------------------------------------------------------------= --+ > |xx =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0++| > |xxx =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0+++| > ||A =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 |A|| > +------------------------------------------------------------------------= --+ > =C2=A0 =C2=A0N =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Min =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 Max =C2=A0 =C2=A0 =C2=A0 =C2=A0Median =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 Avg =C2=A0 =C2=A0 =C2=A0 =C2=A0Stddev > x =C2=A0 5 =C2=A01.063206e-06 =C2=A01.065569e-06 =C2=A01.064172e-06 =C2= =A01.064109e-06 9.8031959e-10 > + =C2=A0 5 =C2=A01.167145e-06 =C2=A01.170244e-06 =C2=A01.168939e-06 1.169= 0444e-06 1.2477022e-09 > Difference at 95.0% confidence > =C2=A0 =C2=A0 =C2=A0 =C2=A01.04935e-07 +/- 1.63638e-09 > =C2=A0 =C2=A0 =C2=A0 =C2=A09.86134% +/- 0.153779% > =C2=A0 =C2=A0 =C2=A0 =C2=A0(Student's t, pooled s =3D 1.122e-09) Do you have an ETA for reviews? When do you plan to commit this? it would be valuable to get a grasp on the benchmark and refine the performance difference as much as possible. Thanks, Attilio --=20 Peace can only be achieved by understanding - A. Einstein
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-FndDyFBQvmg1sBXfdZij6jC=WvWoYDBBurAOg=q36mdcPYw>