Date: Sun, 26 Feb 2012 15:22:04 +0100 From: Florian Smeets <flo@FreeBSD.org> To: Attilio Rao <attilio@FreeBSD.org> Cc: Konstantin Belousov <kostikbel@gmail.com>, arch@FreeBSD.org, Pawel Jakub Dawidek <pjd@FreeBSD.org> Subject: Re: Prefaulting for i/o buffers Message-ID: <4F4A400C.1030606@FreeBSD.org> In-Reply-To: <CAJ-FndAme7Joe1hd05VbvmA4C7_9q26ZQncBKtdBBjGawHqrHQ@mail.gmail.com> References: <20120203193719.GB3283@deviant.kiev.zoral.com.ua> <CAJ-FndABi21GfcCRTZizCPc_Mnxm1EY271BiXcYt9SD_zXFpXw@mail.gmail.com> <20120225151334.GH1344@garage.freebsd.pl> <CAJ-FndBBKHrpB1MNJTXx8gkFXR2d-O6k5-HJeOAyv2DznpN-QQ@mail.gmail.com> <20120225210339.GM55074@deviant.kiev.zoral.com.ua> <CAJ-FndDZpDXqDRR=kT_eQcHbeg3vdiUjnygy1=QLvVuumUsgBw@mail.gmail.com> <20120226141334.GU55074@deviant.kiev.zoral.com.ua> <CAJ-FndAme7Joe1hd05VbvmA4C7_9q26ZQncBKtdBBjGawHqrHQ@mail.gmail.com>
index | next in thread | previous in thread | raw e-mail
[-- Attachment #1 --] On 26.02.12 15:16, Attilio Rao wrote: > Il 26 febbraio 2012 14:13, Konstantin Belousov <kostikbel@gmail.com> ha scritto: >> On Sun, Feb 26, 2012 at 03:02:54PM +0100, Attilio Rao wrote: >>> Il 25 febbraio 2012 22:03, Konstantin Belousov <kostikbel@gmail.com> ha scritto: >>>> On Sat, Feb 25, 2012 at 06:45:00PM +0100, Attilio Rao wrote: >>>>> Il 25 febbraio 2012 16:13, Pawel Jakub Dawidek <pjd@freebsd.org> ha scritto: >>>>>> On Sat, Feb 25, 2012 at 01:01:32PM +0000, Attilio Rao wrote: >>>>>>> Il 03 febbraio 2012 19:37, Konstantin Belousov <kostikbel@gmail.com> ha scritto: >>>>>>>> FreeBSD I/O infrastructure has well known issue with deadlock caused >>>>>>>> by vnode lock order reversal when buffers supplied to read(2) or >>>>>>>> write(2) syscalls are backed by mmaped file. >>>>>>>> >>>>>>>> I previously published the patches to convert i/o path to use VMIO, >>>>>>>> based on the Jeff Roberson proposal, see >>>>>>>> http://wiki.freebsd.org/VM6. As a side effect, the VM6 fixed the >>>>>>>> deadlock. Since that work is very intrusive and did not got any >>>>>>>> follow-up, it get stalled. >>>>>>>> >>>>>>>> Below is very lightweight patch which only goal is to fix deadlock in >>>>>>>> the least intrusive way. This is possible after FreeBSD got the >>>>>>>> vm_fault_quick_hold_pages(9) and vm_fault_disable_pagefaults(9) KPIs. >>>>>>>> http://people.freebsd.org/~kib/misc/vm1.3.patch >>>>>>> >>>>>>> Hi, >>>>>>> I was reviewing: >>>>>>> http://people.freebsd.org/~kib/misc/vm1.11.patch >>>>>>> >>>>>>> and I think it is great. It is simple enough and I don't have further >>>>>>> comments on it. >>>> Thank you. >>>> >>>> This spoiled an announce I intended to send this weekend :) >>>> >>>>>>> >>>>>>> However, as a side note, I was thinking if we could get one day at the >>>>>>> point to integrate rangelocks into vnodes lockmgr directly. >>>>>>> It would be a huge patch, rewrtiting the locking of several members of >>>>>>> vnodes likely, but I think it would be worth it in terms of cleaness >>>>>>> of the interface and less overhead. Also, it would be interesting to >>>>>>> consider merging rangelock implementation in ZFS' one, at some point. >>>>>> >>>>>> I personal opinion about rangelocks and many other VFS features we >>>>>> currently have is that it is good idea in theory, but in practise it >>>>>> tends to overcomplicate VFS. >>>>>> >>>>>> I'm in opinion that we should move as much stuff as we can to individual >>>>>> file systems. We try to implement everything in VFS itself in hope that >>>>>> this will simplify file systems we have. It then turns out only one file >>>>>> system is really using this stuff (most of the time it is UFS) and this >>>>>> is PITA for all the other file systems as well as maintaining VFS. VFS >>>>>> became so complicated over the years that there are maybe few people >>>>>> that can understand it, and every single change to VFS is a huge risk of >>>>>> potentially breaking some unrelated parts. >>>>> >>>>> I think this is questionable due to the following assets: >>>>> - If the problem is filesystems writers having trouble in >>>>> understanding the necessary locking we should really provide cleaner >>>>> and more complete documentation. One would think the same with our VM >>>>> subsystem, but at least in that case there is plenty of comments that >>>>> help understanding how to deal with vm_object, vm_pages locking during >>>>> their lifelines. >>>>> - Our primitives may be more complicated than the >>>>> 'all-in-the-filesystem' one, but at least they offer a complete and >>>>> centralized view over the resources we have allocated in the whole >>>>> system and they allow building better policies about how to manage >>>>> them. One problem I see here, is that those policies are not fully >>>>> implemented, tuned or just got outdated, removing one of the highest >>>>> beneficial that we have by making vnodes so generic >>>>> >>>>> About the thing I mentioned myself: >>>>> - As long as the same path now has both range-locking and vnode >>>>> locking I don't see as a good idea to keep both separated forever. >>>>> Merging them seems to me an important evolution (not only helping >>>>> shrinking the number of primitives themselves but also introducing >>>>> less overhead and likely rewamped scalability for vnodes (but I think >>>>> this needs a deep investigation). >>>> The proper direction to move there is to designate the vnode lock for >>>> the vnode structure protection, and have the range lock protect the >>>> i/o atomicity. This is somewhat done in the proposed patch (since >>>> now vnode lock does not protect the i/o operation, but only chunked >>>> i/o transactions inside the operation). >>>> >>>> The Jeff idea of using page cache as the source of i/o data (implemented >>>> in the VM6 patchset) pushes the idea much further. E.g., the write >>>> does not obtain the write vnode lock typically (but sometimes it had, >>>> to extend the vnode). >>>> >>>> Probably, I will revive VM6 after this change is landed. >>> >>> About that I guess we might be careful. >>> The first thing would be having a very scalable VM subsystem and >>> recent benchmarks have shown that this is not yet the case (Florian, >>> CC'ed, can share some pmc/LOCK_PROFILE analysis on pgsql that, also >>> with the vmcontention patch, shows a lot on contention on vm_object, >>> pmap lock and vm_page_queue_lock. We have some plans for every of >>> them, we will discuss on a separate thread if you prefer). This is >>> just to say, that we may need more work in underground areas to bring >>> VM6 to the point it will really make a difference. >> >> The benchmarks that were done at that time demonstrated that VM6 do not >> cause regressions for e.g. buildworld time, and have a margin improvements, >> around 10%, for some postgresql loads. >> >> Main benefit of the VM6 on UFS is that writers no longer block readers >> for separate i/o ranges. Also, due to vm_page flags locking improvements, >> I suspect the VM6 backpressure code might be simplified and give even >> larger benefit right now. >> >> Anyway, I do not think that VM6 can be put into HEAD quickly, and I want >> to finish with VM1/prefaulting right now. > > I was speaking about a different benchmark. > Florian made a lock_profile/hwpmc analysis on stock + vmcontention > patch for verifying where the biggest bottlenecks are. > Of course, it turns out that the most contended locks are all the ones > involved in VM, which is not surprising at all. > > He can share numbers and insight I guess. All i did until now is run PostgreSQL with 128 client threads with lock_profiling [1] and hwpmc [2]. I haven't spent any time analyzing this, yet. [1] http://people.freebsd.org/~flo/vmc-lock-profiling-postgres-128-20120208.txt [2] http://people.freebsd.org/~flo/vmc-hwpmc-gprof-postgres-128-20120208.txt [-- Attachment #2 --] -----BEGIN PGP SIGNATURE----- iEYEARECAAYFAk9KQAwACgkQapo8P8lCvwl0mgCg2+4H30fWR7qt3g6iIxlYN28W iNIAn2b6unvHqHukMX+Tdp8rtgn/4TP2 =jfVO -----END PGP SIGNATURE-----home | help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4F4A400C.1030606>
