Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 26 Feb 2012 14:16:12 +0000
From:      Attilio Rao <attilio@freebsd.org>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        arch@freebsd.org, Florian Smeets <flo@freebsd.org>, Pawel Jakub Dawidek <pjd@freebsd.org>
Subject:   Re: Prefaulting for i/o buffers
Message-ID:  <CAJ-FndAme7Joe1hd05VbvmA4C7_9q26ZQncBKtdBBjGawHqrHQ@mail.gmail.com>
In-Reply-To: <20120226141334.GU55074@deviant.kiev.zoral.com.ua>
References:  <20120203193719.GB3283@deviant.kiev.zoral.com.ua> <CAJ-FndABi21GfcCRTZizCPc_Mnxm1EY271BiXcYt9SD_zXFpXw@mail.gmail.com> <20120225151334.GH1344@garage.freebsd.pl> <CAJ-FndBBKHrpB1MNJTXx8gkFXR2d-O6k5-HJeOAyv2DznpN-QQ@mail.gmail.com> <20120225210339.GM55074@deviant.kiev.zoral.com.ua> <CAJ-FndDZpDXqDRR=kT_eQcHbeg3vdiUjnygy1=QLvVuumUsgBw@mail.gmail.com> <20120226141334.GU55074@deviant.kiev.zoral.com.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
Il 26 febbraio 2012 14:13, Konstantin Belousov <kostikbel@gmail.com> ha scritto:
> On Sun, Feb 26, 2012 at 03:02:54PM +0100, Attilio Rao wrote:
>> Il 25 febbraio 2012 22:03, Konstantin Belousov <kostikbel@gmail.com> ha scritto:
>> > On Sat, Feb 25, 2012 at 06:45:00PM +0100, Attilio Rao wrote:
>> >> Il 25 febbraio 2012 16:13, Pawel Jakub Dawidek <pjd@freebsd.org> ha scritto:
>> >> > On Sat, Feb 25, 2012 at 01:01:32PM +0000, Attilio Rao wrote:
>> >> >> Il 03 febbraio 2012 19:37, Konstantin Belousov <kostikbel@gmail.com> ha scritto:
>> >> >> > FreeBSD I/O infrastructure has well known issue with deadlock caused
>> >> >> > by vnode lock order reversal when buffers supplied to read(2) or
>> >> >> > write(2) syscalls are backed by mmaped file.
>> >> >> >
>> >> >> > I previously published the patches to convert i/o path to use VMIO,
>> >> >> > based on the Jeff Roberson proposal, see
>> >> >> > http://wiki.freebsd.org/VM6. As a side effect, the VM6 fixed the
>> >> >> > deadlock. Since that work is very intrusive and did not got any
>> >> >> > follow-up, it get stalled.
>> >> >> >
>> >> >> > Below is very lightweight patch which only goal is to fix deadlock in
>> >> >> > the least intrusive way. This is possible after FreeBSD got the
>> >> >> > vm_fault_quick_hold_pages(9) and vm_fault_disable_pagefaults(9) KPIs.
>> >> >> > http://people.freebsd.org/~kib/misc/vm1.3.patch
>> >> >>
>> >> >> Hi,
>> >> >> I was reviewing:
>> >> >> http://people.freebsd.org/~kib/misc/vm1.11.patch
>> >> >>
>> >> >> and I think it is great. It is simple enough and I don't have further
>> >> >> comments on it.
>> > Thank you.
>> >
>> > This spoiled an announce I intended to send this weekend :)
>> >
>> >> >>
>> >> >> However, as a side note, I was thinking if we could get one day at the
>> >> >> point to integrate rangelocks into vnodes lockmgr directly.
>> >> >> It would be a huge patch, rewrtiting the locking of several members of
>> >> >> vnodes likely, but I think it would be worth it in terms of cleaness
>> >> >> of the interface and less overhead. Also, it would be interesting to
>> >> >> consider merging rangelock implementation in ZFS' one, at some point.
>> >> >
>> >> > I personal opinion about rangelocks and many other VFS features we
>> >> > currently have is that it is good idea in theory, but in practise it
>> >> > tends to overcomplicate VFS.
>> >> >
>> >> > I'm in opinion that we should move as much stuff as we can to individual
>> >> > file systems. We try to implement everything in VFS itself in hope that
>> >> > this will simplify file systems we have. It then turns out only one file
>> >> > system is really using this stuff (most of the time it is UFS) and this
>> >> > is PITA for all the other file systems as well as maintaining VFS. VFS
>> >> > became so complicated over the years that there are maybe few people
>> >> > that can understand it, and every single change to VFS is a huge risk of
>> >> > potentially breaking some unrelated parts.
>> >>
>> >> I think this is questionable due to the following assets:
>> >> - If the problem is filesystems writers having trouble in
>> >> understanding the necessary locking we should really provide cleaner
>> >> and more complete documentation. One would think the same with our VM
>> >> subsystem, but at least in that case there is plenty of comments that
>> >> help understanding how to deal with vm_object, vm_pages locking during
>> >> their lifelines.
>> >> - Our primitives may be more complicated than the
>> >> 'all-in-the-filesystem' one, but at least they offer a complete and
>> >> centralized view over the resources we have allocated in the whole
>> >> system and they allow building better policies about how to manage
>> >> them. One problem I see here, is that those policies are not fully
>> >> implemented, tuned or just got outdated, removing one of the highest
>> >> beneficial that we have by making vnodes so generic
>> >>
>> >> About the thing I mentioned myself:
>> >> - As long as the same path now has both range-locking and vnode
>> >> locking I don't see as a good idea to keep both separated forever.
>> >> Merging them seems to me an important evolution (not only helping
>> >> shrinking the number of primitives themselves but also introducing
>> >> less overhead and likely rewamped scalability for vnodes (but I think
>> >> this needs a deep investigation).
>> > The proper direction to move there is to designate the vnode lock for
>> > the vnode structure protection, and have the range lock protect the
>> > i/o atomicity. This is somewhat done in the proposed patch (since
>> > now vnode lock does not protect the i/o operation, but only chunked
>> > i/o transactions inside the operation).
>> >
>> > The Jeff idea of using page cache as the source of i/o data (implemented
>> > in the VM6 patchset) pushes the idea much further. E.g., the write
>> > does not obtain the write vnode lock typically (but sometimes it had,
>> > to extend the vnode).
>> >
>> > Probably, I will revive VM6 after this change is landed.
>>
>> About that I guess we might be careful.
>> The first thing would be having a very scalable VM subsystem and
>> recent benchmarks have shown that this is not yet the case (Florian,
>> CC'ed, can share some pmc/LOCK_PROFILE analysis on pgsql that, also
>> with the vmcontention patch, shows a lot on contention on vm_object,
>> pmap lock and vm_page_queue_lock. We have some plans for every of
>> them, we will discuss on a separate thread if you prefer). This is
>> just to say, that we may need more work in underground areas to bring
>> VM6 to the point it will really make a difference.
>
> The benchmarks that were done at that time demonstrated that VM6 do not
> cause regressions for e.g. buildworld time, and have a margin improvements,
> around 10%, for some postgresql loads.
>
> Main benefit of the VM6 on UFS is that writers no longer block readers
> for separate i/o ranges. Also, due to vm_page flags locking improvements,
> I suspect the VM6 backpressure code might be simplified and give even
> larger benefit right now.
>
> Anyway, I do not think that VM6 can be put into HEAD quickly, and I want
> to finish with VM1/prefaulting right now.

I was speaking about a different benchmark.
Florian made a lock_profile/hwpmc analysis on stock + vmcontention
patch for verifying where the biggest bottlenecks are.
Of course, it turns out that the most contended locks are all the ones
involved in VM, which is not surprising at all.

He can share numbers and insight I guess.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-FndAme7Joe1hd05VbvmA4C7_9q26ZQncBKtdBBjGawHqrHQ>