Date: Mon, 28 Mar 2016 20:19:14 +0300 From: Andriy Gapon <avg@FreeBSD.org> To: Konstantin Belousov <kostikbel@gmail.com>, Maxim Sobolev <sobomax@sippysoft.com> Cc: freebsd-fs@FreeBSD.org, Kirk McKusick <mckusick@mckusick.com>, stable@FreeBSD.org, kib@FreeBSD.org Subject: Re: Process stuck in "vnread" Message-ID: <56F96792.2010800@FreeBSD.org> In-Reply-To: <20160328162310.GJ1741__41334.1269981631$1459182219$gmane$org@kib.kiev.ua> References: <CAH7qZfs3EwT8jnKyodHxF_5nK18MeLSaB_F-qqOfwV0MJMD7Vg@mail.gmail.com> <CAH7qZfssCPxc_uuMoxwAqa6qdi1y=VCqRT6hk-=mTU15RwOCAg@mail.gmail.com> <CAH7qZftHP0b30AnF4Fds9%2BotY0Y24HMFuO=RmkqcBJD3wFNkHg@mail.gmail.com> <20160328162310.GJ1741__41334.1269981631$1459182219$gmane$org@kib.kiev.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
On 28/03/2016 19:23, Konstantin Belousov wrote: > On Mon, Mar 28, 2016 at 08:52:03AM -0700, Maxim Sobolev wrote: >> Done some head scratching, it looks like it's got page fault in the >> copyin() (cp(1) AFAIK mmaps source file). There might be some interlock >> issue between competing write to the same ZFS, the md0 device is locked >> forever waiting for the write operation to complete at the very same time. >> I am curious as to whether we are allowed to sleep in the dmu_write_uio_dbuf(), >> AFAIK dmu is ZFS's transaction layer, so maybe copyin() should be done >> earlier to avoid possible page fault in there? Maxim, is this copy from UFS to ZFS? It looks like that because the copyin() fault goes to vnode_pager_generic_getpages() -> bwait()... > No idea about ZFS, but if the issue is due to copyin(9) recursing into > VM and then VFS while owning file system locks, it is well-known and > long-standing issue. I sometimes call it 'ups deadlock', for some > reasons, see tools/test/upsdl/ for the distilled test case. > > It is handled for UFS and NFS, read the long comment starting with 'The > vn_io_fault() is a wrapper' in sys/kern/vfs_vnops.c, which describes the > deadlock in details and explains the mechanism which is used to prevent > it. Filesystems must opt-in into it by specifiying MNTK_NO_IOPF flag, > and then being ready to get an array of pages for io instead of the buffer > KVA. I don't have any idea why the thread would be stuck in bwait() and what locks and threads are involved here. But, as Kostik said, there is a general problem and I have a patch for ZFS: https://reviews.freebsd.org/D2790 -- Andriy Gapon
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?56F96792.2010800>