Date: Mon, 28 Mar 2016 19:23:10 +0300 From: Konstantin Belousov <kostikbel@gmail.com> To: Maxim Sobolev <sobomax@sippysoft.com> Cc: stable@freebsd.org, freebsd-fs@freebsd.org, Pawel Jakub Dawidek <pjd@freebsd.org>, Kirk McKusick <mckusick@mckusick.com>, kib@freebsd.org Subject: Re: Process stuck in "vnread" Message-ID: <20160328162310.GJ1741@kib.kiev.ua> In-Reply-To: <CAH7qZftHP0b30AnF4Fds9%2BotY0Y24HMFuO=RmkqcBJD3wFNkHg@mail.gmail.com> References: <CAH7qZfs3EwT8jnKyodHxF_5nK18MeLSaB_F-qqOfwV0MJMD7Vg@mail.gmail.com> <CAH7qZfssCPxc_uuMoxwAqa6qdi1y=VCqRT6hk-=mTU15RwOCAg@mail.gmail.com> <CAH7qZftHP0b30AnF4Fds9%2BotY0Y24HMFuO=RmkqcBJD3wFNkHg@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Mar 28, 2016 at 08:52:03AM -0700, Maxim Sobolev wrote: > Done some head scratching, it looks like it's got page fault in the > copyin() (cp(1) AFAIK mmaps source file). There might be some interlock > issue between competing write to the same ZFS, the md0 device is locked > forever waiting for the write operation to complete at the very same time. > I am curious as to whether we are allowed to sleep in the dmu_write_uio_dbuf(), > AFAIK dmu is ZFS's transaction layer, so maybe copyin() should be done > earlier to avoid possible page fault in there? No idea about ZFS, but if the issue is due to copyin(9) recursing into VM and then VFS while owning file system locks, it is well-known and long-standing issue. I sometimes call it 'ups deadlock', for some reasons, see tools/test/upsdl/ for the distilled test case. It is handled for UFS and NFS, read the long comment starting with 'The vn_io_fault() is a wrapper' in sys/kern/vfs_vnops.c, which describes the deadlock in details and explains the mechanism which is used to prevent it. Filesystems must opt-in into it by specifiying MNTK_NO_IOPF flag, and then being ready to get an array of pages for io instead of the buffer KVA.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160328162310.GJ1741>