From owner-freebsd-fs@FreeBSD.ORG Sun Oct 10 12:16:29 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 519BB106564A; Sun, 10 Oct 2010 12:16:29 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id BB1478FC1D; Sun, 10 Oct 2010 12:16:28 +0000 (UTC) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id o9ACFWHq025034 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 10 Oct 2010 15:15:32 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4) with ESMTP id o9ACFWut059551; Sun, 10 Oct 2010 15:15:32 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4/Submit) id o9ACFWrN059550; Sun, 10 Oct 2010 15:15:32 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Sun, 10 Oct 2010 15:15:32 +0300 From: Kostik Belousov To: Andriy Gapon Message-ID: <20101010121532.GG2392@deviant.kiev.zoral.com.ua> References: <39F05641-4E46-4BE0-81CA-4DEB175A5FBE@free.de> <201010061732.o96HW2Vi005945@higson.cam.lispworks.com> <4CAF45A8.3020401@icyb.net.ua> <4CB18BC6.70305@freebsd.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="URkQCorwCiZbgSAY" Content-Disposition: inline In-Reply-To: <4CB18BC6.70305@freebsd.org> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-3.4 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00, DNS_FROM_OPENWHOIS autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: freebsd-fs@freebsd.org Subject: Re: Locked up processes after upgrade to ZFS v15 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 10 Oct 2010 12:16:29 -0000 --URkQCorwCiZbgSAY Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sun, Oct 10, 2010 at 12:47:50PM +0300, Andriy Gapon wrote: > on 08/10/2010 19:24 Andriy Gapon said the following: > > on 06/10/2010 21:51 Kai Gallasch said the following: > >> > >> Am 06.10.2010 um 19:32 schrieb Martin Simmons: > >> > >>>>>>>> On Wed, 6 Oct 2010 14:28:31 +0200, Kai Gallasch said: > >>>> > >>>> How can I debug this and get further information? > >>> > >>> procstat -k -k $pid will generate a backtrace (or replace $pid by -a = for all > >>> processes). > >> > >> procstat for process 12111 (state: zfs) > >> sonnenkraft:~ # procstat -k -k 12111 > >> PID TID COMM TDNAME KSTACK = =20 > >> 12111 102385 httpd - mi_switch+0x21b sleepq_= switch+0x123 sleepq_wait+0x4d __lockmgr_args+0x7ae vop_stdlock+0x39 VOP_LOC= K1_APV+0x9b _vn_lock+0x57 vget+0x7b cache_lookup+0x4e0 vfs_cache_lookup+0xc= 0 VOP_LOOKUP_APV+0xb7 lookup+0x3d3 namei+0x457 vn_open_cred+0x1e3 kern_open= at+0x181 syscall+0x102 Xfast_syscall+0xe2 > >> > >> procstat for process 24731 (state: zfsmrb) > >> # procstat -k -k 24731 > >> PID TID COMM TDNAME KSTACK = =20 > >> 24731 102273 httpd - mi_switch+0x21b sleepq_= switch+0x123 sleepq_wait+0x4d _sleep+0x369 zfs_freebsd_read+0x2a6 VOP_READ_= APV+0xaf vnode_pager_generic_getpages+0x3ea VOP_GETPAGES_APV+0xb5 vnode_pag= er_getpages+0x8c vm_fault+0x685 trap_pfault+0x128 trap+0x52c calltrap+0x8 >=20 > Hm, I think that we actually shouldn't see a stack like that. > vm_fault sets VPO_BUSY on a page before calling vnode_pager_generic_getpa= ges, so > the thread gets stuck forever in zfs mappedread. > It seems like the page that was seen as invalid in vm_fault becomes valid= while > call flow reaches mappedread. The vnode is share-locked, and vm object lock is dropped and reacquired several times until control reaches zfs_mappedread. This indeed allows a window during which page might be read by other thread. There are two possible routes to solve the issue: 1. Provide zfs-specific VOP_GETPAGES(). 2. Use my vm6 patch. Sigh. >=20 > >> In my original post I wrote that only apache httpd processes would loc= k up.. > >> This is wrong. Several other non-httpd processes also got stuck in sta= te zfs or zfsmrb. > >=20 > > Interesting. > > It's possible that TID 102385 might be waiting on a vnode lock held by = TID 102273. > > But TID 102273 seems to be waiting on a vnode's page lock. > > It would be very interesting to learn what process has that page busy, = for how > > long and why. > > Perhaps there is a code path that busies a page, but never un-busies it= ... > >=20 >=20 >=20 > --=20 > Andriy Gapon --URkQCorwCiZbgSAY Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (FreeBSD) iEYEARECAAYFAkyxrmQACgkQC3+MBN1Mb4hV9ACeLIfbAZYd14eJsqFc1G2qTUhP AVIAnA8z9BMl1sb5RFLOKZOwAengP7gD =7NAB -----END PGP SIGNATURE----- --URkQCorwCiZbgSAY--