From owner-freebsd-stable@freebsd.org Wed Mar 2 11:04:43 2016 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 19A7AAC1E41 for ; Wed, 2 Mar 2016 11:04:43 +0000 (UTC) (envelope-from sobomax@sippysoft.com) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id EBAF71F92 for ; Wed, 2 Mar 2016 11:04:42 +0000 (UTC) (envelope-from sobomax@sippysoft.com) Received: by mailman.ysv.freebsd.org (Postfix) id EB9C1AC1E40; Wed, 2 Mar 2016 11:04:42 +0000 (UTC) Delivered-To: stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D13EBAC1E3E for ; Wed, 2 Mar 2016 11:04:42 +0000 (UTC) (envelope-from sobomax@sippysoft.com) Received: from mail-wm0-x22e.google.com (mail-wm0-x22e.google.com [IPv6:2a00:1450:400c:c09::22e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 5A1421F7F for ; Wed, 2 Mar 2016 11:04:42 +0000 (UTC) (envelope-from sobomax@sippysoft.com) Received: by mail-wm0-x22e.google.com with SMTP id p65so74539374wmp.1 for ; Wed, 02 Mar 2016 03:04:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sippysoft-com.20150623.gappssmtp.com; s=20150623; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc; bh=mRl4W3t7jcWA3apZrRKwxTafm78sJOdSTGznSzZzs+g=; b=FJT9ggbyM9Vrkhuo8MbFx0rk3JHL+61m7P06T67JXMWMMLcVZw9L0Lpd+RCBGFLTob RWnOIgpXTTOuwYTmlW2XAfh4dM3YryPFc2jS3Es7FproVGbmXOC1232LclCQ8PSlHq2M hmRHs7j9ssBiDR0AyNGWIWwWQuYF2DzkS3I6r0M0ZyawB+fC8YklTFCEbvPMIQSu22aC w3y2CMXKvVK5Q3MBfngX3TEnPV1SUG764Ofd/nLZd8bawN04NteL0tiOwv/KUBzcwUfS 3Si/LtqU1jw3atMfmoKjlXt2YB9UnE2CqbMhsaXqmA1RgtCEy0Puu9QEzR+RnLaW/FLK 00Vw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:date :message-id:subject:from:to:cc; bh=mRl4W3t7jcWA3apZrRKwxTafm78sJOdSTGznSzZzs+g=; b=B/k8XoCmLrZx/RXsm62F71hpvfJ2eO9rUuziakW4svfd4yvDmcl8cAGAbylxa5Kib2 LV+yj3vrfOJP5sM/ktIlAQnNoURGZv7Tejrw2/JOeApb7M5xxQuDW1zNutVcg4629cfg 1BTgnSnOrfrj/kDGku3uZ6GVQ+ETwVJWH4u8se3Br4NA0RlZlYzwt84ZEsDW75LXMz16 HmpevRHpQAKHot9DMi46/G5aHpS/HaELM4IGLQnOnL8vg0XoLjsJSRMslzAbsZQfOcvM SAa3f3IerXt5xTeBL50geyuCuenHCyNEFj3Vlk9yB53qZyvqjaT0CeAJBStTpxXjDF1x HQGQ== X-Gm-Message-State: AD7BkJJTy61cv6Fn3NyFUuH7fvukamyrwNRfi4mwamd7TmkVFibifoBeHhY6lXIXera2NopX+ddofwQ+B3vn8av7 MIME-Version: 1.0 X-Received: by 10.28.92.195 with SMTP id q186mr3773417wmb.37.1456916680546; Wed, 02 Mar 2016 03:04:40 -0800 (PST) Sender: sobomax@sippysoft.com Received: by 10.27.218.12 with HTTP; Wed, 2 Mar 2016 03:04:40 -0800 (PST) In-Reply-To: References: <20160302095339.GB67250@kib.kiev.ua> Date: Wed, 2 Mar 2016 03:04:40 -0800 X-Google-Sender-Auth: ak_zCT0PyHsSvzbEEfCRL7TGNUg Message-ID: Subject: Re: Process stuck in "vnread" From: Maxim Sobolev To: Konstantin Belousov Cc: stable@freebsd.org, freebsd-fs@freebsd.org, Kirk McKusick Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Mar 2016 11:04:43 -0000 Sorry gmail hit set too early. Backtrace from the md worker: [Switching to thread 357 (Thread 101131)]#0 0xffffffff8095244e in sched_switch () (kgdb) bt #0 0xffffffff8095244e in sched_switch () #1 0xffffffff809313b1 in mi_switch () #2 0xffffffff8097089a in sleepq_wait () #3 0xffffffff808d344d in _cv_wait () #4 0xffffffff81a42185 in ?? () #5 0xfffff803096d3960 in ?? () #6 0x0000000000000000 in ?? () On Wed, Mar 2, 2016 at 3:02 AM, Maxim Sobolev wrote: > Thanks, Konstantin. > > Re: md(4) state: > > 0 88688 0 0 -8 0 0 16 tx->tx_s DL - 0:45.43 > [md0] > > Its backtrace: > > > About the backtrace, indeed, looks like you are right and some portion of > it is not decoded properly, as it's loaded as a kernel module. The setup is > somewhat even more complicated, the /usr/ports is mounted via NULLFS, so in > this command: > > cp /usr/local/share/automake-1.15/compile ./compile > > The target (i.e. ./compile) here is a path on ZFS that is exported via > NULLFS, while the source is a file on UFS2->md->ZFS. This is probably the > reason stack trace is incomplete, both zfs.ko and nullfs.ko are loaded as > modules and the next few frames point towards those. Unfortunately I cannot > beat kgdb to read symbols from those .ko's and decode them. > > #13 0xffffffff80cb36f1 in copyin () > #14 0xffffffff80977ddf in uiomove_faultflag () > #15 0xffffffff819f699c in ?? () > #16 0xfffffe0468a861a0 in ?? () > #17 0xfffff80000000000 in ?? () > #18 0xfffffe0468a861a0 in ?? () > #19 0xfffff80176b39420 in ?? () > #20 0x0000000000000001 in ?? () > > $ kldstat | grep 0xffffffff819 > 2 1 0xffffffff819bd000 aef8 nullfs.ko > 3 1 0xffffffff819c8000 2fd2f0 zfs.ko > > > > > On Wed, Mar 2, 2016 at 1:53 AM, Konstantin Belousov > wrote: > >> On Wed, Mar 02, 2016 at 01:12:31AM -0800, Maxim Sobolev wrote: >> > Hi, I've encountered cp(1) process stuck in the vnread state on one of >> my >> > build machines that got recently upgraded to 10.3. >> > >> > 0 79596 1 0 20 0 17092 1396 wait I 1 >> 0:00.00 >> > /bin/sh /usr/local/bin/autoreconf -f -i >> > 0 79602 79596 0 52 0 41488 9036 wait I 1 >> 0:00.07 >> > /usr/local/bin/perl -w /usr/local/bin/autoreconf-2.69 -f -i >> > 0 79639 79602 0 72 0 0 0 - Z 1 >> 0:00.27 >> > >> > 0 79762 79602 0 20 0 17092 1396 wait I 1 >> 0:00.00 >> > /bin/sh /usr/local/bin/automake --add-missing --copy --force-missing >> > 0 79768 79762 0 52 0 49736 13936 wait I 1 >> 0:00.11 >> > /usr/local/bin/perl -w /usr/local/bin/automake-1.15 --add-missing --copy >> > --force-missing >> > 0 79962 79768 0 20 0 12368 1024 vnread DL 1 >> 0:00.00 >> > cp /usr/local/share/automake-1.15/compile ./compile >> > >> > I am not sure if it's related to that OS version upgrade, but I have not >> > seen any such issues on the same machine in 2-3 years running >> essentially >> > the same build process with version 9.x, 10.0, 10.1 and 10.2. >> > >> > $ uname -a >> > FreeBSD van01.sippysoft.com 10.3-PRERELEASE FreeBSD 10.3-PRERELEASE #1 >> > 80de3e2(master)-dirty: Tue Feb 2 12:19:57 PST 2016 >> > sobomax@abc.sippysoft.com: >> /usr/obj/usr/home/sobomax/projects/freebsd103/sys/ABC >> > amd64 >> > >> > The kernel stack trace is: >> > >> > (kgdb) thread 360 >> > [Switching to thread 360 (Thread 100515)]#0 0xffffffff8095244e in >> > sched_switch () >> > (kgdb) bt >> > #0 0xffffffff8095244e in sched_switch () >> > #1 0xffffffff809313b1 in mi_switch () >> > #2 0xffffffff8097089a in sleepq_wait () >> > #3 0xffffffff80930dd7 in _sleep () >> > #4 0xffffffff809b230e in bwait () >> > #5 0xffffffff80b511f3 in vnode_pager_generic_getpages () >> > #6 0xffffffff80dd1607 in VOP_GETPAGES_APV () >> > #7 0xffffffff80b4f59a in vnode_pager_getpages () >> > #8 0xffffffff80b30031 in vm_fault_hold () >> > #9 0xffffffff80b2f797 in vm_fault () >> > #10 0xffffffff80cb5a75 in trap_pfault () >> > #11 0xffffffff80cb51dd in trap () >> > #12 0xffffffff80c9b122 in calltrap () >> > #13 0xffffffff80cb36f1 in copyin () >> > #14 0xffffffff80977ddf in uiomove_faultflag () >> The backtrace indicates, with 99% certainity that the issue is in the >> requested read never finishing. But the backtrace is obviously not >> complete, and there might be something more happening. At least, >> we do not handle page-ins during uiomove() on user io for quite >> some time. >> >> If the vnode which io hung is UFS over md, you should look at the md >> worker thread state. >> >> > >> > The FS stack configuration is somewhat unique, so I am not sure if I am >> > hitting some rare race condition or lock ordering issues specific to >> that. >> > It's basically ZFS (ZRAID) on top of pair or SATA SSDs with big file on >> > that FS attached via md(4) and UFS2 on that md(4). The build itself >> runs in >> > chroot with that UFS2 fs as its primary root. >> > >> > Just maybe additional bit of info, attempting to list the directory with >> > that UFS image also got my bash process stuck in "zfs" state, backtrace >> > from that is: >> A deadlock in the underlying io layer is consistent with this (secondary) >> observation. >> >> > >> > (kgdb) thread 353 >> > [Switching to thread 353 (Thread 100508)]#0 0xffffffff8095244e in >> > sched_switch () >> > (kgdb) bt >> > #0 0xffffffff8095244e in sched_switch () >> > #1 0xffffffff809313b1 in mi_switch () >> > #2 0xffffffff8097089a in sleepq_wait () >> > #3 0xffffffff809069ad in sleeplk () >> > #4 0xffffffff809060e0 in __lockmgr_args () >> > #5 0xffffffff809b8b7c in vop_stdlock () >> > #6 0xffffffff80dd0a3b in VOP_LOCK1_APV () >> > #7 0xffffffff809d6d23 in _vn_lock () >> > #8 0xffffffff81a8c9cd in ?? () >> > #9 0x0000000000000000 in ?? () >> >> > > > -- > Maksym Sobolyev > Sippy Software, Inc. > Internet Telephony (VoIP) Experts > Tel (Canada): +1-778-783-0474 > Tel (Toll-Free): +1-855-747-7779 > Fax: +1-866-857-6942 > Web: http://www.sippysoft.com > MSN: sales@sippysoft.com > Skype: SippySoft >