From owner-freebsd-current@FreeBSD.ORG Sun Apr 5 10:27:44 2009 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4F02E1065676; Sun, 5 Apr 2009 10:27:44 +0000 (UTC) (envelope-from bsam@ipt.ru) Received: from mail.ipt.ru (mail.ipt.ru [194.62.233.102]) by mx1.freebsd.org (Postfix) with ESMTP id 05D848FC14; Sun, 5 Apr 2009 10:27:43 +0000 (UTC) (envelope-from bsam@ipt.ru) Received: from gate.ipt.ru ([194.62.233.123] helo=h30.sp.ipt.ru) by mail.ipt.ru with esmtp (Exim 4.62 (FreeBSD)) (envelope-from ) id 1LqPZl-0004sX-Li; Sun, 05 Apr 2009 14:27:41 +0400 Received: from bsam by h30.sp.ipt.ru with local (Exim 4.69 (FreeBSD)) (envelope-from ) id 1LqPZl-0000aw-FQ; Sun, 05 Apr 2009 14:27:41 +0400 To: "O. Hartmann" References: <200904041050.28932.thierry.herbelot@free.fr> <200904041151.18209.thierry.herbelot@free.fr> <10969763@bb.ipt.ru> <49D88435.30900@mail.zedat.fu-berlin.de> From: Boris Samorodov Date: Sun, 05 Apr 2009 14:27:41 +0400 In-Reply-To: <49D88435.30900@mail.zedat.fu-berlin.de> (O. Hartmann's message of "Sun\, 05 Apr 2009 12\:13\:09 +0200") Message-ID: <82316530@h30.sp.ipt.ru> User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.3 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: Thierry Herbelot , freebsd-current@freebsd.org, peter@FreeBSD.org Subject: Re: Stuck kernel while cleaning up the object tree X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 05 Apr 2009 10:27:44 -0000 On Sun, 05 Apr 2009 12:13:09 +0200 O. Hartmann wrote: > Boris Samorodov wrote: > > On Sat, 4 Apr 2009 11:51:17 +0200 Thierry Herbelot wrote: > >=20=20=20 > >> Le Saturday 04 April 2009, Thierry Herbelot a =C3=83=C2=A9crit : > >>=20=20=20=20=20 > >>> Hello, > >>> > >>> On recent -current machines, I have seen a common pattern, with the m= achine > >>> being frozen (still responsive to pings, though) in the initial phase= s of > >>> the buildworld procedure : > >>> > >>> example freeze : > >>> -------------------------------------------------------------- > >>> > >>>=20=20=20=20=20=20=20 > >>>>>> stage 2.1: cleaning up the object tree > >>>>>>=20=20=20=20=20=20=20=20=20=20=20=20=20 > >>> -------------------------------------------------------------- > >>> cd /usr/src; MAKEOBJDIRPREFIX=3D/usr/obj MACHINE_ARCH=3Di386 MACHIN= E=3Di386 > >>> CPUTYPE=3D GROFF_BIN_PATH=3D/usr/obj/usr/src/tmp/legacy/usr/bin > >>> GROFF_FONT_PATH=3D/usr/obj/usr/src/tmp/legacy/usr/share/groff_font > >>> GROFF_TMAC_PATH=3D/usr/obj/usr/src/tmp/legacy/usr/share/tmac > >>> _SHLIBDIRPREFIX=3D/usr/obj/usr/src/tmp VERSION=3D"FreeBSD 8.0-CURREN= T i386 > >>> 800074" INSTALL=3D"sh /usr/src/tools/install.sh" > >>> PATH=3D/usr/obj/usr/src/tmp/legacy/usr/sbin:/usr/obj/usr/src/tmp/lega= cy/usr/b > >>> in:/usr/obj/usr/src/tmp/legacy/usr/games:/usr/obj/usr/src/tmp/usr/sbi= n:/usr/ > >>> obj/usr/src/tmp/usr/bin:/usr/obj/usr/src/tmp/usr/games:/sbin:/bin:/us= r/sbin: > >>> /usr/bin NO_CTF=3D1 make -f Makefile.inc1 DESTDIR=3D/usr/obj/usr/src/= tmp > >>> par-cleandir =3D=3D=3D> share/info (cleandir) > >>> =3D=3D=3D> lib (cleandir) > >>> =3D=3D=3D> lib/csu/i386-elf (cleandir) > >>> [type ^T in the console] > >>> load: 0.00 cmd: sh 24587 [*Name Cache] 0.01u 0.00s 0% 1584k > >>> > >>> The other machines also froze while "cleaning up the object tree". > >>> > >>> The machines are configured with serial consoles : I have no kernel s= tack > >>> backtrace to aid in pinpointing the cause of this freeze. > >>> > >>> Cheers > >>> > >>> TfH > >>>=20=20=20=20=20=20=20 > > > >=20=20=20 > >> With a bit more investigation : > >>=20=20=20=20=20 > > > >=20=20=20 > >> on a separate ssh session, top is still live and shows processes stuck= as : > >> 24523 root 1 76 0 1888K 764K *Name 1 0:00 0.00= % make > >>=20=20=20=20=20 > > > >=20=20=20 > >> on still another machine, running Witnesses (all other machines run wi= th a=20 > >> lean GENERIC, with most of the debuging features commented out) : > >> System call __getcwd returning with the following locks held: > >> shared rw Name Cache (Name Cache) r =3D 0 (0xc0ee7e1c) locked=20 > >> @ /usr/src/sys/kerne/vfs_cache.c:974 > >>=20=20=20=20=20 > > > > This is definitely related to: > > SVN rev 190655 on 2009-04-02 21:16:20Z by peter > > (peter@ CCed) > > > >=20=20=20 > >> panic: witness_warn > >> cpuid =3D 0 > >> KDB: enter: panic > Is there a fix in sight soon? I do have this error/fault/lockup now on > ALL FreeBSD 8.0-CURRENT/amd64 machines I have. I've reverted SVN rev 190655 and it's OK for half a day now. WBR --=20 Boris Samorodov (bsam) Research Engineer, http://www.ipt.ru Telephone & Internet SP FreeBSD committer, http://www.FreeBSD.org The Power To Serve