From owner-freebsd-stable@FreeBSD.ORG Fri May 31 14:58:36 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 1A47E2AF for ; Fri, 31 May 2013 14:58:36 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) by mx1.freebsd.org (Postfix) with ESMTP id EC738911 for ; Fri, 31 May 2013 14:58:35 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 6940CB917; Fri, 31 May 2013 10:58:35 -0400 (EDT) From: John Baldwin To: freebsd-stable@freebsd.org Subject: Re: FreeBSD-9.1: machine reboots during snapshot creation, LORs found Date: Fri, 31 May 2013 10:51:03 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p25; KDE/4.5.5; amd64; ; ) References: <20130531122611.GA6607@bali> In-Reply-To: <20130531122611.GA6607@bali> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201305311051.03157.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Fri, 31 May 2013 10:58:35 -0400 (EDT) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 31 May 2013 14:58:36 -0000 On Friday, May 31, 2013 8:26:11 am Andre Albsmeier wrote: > Each day at 5:15 we are generating snapshots on various machines. > This used to work perfectly under 7-STABLE for years but since > we started to use 9.1-STABLE the machine reboots in about 10% > of all cases. > > After rebooting we find a new snapshot file which is a bit > smaller than the good ones and with different permissions > It does not succeed a fsck. In this example it is the one > whose name is beginning with s3: > > -r--r----- 1 root operator snapshot 72802894528 29 May 05:15 s2-2013.05.28-03.15.04 > -r-------- 1 root operator snapshot 72802893824 29 May 05:15 s3-2013.05.29-03.15.03 > -r--r----- 1 root operator snapshot 72802894528 28 May 14:22 s4-2013.05.23-06.38.44 > -r--r----- 1 root operator snapshot 72802894528 28 May 14:22 s5-2013.05.24-03.15.03 > -r--r----- 1 root operator snapshot 72802894528 28 May 14:22 s6-2013.05.25-03.15.03 > > After enabling DIAGNOSTIC, WITNESS and INVARIANTS in the kernel > I see the following LORs (mksnap_ffs starts exactly at 5:15): > > May 29 05:15:00 palveli kernel: lock order reversal: > May 29 05:15:00 palveli kernel: 1st 0xc2371da8 ufs (ufs) @ /src/src-9/sys/kern/vfs_mount.c:1240 > May 29 05:15:00 palveli kernel: 2nd 0xc2371ec4 devfs (devfs) @ /src/src-9/sys/ufs/ffs/ffs_vfsops.c:1414 > May 29 05:15:04 palveli kernel: lock order reversal: > May 29 05:15:04 palveli kernel: 1st 0xc228471c snaplk (snaplk) @ /src/src-9/sys/ufs/ufs/ufs_vnops.c:976 > May 29 05:15:04 palveli kernel: 2nd 0xc22f25e4 ufs (ufs) @ /src/src-9/sys/ufs/ffs/ffs_snapshot.c:1626 > > Unfortunatley no corefiles are being generated ;-(. > > I have checked and even rebuilt the (UFS1) fs in question > from scratch. I have also seen this happen on an UFS2 on > another machine and on a third one when running "dump -L" > on a root fs. > > Any hints of how to proceed? Would it be possible to setup a serial console that is logged on this machine to see if it is panic'ing but failing to write out a crashdump? -- John Baldwin