From owner-freebsd-stable@FreeBSD.ORG Sun Jun 16 07:15:47 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 4ECA36A4; Sun, 16 Jun 2013 07:15:47 +0000 (UTC) (envelope-from jdc@koitsu.org) Received: from slow1-d.mail.gandi.net (slow1-d.mail.gandi.net [217.70.178.86]) by mx1.freebsd.org (Postfix) with ESMTP id CF1421D0C; Sun, 16 Jun 2013 07:15:46 +0000 (UTC) Received: from relay5-d.mail.gandi.net (relay5-d.mail.gandi.net [217.70.183.197]) by slow1-d.mail.gandi.net (Postfix) with ESMTP id 07295535C38; Sun, 16 Jun 2013 08:55:01 +0200 (CEST) Received: from mfilter10-d.gandi.net (mfilter10-d.gandi.net [217.70.178.139]) by relay5-d.mail.gandi.net (Postfix) with ESMTP id 3450341C06B; Sun, 16 Jun 2013 08:54:45 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at mfilter10-d.gandi.net Received: from relay5-d.mail.gandi.net ([217.70.183.197]) by mfilter10-d.gandi.net (mfilter10-d.gandi.net [10.0.15.180]) (amavisd-new, port 10024) with ESMTP id ncyqRdISyhxZ; Sun, 16 Jun 2013 08:54:43 +0200 (CEST) X-Originating-IP: 76.102.14.35 Received: from jdc.koitsu.org (c-76-102-14-35.hsd1.ca.comcast.net [76.102.14.35]) (Authenticated sender: jdc@koitsu.org) by relay5-d.mail.gandi.net (Postfix) with ESMTPSA id 3EC8841C05A; Sun, 16 Jun 2013 08:54:43 +0200 (CEST) Received: by icarus.home.lan (Postfix, from userid 1000) id 5F2B173A1C; Sat, 15 Jun 2013 23:54:41 -0700 (PDT) Date: Sat, 15 Jun 2013 23:54:41 -0700 From: Jeremy Chadwick To: Andre Albsmeier Subject: Re: FreeBSD-9.1: machine reboots during snapshot creation, LORs found Message-ID: <20130616065441.GA15175@icarus.home.lan> References: <20130531122611.GA6607@bali> <201305311051.03157.jhb@freebsd.org> <20130531172523.GA9188@bali> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130531172523.GA9188@bali> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: "freebsd-stable@freebsd.org" , John Baldwin X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Jun 2013 07:15:47 -0000 On Fri, May 31, 2013 at 07:25:23PM +0200, Andre Albsmeier wrote: > On Fri, 31-May-2013 at 16:51:03 +0200, John Baldwin wrote: > > On Friday, May 31, 2013 8:26:11 am Andre Albsmeier wrote: > > > Each day at 5:15 we are generating snapshots on various machines. > > > This used to work perfectly under 7-STABLE for years but since > > > we started to use 9.1-STABLE the machine reboots in about 10% > > > of all cases. > > > > > > After rebooting we find a new snapshot file which is a bit > > > smaller than the good ones and with different permissions > > > It does not succeed a fsck. In this example it is the one > > > whose name is beginning with s3: > > > > > > -r--r----- 1 root operator snapshot 72802894528 29 May 05:15 s2-2013.05.28-03.15.04 > > > -r-------- 1 root operator snapshot 72802893824 29 May 05:15 s3-2013.05.29-03.15.03 > > > -r--r----- 1 root operator snapshot 72802894528 28 May 14:22 s4-2013.05.23-06.38.44 > > > -r--r----- 1 root operator snapshot 72802894528 28 May 14:22 s5-2013.05.24-03.15.03 > > > -r--r----- 1 root operator snapshot 72802894528 28 May 14:22 s6-2013.05.25-03.15.03 > > > > > > After enabling DIAGNOSTIC, WITNESS and INVARIANTS in the kernel > > > I see the following LORs (mksnap_ffs starts exactly at 5:15): > > > > > > May 29 05:15:00 palveli kernel: lock order reversal: > > > May 29 05:15:00 palveli kernel: 1st 0xc2371da8 ufs (ufs) @ /src/src-9/sys/kern/vfs_mount.c:1240 > > > May 29 05:15:00 palveli kernel: 2nd 0xc2371ec4 devfs (devfs) @ /src/src-9/sys/ufs/ffs/ffs_vfsops.c:1414 > > > May 29 05:15:04 palveli kernel: lock order reversal: > > > May 29 05:15:04 palveli kernel: 1st 0xc228471c snaplk (snaplk) @ /src/src-9/sys/ufs/ufs/ufs_vnops.c:976 > > > May 29 05:15:04 palveli kernel: 2nd 0xc22f25e4 ufs (ufs) @ /src/src-9/sys/ufs/ffs/ffs_snapshot.c:1626 > > > > > > Unfortunatley no corefiles are being generated ;-(. > > > > > > I have checked and even rebuilt the (UFS1) fs in question > > > from scratch. I have also seen this happen on an UFS2 on > > > another machine and on a third one when running "dump -L" > > > on a root fs. > > > > > > Any hints of how to proceed? > > > > Would it be possible to setup a serial console that is logged on this machine > > to see if it is panic'ing but failing to write out a crashdump? > > I'll try to arrange that. It'll take a bit since this > box is 200 km away... > > Maybe I'll find another one nearby to reproduce it... SPECIFICALLY regarding "lack of crash dumps": I need to see the following: * cat /etc/rc.conf * cat /etc/fstab I may need output from other commands, but shall deal with that when I see output from the above. Thanks. -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB |