Date: Sun, 16 Jun 2013 10:02:39 +0200 From: Andre Albsmeier <Andre.Albsmeier@siemens.com> To: Jeremy Chadwick <jdc@koitsu.org> Cc: "freebsd-stable@freebsd.org" <freebsd-stable@freebsd.org>, John Baldwin <jhb@freebsd.org> Subject: Re: FreeBSD-9.1: machine reboots during snapshot creation, LORs found Message-ID: <20130616080239.GA73100@bali> In-Reply-To: <20130616065441.GA15175@icarus.home.lan> References: <20130531122611.GA6607@bali> <201305311051.03157.jhb@freebsd.org> <20130531172523.GA9188@bali> <20130616065441.GA15175@icarus.home.lan>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 16-Jun-2013 at 08:54:41 +0200, Jeremy Chadwick wrote: > On Fri, May 31, 2013 at 07:25:23PM +0200, Andre Albsmeier wrote: > > On Fri, 31-May-2013 at 16:51:03 +0200, John Baldwin wrote: > > > On Friday, May 31, 2013 8:26:11 am Andre Albsmeier wrote: > > > > Each day at 5:15 we are generating snapshots on various machines. > > > > This used to work perfectly under 7-STABLE for years but since > > > > we started to use 9.1-STABLE the machine reboots in about 10% > > > > of all cases. > > > > > > > > After rebooting we find a new snapshot file which is a bit > > > > smaller than the good ones and with different permissions > > > > It does not succeed a fsck. In this example it is the one > > > > whose name is beginning with s3: > > > > > > > > -r--r----- 1 root operator snapshot 72802894528 29 May 05:15 s2-2013.05.28-03.15.04 > > > > -r-------- 1 root operator snapshot 72802893824 29 May 05:15 s3-2013.05.29-03.15.03 > > > > -r--r----- 1 root operator snapshot 72802894528 28 May 14:22 s4-2013.05.23-06.38.44 > > > > -r--r----- 1 root operator snapshot 72802894528 28 May 14:22 s5-2013.05.24-03.15.03 > > > > -r--r----- 1 root operator snapshot 72802894528 28 May 14:22 s6-2013.05.25-03.15.03 > > > > > > > > After enabling DIAGNOSTIC, WITNESS and INVARIANTS in the kernel > > > > I see the following LORs (mksnap_ffs starts exactly at 5:15): > > > > > > > > May 29 05:15:00 <kern.crit> palveli kernel: lock order reversal: > > > > May 29 05:15:00 <kern.crit> palveli kernel: 1st 0xc2371da8 ufs (ufs) @ /src/src-9/sys/kern/vfs_mount.c:1240 > > > > May 29 05:15:00 <kern.crit> palveli kernel: 2nd 0xc2371ec4 devfs (devfs) @ /src/src-9/sys/ufs/ffs/ffs_vfsops.c:1414 > > > > May 29 05:15:04 <kern.crit> palveli kernel: lock order reversal: > > > > May 29 05:15:04 <kern.crit> palveli kernel: 1st 0xc228471c snaplk (snaplk) @ /src/src-9/sys/ufs/ufs/ufs_vnops.c:976 > > > > May 29 05:15:04 <kern.crit> palveli kernel: 2nd 0xc22f25e4 ufs (ufs) @ /src/src-9/sys/ufs/ffs/ffs_snapshot.c:1626 > > > > > > > > Unfortunatley no corefiles are being generated ;-(. > > > > > > > > I have checked and even rebuilt the (UFS1) fs in question > > > > from scratch. I have also seen this happen on an UFS2 on > > > > another machine and on a third one when running "dump -L" > > > > on a root fs. > > > > > > > > Any hints of how to proceed? > > > > > > Would it be possible to setup a serial console that is logged on this machine > > > to see if it is panic'ing but failing to write out a crashdump? > > > > I'll try to arrange that. It'll take a bit since this > > box is 200 km away... > > > > Maybe I'll find another one nearby to reproduce it... > > SPECIFICALLY regarding "lack of crash dumps": I need to see the > following: > > * cat /etc/rc.conf > * cat /etc/fstab > > I may need output from other commands, but shall deal with that when I > see output from the above. Thanks. No problem, see below... To make a long story short, the machine dumps core perfectly (tested that a while ago), but not when dealing with _this_ issue... I dump on da1s1b and savecore fetches it from there and puts it on /var (sitting on da0), that's faster. rc.conf (beware, rc.conf.local exists): --------------------------------------- rcshutdown_timeout=180 tmpmfs=YES tmpsize="$(( `/sbin/sysctl -n hw.usermem` / 3000000 ))m" tmpmfs_flags="$tmpmfs_flags -v 1 -n" background_fsck=NO nisdomainname=ofw.tld pflog_flags=-S syslogd_flags=-svv inetd_enable=YES inetd_flags=-l named_flags="-S 1000" named_chrootdir="" rwhod_enable=YES sshd_enable=YES amd_enable=YES amd_flags="-F /etc/amd.conf" nfs_client_enable=YES nfs_access_cache=2 mountd_flags=-n rpcbind_enable=YES ntpdate_enable=YES ntpdate_hosts=ntp ntpd_enable=YES ntpd_flags="-p /var/run/ntpd.pid" nis_client_enable=YES nis_client_flags="-s -S ofw.tld,nis-16-1,nis-16-2" nis_server_flags=-n nis_yppasswdd_flags="-t /var/yp/src/master.passwd -f -v" defaultrouter=192.168.16.2 keyrate=fast sendmail_flags="-bd -q5m" sendmail_submit_flags="$sendmail_flags -ODaemonPortOptions=Addr=localhost" sendmail_msp_queue_flags="-Ac -q30m" sendmail_rebuild_aliases=NO lpd_enable=YES lpd_flags=-s chkprintcap_enable=YES dumpdev=AUTO clear_tmp_X=NO ldconfig_paths=/usr/local/lib ldconfig_paths_aout="" entropy_file=/boot/entropy-file rc.conf.local: -------------- hostname=typhon.ofw.tld ifconfig_msk0="inet 192.168.24.1/21" ifconfig_msk0_alias0="inet 192.168.24.10/32" named_enable=YES nfs_server_enable=YES nis_client_flags="-s -S ofw.tld,nis-24-1,nis-24-2" nis_server_enable=YES defaultrouter=192.168.24.2 lpd_flags=-l dumpdev=/dev/da1s1b quota_enable=YES fstab: ------ /dev/da0s1a / ufs noatime,rw 0 1 /dev/da0s1b none swap sw 0 0 proc /proc procfs rw 0 0 /dev/da0s1d /usr ufs noatime,rw 0 2 /dev/da0s1e /var ufs noatime,nosuid,rw 0 2 /dev/da10p1 /share2 ufs suiddir,groupquota,noatime,nosuid,rw 0 2 /dev/da10p2 /raid2 ufs userquota,noatime,nosuid,rw 0 2
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130616080239.GA73100>