From owner-freebsd-stable@FreeBSD.ORG Thu Jul 4 05:24:49 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 1804E45C; Thu, 4 Jul 2013 05:24:49 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) by mx1.freebsd.org (Postfix) with ESMTP id 4D4661E9B; Thu, 4 Jul 2013 05:24:48 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.7/8.14.7) with ESMTP id r645Oeb9049135; Thu, 4 Jul 2013 08:24:40 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.8.3 kib.kiev.ua r645Oeb9049135 Received: (from kostik@localhost) by tom.home (8.14.7/8.14.7/Submit) id r645OeeG049134; Thu, 4 Jul 2013 08:24:40 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Thu, 4 Jul 2013 08:24:40 +0300 From: Konstantin Belousov To: Andre Albsmeier Subject: Re: FreeBSD-9.1: machine reboots during snapshot creation, LORs found Message-ID: <20130704052440.GG91021@kib.kiev.ua> References: <20130531122611.GA6607@bali> <201305311051.03157.jhb@freebsd.org> <20130616063942.GA72803@bali> <201306171530.31208.jhb@freebsd.org> <20130704051409.GA22021@bali> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="vDF0KQD20bz5pO0G" Content-Disposition: inline In-Reply-To: <20130704051409.GA22021@bali> User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home Cc: "freebsd-stable@freebsd.org" , John Baldwin X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jul 2013 05:24:49 -0000 --vDF0KQD20bz5pO0G Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Jul 04, 2013 at 07:14:09AM +0200, Andre Albsmeier wrote: > On Mon, 17-Jun-2013 at 21:30:31 +0200, John Baldwin wrote: > > On Sunday, June 16, 2013 2:39:42 am Andre Albsmeier wrote: > > > On Fri, 31-May-2013 at 16:51:03 +0200, John Baldwin wrote: > > > > On Friday, May 31, 2013 8:26:11 am Andre Albsmeier wrote: > > > > > Each day at 5:15 we are generating snapshots on various machines. > > > > > This used to work perfectly under 7-STABLE for years but since > > > > > we started to use 9.1-STABLE the machine reboots in about 10% > > > > > of all cases. > > > > >=20 > > > > > After rebooting we find a new snapshot file which is a bit > > > > > smaller than the good ones and with different permissions > > > > > It does not succeed a fsck. In this example it is the one > > > > > whose name is beginning with s3: > > > > >=20 > > > > > -r--r----- 1 root operator snapshot 72802894528 29 May 05:15 = s2-2013.05.28-03.15.04 > > > > > -r-------- 1 root operator snapshot 72802893824 29 May 05:15 = s3-2013.05.29-03.15.03 > > > > > -r--r----- 1 root operator snapshot 72802894528 28 May 14:22 = s4-2013.05.23-06.38.44 > > > > > -r--r----- 1 root operator snapshot 72802894528 28 May 14:22 = s5-2013.05.24-03.15.03 > > > > > -r--r----- 1 root operator snapshot 72802894528 28 May 14:22 = s6-2013.05.25-03.15.03 > > > > >=20 > > > > > After enabling DIAGNOSTIC, WITNESS and INVARIANTS in the kernel > > > > > I see the following LORs (mksnap_ffs starts exactly at 5:15): > > > > >=20 > > > > > May 29 05:15:00 palveli kernel: lock order reversal: > > > > > May 29 05:15:00 palveli kernel: 1st 0xc2371da8 ufs (u= fs) @ /src/src-9/sys/kern/vfs_mount.c:1240 > > > > > May 29 05:15:00 palveli kernel: 2nd 0xc2371ec4 devfs = (devfs) @ /src/src-9/sys/ufs/ffs/ffs_vfsops.c:1414 > > > > > May 29 05:15:04 palveli kernel: lock order reversal: > > > > > May 29 05:15:04 palveli kernel: 1st 0xc228471c snaplk= (snaplk) @ /src/src-9/sys/ufs/ufs/ufs_vnops.c:976 > > > > > May 29 05:15:04 palveli kernel: 2nd 0xc22f25e4 ufs (u= fs) @ /src/src-9/sys/ufs/ffs/ffs_snapshot.c:1626 > > > > >=20 > > > > > Unfortunatley no corefiles are being generated ;-(. > > > > >=20 > > > > > I have checked and even rebuilt the (UFS1) fs in question > > > > > from scratch. I have also seen this happen on an UFS2 on > > > > > another machine and on a third one when running "dump -L" > > > > > on a root fs. > > > > >=20 > > > > > Any hints of how to proceed? > > > >=20 > > > > Would it be possible to setup a serial console that is logged on th= is machine > > > > to see if it is panic'ing but failing to write out a crashdump? > > >=20 > > > Couldn't attach the serial console yet ;-(. But I had people > > > attach a KVMoverIP switch and enabled the various KDB options > > > in the kernel. Now we can see a bit more (see below) -- no > > > crashdump is being generated though. > >=20 > > :( Unfortunately these LORs don't really help with discerning the caus= e of > > the reboot. If you have remote power access (and still wanted to test = this) > > one option would be to change KDB to drop into the debugger on a panic. > > Then you could connect over the KVM and take images of the original pan= ic > > along with a stack trace. >=20 > After a few days of no problems, the box decided to crash > during mksnap_ffs today ;-(. But now I have a crashdump, > see below. Unfortunatley, I cannot upload the dump somewhere > but if you ask me check whatever things I'll be happy to help. >=20 > kgdb /usr/obj/src/src-9/sys/palveli/kernel.debug vmcore.4 > GNU gdb 6.1.1 [FreeBSD] > Copyright 2004 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and you = are > welcome to change it and/or distribute copies of it under certain conditi= ons. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for detail= s. > This GDB was configured as "i386-marcel-freebsd"... >=20 > Unread portion of the kernel message buffer: >=20 >=20 > Fatal trap 12: page fault while in kernel mode > fault virtual address =3D 0xcfb5e000 > fault code =3D supervisor write, page not present > instruction pointer =3D 0x20:0xc07cb2fe > stack pointer =3D 0x28:0xd83545d0 > frame pointer =3D 0x28:0xd835490c > code segment =3D base 0x0, limit 0xfffff, type 0x1b > =3D DPL 0, pres 1, def32 1, gran 1 > processor eflags =3D interrupt enabled, resume, IOPL =3D 0 > current process =3D 12929 (mksnap_ffs) > trap number =3D 12 > panic: page fault > KDB: stack backtrace: > db_trace_self_wrapper(c08207eb,d835441c,c05fdfc9,c081df13,c08a82e0,...) a= t db_trace_self_wrapper+0x26/frame 0xd83543ec > kdb_backtrace(c081df13,c08a82e0,c0801bfa,d8354428,d8354428,...) at kdb_ba= cktrace+0x29/frame 0xd83543f8 > panic(c0801bfa,c0845a01,c2bafae4,1,1,...) at panic+0xc9/frame 0xd835441c > trap_fatal(c0ff6000,cfb5e000,2,0,265abf,...) at trap_fatal+0x353/frame 0x= d835445c > trap_pfault(140da,0,c2baf930,c08b6a40,c282145c,...) at trap_pfault+0x2d7/= frame 0xd83544a4 > trap(d8354590) at trap+0x41a/frame 0xd8354584 > calltrap() at calltrap+0x6/frame 0xd8354584 > --- trap 0xc, eip =3D 0xc07cb2fe, esp =3D 0xd83545d0, ebp =3D 0xd835490c = --- > bcopy(c2b36548,c2f194e0,0,0,0,...) at bcopy+0x1a/frame 0xd835490c > ffs_mount(c2b36548,c2db9000,ff,d8354c08,c2b665e4,...) at ffs_mount+0x15ee= /frame 0xd8354a3c =46rom the crash dump in kgdb, do list *ffs_mount+0x15ee > vfs_donmount(c2baf930,10313108,0,c2b8ba80,c2b8ba80,...) at vfs_donmount+0= x196b/frame 0xd8354c2c > sys_nmount(c2baf930,d8354ccc,c2bafc18,d8354c6c,c0605015,...) at sys_nmoun= t+0x63/frame 0xd8354c50 > syscall(d8354d08) at syscall+0x2ce/frame 0xd8354cfc > Xint0x80_syscall() at Xint0x80_syscall+0x21/frame 0xd8354cfc > --- syscall (378, FreeBSD ELF32, sys_nmount), eip =3D 0x180bdf37, esp =3D= 0xbfbfd65c, ebp =3D 0xbfbfddd8 --- > Uptime: 2d21h49m21s > Physical memory: 503 MB > Dumping 108 MB: 93 77 61 45 29 13 >=20 > No symbol "stopped_cpus" in current context. > No symbol "stoppcbs" in current context. > #0 doadump (textdump=3D1) at pcpu.h:249 > 249 pcpu.h: No such file or directory. > in pcpu.h > (kgdb) where > #0 doadump (textdump=3D1) at pcpu.h:249 > #1 0xc05fdddd in kern_reboot (howto=3D260) at /src/src-9/sys/kern/kern_s= hutdown.c:449 > #2 0xc05fe028 in panic (fmt=3D) at /src/src-9/sys/k= ern/kern_shutdown.c:637 > #3 0xc07cd1d3 in trap_fatal (frame=3D0xd8354590, eva=3D3484803072) > at /src/src-9/sys/i386/i386/trap.c:1044 > #4 0xc07cd4b7 in trap_pfault (frame=3D0xd8354590, usermode=3D0, eva=3D34= 84803072) > at /src/src-9/sys/i386/i386/trap.c:957 > #5 0xc07ce05a in trap (frame=3D0xd8354590) at /src/src-9/sys/i386/i386/t= rap.c:555 > #6 0xc07ba88c in calltrap () at /src/src-9/sys/i386/i386/exception.s:170 > #7 0xc07cb2fe in bcopy () at /src/src-9/sys/i386/i386/support.s:196 > Previous frame inner to this frame (corrupt stack?) > (kgdb)=20 >=20 > -Andre > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" --vDF0KQD20bz5pO0G Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.20 (FreeBSD) iQIcBAEBAgAGBQJR1QcXAAoJEJDCuSvBvK1BzdoP+gPvfkqV1v+ae8dg+8WWgSWo L5nAKmrtRsj+teXXhmqS8pf5W536Uizs6rbA0WcPYBbBvcKcmd2o14aDt9NPgw/1 L8zi3ejMUthSsjcAxNI+/O8ZOpt3Ntw37t4RPuokKusqZBuca7D6xq+ZnKZyV0Y1 ge8NFOQ6YLCb4cOSrmxV/hgzpiOLfsG48YDov6WydUrfVYSagxNyF3sgWIKhvUda qz7ps/Y9YmLQv1Z0WvD4ybaywM/3SLP1vl3WWuOT0GKK7GdZqkS80yHEDudoFFEq N1LG34dNncXSE58wuBor003Pa2agReRJHHtqRZeRSaDi5sOs891weyEKgg7mUZhx DcnKXZ+Ovaxw0rxqw0U/u9wQnmzeSNz83QHax22mkjrh2KPivVEE1XuaBRs92VI8 U4TdFUK6yViZfZI0z0uCM+C1jIp3PHpQh1BnnUZMAQ6A3NABIsCl/AIiABQeGKYx gr3oOj7PXBXSWHbNJHGsOKTeXODKTBlVuEgO9wiPTVRW5iz9kMvG4YZq/xUenjtP z2jgdYU9CoALAT0gVUPp0dzMFVzWuO8GDSWZf33fBK6JK82G5+LraH5uahH9mbjF ItRKRZ6BuPfYOmokv5khR+ZposFwBBfbzNREiD44UpKfUp3/b1TCQG1PJboMZAft KvnTtCrqTJHSNVZ/nyqG =2DSj -----END PGP SIGNATURE----- --vDF0KQD20bz5pO0G--