From owner-freebsd-stable@FreeBSD.ORG Fri Jul 12 06:35:45 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id CCE908B9; Fri, 12 Jul 2013 06:35:45 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) by mx1.freebsd.org (Postfix) with ESMTP id 0A2391D24; Fri, 12 Jul 2013 06:35:44 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.7/8.14.7) with ESMTP id r6C6ZZZH078077; Fri, 12 Jul 2013 09:35:35 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.8.3 kib.kiev.ua r6C6ZZZH078077 Received: (from kostik@localhost) by tom.home (8.14.7/8.14.7/Submit) id r6C6ZXba078062; Fri, 12 Jul 2013 09:35:33 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Fri, 12 Jul 2013 09:35:33 +0300 From: Konstantin Belousov To: Andre Albsmeier Subject: Re: FreeBSD-9.1: machine reboots during snapshot creation, LORs found Message-ID: <20130712063533.GZ91021@kib.kiev.ua> References: <201306171530.31208.jhb@freebsd.org> <20130704051409.GA22021@bali> <20130704052440.GG91021@kib.kiev.ua> <20130704052659.GA23398@bali> <20130704061550.GI91021@kib.kiev.ua> <20130704142919.GA1798@bali> <20130704172528.GL91021@kib.kiev.ua> <20130712052440.GA97779@bali> <20130712060112.GY91021@kib.kiev.ua> <20130712060527.GA483@bali> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="9LeopJxB+v54xXdZ" Content-Disposition: inline In-Reply-To: <20130712060527.GA483@bali> User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home Cc: Kirk McKusick , "freebsd-stable@freebsd.org" , John Baldwin X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 12 Jul 2013 06:35:45 -0000 --9LeopJxB+v54xXdZ Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Jul 12, 2013 at 08:05:27AM +0200, Andre Albsmeier wrote: > On Fri, 12-Jul-2013 at 08:01:12 +0200, Konstantin Belousov wrote: > > On Fri, Jul 12, 2013 at 07:24:40AM +0200, Andre Albsmeier wrote: > > > On Thu, 04-Jul-2013 at 19:25:28 +0200, Konstantin Belousov wrote: > > > > On Thu, Jul 04, 2013 at 04:29:19PM +0200, Andre Albsmeier wrote: > > > > > OK, patch is applied. I will reboot the machine later > > > > > and see what happens tomorrow in the morning. However, > > > > > it might take a few days since the last 2 weeks all was > > > > > fine. > > > > >=20 > > > > > BTW, should this patch be used in general or is it just > > > > > for debugging? My understanding is that it is something > > > > > which could stay in the code... > > > >=20 > > > > Patch is to improve debugging. > > > >=20 > > > > I probably commit it after the issue is closed. Arguments against > > > > the commit is that the change imposes small performance penalty > > > > due to save and restore of the %ebp (I doubt that this is measureab= le > > > > by any means). Also, arguably, such change should be done for all > > > > functions in support.s, but bcopy() is the hot spot. > > >=20 > > > Got a new one, 2 hours old ;-) > > >=20 > > > GNU gdb 6.1.1 [FreeBSD] > > > Copyright 2004 Free Software Foundation, Inc. > > > GDB is free software, covered by the GNU General Public License, and = you are > > > welcome to change it and/or distribute copies of it under certain con= ditions. > > > Type "show copying" to see the conditions. > > > There is absolutely no warranty for GDB. Type "show warranty" for de= tails. > > > This GDB was configured as "i386-marcel-freebsd"... > > >=20 > > > Unread portion of the kernel message buffer: > > >=20 > > >=20 > > > Fatal trap 12: page fault while in kernel mode > > > fault virtual address =3D 0xcd5ec000 > > > fault code =3D supervisor write, page not present > > > instruction pointer =3D 0x20:0xc07cb2fe > > > stack pointer =3D 0x28:0xd82e45cc > > > frame pointer =3D 0x28:0xd82e45d4 > > > code segment =3D base 0x0, limit 0xfffff, type 0x1b > > > =3D DPL 0, pres 1, def32 1, gran 1 > > > processor eflags =3D interrupt enabled, resume, IOPL =3D 0 > > > current process =3D 18714 (mksnap_ffs) > > > trap number =3D 12 > > > panic: page fault > > > KDB: stack backtrace: > > > db_trace_self_wrapper(c08207eb,d82e4418,c05fdfc9,c081df13,c08a82e0,..= =2E) at db_trace_self_wrapper+0x26/frame 0xd82e43e8 > > > kdb_backtrace(c081df13,c08a82e0,c0801bfa,d82e4424,d82e4424,...) at kd= b_backtrace+0x29/frame 0xd82e43f4 > > > panic(c0801bfa,c0845a01,c2b067d4,1,1,...) at panic+0xc9/frame 0xd82e4= 418 > > > trap_fatal(c0ff6000,cd5ec000,2,0,c08b6bf4,...) at trap_fatal+0x353/fr= ame 0xd82e4458 > > > trap_pfault(baa8454b,21510,0,c2b06620,c08b6bf0,...) at trap_pfault+0x= 2d7/frame 0xd82e44a0 > > > trap(d82e458c) at trap+0x41a/frame 0xd82e4580 > > > calltrap() at calltrap+0x6/frame 0xd82e4580 > > > --- trap 0xc, eip =3D 0xc07cb2fe, esp =3D 0xd82e45cc, ebp =3D 0xd82e4= 5d4 --- > > > bcopy(c36ed000,cd5e6000,8000,8000,c281b980,...) at bcopy+0x1a/frame 0= xd82e45d4 > > > ffs_snapshot(c2b35a90,c2ed0400,0,0,0,...) at ffs_snapshot+0x2933/fram= e 0xd82e490c > > > ffs_mount(c2b35a90,c322e200,ff,d82e4c08,c2ccbc8c,...) at ffs_mount+0x= 15ee/frame 0xd82e4a3c > > > vfs_donmount(c2b06620,10313108,0,c2b74d80,c2b74d80,...) at vfs_donmou= nt+0x196b/frame 0xd82e4c2c > > > sys_nmount(c2b06620,d82e4ccc,c2b06908,d82e4c6c,c0605015,...) at sys_n= mount+0x63/frame 0xd82e4c50 > > > syscall(d82e4d08) at syscall+0x2ce/frame 0xd82e4cfc > > > Xint0x80_syscall() at Xint0x80_syscall+0x21/frame 0xd82e4cfc > > > --- syscall (378, FreeBSD ELF32, sys_nmount), eip =3D 0x180bdf37, esp= =3D 0xbfbfd65c, ebp =3D 0xbfbfddd8 --- > > > Uptime: 4d20h0m44s > > > Physical memory: 503 MB > > > Dumping 104 MB: 89 73 57 41 25 9 > > >=20 > > > No symbol "stopped_cpus" in current context. > > > No symbol "stoppcbs" in current context. > > > #0 doadump (textdump=3D1) at pcpu.h:249 > > > 249 pcpu.h: No such file or directory. > > > in pcpu.h > > > (kgdb) where > > > #0 doadump (textdump=3D1) at pcpu.h:249 > > > #1 0xc05fdddd in kern_reboot (howto=3D260) at /src/src-9/sys/kern/ke= rn_shutdown.c:449 > > > #2 0xc05fe028 in panic (fmt=3D) at /src/src-9/s= ys/kern/kern_shutdown.c:637 > > > #3 0xc07cd1d3 in trap_fatal (frame=3D0xd82e458c, eva=3D3445538816) > > > at /src/src-9/sys/i386/i386/trap.c:1044 > > > #4 0xc07cd4b7 in trap_pfault (frame=3D0xd82e458c, usermode=3D0, eva= =3D3445538816) > > > at /src/src-9/sys/i386/i386/trap.c:957 > > > #5 0xc07ce05a in trap (frame=3D0xd82e458c) at /src/src-9/sys/i386/i3= 86/trap.c:555 > > > #6 0xc07ba88c in calltrap () at /src/src-9/sys/i386/i386/exception.s= :170 > > > #7 0xc07cb2fe in bcopy () at /src/src-9/sys/i386/i386/support.s:198 > > > #8 0xc072be13 in ffs_snapshot (mp=3D0xc2b35a90, snapfile=3D0xc2ed040= 0 "s5-2013.07.12-03.15.01") > > > at /src/src-9/sys/ufs/ffs/ffs_snapshot.c:793 > > > #9 0xc0748e8e in ffs_mount (mp=3D0xc2b35a90) at /src/src-9/sys/ufs/f= fs/ffs_vfsops.c:483 > > > #10 0xc068a72b in vfs_donmount (td=3D0xc2b06620, fsflags=3D271659272,= fsoptions=3D0xc2b74d80) > > > at /src/src-9/sys/kern/vfs_mount.c:948 > > > #11 0xc068a8e3 in sys_nmount (td=3D0xc2b06620, uap=3D0xd82e4ccc) at /= src/src-9/sys/kern/vfs_mount.c:417 > > > #12 0xc07cd7ae in syscall (frame=3D0xd82e4d08) at subr_syscall.c:135 > > > #13 0xc07ba8f1 in Xint0x80_syscall () at /src/src-9/sys/i386/i386/exc= eption.s:270 > > > #14 0x00000033 in ?? () > > > Previous frame inner to this frame (corrupt stack?) > >=20 > > Please show me the first 100 lines of the output of dumpfs(8) on the > > filesystem where snapshot creation caused the panic. >=20 > OK, dumpfs /dev/stripe/p | head -100: >=20 > magic 11954 (UFS1) time Fri Jul 12 08:02:40 2013 > id [ 517fa356 4ecc9335 ] > ncg 82 size 17774144 blocks 17737399 > bsize 32768 shift 15 mask 0xffff8000 > fsize 4096 shift 12 mask 0xfffff000 > frag 8 shift 3 fsbtodb 3 > minfree 8% optim time symlinklen 60 > maxbpg 4096 maxcontig 4 contigsumsize 4 > nbfree 1958555 ndir 695 nifree 1123668 nffree 5395 > cpg 1 bpg 27415 fpg 219320 ipg 13824 > nindir 8192 inopb 256 nspf 8 maxfilesize 18016597801566207 > sbsize 4096 cgsize 32768 cgoffset 0 cgmask 0xffffffff > csaddr 456 cssize 4096 > rotdelay 0ms rps 60 trackskew 0 interleave 1 > nsect 1754560 npsect 1754560 spc 1754560 > sblkno 8 cblkno 16 iblkno 24 dblkno 456 > cgrotor 50 fmod 0 ronly 0 clean 0 > metaspace 0 avgfpdir 64 avgfilesize 16384 > flags soft-updates=20 > fsmnt /palveli > volname swuid 0 providersize 17774144 UFS1, weird. I believe I see the problem. UFS1 superblock is not aligned on the fs block boundary, and bcopy() call tried to do the full block copy. In fact, when the snapshotting operation did not trap, you probably get a data corruption in the unrelated buffer. Please try the patch below. diff --git a/sys/ufs/ffs/ffs_snapshot.c b/sys/ufs/ffs/ffs_snapshot.c index ad157aa..c37706b 100644 --- a/sys/ufs/ffs/ffs_snapshot.c +++ b/sys/ufs/ffs/ffs_snapshot.c @@ -792,7 +792,7 @@ out1: brelse(nbp); } else { loc =3D blkoff(fs, fs->fs_sblockloc); - bcopy((char *)copy_fs, &nbp->b_data[loc], fs->fs_bsize); + bcopy((char *)copy_fs, &nbp->b_data[loc], (u_int)fs->fs_sbsize); bawrite(nbp); } /* --9LeopJxB+v54xXdZ Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.20 (FreeBSD) iQIcBAEBAgAGBQJR36O1AAoJEJDCuSvBvK1B7+YQAIR1dGnTbZCC6B0iGFbECx4m EsEk/d8pAB0TKwBCQzugfvhHReULkNTcZYbYg/kFXYxvO09iMzYave20oA0g55Uc XkVfLefjhIi3zNfEsyj23jcgHsl/9PjLo1TAuJTgC5WGWPHBEA/pN8nCXanSmhct fjZxTNX2URDd/26tO+SrGYGczs733AgLodSEBHg2whDklWC8kQpmy/JlVBlmFtWq r3qZGXzGOuxd/8W/sW1rFuPMnGemxKP5btjS3k0g4mH5rr38u7v7MsgVNRVpHuhp MUte5xc+XeeG5FHi+vqIJiX+A0uJmEvaM2X/Cu43XGQYlqKO1kEHQHKgb3FVFfOM 44p3SJnguNlnb9ThTUKG+aToxOUjZc/DdFm4t4Wxx5qU0TyHBqy6ewMemgsD52bs zxeHcb6Tvmn9hpfoamzYMj2uXaDX9ZXRpAyKD/avVB2ExRdITohM7o5obWcDdZPD j3WYofVD55Q6jYE5yKsjt83I7GmAe0FlPLCoOi/G9QVCWOtmNxD0HTXtxBWYYKZ5 QzhAWVd+itB965rJFJ+xS1FxRLOvSXSDT342gm9LoIh2VGwSfEo7F+eMQyDbRgJC RQ8r7sRccn5gPrC5wM700+EsTHIebWtD1Sq5S5yR6HdxPlSRa3Y+0si11wFFn2rm URDw/bUacx5fhMy3hn/w =Jsti -----END PGP SIGNATURE----- --9LeopJxB+v54xXdZ--