From owner-freebsd-current@freebsd.org Sat Jul 8 16:45:11 2017 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id CA287D9DD92 for ; Sat, 8 Jul 2017 16:45:11 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-210-62.reflexion.net [208.70.210.62]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 7A7ABF90 for ; Sat, 8 Jul 2017 16:45:10 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 21409 invoked from network); 8 Jul 2017 16:45:09 -0000 Received: from unknown (HELO rtc-sm-01.app.dca.reflexion.local) (10.81.150.1) by 0 (rfx-qmail) with SMTP; 8 Jul 2017 16:45:09 -0000 Received: by rtc-sm-01.app.dca.reflexion.local (Reflexion email security v8.40.1) with SMTP; Sat, 08 Jul 2017 12:45:09 -0400 (EDT) Received: (qmail 25895 invoked from network); 8 Jul 2017 16:45:08 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 8 Jul 2017 16:45:08 -0000 Received: from [192.168.1.114] (c-76-115-7-162.hsd1.or.comcast.net [76.115.7.162]) by iron2.pdx.net (Postfix) with ESMTPSA id EA529EC8FB9; Sat, 8 Jul 2017 09:45:07 -0700 (PDT) From: Mark Millard Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: dump trying to access incorrect block numbers? Date: Sat, 8 Jul 2017 09:45:07 -0700 References: To: peter@rulingia.com, FreeBSD Current In-Reply-To: Message-Id: <73E92033-CB2F-404C-8B3F-736EF09AA9F7@dsl-only.net> X-Mailer: Apple Mail (2.3273) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 08 Jul 2017 16:45:11 -0000 [I add notes about a problem that happens after the "fsck -B". Also forgot to mention: production style kernel world builds were in use. And a tried a powerpc64 build and it works the same.] On 2017-Jul-7, at 11:09 PM, Mark Millard wrote: > [This note has more information than one sent with extra text > in the subject but with a partially different "to" list.] >=20 > Peter Jeremy peter at rulingia.com wrote on > Sat Jul 8 02:00:47 UTC 2017 : >=20 >> When did you first notice this (what SVN revision)? >> Do you know what the last good SVN revision was? >> Is this a new or old filesystem? >> Is the filesystem mounted/active or not when you dump it? >> What are the relevant parameters for the filesystem on ada0s3a? >> Are you running softupdates, journalling etc? >> Which dump(8) phase is reporting the errors? >> What are the exact dump and fsck commands you ran? >=20 > I can add a little information with some contrast > and only "fsck -B" in use (with an unclean file > system from a prior crash), no dump use. Still: > a snapshot is involved in the below. >=20 > Unfortunately two problems with major consequences > for my involved context limit the svn range that I > can cover for the activity, the problem version > ranges being: >=20 > -r319722 through -r320651 (fixed by -r320652) > (actually this is why I had used "boot -s"=20 > in what I report later: I could get to a > shell prompt that way instead of crashing > before any login prompt; the crashes left > the file system in need of repair) >=20 > -r320509 through -r320561 (fixed by -r320570) >=20 > So I was using -r320570 to avoid one of the > two problems. >=20 >=20 >=20 > Context: 32-bit powerpc FreeBSD used on PowerMac G5 > so-called "Quad-core". (So big-endian as well.) > Softupdates, no journalling. Long-in-use file > system having lots of FreeBSD versions updates > and port rebuilds over the time. >=20 > The following is from now, not from the time of the > example messages: >=20 > # dumpfs / | more > magic 19540119 (UFS2) time Fri Jul 7 22:53:34 2017 > superblock location 65536 id [ ] > ncg 158 size 25165823 blocks 24372006 > bsize 32768 shift 15 mask 0xffff8000 > fsize 4096 shift 12 mask 0xfffff000 > frag 8 shift 3 fsbtodb 3 > minfree 8% optim time symlinklen 120 > maxbsize 32768 maxbpg 4096 maxcontig 4 contigsumsize 4 > nbfree 2130375 ndir 65518 nifree 11769796 nffree 425065 > bpg 20032 fpg 160256 ipg 80128 unrefs 0 > nindir 4096 inopb 128 maxfilesize 2252349704110079 > sbsize 4096 cgsize 32768 csaddr 5048 cssize 4096 > sblkno 24 cblkno 32 iblkno 40 dblkno 5048 > cgrotor 127 fmod 0 ronly 0 clean 0 > metaspace 6408 avgfpdir 64 avgfilesize 16384 > flags soft-updates trim=20 > fsmnt / > volname FBSDG4Srootfs swuid 0 providersize 25165823 > . . . >=20 >=20 >=20 > What I had done that produced the messages was: >=20 > leaves root (only) file system not marked clean > so fsck -B will actually do something below> >=20 > boot -s (so: single user mode) > # The next 3 lines are the content of a generic, manually-run script. > mount -u / > mount -a -t ufs (but there is no other file system) > swapon -a (there is a swap partition) > # > fsck -B >=20 > That "fsck -B" caused the same kinds of lines > reported by Michael Butler, happening as fsck > makes a snapshot for the background processing > to use. (I have camera pictures and could type > in some of the lines if needed.) >=20 > After those lines was text like (typed in from > an example camera picture): >=20 > ** //.snap/fsck_snapshot > ** Last Mount on / > ** Root file system > ** Phase 1 - Check Blocks and Sizes > ** Phase 2 - Check Pathnames > ** Phase 3 - Check Connectivity > ** Phase 4 - Check Reference Counts > ** Phase 5 - Check Cyl groups > Reclaimed: 0 directories, 1 files, 22680 fragments > 780914 files, 4797127 used, 19552199 free (443479 frags, 3288590 = blocks, 1.8% fragmentation) >=20 > ***** FILE SYSTEM MARKED CLEAN ***** [I forgot or mention that the context was a production style kernel and world build, no invariants or other such.] Since I'm running a patched -r320570 for the issue: -r319722 through -r320651 (fixed by -r320652) I went back and forced a power-off without shutdown and did the sequence: boot -s (so: single user mode) # The next 3 lines are the content of a generic, manually-run script. mount -u / mount -a -t ufs (but there is no other file system) swapon -a (there is a swap partition) # fsck -B but always waited briefly after the fsck -B finished. Like before the following happens as it tries to trim: (typed in from camera picture) panic: ffs_blkfree_cq: freeing free block cpuid =3D 2 (varies, of course) time =3D (varies) KDB: stack backtrace (stack addresses can vary: just an example here) 0xd23b17e0: at kdb_backtrace+0x5c 0xd23b1850: at vpanic+0x1e8 0xd23b18c0: at panic+0x54 0xd23b1910: at ffs_blkfree_cq+0x278 0xd23b1980: at ffs_blkfree_trim_task+0x60 0xd23b19b0: at taskqueue_run_locked+0x10 0xd23b1a10: at taskqueue_thread_loop+0x174 0xd23b1a50: at fork_exit+0xf4 0xd23b1a80: at fork_trampoline+0xc KDB: enter: panic [ thread pid 0 tid 1000082 ] Stopped at kdb_enter_0x70: addi r0,r0,0x0 I've tried this on a powerpc64 and it works the same, complete with the "freeing free block" issue. =3D=3D=3D Mark Millard markmi at dsl-only.net