From owner-freebsd-current@freebsd.org Sat Jul 8 17:45:44 2017 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 98D50D9F464 for ; Sat, 8 Jul 2017 17:45:44 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-210-62.reflexion.net [208.70.210.62]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 53FDF3800 for ; Sat, 8 Jul 2017 17:45:43 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 14815 invoked from network); 8 Jul 2017 17:45:42 -0000 Received: from unknown (HELO mail-cs-01.app.dca.reflexion.local) (10.81.19.1) by 0 (rfx-qmail) with SMTP; 8 Jul 2017 17:45:42 -0000 Received: by mail-cs-01.app.dca.reflexion.local (Reflexion email security v8.40.1) with SMTP; Sat, 08 Jul 2017 13:45:42 -0400 (EDT) Received: (qmail 16849 invoked from network); 8 Jul 2017 17:45:42 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 8 Jul 2017 17:45:42 -0000 Received: from [192.168.1.114] (c-76-115-7-162.hsd1.or.comcast.net [76.115.7.162]) by iron2.pdx.net (Postfix) with ESMTPSA id A0857EC8E6A; Sat, 8 Jul 2017 10:45:41 -0700 (PDT) From: Mark Millard Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: dump trying to access incorrect block numbers? Date: Sat, 8 Jul 2017 10:45:41 -0700 References: <73E92033-CB2F-404C-8B3F-736EF09AA9F7@dsl-only.net> To: peter@rulingia.com, FreeBSD Current In-Reply-To: <73E92033-CB2F-404C-8B3F-736EF09AA9F7@dsl-only.net> Message-Id: X-Mailer: Apple Mail (2.3273) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 08 Jul 2017 17:45:44 -0000 [A normal multi-user boot's fsck activity can do fsck -B activity that gets the problem.] On 2017-Jul-8, at 9:45 AM, Mark Millard wrote: > [I add notes about a problem that happens after the > "fsck -B". Also forgot to mention: production style > kernel world builds were in use. And a tried a > powerpc64 build and it works the same.] >=20 > On 2017-Jul-7, at 11:09 PM, Mark Millard = wrote: >=20 >> [This note has more information than one sent with extra text >> in the subject but with a partially different "to" list.] >>=20 >> Peter Jeremy peter at rulingia.com wrote on >> Sat Jul 8 02:00:47 UTC 2017 : >>=20 >>> When did you first notice this (what SVN revision)? >>> Do you know what the last good SVN revision was? >>> Is this a new or old filesystem? >>> Is the filesystem mounted/active or not when you dump it? >>> What are the relevant parameters for the filesystem on ada0s3a? >>> Are you running softupdates, journalling etc? >>> Which dump(8) phase is reporting the errors? >>> What are the exact dump and fsck commands you ran? >>=20 >> I can add a little information with some contrast >> and only "fsck -B" in use (with an unclean file >> system from a prior crash), no dump use. Still: >> a snapshot is involved in the below. >>=20 >> Unfortunately two problems with major consequences >> for my involved context limit the svn range that I >> can cover for the activity, the problem version >> ranges being: >>=20 >> -r319722 through -r320651 (fixed by -r320652) >> (actually this is why I had used "boot -s"=20 >> in what I report later: I could get to a >> shell prompt that way instead of crashing >> before any login prompt; the crashes left >> the file system in need of repair) >>=20 >> -r320509 through -r320561 (fixed by -r320570) >>=20 >> So I was using -r320570 to avoid one of the >> two problems. >>=20 >>=20 >>=20 >> Context: 32-bit powerpc FreeBSD used on PowerMac G5 >> so-called "Quad-core". (So big-endian as well.) >> Softupdates, no journalling. Long-in-use file >> system having lots of FreeBSD versions updates >> and port rebuilds over the time. >>=20 >> The following is from now, not from the time of the >> example messages: >>=20 >> # dumpfs / | more >> magic 19540119 (UFS2) time Fri Jul 7 22:53:34 2017 >> superblock location 65536 id [ ] >> ncg 158 size 25165823 blocks 24372006 >> bsize 32768 shift 15 mask 0xffff8000 >> fsize 4096 shift 12 mask 0xfffff000 >> frag 8 shift 3 fsbtodb 3 >> minfree 8% optim time symlinklen 120 >> maxbsize 32768 maxbpg 4096 maxcontig 4 contigsumsize 4 >> nbfree 2130375 ndir 65518 nifree 11769796 nffree = 425065 >> bpg 20032 fpg 160256 ipg 80128 unrefs 0 >> nindir 4096 inopb 128 maxfilesize 2252349704110079 >> sbsize 4096 cgsize 32768 csaddr 5048 cssize 4096 >> sblkno 24 cblkno 32 iblkno 40 dblkno 5048 >> cgrotor 127 fmod 0 ronly 0 clean 0 >> metaspace 6408 avgfpdir 64 avgfilesize 16384 >> flags soft-updates trim=20 >> fsmnt / >> volname FBSDG4Srootfs swuid 0 providersize 25165823 >> . . . >>=20 >>=20 >>=20 >> What I had done that produced the messages was: >>=20 >> > leaves root (only) file system not marked clean >> so fsck -B will actually do something below> >>=20 >> boot -s (so: single user mode) >> # The next 3 lines are the content of a generic, manually-run script. >> mount -u / >> mount -a -t ufs (but there is no other file system) >> swapon -a (there is a swap partition) >> # >> fsck -B >>=20 >> That "fsck -B" caused the same kinds of lines >> reported by Michael Butler, happening as fsck >> makes a snapshot for the background processing >> to use. (I have camera pictures and could type >> in some of the lines if needed.) >>=20 >> After those lines was text like (typed in from >> an example camera picture): >>=20 >> ** //.snap/fsck_snapshot >> ** Last Mount on / >> ** Root file system >> ** Phase 1 - Check Blocks and Sizes >> ** Phase 2 - Check Pathnames >> ** Phase 3 - Check Connectivity >> ** Phase 4 - Check Reference Counts >> ** Phase 5 - Check Cyl groups >> Reclaimed: 0 directories, 1 files, 22680 fragments >> 780914 files, 4797127 used, 19552199 free (443479 frags, 3288590 = blocks, 1.8% fragmentation) >>=20 >> ***** FILE SYSTEM MARKED CLEAN ***** >=20 > [I forgot or mention that the context was a > production style kernel and world build, > no invariants or other such.] >=20 > Since I'm running a patched -r320570 for the > issue: >=20 > -r319722 through -r320651 (fixed by -r320652) >=20 > I went back and forced a power-off without > shutdown and did the sequence: >=20 > boot -s (so: single user mode) > # The next 3 lines are the content of a generic, manually-run script. > mount -u / > mount -a -t ufs (but there is no other file system) > swapon -a (there is a swap partition) > # > fsck -B >=20 > but always waited briefly after the fsck -B finished. >=20 > Like before the following happens as it tries to trim: > (typed in from camera picture) >=20 > panic: ffs_blkfree_cq: freeing free block > cpuid =3D 2 (varies, of course) > time =3D (varies) > KDB: stack backtrace > (stack addresses can vary: just an example here) > 0xd23b17e0: at kdb_backtrace+0x5c > 0xd23b1850: at vpanic+0x1e8 > 0xd23b18c0: at panic+0x54 > 0xd23b1910: at ffs_blkfree_cq+0x278 > 0xd23b1980: at ffs_blkfree_trim_task+0x60 > 0xd23b19b0: at taskqueue_run_locked+0x10 > 0xd23b1a10: at taskqueue_thread_loop+0x174 > 0xd23b1a50: at fork_exit+0xf4 > 0xd23b1a80: at fork_trampoline+0xc > KDB: enter: panic > [ thread pid 0 tid 1000082 ] > Stopped at kdb_enter_0x70: addi r0,r0,0x0 >=20 >=20 > I've tried this on a powerpc64 and it works > the same, complete with the "freeing free > block" issue. I tried a sequence using a normal boot to multi-user that was not clean but did a automatic fsck -B and I got the messages and the later "freeing free block" crash. It appears that having mksnap_ffs (and code equivalents in other programs) broken in turn breaks fsck -B fairly majorly. (Michael Butler did the mksnap_ffs test at Rodney W. Grimes request.) I've been using the following to clean things up when I'm done with an experimental sequence that leaves things needing a fsck: boot -s (a single user boot) fsck -F So far it has resulted in a clean file system. With that status fsck -B then has no such problem: apparently it then does not create a snaphot by default. So then a multi-user boot works okay for its automatic fsck use. =3D=3D=3D Mark Millard markmi at dsl-only.net