Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 12 Jul 2017 22:12:25 +0000
From:      bugzilla-noreply@freebsd.org
To:        freebsd-bugs@FreeBSD.org
Subject:   [Bug 220693] head -r320570 & -r320760 (e.g.): ufs snapshot creation broken & leads to fsck -B related SSD-trim "freeing free block" panics; more
Message-ID:  <bug-220693-8@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D220693

            Bug ID: 220693
           Summary: head -r320570 & -r320760 (e.g.):  ufs snapshot
                    creation broken & leads to fsck -B related SSD-trim
                    "freeing free block" panics; more
           Product: Base System
           Version: CURRENT
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: kern
          Assignee: freebsd-bugs@FreeBSD.org
          Reporter: markmi@dsl-only.net

See also the exchange of list submittals associated
with:

https://lists.freebsd.org/pipermail/freebsd-current/2017-July/066505.html
and:
https://lists.freebsd.org/pipermail/freebsd-current/2017-July/066508.html

I free quote material from these without attribution here. . .


Basic context material . . .

As I remember it happened to be that the reporting folks
were using non-debug/non-invariant kernel builds. Multiple
TARGET_ARCH's, 32-bit and 64-bit, little-endian and
big-endian.

The basic create-snapshot test that fails:

After a short pause with disk activity, the same sorts of errors are=20
logged when using "mksnap_ffs /.snap2" where .snap2 did  not previously=20
exist

The type of messages was (e.g.):

g_vfs_done():ada0s3a[READ(offset=3D6050375794688, length=3D32768)]error =3D=
 5
Jul  7 00:10:24 toshi kernel

Note the huge offset: such is true of the messages in general.

Also the messages are from the kernel and its nmount related
snapshot creation activity, not from the user-space program.

The original list-notice was about dump (and its snapshot
creation) but the issue is not specific to dump.


fsck -B related panic material. . .

My original context for this: 32-bit powerpc.

<Prior failed multi-user boot from system problem
leaves root (only) file system not marked clean
so fsck -B will actually do something below>

boot -s (so: single user mode)
# The next 3 lines are the content of a generic, manually-run script.
mount -u /
mount -a -t ufs (but there is no other file system)
swapon -a       (there is a swap partition)
#
fsck -B

That "fsck -B" caused the same kinds of lines
reported by Michael Butler, happening as fsck
makes a snapshot for the background processing
to use.

After the g_vfs_done lines was text like (typed
in from an example camera picture):

** //.snap/fsck_snapshot
** Last Mount on /
** Root file system
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
Reclaimed: 0 directories, 1 files, 22680 fragments
780914 files, 4797127 used, 19552199 free (443479 frags, 3288590 blocks, 1.=
8%
fragmentation)

***** FILE SYSTEM MARKED CLEAN *****

But always waiting a while leads to a panic
that looks like (showing an example):
(Note: context is an SSD with trim enabled)
(typed in from camera picture)

panic: ffs_blkfree_cq: freeing free block
cpuid =3D 2 (varies, of course)
time =3D (varies)
KDB: stack backtrace
(stack addresses can vary: just an example here)
0xd23b17e0: at kdb_backtrace+0x5c
0xd23b1850: at vpanic+0x1e8
0xd23b18c0: at panic+0x54
0xd23b1910: at ffs_blkfree_cq+0x278
0xd23b1980: at ffs_blkfree_trim_task+0x60
0xd23b19b0: at taskqueue_run_locked+0x10
0xd23b1a10: at taskqueue_thread_loop+0x174
0xd23b1a50: at fork_exit+0xf4
0xd23b1a80: at fork_trampoline+0xc
KDB: enter: panic
[ thread pid 0 tid 1000082 ]
Stopped at kdb_enter_0x70: addi r0,r0,0x0


I've tried this on a powerpc64 and it works
the same, complete with the "freeing free
block" issue.

I've also had the problem with a normal multi-user
boot that initiated a fsck -B automatically in a
context where the SSD had not been marked clean.

To avoid this and fix such file systems I've been
booting with "boot -s" and using "fsck -F" from
the single-user command prompt.


Unfortunately two problems with major consequences
for my involved context limit the svn range that I
can cover for the activity, the problem version
ranges being:

-r319722 through -r320651 (fixed by -r320652)
(actually this is why I had originally used
"boot -s"  in what I report above: I could get
to a shell prompt that way instead of crashing
before any login prompt; the crashes left
the file system in need of repair)

-r320509 through -r320561 (fixed by -r320570)

So I was using -r320570 to avoid one of the
two problems, now with a trail patch for what
was later fixed in -r320652.

I do not know if the problem was present back
before -r319722 or before -r320509.

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-220693-8>