From owner-freebsd-bugs@freebsd.org  Mon Jul 11 17:56:05 2016
Return-Path: <owner-freebsd-bugs@freebsd.org>
Delivered-To: freebsd-bugs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0AB49B92AA6
 for <freebsd-bugs@mailman.ysv.freebsd.org>;
 Mon, 11 Jul 2016 17:56:05 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from kenobi.freebsd.org (kenobi.freebsd.org
 [IPv6:2001:1900:2254:206a::16:76])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id EE68E1E17
 for <freebsd-bugs@FreeBSD.org>; Mon, 11 Jul 2016 17:56:04 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from bugs.freebsd.org ([127.0.1.118])
 by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u6BHu4sO059694
 for <freebsd-bugs@FreeBSD.org>; Mon, 11 Jul 2016 17:56:04 GMT
 (envelope-from bugzilla-noreply@freebsd.org)
From: bugzilla-noreply@freebsd.org
To: freebsd-bugs@FreeBSD.org
Subject: [Bug 211013] Write error to UFS filesystem with softupdates panics
 machine
Date: Mon, 11 Jul 2016 17:56:05 +0000
X-Bugzilla-Reason: AssignedTo
X-Bugzilla-Type: new
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: Base System
X-Bugzilla-Component: kern
X-Bugzilla-Version: 11.0-BETA1
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: Affects Many People
X-Bugzilla-Who: karl@denninger.net
X-Bugzilla-Status: New
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: ---
X-Bugzilla-Assigned-To: freebsd-bugs@FreeBSD.org
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform
 op_sys bug_status bug_severity priority component assigned_to reporter
Message-ID: <bug-211013-8@https.bugs.freebsd.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: freebsd-bugs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Bug reports <freebsd-bugs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-bugs>,
 <mailto:freebsd-bugs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-bugs/>
List-Post: <mailto:freebsd-bugs@freebsd.org>
List-Help: <mailto:freebsd-bugs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-bugs>,
 <mailto:freebsd-bugs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 11 Jul 2016 17:56:05 -0000

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D211013

            Bug ID: 211013
           Summary: Write error to UFS filesystem with softupdates panics
                    machine
           Product: Base System
           Version: 11.0-BETA1
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Many People
          Priority: ---
         Component: kern
          Assignee: freebsd-bugs@FreeBSD.org
          Reporter: karl@denninger.net

The machine in question had mounted a UFS filesystem mounted that had
softupdates enabled (on an SD card; I was updating a system that runs FreeB=
SD
on a Raspberry Pi2 by plugging the card into a different machine) and the I=
/O
card took an unrecoverable write error.

The result was a kernel panic; this is apparently considered expected behav=
ior
at present if softupdates are turned on for the filesystem because it's
possible that the filesystem has now been corrupted and there is no way to =
be
sure with the machine running.  Thus the choice to panic() when this situat=
ion
occurs.

But it appears that the choice to panic() is too broad and unnecessary in t=
hat
in many cases a less-severe action is effective while not exposing the syst=
em
to the risk of unknown filesystem corruption.

Yes, if there are working-set pages on that volume and it is corrupt, the
system is no longer stable (this is especially true if the system is *runni=
ng*
from that volume.)  It is also true that in the case of a solid-state devic=
e of
some kind the impact of a write error may cross a filesystem boundary, so i=
t's
insufficient to invalidate the filesystem (on a SSD or similar device the
read/erase/write cycle for a data re-write may involve many megabytes of da=
ta,
and that can possibly not be entirely local to the filesystem mounted if th=
ere
is more than one on the physical volume.)

HOWEVER, forcibly-detaching the volume in question instead of calling panic=
()
*should* be effective in preventing the possibility of propagating a corrup=
ted
filesystem.  While this will lead to a panic in the event that executing RSS
(or consumed page file space) is present on that volume, in the case where =
the
device holds only data the detach will *not* panic the machine.

This appears to be a situation where a less-severe "remedy" for a failed I/=
O is
certainly called for.

The following backtrace was captured from the panic itself:

root@Dbms2:/var/crash # kgdb /boot/kernel/kernel vmcore.0
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain condition=
s.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:
panic: initiate_write_inodeblock_ufs2: already started
cpuid =3D 14
KDB: stack backtrace:
#0 0xffffffff80b1f357 at kdb_backtrace+0x67
#1 0xffffffff80ad6ec2 at vpanic+0x182
#2 0xffffffff80ad6d33 at panic+0x43
#3 0xffffffff80dc16ad at softdep_disk_io_initiation+0x159d
#4 0xffffffff80de61eb at ffs_geom_strategy+0x13b
#5 0xffffffff80b872f7 at bufwrite+0x267
#6 0xffffffff80b8ac6a at vfs_bio_awrite+0x3ca
#7 0xffffffff80b96b77 at vop_stdfsync+0x277
#8 0xffffffff80983766 at devfs_fsync+0x26
#9 0xffffffff81101f7d at VOP_FSYNC_APV+0x8d
#10 0xffffffff80baf1ae at sched_sync+0x3be
#11 0xffffffff80a8dcb5 at fork_exit+0x85
#12 0xffffffff80f7f85e at fork_trampoline+0xe
Uptime: 27m9s


(kgdb) where
#0  doadump (textdump=3D<value optimized out>) at pcpu.h:221
#1  0xffffffff80ad6949 in kern_reboot (howto=3D260)
    at /usr/src/sys/kern/kern_shutdown.c:366
#2  0xffffffff80ad6efb in vpanic (fmt=3D<value optimized out>,
    ap=3D<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:759
#3  0xffffffff80ad6d33 in panic (fmt=3D0x0)
    at /usr/src/sys/kern/kern_shutdown.c:690
#4  0xffffffff80dc16ad in softdep_disk_io_initiation (bp=3D<value optimized=
 out>)
    at /usr/src/sys/ufs/ffs/ffs_softdep.c:10301
#5  0xffffffff80de61eb in ffs_geom_strategy (bo=3D<value optimized out>,
    bp=3D<value optimized out>) at buf.h:412
#6  0xffffffff80b872f7 in bufwrite (bp=3D0xfffffe02e8629b30) at buf.h:405
#7  0xffffffff80b8ac6a in vfs_bio_awrite (bp=3D<value optimized out>)
    at buf.h:393
#8  0xffffffff80b96b77 in vop_stdfsync (ap=3D0xfffffe034f481b68)
    at /usr/src/sys/kern/vfs_default.c:692
#9  0xffffffff80983766 in devfs_fsync (ap=3D0xfffffe034f481b68)
    at /usr/src/sys/fs/devfs/devfs_vnops.c:702
#10 0xffffffff81101f7d in VOP_FSYNC_APV (vop=3D<value optimized out>,
    a=3D<value optimized out>) at vnode_if.c:1331
#11 0xffffffff80baf1ae in sched_sync () at vnode_if.h:549
#12 0xffffffff80a8dcb5 in fork_exit (callout=3D0xffffffff80baedf0 <sched_sy=
nc>,
    arg=3D0x0, frame=3D0xfffffe034f481c00) at /usr/src/sys/kern/kern_fork.c=
:1038
#13 0xffffffff80f7f85e in fork_trampoline ()
    at /usr/src/sys/amd64/amd64/exception.S:611
#14 0x0000000000000000 in ?? ()
(kgdb)

FreeBSD 11.0-BETA1 #0 r302439: Fri Jul  8 14:37:27 CDT 2016=20=20=20=20
karl@Dbms2.denninger.net:/usr/obj/usr/src/sys/GENERIC

The offending code line:

static void
initiate_write_inodeblock_ufs2(inodedep, bp)
        struct inodedep *inodedep;
        struct buf *bp;                 /* The inode block */
{
        struct allocdirect *adp, *lastadp;
        struct ufs2_dinode *dp;
        struct ufs2_dinode *sip;
        struct inoref *inoref;
        struct ufsmount *ump;
        struct fs *fs;
        ufs_lbn_t i;
#ifdef INVARIANTS
        ufs_lbn_t prevlbn =3D 0;
#endif
        int deplist;

        if (inodedep->id_state & IOSTARTED)
                panic("initiate_write_inodeblock_ufs2: already started");
        inodedep->id_state |=3D IOSTARTED;


-- End capture

--=20
You are receiving this mail because:
You are the assignee for the bug.=