From owner-freebsd-bugs@freebsd.org Mon Jul 11 17:56:05 2016 Return-Path: Delivered-To: freebsd-bugs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0AB49B92AA6 for ; Mon, 11 Jul 2016 17:56:05 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id EE68E1E17 for ; Mon, 11 Jul 2016 17:56:04 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u6BHu4sO059694 for ; Mon, 11 Jul 2016 17:56:04 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-bugs@FreeBSD.org Subject: [Bug 211013] Write error to UFS filesystem with softupdates panics machine Date: Mon, 11 Jul 2016 17:56:05 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 11.0-BETA1 X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Many People X-Bugzilla-Who: karl@denninger.net X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Jul 2016 17:56:05 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D211013 Bug ID: 211013 Summary: Write error to UFS filesystem with softupdates panics machine Product: Base System Version: 11.0-BETA1 Hardware: Any OS: Any Status: New Severity: Affects Many People Priority: --- Component: kern Assignee: freebsd-bugs@FreeBSD.org Reporter: karl@denninger.net The machine in question had mounted a UFS filesystem mounted that had softupdates enabled (on an SD card; I was updating a system that runs FreeB= SD on a Raspberry Pi2 by plugging the card into a different machine) and the I= /O card took an unrecoverable write error. The result was a kernel panic; this is apparently considered expected behav= ior at present if softupdates are turned on for the filesystem because it's possible that the filesystem has now been corrupted and there is no way to = be sure with the machine running. Thus the choice to panic() when this situat= ion occurs. But it appears that the choice to panic() is too broad and unnecessary in t= hat in many cases a less-severe action is effective while not exposing the syst= em to the risk of unknown filesystem corruption. Yes, if there are working-set pages on that volume and it is corrupt, the system is no longer stable (this is especially true if the system is *runni= ng* from that volume.) It is also true that in the case of a solid-state devic= e of some kind the impact of a write error may cross a filesystem boundary, so i= t's insufficient to invalidate the filesystem (on a SSD or similar device the read/erase/write cycle for a data re-write may involve many megabytes of da= ta, and that can possibly not be entirely local to the filesystem mounted if th= ere is more than one on the physical volume.) HOWEVER, forcibly-detaching the volume in question instead of calling panic= () *should* be effective in preventing the possibility of propagating a corrup= ted filesystem. While this will lead to a panic in the event that executing RSS (or consumed page file space) is present on that volume, in the case where = the device holds only data the detach will *not* panic the machine. This appears to be a situation where a less-severe "remedy" for a failed I/= O is certainly called for. The following backtrace was captured from the panic itself: root@Dbms2:/var/crash # kgdb /boot/kernel/kernel vmcore.0 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain condition= s. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"... Unread portion of the kernel message buffer: panic: initiate_write_inodeblock_ufs2: already started cpuid =3D 14 KDB: stack backtrace: #0 0xffffffff80b1f357 at kdb_backtrace+0x67 #1 0xffffffff80ad6ec2 at vpanic+0x182 #2 0xffffffff80ad6d33 at panic+0x43 #3 0xffffffff80dc16ad at softdep_disk_io_initiation+0x159d #4 0xffffffff80de61eb at ffs_geom_strategy+0x13b #5 0xffffffff80b872f7 at bufwrite+0x267 #6 0xffffffff80b8ac6a at vfs_bio_awrite+0x3ca #7 0xffffffff80b96b77 at vop_stdfsync+0x277 #8 0xffffffff80983766 at devfs_fsync+0x26 #9 0xffffffff81101f7d at VOP_FSYNC_APV+0x8d #10 0xffffffff80baf1ae at sched_sync+0x3be #11 0xffffffff80a8dcb5 at fork_exit+0x85 #12 0xffffffff80f7f85e at fork_trampoline+0xe Uptime: 27m9s (kgdb) where #0 doadump (textdump=3D) at pcpu.h:221 #1 0xffffffff80ad6949 in kern_reboot (howto=3D260) at /usr/src/sys/kern/kern_shutdown.c:366 #2 0xffffffff80ad6efb in vpanic (fmt=3D, ap=3D) at /usr/src/sys/kern/kern_shutdown.c:759 #3 0xffffffff80ad6d33 in panic (fmt=3D0x0) at /usr/src/sys/kern/kern_shutdown.c:690 #4 0xffffffff80dc16ad in softdep_disk_io_initiation (bp=3D) at /usr/src/sys/ufs/ffs/ffs_softdep.c:10301 #5 0xffffffff80de61eb in ffs_geom_strategy (bo=3D, bp=3D) at buf.h:412 #6 0xffffffff80b872f7 in bufwrite (bp=3D0xfffffe02e8629b30) at buf.h:405 #7 0xffffffff80b8ac6a in vfs_bio_awrite (bp=3D) at buf.h:393 #8 0xffffffff80b96b77 in vop_stdfsync (ap=3D0xfffffe034f481b68) at /usr/src/sys/kern/vfs_default.c:692 #9 0xffffffff80983766 in devfs_fsync (ap=3D0xfffffe034f481b68) at /usr/src/sys/fs/devfs/devfs_vnops.c:702 #10 0xffffffff81101f7d in VOP_FSYNC_APV (vop=3D, a=3D) at vnode_if.c:1331 #11 0xffffffff80baf1ae in sched_sync () at vnode_if.h:549 #12 0xffffffff80a8dcb5 in fork_exit (callout=3D0xffffffff80baedf0 , arg=3D0x0, frame=3D0xfffffe034f481c00) at /usr/src/sys/kern/kern_fork.c= :1038 #13 0xffffffff80f7f85e in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:611 #14 0x0000000000000000 in ?? () (kgdb) FreeBSD 11.0-BETA1 #0 r302439: Fri Jul 8 14:37:27 CDT 2016=20=20=20=20 karl@Dbms2.denninger.net:/usr/obj/usr/src/sys/GENERIC The offending code line: static void initiate_write_inodeblock_ufs2(inodedep, bp) struct inodedep *inodedep; struct buf *bp; /* The inode block */ { struct allocdirect *adp, *lastadp; struct ufs2_dinode *dp; struct ufs2_dinode *sip; struct inoref *inoref; struct ufsmount *ump; struct fs *fs; ufs_lbn_t i; #ifdef INVARIANTS ufs_lbn_t prevlbn =3D 0; #endif int deplist; if (inodedep->id_state & IOSTARTED) panic("initiate_write_inodeblock_ufs2: already started"); inodedep->id_state |=3D IOSTARTED; -- End capture --=20 You are receiving this mail because: You are the assignee for the bug.=