From owner-freebsd-stable@freebsd.org Mon Jul 11 17:39:52 2016 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id CD584B925FD for ; Mon, 11 Jul 2016 17:39:52 +0000 (UTC) (envelope-from ian@freebsd.org) Received: from pmta2.delivery6.ore.mailhop.org (pmta2.delivery6.ore.mailhop.org [54.200.129.228]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id B0ACC15AD for ; Mon, 11 Jul 2016 17:39:52 +0000 (UTC) (envelope-from ian@freebsd.org) X-MHO-User: 96b3331b-478e-11e6-8929-8ded99d5e9d7 X-Report-Abuse-To: https://support.duocircle.com/support/solutions/articles/5000540958-duocircle-standard-smtp-abuse-information X-Originating-IP: 73.34.117.227 X-Mail-Handler: DuoCircle Outbound SMTP Received: from ilsoft.org (unknown [73.34.117.227]) by outbound2.ore.mailhop.org (Halon Mail Gateway) with ESMTPSA; Mon, 11 Jul 2016 17:40:41 +0000 (UTC) Received: from rev (rev [172.22.42.240]) by ilsoft.org (8.15.2/8.14.9) with ESMTP id u6BHdnKC003908; Mon, 11 Jul 2016 11:39:49 -0600 (MDT) (envelope-from ian@freebsd.org) Message-ID: <1468258789.72182.122.camel@freebsd.org> Subject: Re: Not-so stable if you take a CAM error.... From: Ian Lepore To: Karl Denninger , freebsd-stable@freebsd.org Date: Mon, 11 Jul 2016 11:39:49 -0600 In-Reply-To: References: <2b0c454b-c1a0-4b5b-e778-bf0939e90ae1@denninger.net> <6e9c07e1-12a6-a7cd-f775-6b0fe5a706bc@denninger.net> <1468243977.72182.118.camel@freebsd.org> <877f5e8e-c1e7-6fb0-6ceb-031ce3e68582@denninger.net> <1468254746.72182.121.camel@freebsd.org> Content-Type: text/plain; charset="us-ascii" X-Mailer: Evolution 3.16.5 FreeBSD GNOME Team Port Mime-Version: 1.0 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Jul 2016 17:39:52 -0000 On Mon, 2016-07-11 at 12:30 -0500, Karl Denninger wrote: > On 7/11/2016 11:32, Ian Lepore wrote: > > On Mon, 2016-07-11 at 09:50 -0400, Brandon Allbery wrote: > > > On Mon, Jul 11, 2016 at 9:46 AM, Karl Denninger < > > > karl@denninger.net> > > > wrote: > > > > > > > Here's the backtrace ... sounds like expected behavior, which > > > > is > > > > not-so > > > > good all-in for a situation like this. I guess the strategy is > > > > to > > > > turn > > > > off softupdates before attempting such an update so as not to > > > > crash > > > > the > > > > host machine if there's a problem with the card. > > > > > > > I would tend to assume that removable media should not have > > > softupdates > > > enabled. Even with properly working media, it's practically > > > begging > > > for > > > corruption. > > > > > Writing to an sdcard without softupdates enabled will be an > > exercise in > > patience. Like, come back next week and maybe it'll be done. > > > > The only thing that comes to mind with this is maybe some sort of > > mount > > flag to say you're willing to live with any amount of filesystem > > corruption in lieu of panicking. I'm not sure how easy/practical > > that > > would be to implement, though. > > > > -- Ian > Why not force-detach the volume that takes the error instead of a > panic()? > Patches welcome. -- Ian > That would lead to a panic if the detached volume was the system > volume > (obviously) but for a data volume it would simply result in it being > forcibly unmounted (and dirty, so if it's corrupt it will get caught > when reattached.) > > It seems that the current paradigm of saying "screw you, panic the > machine" violates the principle of least astonishment and is overly > punitive vis-a-vis necessity. Refusing further I/O because the > volume > may now have a corrupt filesystem appears to be facially reasonable, > but > that doesn't necessarily wind up being fatal the system itself -- it > is > if that's the system volume and is not covered by some sort of > redundancy, obviously, but it's not in all cases. > > (Note that you can't just unmount the filesystem involved in the > error; > it has to be the volume that gets forcibly detached and whatever > flows > through from that you have to live with. The reason is that on any > sort > of solid-state media the OS has zero control over zoning and write > amplification means far more the data you were actually modifying may > have been lost -- it's entirely possible that *several megabytes* of > data just got trashed by the write error, and it's even possible that > the block(s) involved cross a filesystem boundary!) >