Date: Thu, 20 Apr 2017 21:02:14 +0100 From: Edward Tomasz Napierala <trasz@freebsd.org> To: Bruce Evans <brde@optusnet.com.au> Cc: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r316941 - head/sys/kern Message-ID: <20170420200214.GA1717@brick> In-Reply-To: <20170415064658.L4428@besplex.bde.org> References: <201704142015.v3EKFYWA017623@repo.freebsd.org> <20170415064658.L4428@besplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 0415T0736, Bruce Evans wrote: > On Fri, 14 Apr 2017, Edward Tomasz Napierala wrote: > > > Log: > > Don't try to write out bufs that have already failed with ENXIO. > > This fixes some panics after disconnecting mounted disks. > > > > Submitted by: imp (slightly different version, which I've then lost) > > Reviewed by: kib, imp, mckusick > > MFC after: 2 weeks > > Differential Revision: https://reviews.freebsd.org/D9674 > > > > Modified: > > head/sys/kern/vfs_bio.c > > > > Modified: head/sys/kern/vfs_bio.c > > ============================================================================== > > --- head/sys/kern/vfs_bio.c Fri Apr 14 20:15:17 2017 (r316940) > > +++ head/sys/kern/vfs_bio.c Fri Apr 14 20:15:34 2017 (r316941) > > @@ -2290,18 +2290,28 @@ brelse(struct buf *bp) > > bdirty(bp); > > } > > if (bp->b_iocmd == BIO_WRITE && (bp->b_ioflags & BIO_ERROR) && > > + (bp->b_error != ENXIO || !LIST_EMPTY(&bp->b_dep)) && > > !(bp->b_flags & B_INVAL)) { > > /* > > - * Failed write, redirty. Must clear BIO_ERROR to prevent > > - * pages from being scrapped. > > + * Failed write, redirty. All errors except ENXIO (which > > + * means the device is gone) are expected to be potentially > > + * transient - underlying media might work if tried again > > + * after EIO, and memory might be available after an ENOMEM. > > + * > > + * Do this also for buffers that failed with ENXIO, but have > > + * non-empty dependencies - the soft updates code might need > > + * to access the buffer to untangle them. > > + * > > + * Must clear BIO_ERROR to prevent pages from being scrapped. > > */ > > This is hard to fix, but I have used a version that only retries after > EIO for 15-20 years. I didn't think of ENOMEM. > > The media is unlikely to come back after EIO too. For removable media, > you might be able to get the write done to new media, but a panic reading > from the new media is just as likely. Geom "tasting" might prevent the > new media being used. I think media that actually disappeared will eventually result in ENXIO. That's what GEOMs return when they "wither". > ENXIO is actually the one error that can often be recovered from. I > wrote a form of "tasting" in a toy OS 30-35 years ago. It handled > removal of "mounted" disks with pending writes too well, in a way that > made recovery from non-transient I/O errors almost impossible without > turning off the system. ENXIO was treated as a transient I/O error. > It was recovered from perfectly if the user could find the original > media and unremove it. The "tasting" usually worked to detect different > media and disallow writing cached data to a different disk. Media > errors were common, and when one occurred for writing the method of > replacing the disk by a garbage one did't work since it was a different > disk. The most common one was writing to a write protected disk, and > that was recoverable by removing the write protection. But often you > really didn't want to write to that disk, but wanted to write somewhere. > The only way to continue was to reboot to discard the write. Hah. I actually wrote something similar for FreeBSD: gmountver(8). It's a GEOM class that simply passes BIOs to the lower layer, except when it returns EIO or ENXIO - when that happens it queues the BIO in its queue, closes the provider, and then when it comes back it reattaches and resubmits the BIOs. It might actually be useful again due to not always reliable SD cards one might use for rootfs on Raspberry Pi, for example.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170420200214.GA1717>