From owner-freebsd-bugs@FreeBSD.ORG Thu Aug 16 19:20:09 2012 Return-Path: Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 23E11106566C for ; Thu, 16 Aug 2012 19:20:09 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id F102E8FC14 for ; Thu, 16 Aug 2012 19:20:08 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q7GJK8p8090311 for ; Thu, 16 Aug 2012 19:20:08 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q7GJK8wC090310; Thu, 16 Aug 2012 19:20:08 GMT (envelope-from gnats) Resent-Date: Thu, 16 Aug 2012 19:20:08 GMT Resent-Message-Id: <201208161920.q7GJK8wC090310@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-bugs@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, Dieter Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7770C106564A for ; Thu, 16 Aug 2012 19:10:08 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from red.freebsd.org (red.freebsd.org [IPv6:2001:4f8:fff6::22]) by mx1.freebsd.org (Postfix) with ESMTP id 61F4C8FC0C for ; Thu, 16 Aug 2012 19:10:08 +0000 (UTC) Received: from red.freebsd.org (localhost [127.0.0.1]) by red.freebsd.org (8.14.4/8.14.4) with ESMTP id q7GJA81L050159 for ; Thu, 16 Aug 2012 19:10:08 GMT (envelope-from nobody@red.freebsd.org) Received: (from nobody@localhost) by red.freebsd.org (8.14.4/8.14.4/Submit) id q7GJA7iB050158; Thu, 16 Aug 2012 19:10:07 GMT (envelope-from nobody) Message-Id: <201208161910.q7GJA7iB050158@red.freebsd.org> Date: Thu, 16 Aug 2012 19:10:07 GMT From: Dieter To: freebsd-gnats-submit@FreeBSD.org X-Send-Pr-Version: www-3.1 Cc: Subject: kern/170675: ata(4) hangs system, causing data loss X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Aug 2012 19:20:09 -0000 >Number: 170675 >Category: kern >Synopsis: ata(4) hangs system, causing data loss >Confidential: no >Severity: non-critical >Priority: low >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Thu Aug 16 19:20:08 UTC 2012 >Closed-Date: >Last-Modified: >Originator: Dieter >Release: FreeBSD 8.2 amd64 >Organization: >Environment: FreeBSD 8.2 amd64 >Description: FreeBSD 8.2 amd64 ad6 is a vanilla sata drive /var/log/messages contains: ad6: FAILURE - device detached No other clues are provided. It would be useful if ata(4) told us *why* it decided to detach the drive. Over 24 hours later, the system suddenly hung, for no obvious reason. Thinking that perhaps ata(4) was having some new problem with ad6, I unplugged ad6's data cable. The system then recovered. However, the system was completely hung for 19 minutes, and perhaps would have remained hung forever without manual intervention. THIS RESULTED IN THE UNNECESSARY LOSS OF INCOMING DATA! COMPLETELY UNACCEPTABLE! Other than the device detached message, ata(4) did not output any info at all about this problem. There is no reason that ata(4) should have to hang the entire system for even a millisecond, much less 19 minutes, just because it is having some problem with one disk drive. (ad6 contained only user data, no system partitions or swap) News Flash: hardware isn't perfect and never will be. Hardware sometimes hiccups or fails altogether. FreeBSD needs to deal with failures gracefully and continue servicing the remaining hardware. The phrase "can't walk and chew gum at the same time" comes to mind. I suspect that ata(4) turned off ALL interupts (why all of them? why not just turn off interrupts for the device being serviced?) and then went into an infinite loop. >How-To-Repeat: >Fix: (1) find the offending infinite loop (or whatever) in ata(4) and fix it. (2) Don't turn off all interrupts, just turn off interrupts for the device being serviced. >Release-Note: >Audit-Trail: >Unformatted: