From owner-freebsd-bugs@freebsd.org Mon Nov 2 23:12:54 2015 Return-Path: Delivered-To: freebsd-bugs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 38838A24EB6 for ; Mon, 2 Nov 2015 23:12:54 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 0AC261253 for ; Mon, 2 Nov 2015 23:12:54 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id tA2NCr9S035482 for ; Mon, 2 Nov 2015 23:12:53 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-bugs@FreeBSD.org Subject: [Bug 204233] 'Livelocked' Geom mirror when one PATA provider experienced write timeouts Date: Mon, 02 Nov 2015 23:12:54 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 10.2-STABLE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: antiduh@csh.rit.edu X-Bugzilla-Status: New X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-bugs@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Nov 2015 23:12:54 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204233 Bug ID: 204233 Summary: 'Livelocked' Geom mirror when one PATA provider experienced write timeouts Product: Base System Version: 10.2-STABLE Hardware: amd64 OS: Any Status: New Severity: Affects Only Me Priority: --- Component: kern Assignee: freebsd-bugs@FreeBSD.org Reporter: antiduh@csh.rit.edu I had a geom mirror made of two older 160 GB Western Digital PATA HDDs, ada0 and ada1 going into /dev/mirror/gm0. One of the drives experienced failure such that it would respond to writes with only a timeout: > Oct 30 12:34:20 angst (ada0:ata0:0:0:0): WRITE_DMA48. ACB: 35 00 0f a3 64 40 11 00 00 00 20 00 > Oct 30 12:34:20 angst (ada0:ata0:0:0:0): CAM status: Command timeout > Oct 30 12:34:20 angst (ada0:ata0:0:0:0): Retrying command > Oct 30 12:34:20 angst (ada0:ata0:0:0:0): WRITE_DMA48. ACB: 35 00 e1 07 6d 40 12 00 00 00 20 00 > Oct 30 12:38:21 angst (ada0:ata0:0:0:0): CAM status: Command timeout > Oct 30 12:38:21 angst (ada0:ata0:0:0:0): Retrying command (The drive was old, so I suspect hardware failure) Unfortunately, this brought most of the OS to a crawling halt for many hours - gmirror was blocking all IO activity to the gm0 provider because it was waiting for the writes to timeout, which took many minutes. However, the system wasn't completely dead - it seemed as if queued block read requests would work when they could slip in between the blocking writes when the write timeout elapsed. Eventually, I was able to log into the system and manually remove the dead disk from the mirror, after which point the system came back to life. Why didn't gmirror drop the disk automatically? -- You are receiving this mail because: You are the assignee for the bug.