From owner-freebsd-questions@FreeBSD.ORG Fri Jun 28 22:28:15 2013 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 9ABDCBE3 for ; Fri, 28 Jun 2013 22:28:15 +0000 (UTC) (envelope-from nzp@riseup.net) Received: from mx1.riseup.net (mx1.riseup.net [198.252.153.129]) by mx1.freebsd.org (Postfix) with ESMTP id 6A7B819D4 for ; Fri, 28 Jun 2013 22:28:15 +0000 (UTC) Received: from fulvetta.riseup.net (fulvetta-pn.riseup.net [10.0.1.75]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "*.riseup.net", Issuer "Gandi Standard SSL CA" (not verified)) by mx1.riseup.net (Postfix) with ESMTPS id A9C5046F4D; Fri, 28 Jun 2013 15:28:09 -0700 (PDT) Received: from [127.0.0.1] (localhost [127.0.0.1]) (Authenticated sender: nzp@fulvetta.riseup.net) with ESMTPSA id 73161758 Date: Sat, 29 Jun 2013 00:28:05 +0200 From: Nikola =?utf-8?B?UGF2bG92acSH?= To: Adam Vande More Subject: Re: Troubleshooting a gmirror disk marked broken Message-ID: <20130628222805.GA15414@sputnjik.localdomain> Mail-Followup-To: Adam Vande More , FreeBSD Questions References: <20130627023837.GA7685@sputnjik.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-Virus-Scanned: clamav-milter 0.97.8 at mx1 X-Virus-Status: Clean Cc: FreeBSD Questions X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 28 Jun 2013 22:28:15 -0000 On Wed, Jun 26, 2013 at 10:09:33PM -0500, Adam Vande More wrote: > On Wed, Jun 26, 2013 at 9:38 PM, Nikola Pavlović wrote: > > > Hi, > > > > Last night during a massive (~1 year worth :| ) > > portsnap fetch > > > > the server went unresponsive and ssh eventually disconnected. I decided > > to leave it during the night, and, sure enough, the situation was the > > same in the morning, so I had to do a hard reset. It came back up, but > > one of the two gmirror components was marked as broken and deactivated. > > > > The hang happened during the 'fetching new files or ports' (~24000 of > > them, there are currently ~10000 snapshots in /var/db/portsnap) phase > > of postsnap fetch. > > > > /var/log/messages was completely silent during the period between the > > hang and the reset. > > > > Googling around I found a mention that it's possible to sometimes get a > > 'blip'[*] during busy periods, so I decided to just bite the bullet and > > reinsert the component with > > # gmirror forget gm0 > > # gmirror clean ad4 > > # gmirror insert gm0 ad4 > > > > Currently it's syncing and things *seem* OK. My question is how much > > should I be worried and what could be the cause of this? Is it possible > > that ports snapshot fetching caused this, or that perhaps it was the other > > way around (a failing disk causing the machine to choke during the huge > > portsnap fetch)? How to proceed? :) > > > > The messages log definitely shows problems with your io. The smart log of > the disks are also at least mildly concerning and indicates the drives are > in a preliminary stage of death. Some HD deaths take years to complete. > Expect random glitches and intermittent reduced performance as a continuous > degradation. You might be able to alleviate some of this by switching to > the AHCI driver and bumping up timeouts but at the end of the day 2 flaky > disks in a mirror don't inspire confidence. > About AHCI, it didn't attach after setting ahci_load="YES" in loader.conf so I assumed it wasn't enabled in BIOS. As I don't have physical access to the machine I asked the support to enable it, and presumably they did (that's what they said, and the machine was rebooted when they said they did). But still no luck. It's a VIA 6420 controller and maybe it doesn't support AHCI (couldn't find anything definitive on the net about that). If that's the case, is it even possible that there exists an option to enable it in BIOS? I'm confused because they didn't say it doesn't support it, but explicitly that they enabled it. It's possible to request KVM-over-IP, so I can look for myself, but I don't want to waste time (and install Java just for this) if it's useless. -- To criticize the incompetent is easy; it is more difficult to criticize the competent.