From owner-freebsd-stable@FreeBSD.ORG Tue Feb 23 17:44:43 2010 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C37C11065676; Tue, 23 Feb 2010 17:44:43 +0000 (UTC) (envelope-from h.schmalzbauer@omnilan.de) Received: from host.omnilan.net (host.omnilan.net [62.245.232.135]) by mx1.freebsd.org (Postfix) with ESMTP id 47C068FC17; Tue, 23 Feb 2010 17:44:42 +0000 (UTC) Received: from titan.flintsbach.schmalzbauer.de (titan.flintsbach.schmalzbauer.de [172.21.1.150]) (authenticated bits=0) by host.omnilan.net (8.13.8/8.13.8) with ESMTP id o1NHifMA049975 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 23 Feb 2010 18:44:42 +0100 (CET) (envelope-from h.schmalzbauer@omnilan.de) Message-ID: <4B841409.5070603@omnilan.de> Date: Tue, 23 Feb 2010 18:44:41 +0100 From: Harald Schmalzbauer Organization: OmniLAN User-Agent: Thunderbird 2.0.0.23 (X11/20090906) MIME-Version: 1.0 To: Alexander Motin References: <1266934981.00222684.1266922202@10.7.7.3> <4B83EFD4.8050403@FreeBSD.org> <4B83FD62.2020407@omnilan.de> <4B83FFEF.7010509@FreeBSD.org> <4B840C54.3010304@omnilan.de> <4B8411EE.5030909@FreeBSD.org> In-Reply-To: <4B8411EE.5030909@FreeBSD.org> X-Enigmail-Version: 0.95.6 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enigCFB73A95EE6E0D03F848B0F4" Cc: freebsd-stable@FreeBSD.org Subject: Re: ahcich timeouts, only with ahci, not with ataahci X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Feb 2010 17:44:43 -0000 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enigCFB73A95EE6E0D03F848B0F4 Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: quoted-printable Alexander Motin schrieb am 23.02.2010 18:35 (localtime): =2E.. >> One understanding question: If the drive doesn't complete a command, >> regardless if it's due to a firmware bug, a disk surface error or >> whatever, is there no way for the driver to terminate the request and >> take the drive offline after some time? This would be a very important= >> behaviour for me. It doesn't make sense building RAIDz storage when a >> failing drive hangs the complete machine, even if the system partition= s >> are on a complete different SSD. >=20 > That's what timeouts are used for. When timeout detected, driver resets= > device and reports error to upper layer. After receiving error, CAM > reinitializes device. If device is completely dead, reinitialization > will fail and device will be dropped immediately. If device is still > alive, reinit succeed and CAM will retry command again. If all retries > failed, error reported to the GEOM layer and then possibly to file > system. I have no idea how RAIDZ behaves in such case. May be after few= > such errors it should drop that device out of array. >=20 > Timeout is a worst possible case for any device, as it takes too much > time and doesn't give any recovery information. Half-dead case is worst= > possible case of timeout. It is difficult to say what which way is > better: drop last drive from degraded array and lost all info, or retry= > forever. There is probably no right answer. I see. Thanks a lot for clarification. Before getting the machine onsite I did some ZFS tests like removing one = disk when cvs checkout was running. I can remember that ZFS hadn't showed the removed drive as offline, but=20 there was no hang. The pool was degraded and after reinserting and=20 rebooting I could resilver the pool. I couldn't manage to get it=20 consistent without rebooting, but I accepted that since I would have to=20 walk on site for changing the drive any way. I'll restore the default vfs.zfs.txg.timeout=3D30, so the hang can be=20 easily reproduced and see if I can 'camcontrol stop' the drive. Do you=20 think I can get usefull information with that test? Thanks, -Harry --------------enigCFB73A95EE6E0D03F848B0F4 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.13 (FreeBSD) iEYEARECAAYFAkuEFAkACgkQLDqVQ9VXb8h4MwCfUKBtFqeqn+MqktUGTsTRqV2T H7gAn3Ki2R5zTt0Zv65fn0yrpmaDqQ9F =2cn/ -----END PGP SIGNATURE----- --------------enigCFB73A95EE6E0D03F848B0F4--