From owner-freebsd-stable@FreeBSD.ORG Tue Dec 9 10:09:03 2014 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id DF8F5C2A for ; Tue, 9 Dec 2014 10:09:03 +0000 (UTC) Received: from smtp.free.de (smtp.free.de [91.204.6.103]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 5ABC0C53 for ; Tue, 9 Dec 2014 10:09:03 +0000 (UTC) Received: (qmail 39241 invoked from network); 9 Dec 2014 11:08:59 +0100 Received: from smtp.free.de (HELO orwell) (k@free.de@[91.204.4.103]) (envelope-sender ) by smtp.free.de (qmail-ldap-1.03) with AES128-SHA encrypted SMTP for ; 9 Dec 2014 11:08:59 +0100 Date: Tue, 9 Dec 2014 11:08:57 +0100 From: Kai Gallasch To: freebsd-stable@freebsd.org Subject: Re: 10.1 RC4 r273903 - zpool scrub on ssd mirror - ahci command timeout Message-ID: <20141209110857.6bb9bcbd@orwell> In-Reply-To: <20141209090157.M14881@martymac.org> References: <20141106003240.344dedf6@orwell> <545AB64F.1060502@multiplay.co.uk> <20141106012739.509b96b5@orwell> <545ACCEF.5000300@multiplay.co.uk> <20141209093405.6dd2c268@orwell> <20141209090157.M14881@martymac.org> Organization: FREE! X-Mailer: Claws Mail 3.9.3 (GTK+ 2.24.23; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; boundary="Sig_/4ki=HT6Fq8P1xA3z3vESpzF"; protocol="application/pgp-signature" X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Dec 2014 10:09:04 -0000 --Sig_/4ki=HT6Fq8P1xA3z3vESpzF Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Am Tue, 9 Dec 2014 09:04:26 +0000 (UTC) schrieb "Ganael LAPLANCHE" : > On Tue, 9 Dec 2014 09:34:05 +0100, Kai Gallasch wrote >=20 > Hi Kai, >=20 > > Any ideas (left) ? >=20 > There is a PR for AHCI timeouts : >=20 > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D195349 >=20 > I don't know if it is related to your problem but maybe you can try > the suggested workaround ? Thank you for this information. But no. My problem seems to be unrelated.. K. echo 'hint.ahci.0.msi=3D"0"' >> /boot/loader.conf After reboot: # zpool scrub ssdpool # zpool status ssdpool pool: ssdpool state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://illumos.org/msg/ZFS-8000-9P scan: scrub in progress since Tue Dec 9 10:36:24 2014 5.36G scanned out of 115G at 166M/s, 0h11m to go 24.5K repaired, 4.65% done config: NAME STATE READ WRITE CKSUM ssdpool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 gpt/ssdpool0 ONLINE 0 0 13 (repairing) gpt/ssdpool1 ONLINE 0 0 0 errors: No known data errors After the zpool scrub finished: # zpool status ssdpool pool: ssdpool state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://illumos.org/msg/ZFS-8000-9P scan: scrub repaired 38.5K in 0h9m with 0 errors on Tue Dec 9 10:45:58 2014 config: NAME STATE READ WRITE CKSUM ssdpool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 gpt/ssdpool0 ONLINE 0 0 15 gpt/ssdpool1 ONLINE 0 0 4 # zpool clear ssdpool # zpool scrub ssdpool This "zpool scrub" run one SSD drive is lost during the scrub :-/ A "camcontrol rescan all" does not bring the missing ssd drive back.. # zpool status ssdpool pool: ssdpool state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using 'zpool online'. see: http://illumos.org/msg/ZFS-8000-2Q scan: scrub canceled on Tue Dec 9 10:58:42 2014 config: NAME STATE READ WRITE CKSUM ssdpool DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 gpt/ssdpool0 ONLINE 0 0 0 2481016284460057031 UNAVAIL 297 215 47 was /dev/gpt/ssdpool1 ahcich3: Timeout on slot 24 port 0 ahcich3: is 00000000 cs fc00001f ss ff00001f rs ff00001f tfd 40 serr 000000= 00 cmd 0024d917 (ada3:ahcich3:0:0:0): READ_FPDMA_QUEUED. ACB: 60 24 1b 4a c6 40 1e 00 00 00= 00 00 (ada3:ahcich3:0:0:0): CAM status: Command timeout (ada3:ahcich3:0:0:0): Retrying command ahcich3: AHCI reset: device not ready after 31000ms (tfd =3D 00000080) ahcich3: Timeout on slot 4 port 0 ahcich3: is 00000000 cs 00000010 ss 00000000 rs 00000010 tfd 80 serr 000000= 00 cmd 0024c417 (aprobe0:ahcich3:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 0= 0 00 (aprobe0:ahcich3:0:0:0): CAM status: Command timeout (aprobe0:ahcich3:0:0:0): Retrying command ahcich3: AHCI reset: device not ready after 31000ms (tfd =3D 00000080) ahcich3: Timeout on slot 4 port 0 ahcich3: is 00000000 cs 00000010 ss 00000000 rs 00000010 tfd 80 serr 000000= 00 cmd 0024c417 (aprobe0:ahcich3:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 0= 0 00 (aprobe0:ahcich3:0:0:0): CAM status: Command timeout (aprobe0:ahcich3:0:0:0): Error 5, Retries exhausted ahcich3: AHCI reset: device not ready after 31000ms (tfd =3D 00000080) ahcich3: Timeout on slot 4 port 0 ahcich3: is 00000000 cs 00000010 ss 00000000 rs 00000010 tfd 80 serr 000000= 00 cmd 0024c417 (aprobe0:ahcich3:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 0= 0 00 (aprobe0:ahcich3:0:0:0): CAM status: Command timeout (aprobe0:ahcich3:0:0:0): Error 5, Retry was blocked ada3 at ahcich3 bus 0 scbus3 target 0 lun 0 ada3: s/n S1SXNSAFA06835A detached ahcich3: AHCI reset: device not ready after 31000ms (tfd =3D 00000080) ahcich3: Timeout on slot 4 port 0 ahcich3: is 00000000 cs 00000010 ss 00000000 rs 00000010 tfd 80 serr 000000= 00 cmd 0024c417 (aprobe0:ahcich3:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 0= 0 00 (aprobe0:ahcich3:0:0:0): CAM status: Command timeout (aprobe0:ahcich3:0:0:0): Retrying command ahcich3: AHCI reset: device not ready after 31000ms (tfd =3D 00000080) ahcich3: Timeout on slot 4 port 0 ahcich3: is 00000000 cs 00000010 ss 00000000 rs 00000010 tfd 80 serr 000000= 00 cmd 0024c417 (aprobe0:ahcich3:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 0= 0 00 (aprobe0:ahcich3:0:0:0): CAM status: Command timeout (aprobe0:ahcich3:0:0:0): Error 5, Retries exhausted ahcich3: AHCI reset: device not ready after 31000ms (tfd =3D 00000080) ahcich3: Poll timeout on slot 4 port 0 ahcich3: is 00000000 cs 00000010 ss 00000000 rs 00000010 tfd 80 serr 000000= 00 cmd 0024c417 (aprobe0:ahcich3:0:0:0): NOP. ACB: 00 00 00 00 00 00 00 00 00 00 00 00 (aprobe0:ahcich3:0:0:0): CAM status: Command timeout (aprobe0:ahcich3:0:0:0): Error 5, Retries exhausted ahcich3: Timeout on slot 4 port 0 ahcich3: is 00000000 cs 00000010 ss 00000000 rs 00000010 tfd 80 serr 000000= 00 cmd 0024c417 (ada3:ahcich3:0:0:0): SETFEATURES ENABLE RCACHE. ACB: ef aa 00 00 00 40 00 = 00 00 00 00 00 (ada3:ahcich3:0:0:0): CAM status: Unconditionally Re-queue Request (ada3:ahcich3:0:0:0): Error 5, Periph was invalidated ahcich3: AHCI reset: device not ready after 31000ms (tfd =3D 00000080) ahcich3: Poll timeout on slot 4 port 0 ahcich3: is 00000000 cs 00000010 ss 00000000 rs 00000010 tfd 80 serr 000000= 00 cmd 0024c417 (aprobe0:ahcich3:0:0:0): NOP. ACB: 00 00 00 00 00 00 00 00 00 00 00 00 (aprobe0:ahcich3:0:0:0): CAM status: Command timeout (aprobe0:ahcich3:0:0:0): Error 5, Retries exhausted ahcich3: Timeout on slot 4 port 0 ahcich3: is 00000000 cs 0001fff0 ss 0001fff0 rs 0001fff0 tfd 80 serr 000000= 00 cmd 0024c417 (ada3:ahcich3:0:0:0): READ_FPDMA_QUEUED. ACB: 60 24 1b 4a c6 40 1e 00 00 00= 00 00 (ada3:ahcich3:0:0:0): CAM status: Command timeout (ada3:ahcich3:0:0:0): Error 5, Periph was invalidated (ada3:ahcich3:0:0:0): READ_FPDMA_QUEUED. ACB: 60 15 3f 4a c6 40 1e 00 00 00= 00 00 (ada3:ahcich3:0:0:0): CAM status: Unconditionally Re-queue Request (ada3:ahcich3:0:0:0): Error 5, Periph was invalidated (ada3:ahcich3:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 08 ed c4 21 40 20 00 00 0= 0 00 00 (ada3:ahcich3:0:0:0): CAM status: Unconditionally Re-queue Request (ada3:ahcich3:0:0:0): Error 5, Periph was invalidated (ada3:ahcich3:0:0:0): READ_FPDMA_QUEUED. ACB: 60 00 71 19 d3 40 04 00 00 01= 00 00 (ada3:ahcich3:0:0:0): CAM status: Unconditionally Re-queue Request (ada3:ahcich3:0:0:0): Error 5, Periph was invalidated (ada3:ahcich3:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 f0 bb 43 23 40 20 00 00 0= 0 00 00 (ada3:ahcich3:0:0:0): CAM status: Unconditionally Re-queue Request (ada3:ahcich3:0:0:0): Error 5, Periph was invalidated (ada3:ahcich3:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 e8 bb 45 23 40 20 00 00 0= 0 00 00 (ada3:ahcich3:0:0:0): CAM status: Unconditionally Re-queue Request (ada3:ahcich3:0:0:0): Error 5, Periph was invalidated (ada3:ahcich3:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 e8 bb 46 23 40 20 00 00 0= 0 00 00 (ada3:ahcich3:0:0:0): CAM status: Unconditionally Re-queue Request (ada3:ahcich3:0:0:0): Error 5, Periph was invalidated (ada3:ahcich3:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 e8 bb 47 23 40 20 00 00 0= 0 00 00 (ada3:ahcich3:0:0:0): CAM status: Unconditionally Re-queue Request (ada3:ahcich3:0:0:0): Error 5, Periph was invalidated (ada3:ahcich3:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 e8 bb 48 23 40 20 00 00 0= 0 00 00 (ada3:ahcich3:0:0:0): CAM status: Unconditionally Re-queue Request (ada3:ahcich3:0:0:0): Error 5, Periph was invalidated (ada3:ahcich3:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 e8 bb 49 23 40 20 00 00 0= 0 00 00 (ada3:ahcich3:0:0:0): CAM status: Unconditionally Re-queue Request (ada3:ahcich3:0:0:0): Error 5, Periph was invalidated (ada3:ahcich3:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 e8 bb 4a 23 40 20 00 00 0= 0 00 00 (ada3:ahcich3:0:0:0): CAM status: Unconditionally Re-queue Request (ada3:ahcich3:0:0:0): Error 5, Periph was invalidated (ada3:ahcich3:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 e8 bb 4b 23 40 20 00 00 0= 0 00 00 (ada3:ahcich3:0:0:0): CAM status: Unconditionally Re-queue Request (ada3:ahcich3:0:0:0): Error 5, Periph was invalidated (ada3:ahcich3:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 e8 bb 4c 23 40 20 00 00 0= 0 00 00 (ada3:ahcich3:0:0:0): CAM status: Unconditionally Re-queue Request (ada3:ahcich3:0:0:0): Error 5, Periph was invalidated (ada3:ahcich3:0:0:0): Periph destroyed ahcich3: AHCI reset: device not ready after 31000ms (tfd =3D 00000080) ahcich3: Poll timeout on slot 16 port 0 ahcich3: is 00000000 cs 00010000 ss 00000000 rs 00010000 tfd 80 serr 000000= 00 cmd 0024d017 (aprobe0:ahcich3:0:0:0): NOP. ACB: 00 00 00 00 00 00 00 00 00 00 00 00 (aprobe0:ahcich3:0:0:0): CAM status: Command timeout (aprobe0:ahcich3:0:0:0): Error 5, Retries exhausted --=20 PGP-KeyID =3D 0xE401B671927D4A5C I am a robot. --Sig_/4ki=HT6Fq8P1xA3z3vESpzF Content-Type: application/pgp-signature; name=signature.asc Content-Disposition: attachment; filename=signature.asc -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAEBCgAGBQJUhso5AAoJEHBlTXxPsfWIUfAP/3PBpfvPXY/eCbXtF5Kh+08+ pTASCmwlAjyB2HGQG3MkknyhcDzLU9Anaht+R4qAiUF579i5Zzn/2Vg/516h+Fon HSIZPQUOrrvKbAqY5gAaAwLWiIBtphL1jt1tZBtLlDLA0cZ6DsRa7XWlsCrtoraV G54kTGczCc3gZR7m7QJr0b9CPA8DNp0dEUZVUpIN3fq7NRWXkp88TY7AJyHclkPz LHH8DMK8/uLwwCFkrmUIpYRdAsu1VdvWvYMDiFa36v+ZbbG+f1HPyJa6GiSIHX3C Tdrb5KWij8eNdRXFsoil4dTIQXfqdKOfgcVZVXI7cztI5LBqUrHNMp7hxgHvF8QT u7XPn+GhBMkbPCGZJCu5cl37gz7Mom2mHa96eMTj8Ezcbg8OJCtPCdkluDwEpaNK mxE8OprktNDRD9jcCUShC6ZzisEp0phWunobaO74pyZ3GSOwHnJBx8YOR56Gb7x9 KwF033fi3TRGvf32P0GlmQbJc2U1/q9tNmaoeubOsIjOliUE8EBKP/vu+zSj+3sY iwpvIjJFhhs42iAPrmESJzk3kwjZl2ZSHMLrQHLRlX07q1GVg6a5xKGJY1DyXtLR QA0XWbcKabXEb+Yt/AkMI1CKQp6fvJoJVcFij03Ln8iQ96Iiu8tlxNnkbsEsqRQ4 NRaJ8rchIfPs0HlF0Hqv =MglT -----END PGP SIGNATURE----- --Sig_/4ki=HT6Fq8P1xA3z3vESpzF--