From owner-freebsd-current@FreeBSD.ORG Wed Jan 30 22:16:19 2013 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 93A218A7; Wed, 30 Jan 2013 22:16:19 +0000 (UTC) (envelope-from universite@ukr.net) Received: from ffe17.ukr.net (ffe17.ukr.net [195.214.192.83]) by mx1.freebsd.org (Postfix) with ESMTP id 225C0A11; Wed, 30 Jan 2013 22:16:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=ukr.net; s=ffe; h=Date:Message-Id:From:To:References:In-Reply-To:Subject:Cc:Content-Type:Content-Transfer-Encoding:MIME-Version; bh=fV+Bh0N8JNjGFvYk/bb45iFUqDQptup1FImZtFB/6Y4=; b=Q0ixxUY3cZf4anPbmhMYsDqjv1snjiiijqH2XruWglCDasyj6SNa9gw5XRB1FxfAadL4Izu8PVH+dwDODFd8CEUmO0JLUu5G3eZAtb7lYI3BmRCzYdzYDqul4dlpcTyCDc7S1ny+Vy0Cp5+/vP8OXYFb6/rk7zs2Vg8V1m4tamc=; Received: from mail by ffe17.ukr.net with local ID 1U0fZJ-000N1z-3y ; Wed, 30 Jan 2013 23:51:45 +0200 MIME-Version: 1.0 Content-Disposition: inline Content-Transfer-Encoding: binary Content-Type: text/plain; charset="windows-1251" Subject: Re[2]: Re[2]: AHCI timeout when using ZFS + AIO + NCQ In-Reply-To: <1359317924363-5781425.post@n5.nabble.com> References: <70362.1359299605.3196836531757973504@ffe11.ukr.net> <16B555759C2041ED8185DF478193A59D@multiplay.co.uk> <917933DB5C9A490D93A739058C2507A1@multiplay.co.uk> <93308.1359297551.14145052969567453184@ffe15.ukr.net> <13391.1359029978.3957795939058384896@ffe16.ukr.net> <70578.1359313319.18126575192049975296@ffe16.ukr.net> <221B307551154F489452F89E304CA5F7@multiplay.co.uk> <1359317924363-5781425.post@n5.nabble.com> To: "Beeblebrox" From: "Vladislav Prodan" X-Mailer: freemail.ukr.net 4.0 Message-Id: <87448.1359582705.624376220320202752@ffe17.ukr.net> X-Browser: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:18.0) Gecko/20100101 Firefox/18.0 Date: Wed, 30 Jan 2013 23:51:45 +0200 Cc: freebsd-fs@freebsd.org, freebsd-current@freebsd.org X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 30 Jan 2013 22:16:19 -0000 > I once ran into a very severe AHCI timeout problem. After months of trying to > figure it out and insane "Hardware_ECC_Recovered" error values, I found that > the error was with the power connector plug / sata HDD interface. All errors > disappeared after replacing that cable. Since you have error on more than 1 > HDD, I suggest: > 1. Check smartctl output for each AND all HDD > 2. Check whether your power supply unit is still healthy or if it is > supplying inconsistent power. > 3. Check the main power supply line and whether it shows any voltage > fluctuations or if there is a new heavy consumer of amps on the same power > line as the server is plugged to. > > I've deliberately chose a different server that has a different chipset, and that there were no problems with the HDD. Added kernel support: device ahci # AHCI-compatible SATA controllers And now, after 2.5 days fell off one HDD. [3:14]beastie:root->/root# zpool status pool: tank state: DEGRADED status: One or more devices has been removed by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using 'zpool online' or replace the device with 'zpool replace'. scan: none requested config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 mirror-0 ONLINE 0 0 0 gpt/disk0 ONLINE 0 0 0 gpt/disk2 ONLINE 0 0 0 mirror-1 DEGRADED 0 0 0 gpt/disk1 ONLINE 0 0 0 4931885954389536913 REMOVED 0 0 0 was /dev/gpt/disk3 errors: No known data errors Jan 30 09:49:28 beastie kernel: ahcich3: Timeout on slot 29 port 0 Jan 30 09:49:28 beastie kernel: ahcich3: is 00000000 cs 20000000 ss 00000000 rs 20000000 tfd c0 serr 00000000 cmd 0004dd17 Jan 30 09:49:28 beastie kernel: (ada3:ahcich3:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00 Jan 30 09:49:28 beastie kernel: (ada3:ahcich3:0:0:0): CAM status: Command timeout Jan 30 09:49:28 beastie kernel: (ada3:ahcich3:0:0:0): Retrying command Jan 30 09:51:31 beastie kernel: ahcich3: AHCI reset: device not ready after 31000ms (tfd = 00000080) Jan 30 09:51:31 beastie kernel: ahcich3: Timeout on slot 29 port 0 Jan 30 09:51:31 beastie kernel: ahcich3: is 00000000 cs 20000000 ss 00000000 rs 20000000 tfd 80 serr 00000000 cmd 0004dd17 Jan 30 09:51:31 beastie kernel: (aprobe0:ahcich3:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00 Jan 30 09:51:31 beastie kernel: (aprobe0:ahcich3:0:0:0): CAM status: Command timeout Jan 30 09:51:31 beastie kernel: (aprobe0:ahcich3:0:0:0): Error 5, Retry was blocked Jan 30 09:51:31 beastie kernel: ahcich3: AHCI reset: device not ready after 31000ms (tfd = 00000080) Jan 30 09:51:31 beastie kernel: ahcich3: Timeout on slot 29 port 0 Jan 30 09:51:31 beastie kernel: ahcich3: is 00000000 cs 00000000 ss 00000000 rs 20000000 tfd 58 serr 00000000 cmd 0004dd17 Jan 30 09:51:31 beastie kernel: (aprobe0:ahcich3:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00 Jan 30 09:51:31 beastie kernel: (aprobe0:ahcich3:0:0:0): CAM status: Command timeout Jan 30 09:51:31 beastie kernel: (aprobe0:ahcich3:0:0:0): Error 5, Retry was blocked Jan 30 09:51:31 beastie kernel: (ada3:ahcich3:0:0:0): lost device Jan 30 09:51:31 beastie kernel: (pass3:ahcich3:0:0:0): passdevgonecb: devfs entry is gone -- Vladislav V. Prodan System & Network Administrator http://support.od.ua +380 67 4584408, +380 99 4060508 VVP88-RIPE