Date: Mon, 14 Sep 2009 11:51:39 -0500 (CDT) From: "Sean C. Farley" <scf@FreeBSD.org> To: Mike Tancsa <mike@sentex.net> Cc: Miroslav Lachman <000.fbsd@quip.cz>, FreeBSD Current <current@FreeBSD.org> Subject: Re: ata timeouts under load Message-ID: <alpine.BSF.2.00.0909141126580.38475@thor.farley.org> In-Reply-To: <200909141526.n8EFQwuG021801@lava.sentex.ca> References: <4AAD4E51.5060908@FreeBSD.org> <4AAD5365.5000902@FreeBSD.org> <4AAD5DD2.4030104@FreeBSD.org> <20090914100941.0adc00aa@Nokia-N810-43-7> <4AAE5F7E.4050103@quip.cz> <200909141526.n8EFQwuG021801@lava.sentex.ca>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 14 Sep 2009, Mike Tancsa wrote: > At 11:21 AM 9/14/2009, Miroslav Lachman wrote: > >> I have very similar problem with one disk in gmirror, but it is on 7.2 >> not current. > >> Sep 14 04:48:29 jimi kernel: ad6: timeout waiting to issue command >> Sep 14 04:48:29 jimi kernel: ad6: error issuing FLUSHCACHE command >> Sep 14 04:48:29 jimi kernel: ad6: TIMEOUT - READ_DMA48 retrying (1 retry left) LBA=447001516 >> Sep 14 04:48:29 jimi kernel: ad6: FAILURE - READ_DMA48 status=51<READY,DSC,ERROR> error=4<ABORTED> LBA=447001516 > > Are you sure this is not just a bad cable ? I have had similar symptoms > which was a result of a bad cable. If possible, swap the cable between > the 2 disks and see if it follows the cable. I also have the same/similar problem with 7.2 (and earlier). I have replaced the cable and the drive. Replacing the drive resulted in the LBA changing, but otherwise the LBA never changes. Extended offline tests complete without errors. Timeout message: kernel: ad0: TIMEOUT - READ_DMA retrying (1 retry left) LBA=43471743 I do use this in /boot/loader.conf to help (I hope) prevent the timeout from breaking the mirror: kern.geom.mirror.timeout=45 Reading that region with dd does not produce the timeout, but it may be because of this just noticed error: Error 9 occurred at disk power-on lifetime: 13578 hours (565 days + 18 hours) When the command that caused the error occurred, the device was in an unknown state. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 59 11 8e 53 97 e2 Error: UNC 17 sectors at LBA = 0x0297538e = 43471758 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 02 20 7f 53 97 e2 97 00:04:48.074 READ DMA c8 02 20 5f 53 97 e2 97 00:04:48.062 READ DMA c8 02 20 3f 53 97 e2 97 00:04:48.050 READ DMA c8 02 04 43 6e c5 e2 c5 00:04:48.029 READ DMA c8 02 20 ff d6 8b e2 8b 00:04:48.016 READ DMA Would this error mean that the drive has remapped the block? However, remapping should only occur when the block has a write operation applied to it, yes? Is there a safe way of writing to a specific block? Would it be safe to read a block with dd and write it back? Of course, the drive would not be in the mirror at the time. Sean -- scf@FreeBSD.org
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.BSF.2.00.0909141126580.38475>