FreeBSD Mail Archives

Date:      Mon, 14 Sep 2009 11:51:39 -0500 (CDT)
From:      "Sean C. Farley" <scf@FreeBSD.org>
To:        Mike Tancsa <mike@sentex.net>
Cc:        Miroslav Lachman <000.fbsd@quip.cz>, FreeBSD Current <current@FreeBSD.org>
Subject:   Re: ata timeouts under load
Message-ID:  <alpine.BSF.2.00.0909141126580.38475@thor.farley.org>
In-Reply-To: <200909141526.n8EFQwuG021801@lava.sentex.ca>
References:  <4AAD4E51.5060908@FreeBSD.org> <4AAD5365.5000902@FreeBSD.org> <4AAD5DD2.4030104@FreeBSD.org> <20090914100941.0adc00aa@Nokia-N810-43-7> <4AAE5F7E.4050103@quip.cz> <200909141526.n8EFQwuG021801@lava.sentex.ca>

index | next in thread | previous in thread | raw e-mail

On Mon, 14 Sep 2009, Mike Tancsa wrote:

> At 11:21 AM 9/14/2009, Miroslav Lachman wrote:
>
>> I have very similar problem with one disk in gmirror, but it is on 7.2 
>> not current.
>
>> Sep 14 04:48:29 jimi kernel: ad6: timeout waiting to issue command
>> Sep 14 04:48:29 jimi kernel: ad6: error issuing FLUSHCACHE command
>> Sep 14 04:48:29 jimi kernel: ad6: TIMEOUT - READ_DMA48 retrying (1 retry left) LBA=447001516
>> Sep 14 04:48:29 jimi kernel: ad6: FAILURE - READ_DMA48 status=51<READY,DSC,ERROR> error=4<ABORTED> LBA=447001516
>
> Are you sure this is not just a bad cable ? I have had similar symptoms 
> which was a result of a bad cable.  If possible, swap the cable between 
> the 2 disks and see if it follows the cable.

I also have the same/similar problem with 7.2 (and earlier).  I have 
replaced the cable and the drive.  Replacing the drive resulted in the LBA 
changing, but otherwise the LBA never changes.  Extended offline tests complete without 
errors.

Timeout message:
kernel: ad0: TIMEOUT - READ_DMA retrying (1 retry left) LBA=43471743

I do use this in /boot/loader.conf to help (I hope) prevent the timeout 
from breaking the mirror:
kern.geom.mirror.timeout=45

Reading that region with dd does not produce the timeout, but it may be 
because of this just noticed error:

Error 9 occurred at disk power-on lifetime: 13578 hours (565 days + 18 hours)
   When the command that caused the error occurred, the device was in an unknown state.

   After command completion occurred, registers were:
   ER ST SC SN CL CH DH
   -- -- -- -- -- -- --
   40 59 11 8e 53 97 e2  Error: UNC 17 sectors at LBA = 0x0297538e = 43471758

   Commands leading to the command that caused the error were:
   CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
   -- -- -- -- -- -- -- --  ----------------  --------------------
   c8 02 20 7f 53 97 e2 97      00:04:48.074  READ DMA
   c8 02 20 5f 53 97 e2 97      00:04:48.062  READ DMA
   c8 02 20 3f 53 97 e2 97      00:04:48.050  READ DMA
   c8 02 04 43 6e c5 e2 c5      00:04:48.029  READ DMA
   c8 02 20 ff d6 8b e2 8b      00:04:48.016  READ DMA

Would this error mean that the drive has remapped the block?  However, 
remapping should only occur when the block has a write operation applied 
to it, yes?  Is there a safe way of writing to a specific block?  Would it 
be safe to read a block with dd and write it back?  Of course, the drive 
would not be in the mirror at the time.

Sean
-- 
scf@FreeBSD.org

home | help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.BSF.2.00.0909141126580.38475>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation