Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 01 Sep 2012 00:12:22 +0100
From:      Kaya Saman <kayasaman@gmail.com>
To:        freebsd-fs@FreeBSD.org
Subject:   DMA Error on disk FreeBSD won't boot
Message-ID:  <504144D6.5030800@gmail.com>

next in thread | raw e-mail | index | archive | help
Hi,

I'm pretty sure this means "bad" disk which is surprising considering 
how new the disk is.

I run FreeBSD 8.2 x64 on a little Mini-ITX based server which has 1 SSD 
for root / and 2x 2TB disks used as a ZFS pool called ZFS_POOL_1 and 2x 
4TB disks used as a second ZFS pool called ZFS_POOL_2.

I don't have a ZFS file system on the disks as they were built using the 
'zpool' command so data is basically being stored on raw (pseudo) RAID 0.


Just now one of the disks started reporting DMA timeouts... now I've 
noticed this before with JMicron based controller from STARTECH however 
the disk in question was connected directly to the mother board.

These are the errors I was getting:

  WARNING:  Kernel Errors Present
     ad13: FAILURE - READ_DMA48 status=51<READY,DSC,ERROR> error=10<NID_NOT_FOUND ...:  26 Time(s)

  1 Time(s): ad12: 3815447MB <Hitachi HDS724040ALE640 MJAOA250> at ata6-master UDMA100 SATA 3Gb/s
  1 Time(s): ad13: 3815447MB <Hitachi HDS724040ALE640 MJAOA250> at ata6-slave UDMA100 SATA 3Gb/s
  1 Time(s): ad13: FAILURE - READ_DMA48 timed out LBA=1330838659
  1 Time(s): ad13: TIMEOUT - READ_DMA48 retrying (0 retries left) LBA=1086500736
  1 Time(s): ad13: TIMEOUT - READ_DMA48 retrying (0 retries left) LBA=1330838659
  1 Time(s): ad13: TIMEOUT - READ_DMA48 retrying (1 retry left) LBA=1086500736
  1 Time(s): ad13: TIMEOUT - READ_DMA48 retrying (1 retry left) LBA=1330838659
  1 Time(s): ad13: TIMEOUT - READ_DMA48 retrying (1 retry left) LBA=1409316096
  1 Time(s): ad13: TIMEOUT - READ_DMA48 retrying (1 retry left) LBA=1409317248
  10 Time(s): ad13: TIMEOUT - READ_DMA48 retrying (1 retry left) LBA=1681236467
  4 Time(s): ad13: TIMEOUT - READ_DMA48 retrying (1 retry left) LBA=1681236723
  8 Time(s): ad13: TIMEOUT - READ_DMA48 retrying (1 retry left) LBA=1681236979
  1 Time(s): ad13: TIMEOUT - READ_DMA48 retrying (1 retry left) LBA=1747452672
  1 Time(s): ad14: 1907729MB <WDC WD2001FASS-00U0B0 01.00101> at ata7-master UDMA100 SATA 3Gb/s


am not sure where to go from here?


My usual response would be to dock via USB and see if the drives 
mounted, since this is a ZPOOL the only thing I could think of was to 
run 'zpool import ZFS_POOL_2' on my laptop running FreeBSD 9 x64 and see 
if that worked.

Unfortunately the disks registered in the system but due to some SCSI 
errors didn't actually show under /dev. I can't be more specific 
unfortunately to the type of error due to the fact that my laptop: 
Lenovo X220 doesn't have GUI on FreeBSD partition.. for whatever reason 
even with Kernel patch X won't start. I guess I could have sent it to 
myself using mail command as ext4 file system won't mount on FreeBSD 
either... for my Linux partition.


Anyhow, the disks were not using AHCI as the server system board 
wouldn't detect them so was using ATA mode. It worked fine up till now.


Of course all my data is backed up bar a few gigs if not 10s of gigs 
however, I would really like to recover the issue if even possible.


In Linux I always used to be able to run dd on a failing disk then mount 
it using 'Loop', my aim now as the drive is so large is to do a direct 
dd copy from the failing drive to a new drive but if FreeBSD won't start 
with the drive in place, or add the drive when booted (using hotswap - 
mainly as not sure how to do, have read a little but as not using AHCI 
didn't really find anything substantial), or dock via USB then dd from 
any *NIX based OS.

Could anyone suggest anything?


I guess I could run a Linux recovery CD with the drive inserted and see 
if I could 'dd' from there, though I have an uncanny feeling that it 
won't recognize upon boot??


Regards,


Kaya



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?504144D6.5030800>