Date: Tue, 08 Jan 2008 17:28:46 -0500 From: "Stephen M. Rumble" <stephen.rumble@utoronto.ca> To: freebsd-stable@freebsd.org Subject: RELENG_7: zfs mirror causes ata timeout Message-ID: <20080108172846.2lglrcvo0qsk88o0@webmail.utoronto.ca>
next in thread | raw e-mail | index | archive | help
Hi all, I'm having a bit of trouble with a new machine running the latest RELENG_7 code. I have two 500GB WD Caviar GP disks on a mini-itx GM965-based board (MSI "fuzzy") running amd64 with 4GB of ram. The disks are: ad4: 476940MB <WDC WD5000AACS-00ZUB0 01.01B01> at ata2-master SATA150 ad6: 476940MB <WDC WD5000AACS-00ZUB0 01.01B01> at ata3-master SATA150 Both appear to work great alone with UFS and ZFS and separate filesystems/pools. However, soon after I create a ZFS mirror between the two I run into the following sort of trouble: ad6: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly ad6: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly ad6: WARNING - SETFEATURES ENABLE RCACHE taskqueue timeout - completing request directly ad6: WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout - completing request directly ad6: WARNING - SET_MULTI taskqueue timeout - completing request directly ad6: FAILURE - READ_DMA timed out LBA=xxxxxxxx Usually these continue on ad infinitum. Sometimes the machine recovers, only to fail soon after. These errors also aren't trivial to reproduce. They seem to happen at random, especially when the system is under low utilisation. Sometimes, however, they occur immediately upon boot. I've tried different power supplies and cables. I've enabled and disabled spread spectrum clocking and tried both SATA300 and SATA150 rates. I've also tried switching drives between ports so that what was ad4 is ad6 and what was ad6 is ad4. The problems persist, but seem to follow the same drive (ad6 originally, then ad4 when swapped). This seems to indicate a drive problem, but it works great on its own, even when exercising both disks simultaneously. SMART reports no problems and ZFS reports no issues when ad6 is used on its own outside of a zfs mirror. It seems like it's the drive, but it works fine when not in a mirror. I'm stumped. Any ideas? The only interesting bit of evidence I could find is that when these errors do occur, smartctl reports an increase in the Start_Stop_Count field on ad6. ad4, which appears to work fine, doesn't demonstrate this and has a much lower value. Any input would be appreciated. I've tried disabling ACPI, but the kernel cannot find the controller (ICH8M). I'm using AHCI, but compatibility mode doesn't appear to alter the behaviour. I don't know if it's important, but I'm not using ZFS on the whole drive, just ad{4,6}s1d. Any help would be appreciated. Thanks, Steve P.S. Please cc me on replies as I'm not subscribed.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080108172846.2lglrcvo0qsk88o0>