Date: Fri, 20 Apr 2012 22:03:08 +0200 From: Gijs <gijsje@heteigenwijsje.nl> To: freebsd-hardware@freebsd.org Subject: AHCI Time-outs while doing scrub or sysctl off line tests Message-ID: <4F91C0FC.9040000@heteigenwijsje.nl>
next in thread | raw e-mail | index | archive | help
Hey all, I'm running 9-Stable with a zfs pool containing 2 sets of 3 disks in raidz1. I added the second set of disks (3 times 1,5 samsung F2EG) after my first 3 got filled up (3x1tb only 50G left, it's baaad I know). After adding the second set of disks I noticed that during scrubs (wich would basicly be the highest load the system receives besides some bittorrent traffic and file serving) I would start receiving AHCI timeouts on ports 3-5, the newly added disks. Together with that scrub performance is increadibly bad, it dropped to below 900kb/s. This might be a result however of zfs fragmentation due to the first set being filled up way above the adviced 80% as well as it being filled up by torrent clients. After a port starts sending AHCI errors connection will be dropped after some time. If this happens I have to physically disconnect and reconnect the drive or do a full system halt (reboot/reset does not help) to get the functionality back. Motherboard is an Asus M4A89GTD Pro wich has an 890GX northbridge and an SB850 southbridge. I've been searching around a lot and did not find anything conclusive how to permanently fix this problem. Some posts seem to suggest that a "cheap" controller like the onboard one might suffer from the strain put on it by the heavy ZFS workloads, and thus start randomly dropping connections. Some posts suggest that the problem is in AHCI/NCQ and that disabling those results in resolution of the problems. During boot time the drives are indeed configured with NCQ turned on, camcontrol shows both NCQ and tagged queuing turned on for the 3 samsung drives wich fail, minimum tagged queue depth for tagged queueing is set at 32. Indeed disabling AHCI resulted in the disappearance of the AHCI time-outs, unfortunately of course also in a performance drop and loss of hot swap capabilities. The samsung F3EG drives had problems with NCQ in combination together with the SB850 southbridge, so this migh also be a cause into the problem, unfortunately seagate did not yet respond to my support question. As a loss of AHCI functionality is kinda big, I would like to see if toggling NCQ per drive is possible, and if it does resolve the problem. Any advice on this ? Cheers, Gijs
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4F91C0FC.9040000>