Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 20 Apr 2012 22:03:08 +0200
From:      Gijs <gijsje@heteigenwijsje.nl>
To:        freebsd-hardware@freebsd.org
Subject:   AHCI Time-outs while doing scrub or sysctl off line tests
Message-ID:  <4F91C0FC.9040000@heteigenwijsje.nl>

next in thread | raw e-mail | index | archive | help
Hey all,

I'm running 9-Stable with a zfs pool containing 2 sets of 3 disks in 
raidz1.
I added the second set of disks (3 times 1,5 samsung F2EG) after my 
first 3 got filled up (3x1tb only 50G left, it's baaad I know).
After adding the second set of disks I noticed that during scrubs (wich 
would basicly be the highest load the system receives besides some 
bittorrent traffic and file serving) I would start receiving AHCI 
timeouts on ports 3-5, the newly added disks.
Together with that scrub performance is increadibly bad, it dropped to 
below 900kb/s. This might be a result however of zfs fragmentation due 
to the first set being filled up way above the adviced 80% as well as it 
being filled up by torrent clients.

After a port starts sending AHCI errors connection will be dropped after 
some time. If this happens I have to physically disconnect and reconnect 
the drive or do a full system halt (reboot/reset does not help) to get 
the functionality back.
Motherboard is an Asus M4A89GTD Pro wich has an 890GX northbridge and an 
SB850 southbridge.

I've been searching around a lot and did not find anything conclusive 
how to permanently fix this problem.
Some posts seem to suggest that a "cheap" controller like the onboard 
one might suffer from the strain put on it by the heavy ZFS workloads, 
and thus start randomly dropping connections. Some posts suggest that 
the problem is in AHCI/NCQ and that disabling those results in 
resolution of the problems. During boot time the drives are indeed 
configured with NCQ turned on, camcontrol shows both NCQ and tagged 
queuing turned on for the 3 samsung drives wich fail, minimum tagged 
queue depth for tagged queueing is set at 32.
Indeed disabling AHCI resulted in the disappearance of the AHCI 
time-outs, unfortunately of course also in a performance drop and loss 
of hot swap capabilities.
The samsung F3EG drives had problems with NCQ in combination together 
with the SB850 southbridge, so this migh also be a cause into the 
problem, unfortunately seagate did not yet respond to my support question.

As a loss of AHCI functionality is kinda big, I would like to see if 
toggling NCQ per drive is possible, and if it does resolve the problem. 
Any advice on this ?

Cheers,

Gijs




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4F91C0FC.9040000>