From owner-freebsd-hardware@FreeBSD.ORG Fri Apr 20 20:03:10 2012 Return-Path: Delivered-To: freebsd-hardware@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9755F106566C for ; Fri, 20 Apr 2012 20:03:10 +0000 (UTC) (envelope-from gijsje@heteigenwijsje.nl) Received: from mail.heteigenwijsje.nl (156-49-223.ftth.xms.internl.net [85.223.49.156]) by mx1.freebsd.org (Postfix) with ESMTP id 2392E8FC12 for ; Fri, 20 Apr 2012 20:03:09 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by mail.heteigenwijsje.nl (Postfix) with ESMTP id AAFE816117 for ; Fri, 20 Apr 2012 22:03:08 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=heteigenwijsje.nl; s=dkim; t=1334952188; bh=iGmt/DtO5TDVBjAccfVNxL/V2u6wTVudF9coZIaNOAg=; h=Date:From:To:Subject; b=bpl4k3wQF1PkRExf2uoh97ea7k9NKab+AwlfrZiFBq1USmKJfRbQp7sOyToMXZ9C5 K9uxc2XR0LKC0U7Su+UljNauhxXmL4khhyBqq4GtdxeqGjKzwXakabvmGWN/jg9eRA ne6qeVmaWOYh7espFx9Rwcp8UufjUrM6zm4rCdQQ= Received: from mail.heteigenwijsje.nl ([127.0.0.1]) by localhost (mail.heteigenwijsje.nl [127.0.0.1]) (maiad, port 10024) with ESMTP id 01957-03 for ; Fri, 20 Apr 2012 22:03:06 +0200 (CEST) Received: from [192.168.0.55] (unknown [192.168.0.55]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: gijsje@heteigenwijsje.nl) by mail.heteigenwijsje.nl (Postfix) with ESMTPSA id 75DF716112 for ; Fri, 20 Apr 2012 22:03:06 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=heteigenwijsje.nl; s=dkim; t=1334952186; bh=iGmt/DtO5TDVBjAccfVNxL/V2u6wTVudF9coZIaNOAg=; h=Date:From:To:Subject; b=MezC8NSByhg50nkoMpEaYBN8tUsSwVyMIK3tRcdZ2P7N97893Xjv/ZkcZSFuNQCgY bQ0SXOcYJmBJwGoFJWstZaqYXJgkkZqmbrsNBPq+MyQjqdJyNM9B5ZpHc/gF99spXV y0sXWdd+VXe41gC19gBdcU6awBl5P3+bPiTKF14I= Message-ID: <4F91C0FC.9040000@heteigenwijsje.nl> Date: Fri, 20 Apr 2012 22:03:08 +0200 From: Gijs User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:11.0) Gecko/20120410 Thunderbird/11.0.1 MIME-Version: 1.0 To: freebsd-hardware@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: Maia Mailguard 1.0.3 Subject: AHCI Time-outs while doing scrub or sysctl off line tests X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Apr 2012 20:03:10 -0000 Hey all, I'm running 9-Stable with a zfs pool containing 2 sets of 3 disks in raidz1. I added the second set of disks (3 times 1,5 samsung F2EG) after my first 3 got filled up (3x1tb only 50G left, it's baaad I know). After adding the second set of disks I noticed that during scrubs (wich would basicly be the highest load the system receives besides some bittorrent traffic and file serving) I would start receiving AHCI timeouts on ports 3-5, the newly added disks. Together with that scrub performance is increadibly bad, it dropped to below 900kb/s. This might be a result however of zfs fragmentation due to the first set being filled up way above the adviced 80% as well as it being filled up by torrent clients. After a port starts sending AHCI errors connection will be dropped after some time. If this happens I have to physically disconnect and reconnect the drive or do a full system halt (reboot/reset does not help) to get the functionality back. Motherboard is an Asus M4A89GTD Pro wich has an 890GX northbridge and an SB850 southbridge. I've been searching around a lot and did not find anything conclusive how to permanently fix this problem. Some posts seem to suggest that a "cheap" controller like the onboard one might suffer from the strain put on it by the heavy ZFS workloads, and thus start randomly dropping connections. Some posts suggest that the problem is in AHCI/NCQ and that disabling those results in resolution of the problems. During boot time the drives are indeed configured with NCQ turned on, camcontrol shows both NCQ and tagged queuing turned on for the 3 samsung drives wich fail, minimum tagged queue depth for tagged queueing is set at 32. Indeed disabling AHCI resulted in the disappearance of the AHCI time-outs, unfortunately of course also in a performance drop and loss of hot swap capabilities. The samsung F3EG drives had problems with NCQ in combination together with the SB850 southbridge, so this migh also be a cause into the problem, unfortunately seagate did not yet respond to my support question. As a loss of AHCI functionality is kinda big, I would like to see if toggling NCQ per drive is possible, and if it does resolve the problem. Any advice on this ? Cheers, Gijs