From owner-freebsd-hardware@FreeBSD.ORG Mon Oct 15 17:21:08 2012 Return-Path: Delivered-To: freebsd-hardware@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id F2ACAAED for ; Mon, 15 Oct 2012 17:21:07 +0000 (UTC) (envelope-from nate.keegan@gmail.com) Received: from mail-vb0-f54.google.com (mail-vb0-f54.google.com [209.85.212.54]) by mx1.freebsd.org (Postfix) with ESMTP id A3B488FC16 for ; Mon, 15 Oct 2012 17:21:07 +0000 (UTC) Received: by mail-vb0-f54.google.com with SMTP id v11so7038559vbm.13 for ; Mon, 15 Oct 2012 10:21:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=rBo39Ok+QFER0hKIGHmgXznw8o2RNew94JAm9CmkdhA=; b=hLfQbzY+8a2IUshJVyRfgaeyc41yy0k04fKl7EZy60VvBkVaSuP7C4AJXUHqAT6r7H XNLcxWvuWTZ8m/yMiCGiowuhtNSFLqmjkLmBXEIwM4DJCU3EgjZUFUSHiQYwsO9VOS7k IkTZfGJIodiNz8y1TRAH9mVzaKx0tq392z7bwNIbGQ8bNFFCK7ikW4+s/6UoaHoQpkTv UxRRwVzTssutsYP0Vslud+93sn3TvQaQlsSzaknv8gHlOLJSD+znHdMPhKr1jgGqCGvP c50TDa9FcIHXl+o4sre7MZH1zTh2GG4P/ARiNG1VMLKbPOiMKnb7DtEYV5dMTKaggBcs vp7Q== MIME-Version: 1.0 Received: by 10.221.2.76 with SMTP id nt12mr7088337vcb.12.1350321666934; Mon, 15 Oct 2012 10:21:06 -0700 (PDT) Received: by 10.58.240.42 with HTTP; Mon, 15 Oct 2012 10:21:06 -0700 (PDT) In-Reply-To: References: <20121015095858.GC33428@server.rulingia.com> Date: Mon, 15 Oct 2012 10:21:06 -0700 Message-ID: Subject: Re: ahcich Timeouts SATA SSD From: nate keegan To: freebsd-hardware@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Oct 2012 17:21:08 -0000 I took a look at the DDB man page and I am not able to do this when the issue happens as the system is completely blown up (meaning no keyboard input on IPMI console, existing SSH sessions, etc. No changes have been seen in the ZFS load on the system. The nature of this system (backup) is such that the heaviest load would be created in the first week or so of going online as we use rsync to copy files down from our Windows servers and during this first week or so the system has to 'seed' the initial copies which would be much heavier on I/O than after that first week where things are relatively constant in terms of I/O. I have 48 Gb of Crucial memory that I will put in this system today to replace the 24 Gb or so of Kingston memory I have in the system. If the issue happens again with the memory change I plan on replacing both SSD (Crucial M4) with two non-SSD SATA disks with the idea that maybe the Crucial firmware on the disks (002 on both disks) is the culprit somehow. It neither item turn out to solve the issue will move on to 9.1RC2 or 9.1-RELEASE if it is out by then and adding kernel options requested. The amount of monkeying that I have had to do via /boot/loader.conf and the camcontrol script I run is telling me that the SSD, the firmware on the SSD, etc is somehow causing the issue as we have plenty of other FreeBSD 8.x and 9.x systems that use non-SSD SATA drives without this issue popping up in their daily workloads. My /boot/loader.conf looks like this currently: # Set in the BIOS as well to activate ahci_load="YES" # Should be auto-negotiation in FreeBSD 9.x # See ahci(4) hint.ahcich.0.sata_rev=1 hint.ahcich.1.sata_rev=1 hint.ahcich.0.pm_level=1 hint.ahcich.1.pm_level=1 And /usr/local/etc/rc.d/camcontrol: #!/bin/sh CAMCONTROL=/sbin/camcontrol # Disable NCQ $CAMCONTROL tags ada0 -N 1 > /dev/null $CAMCONTROL tags ada1 -N 1 > /dev/null # Disable APM $CAMCONTROL cmd ada0 -a "EF 85 00 00 00 00 00 00 00 00 00 00" > /dev/null $CAMCONTROL cmd ada1 -a "EF 85 00 00 00 00 00 00 00 00 00 00" > /dev/null Without both of these shims in place I get maybe 1.5 hours to two hours or so before the system goes kablooie and that is without the system doing any real I/O work just running FreeBSD during the business day and a few scripts from cron to check for data and shuffle it around.