From owner-freebsd-scsi@FreeBSD.ORG Tue Nov 1 20:32:04 2011 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 76867106566B for ; Tue, 1 Nov 2011 20:32:04 +0000 (UTC) (envelope-from nitroboost@gmail.com) Received: from mail-bw0-f54.google.com (mail-bw0-f54.google.com [209.85.214.54]) by mx1.freebsd.org (Postfix) with ESMTP id 004EC8FC16 for ; Tue, 1 Nov 2011 20:32:03 +0000 (UTC) Received: by bkbzs2 with SMTP id zs2so5225899bkb.13 for ; Tue, 01 Nov 2011 13:32:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=+Fn/8sZCdbJlV92LzMXlrggIA4HOQJixyp4tB0xttk8=; b=YJSajKwityjlkE+LWXxvjnx90odRVY0+VAOF5/nzH8MS2SIe6pqAzX4xcAhEm/xpOB hEXVsJJHVCgv/4CV+Vsrs/jH7EB9OHmHwjdcdur9tt0Y8HTH9Yzx34Bed6sAAHYSgI8v OhKwBlFyocOQIW6nDVF3D4PaatHOgMuEesiTc= MIME-Version: 1.0 Received: by 10.182.74.41 with SMTP id q9mr257137obv.28.1320179522178; Tue, 01 Nov 2011 13:32:02 -0700 (PDT) Received: by 10.182.35.193 with HTTP; Tue, 1 Nov 2011 13:32:01 -0700 (PDT) In-Reply-To: <4EAEF431.7090108@brockmann-consult.de> References: <4EAEF431.7090108@brockmann-consult.de> Date: Tue, 1 Nov 2011 13:32:01 -0700 Message-ID: From: Jason Wolfe To: Peter Maloney Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-scsi@freebsd.org Subject: Re: mps/LSI SAS2008 controller crashes when smartctl is run with upped disk tags X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Nov 2011 20:32:04 -0000 On Mon, Oct 31, 2011 at 12:17 PM, Peter Maloney < peter.maloney@brockmann-consult.de> wrote: > Dear Jason, > > I get a simlar problem on a system with an LSI 9211-8i with 20 SATA > disks attached (2 SSDs and 18 spnning disks). My system doesn't hang, > panic, or reset though. I just lose access to one disk, which is then > considered FAULTED in my zpool status (with the ZFS file system). If I > physically remove the FAULTED disk and run "gpart recover da0", I get a > panic. Otherwise, the system keeps running in a degraded state. When I > reboot and resilver, some data is found damaged and repaired, not just > refreshed with the latest state. The server has 1 HBA and 2 backplanes, > and I have the 2 mirrored root disks on different backplanes. Maybe that > is why mine runs degraded and yours hang. > > This happened twice so far (in around a month or two), and both times it > was one of the mirrored root disks (SSDs) that faulted. > > My tags are set to 255. I will try reproducing it as you said, and then > if it fails, rebooting and trying again setting tags to 2 as you suggested. > > And *thank you very much for this information*. This is the last > outstanding issue with this server. I hope this workaround helps. > > # camcontrol tags /dev/da0 > (pass0:mps0:0:7:0): device openings: 255 > Peter, This happens 'randomly' for you, or do you have some automated process running smartctl that trips the drives up occasionally? The way I'm getting around it currently is to just move /usr/local/sbin/smartctl elsewhere, and replacing it with a wrapper that simply drops the tags to 1, executes to the new smartctl location with the options passed, then moves the tags back to whatever you prefer. There will obviously be a small detriment here, but it should be fairly quick and hopefully not even noticeable in your case. If smartctl is not triggering these events for you, any idea what is? Jason