From owner-freebsd-fs@freebsd.org Fri Aug 31 02:58:06 2018 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id DA2CFF6E9EE for ; Fri, 31 Aug 2018 02:58:06 +0000 (UTC) (envelope-from mclewis@genomecenter.ucdavis.edu) Received: from malone.genomecenter.ucdavis.edu (malone.genomecenter.ucdavis.edu [128.120.243.192]) by mx1.freebsd.org (Postfix) with ESMTP id 74BF97DF8F for ; Fri, 31 Aug 2018 02:58:06 +0000 (UTC) (envelope-from mclewis@genomecenter.ucdavis.edu) Received: by malone.genomecenter.ucdavis.edu (Postfix, from userid 20141) id C93F920189; Thu, 30 Aug 2018 19:58:05 -0700 (PDT) Date: Thu, 30 Aug 2018 19:58:05 -0700 From: "M. Casper Lewis" To: Grant Gray Cc: freebsd-fs@freebsd.org Subject: Re: Failing ZFS log devices/panic Message-ID: <20180831025805.GY1473@genomecenter.ucdavis.edu> References: <20180831003436.GW1473@genomecenter.ucdavis.edu> <707525919.257415.1535675999891.JavaMail.zimbra@grantgray.id.au> <20180831011607.GX1473@genomecenter.ucdavis.edu> <1121631966.257797.1535679312447.JavaMail.zimbra@grantgray.id.au> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1121631966.257797.1535679312447.JavaMail.zimbra@grantgray.id.au> User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 31 Aug 2018 02:58:07 -0000 On Fri, Aug 31, 2018 at 11:35:12AM +1000, Grant Gray wrote: > I'm going to defer to people with more experience with this hardware > combination than myself, but I will say I've had compatibility issues with > SATA SSD's on LSI SAS controllers in the past. In my case this manifested > as the SSD's disappearing off the bus and not returning. And after a reboot they are back? Interesting. This is not what we are seeing, rather an accumulation of errors until the device is faulted, despite both SMART and the vendor utility reporting a healthy drive. > I've currently got an issue between some HGST SATA disks and a SAS3008 HBA > where mixing SAS and SATA disks on the same port results in intermittent > I/O errors on the HGST SATA disks, but not other SAS devices. This sounds like what we are seeing, but we're not mixing SAS and SATA. > If feasible, it may be useful to move the SSD's onto a plain SATA controller > as a diagnostic step. Certainly a step to consider if our controller swap does not improve the situation. That said, this is a production fileserver with about half a petabyte of Important Data(tm) on it, so experimentation is not exactly the soup of the day. Which is to say we'd like to take this machine down as infrequently as possible. Diagnostic steps sans downtime are preferable. -- M. Casper Lewis | mclewis@ucdavis.edu Systems Administrator | Voice: (530) 754-7978 Genome Center | University of California, Davis |