From owner-freebsd-fs@freebsd.org  Fri Aug 31 00:34:45 2018
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id EEA9CEFD228
 for <freebsd-fs@mailman.ysv.freebsd.org>; Fri, 31 Aug 2018 00:34:44 +0000 (UTC)
 (envelope-from mclewis@genomecenter.ucdavis.edu)
Received: from malone.genomecenter.ucdavis.edu
 (malone.genomecenter.ucdavis.edu [128.120.243.192])
 by mx1.freebsd.org (Postfix) with ESMTP id A0751798A7
 for <freebsd-fs@freebsd.org>; Fri, 31 Aug 2018 00:34:44 +0000 (UTC)
 (envelope-from mclewis@genomecenter.ucdavis.edu)
Received: by malone.genomecenter.ucdavis.edu (Postfix, from userid 20141)
 id D843620189; Thu, 30 Aug 2018 17:34:36 -0700 (PDT)
Date: Thu, 30 Aug 2018 17:34:36 -0700
From: "M. Casper Lewis" <mclewis@genomecenter.ucdavis.edu>
To: freebsd-fs@freebsd.org
Subject: Failing ZFS log devices/panic
Message-ID: <20180831003436.GW1473@genomecenter.ucdavis.edu>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.5.21 (2010-09-15)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.27
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 31 Aug 2018 00:34:45 -0000

Greetings,

We are having an issue with stability problems on one of our ZFS fileservers.
The system will run fine for a few days, but gradually report the log
the log devices as failing, and then eventually panic.  After several rounds
of this, we finally removed the log devices and the machine has not
panicked since.

We have tried several different types of SSD (both datacenter and non) and 
the issue happens with all of them.  When queried with the vendor tools, the 
drives all report themselves healthy, and after a reboot they all report 
healthy as well.

The same SSDs are serving as cache devices without issue.

This is FreeBSD 11.2-RELEASE-p2 #2 r337991

Here is a backtrace:

KDB: stack backtrace:
#0 0xffffffff80b3d3c7 at kdb_backtrace+0x67
#1 0xffffffff80af6a37 at vpanic+0x177
#2 0xffffffff80af68b3 at panic+0x43
#3 0xffffffff80deabea at vm_fault_hold+0x244a
#4 0xffffffff80de8755 at vm_fault+0x75
#5 0xffffffff80f7810c at trap_pfault+0x14c
#6 0xffffffff80f777d7 at trap+0x2c7
#7 0xffffffff80f5740c at calltrap+0x8
#8 0xffffffff823442b9 at zfs_log_write+0x169
#9 0xffffffff82350a30 at zfs_freebsd_write+0xb50
#10 0xffffffff810faea3 at VOP_WRITE_APV+0x103
#11 0xffffffff80a32ffb at nfsvno_write+0x12b
#12 0xffffffff80a2af45 at nfsrvd_write+0x4a5
#13 0xffffffff80a1866b at nfsrvd_dorpc+0x11bb
#14 0xffffffff80a287e7 at nfssvc_program+0x557
#15 0xffffffff80d6bcd9 at svc_run_internal+0xe09
#16 0xffffffff80d6c18b at svc_thread_start+0xb
#17 0xffffffff80aba073 at fork_exit+0x83

Any suggestions on what to try next?  We are at a loss as to why the devices
are being marked failed when they clearly are not.

-- 
M. Casper Lewis                     |   mclewis@ucdavis.edu
Systems Administrator               |   Voice: (530) 754-7978
Genome Center                       |
University of California, Davis     |