From owner-freebsd-fs@freebsd.org Fri Aug 31 00:34:45 2018 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id EEA9CEFD228 for ; Fri, 31 Aug 2018 00:34:44 +0000 (UTC) (envelope-from mclewis@genomecenter.ucdavis.edu) Received: from malone.genomecenter.ucdavis.edu (malone.genomecenter.ucdavis.edu [128.120.243.192]) by mx1.freebsd.org (Postfix) with ESMTP id A0751798A7 for ; Fri, 31 Aug 2018 00:34:44 +0000 (UTC) (envelope-from mclewis@genomecenter.ucdavis.edu) Received: by malone.genomecenter.ucdavis.edu (Postfix, from userid 20141) id D843620189; Thu, 30 Aug 2018 17:34:36 -0700 (PDT) Date: Thu, 30 Aug 2018 17:34:36 -0700 From: "M. Casper Lewis" To: freebsd-fs@freebsd.org Subject: Failing ZFS log devices/panic Message-ID: <20180831003436.GW1473@genomecenter.ucdavis.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 31 Aug 2018 00:34:45 -0000 Greetings, We are having an issue with stability problems on one of our ZFS fileservers. The system will run fine for a few days, but gradually report the log the log devices as failing, and then eventually panic. After several rounds of this, we finally removed the log devices and the machine has not panicked since. We have tried several different types of SSD (both datacenter and non) and the issue happens with all of them. When queried with the vendor tools, the drives all report themselves healthy, and after a reboot they all report healthy as well. The same SSDs are serving as cache devices without issue. This is FreeBSD 11.2-RELEASE-p2 #2 r337991 Here is a backtrace: KDB: stack backtrace: #0 0xffffffff80b3d3c7 at kdb_backtrace+0x67 #1 0xffffffff80af6a37 at vpanic+0x177 #2 0xffffffff80af68b3 at panic+0x43 #3 0xffffffff80deabea at vm_fault_hold+0x244a #4 0xffffffff80de8755 at vm_fault+0x75 #5 0xffffffff80f7810c at trap_pfault+0x14c #6 0xffffffff80f777d7 at trap+0x2c7 #7 0xffffffff80f5740c at calltrap+0x8 #8 0xffffffff823442b9 at zfs_log_write+0x169 #9 0xffffffff82350a30 at zfs_freebsd_write+0xb50 #10 0xffffffff810faea3 at VOP_WRITE_APV+0x103 #11 0xffffffff80a32ffb at nfsvno_write+0x12b #12 0xffffffff80a2af45 at nfsrvd_write+0x4a5 #13 0xffffffff80a1866b at nfsrvd_dorpc+0x11bb #14 0xffffffff80a287e7 at nfssvc_program+0x557 #15 0xffffffff80d6bcd9 at svc_run_internal+0xe09 #16 0xffffffff80d6c18b at svc_thread_start+0xb #17 0xffffffff80aba073 at fork_exit+0x83 Any suggestions on what to try next? We are at a loss as to why the devices are being marked failed when they clearly are not. -- M. Casper Lewis | mclewis@ucdavis.edu Systems Administrator | Voice: (530) 754-7978 Genome Center | University of California, Davis |