From owner-freebsd-questions@freebsd.org Fri Jul 13 19:11:41 2018 Return-Path: Delivered-To: freebsd-questions@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 829031048FC2 for ; Fri, 13 Jul 2018 19:11:41 +0000 (UTC) (envelope-from list@museum.rain.com) Received: from g5.umpquanet.com (ns.umpquanet.com [209.216.177.146]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 0D624742BE for ; Fri, 13 Jul 2018 19:11:40 +0000 (UTC) (envelope-from list@museum.rain.com) Received: from g5.umpquanet.com (localhost [127.0.0.1]) by g5.umpquanet.com (8.15.2/8.15.2) with ESMTPS id w6DJApQP002014 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Fri, 13 Jul 2018 12:10:51 -0700 (PDT) (envelope-from list@museum.rain.com) Received: (from james@localhost) by g5.umpquanet.com (8.15.2/8.15.2/Submit) id w6DJAog1002013; Fri, 13 Jul 2018 12:10:50 -0700 (PDT) (envelope-from list@museum.rain.com) X-Authentication-Warning: g5.umpquanet.com: james set sender to list@museum.rain.com using -f Date: Fri, 13 Jul 2018 12:10:50 -0700 From: Jim Long To: Mike Tancsa Cc: freebsd-questions@freebsd.org Subject: Re: Disk/ZFS activity crash on 11.2-STABLE [SOLVED] Message-ID: <20180713191050.GA98371@g5.umpquanet.com> References: <20180711212959.GA81029@g5.umpquanet.com> <5ebd8573-1363-06c7-cbb2-8298b0894319@sentex.net> <20180712183512.GA75020@g5.umpquanet.com> <20180712214248.GA98578@g5.umpquanet.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180712214248.GA98578@g5.umpquanet.com> User-Agent: Mutt/1.9.5 (2018-04-13) X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 Jul 2018 19:11:41 -0000 On Thu, Jul 12, 2018 at 02:42:48PM -0700, Jim Long wrote: > On Thu, Jul 12, 2018 at 02:49:53PM -0400, Mike Tancsa wrote: > > --snip-- > > > I would try and set a ceiling. On > > RELENG_11 you dont need to reboot > > > > Try > > sysctl -w vfs.zfs.arc_max=77946198016 > > > > which shaves off 20G from what ARC can gobble up. Not sure if thats your > > issue, but it is an issue for some users. > > > > If you are still hurting for caching, an SSD drive or NVME and make it a > > caching device for your pool. > > > > and what does > > zpool status > > show ? > > I set the limit to the value you suggested, and the next test ran less > than three minutes before the machine rebooted, with no crash dump produced. > > I further reduced the limit to 50G and it's been running for about 50 minutes > so far. Fingers crossed. I do have L2ARC I can add if need be. > > I'll keep you posted on how this run goes. > > Thank you, > > Jim It appears that limiting the ARC size did it. The 'zfs send -R' was able to complete with ARC limited to 50G, and a second run with a 60G ARC limit also completed. That is a very handy tunable to know about. Being able to reduce cache size on a running system when needed, to free up RAM, or whatever. I was curious to find the answer to your query about the average size of files on the system, so I ran a 'zdb -b' on the pool. That process began to page out large amounts of RAM into swap, which was making the system rather sluggish, especially once I decided to kill the zdb process. By dropping the ARC size limit, I was able to temporarily free some RAM so that the process could succumb to the SIGKILL signal. Thank you very much for your advice in guiding me to this resolution! Jim