From owner-freebsd-questions@freebsd.org  Fri Jul 13 19:11:41 2018
Return-Path: <owner-freebsd-questions@freebsd.org>
Delivered-To: freebsd-questions@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 829031048FC2
 for <freebsd-questions@mailman.ysv.freebsd.org>;
 Fri, 13 Jul 2018 19:11:41 +0000 (UTC)
 (envelope-from list@museum.rain.com)
Received: from g5.umpquanet.com (ns.umpquanet.com [209.216.177.146])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 0D624742BE
 for <freebsd-questions@freebsd.org>; Fri, 13 Jul 2018 19:11:40 +0000 (UTC)
 (envelope-from list@museum.rain.com)
Received: from g5.umpquanet.com (localhost [127.0.0.1])
 by g5.umpquanet.com (8.15.2/8.15.2) with ESMTPS id w6DJApQP002014
 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Fri, 13 Jul 2018 12:10:51 -0700 (PDT)
 (envelope-from list@museum.rain.com)
Received: (from james@localhost)
 by g5.umpquanet.com (8.15.2/8.15.2/Submit) id w6DJAog1002013;
 Fri, 13 Jul 2018 12:10:50 -0700 (PDT)
 (envelope-from list@museum.rain.com)
X-Authentication-Warning: g5.umpquanet.com: james set sender to
 list@museum.rain.com using -f
Date: Fri, 13 Jul 2018 12:10:50 -0700
From: Jim Long <list@museum.rain.com>
To: Mike Tancsa <mike@sentex.net>
Cc: freebsd-questions@freebsd.org
Subject: Re: Disk/ZFS activity crash on 11.2-STABLE [SOLVED]
Message-ID: <20180713191050.GA98371@g5.umpquanet.com>
References: <20180711212959.GA81029@g5.umpquanet.com>
 <5ebd8573-1363-06c7-cbb2-8298b0894319@sentex.net>
 <20180712183512.GA75020@g5.umpquanet.com>
 <a069a076-df1c-80b2-1116-787e0a948ed9@sentex.net>
 <20180712214248.GA98578@g5.umpquanet.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20180712214248.GA98578@g5.umpquanet.com>
User-Agent: Mutt/1.9.5 (2018-04-13)
X-BeenThere: freebsd-questions@freebsd.org
X-Mailman-Version: 2.1.27
Precedence: list
List-Id: User questions <freebsd-questions.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-questions>, 
 <mailto:freebsd-questions-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-questions/>
List-Post: <mailto:freebsd-questions@freebsd.org>
List-Help: <mailto:freebsd-questions-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
 <mailto:freebsd-questions-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 13 Jul 2018 19:11:41 -0000

On Thu, Jul 12, 2018 at 02:42:48PM -0700, Jim Long wrote:
> On Thu, Jul 12, 2018 at 02:49:53PM -0400, Mike Tancsa wrote:
> 
> --snip--
> 
> > I would try and set a ceiling. On
> > RELENG_11 you dont need to reboot
> > 
> > Try
> > sysctl -w vfs.zfs.arc_max=77946198016
> > 
> > which shaves off 20G from what ARC can gobble up. Not sure if thats your
> > issue, but it is an issue for some users.
> > 
> > If you are still hurting for caching, an SSD drive or NVME and make it a
> > caching device for your pool.
> > 
> > and what does
> > zpool status
> > show ?
> 
> I set the limit to the value you suggested, and the next test ran less
> than three minutes before the machine rebooted, with no crash dump produced.
> 
> I further reduced the limit to 50G and it's been running for about 50 minutes
> so far.  Fingers crossed.  I do have L2ARC I can add if need be.
> 
> I'll keep you posted on how this run goes.
> 
> Thank you,
> 
> Jim

It appears that limiting the ARC size did it.  The 'zfs send -R' was
able to complete with ARC limited to 50G, and a second run with a 60G
ARC limit also completed.

That is a very handy tunable to know about.  Being able to reduce cache
size on a running system when needed, to free up RAM, or whatever.  I
was curious to find the answer to your query about the average size of
files on the system, so I ran a 'zdb -b' on the pool.  That process
began to page out large amounts of RAM into swap, which was making the
system rather sluggish, especially once I decided to kill the zdb
process.  By dropping the ARC size limit, I was able to temporarily free
some RAM so that the process could succumb to the SIGKILL signal.

Thank you very much for your advice in guiding me to this resolution!

Jim