From owner-freebsd-arm@freebsd.org  Tue Aug  4 22:37:57 2020
Return-Path: <owner-freebsd-arm@freebsd.org>
Delivered-To: freebsd-arm@mailman.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.nyi.freebsd.org (Postfix) with ESMTP id B30BF3A1EF8
 for <freebsd-arm@mailman.nyi.freebsd.org>;
 Tue,  4 Aug 2020 22:37:57 +0000 (UTC)
 (envelope-from bsd@zeppelin.net)
Received: from dazed.zeppelin.net (dazed.zeppelin.net [75.144.17.114])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256)
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 4BLqQX4WBmz40BR
 for <freebsd-arm@freebsd.org>; Tue,  4 Aug 2020 22:37:56 +0000 (UTC)
 (envelope-from bsd@zeppelin.net)
Received: from rp64bsd.zeppelin.net.zeppelin.net (pfsense.zeppelin.net
 [75.144.17.117])
 by dazed.zeppelin.net (Postfix) with ESMTP id 7B47A19B7FE
 for <freebsd-arm@freebsd.org>; Tue,  4 Aug 2020 15:37:50 -0700 (PDT)
Date: Tue, 04 Aug 2020 15:37:48 -0700
Message-ID: <87k0yevxz7.wl-bsd@zeppelin.net>
From: Josh Howard <bsd@zeppelin.net>
To: freebsd-arm@freebsd.org
Subject: Re: ARM64 hosts eventually lockup running net-mgmt/unifi
In-Reply-To: <op.0osbjnp5kndu52@sjakie>
References: <87zh7bis29.wl-bsd@zeppelin.net>
	<op.0osbjnp5kndu52@sjakie>
User-Agent: Wanderlust/2.15.9 (Almost Unreal) Emacs/28.0 Mule/6.0
 (HANACHIRUSATO)
MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue")
Content-Type: text/plain; charset=US-ASCII
X-Rspamd-Queue-Id: 4BLqQX4WBmz40BR
X-Spamd-Bar: ++
Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none;
 spf=fail (mx1.freebsd.org: domain of bsd@zeppelin.net does not designate
 75.144.17.114 as permitted sender) smtp.mailfrom=bsd@zeppelin.net
X-Spamd-Result: default: False [2.90 / 15.00]; ARC_NA(0.00)[];
 R_SPF_FAIL(1.00)[-all]; FROM_HAS_DN(0.00)[];
 TO_MATCH_ENVRCPT_ALL(0.00)[]; NEURAL_SPAM_SHORT(0.23)[0.229];
 MIME_GOOD(-0.10)[text/plain]; TO_DN_NONE(0.00)[];
 PREVIOUSLY_DELIVERED(0.00)[freebsd-arm@freebsd.org];
 NEURAL_SPAM_MEDIUM(0.45)[0.453]; RCPT_COUNT_ONE(0.00)[1];
 DMARC_NA(0.00)[zeppelin.net]; NEURAL_SPAM_LONG(0.32)[0.316];
 MID_CONTAINS_FROM(1.00)[]; FROM_EQ_ENVFROM(0.00)[];
 RCVD_TLS_LAST(0.00)[]; R_DKIM_NA(0.00)[];
 ASN(0.00)[asn:7922, ipnet:75.144.0.0/13, country:US];
 MIME_TRACE(0.00)[0:+]; RCVD_COUNT_TWO(0.00)[2]
X-BeenThere: freebsd-arm@freebsd.org
X-Mailman-Version: 2.1.33
Precedence: list
List-Id: "Porting FreeBSD to ARM processors." <freebsd-arm.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-arm>,
 <mailto:freebsd-arm-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arm/>
List-Post: <mailto:freebsd-arm@freebsd.org>
List-Help: <mailto:freebsd-arm-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-arm>,
 <mailto:freebsd-arm-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 04 Aug 2020 22:37:57 -0000

On Mon, 03 Aug 2020 10:44:01 -0700,
Ronald Klop wrote:
> 
> On Mon, 03 Aug 2020 18:59:26 +0200, Josh Howard <bsd@zeppelin.net> wrote:
> 
> > This one has been sort of a pain to narrow down, but on any of:
> > RockPro64,
> > RockPi4b, or RPI4, if I run net-mgmt/unifi eventually the host just hard
> > locks. Nothing over serial, nothing interesting in the logs, no
> > other hints,
> > so it's not clear what precisely is causing it. For those
> > unfamiliar, unifi
> > runs both a Java app and a mongodb server. I've tried with openjdk8
> > (their
> > only supports version) and openjdk11, neither one made any
> > difference. I'm
> > not totally sure how a userland app like this could cause this to happen,
> > but it's getting consistent that it eventually does kill my host.
> > 
> > Any ideas or hints would be great!
> 
> 
> I had the same problem. The default amount of nmbclusters is too
> low. If they are full the OS becomes very unresponsive.
> 
> I run this script hourly. It doubles the amount of nmbclusters if more
> than half are occupied.
> 
> @hourly bin/nmbclustercheck.sh
> [root@rpi3 ~]# more bin/nmbclustercheck.sh
> #! /bin/sh
> 
> LINE=$( netstat -m | grep "mbuf clusters" | cut -d ' ' -f 1 )
> CURRENT=$( echo $LINE | cut -d '/' -f 1 )
> MAX=$( echo $LINE | cut -d '/' -f 4 )
> 
> if test $CURRENT -gt $(( $MAX / 2 ))
> then
>         NEW_MAX=$(( $MAX * 2 ))
>         echo Increase kern.upc.nmbclusters from $MAX to $NEW_MAX
>         sysctl kern.ipc.nmbclusters=$NEW_MAX
> fi
> 
> 
> Current amount after 14 days of uptime:
> [root@rpi3 ~]# sysctl kern.ipc.nmbclusters
> kern.ipc.nmbclusters: 19250
> 

Thanks for the lead!

I did attempt this, sadly it didn't change anything. I graphed the
nmbcluster usage over about 12 hours, but at some point the system
simply hanged and there was no recovering short of a hard reboot. The
number of used clusters did increase gradually, but never got close
to the limit. I agree it does seem likely to be somehow related to
some resource exhaustion, but just not getting any indication of what
it is.