From owner-freebsd-arm@freebsd.org Tue Aug 4 22:37:57 2020 Return-Path: Delivered-To: freebsd-arm@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id B30BF3A1EF8 for ; Tue, 4 Aug 2020 22:37:57 +0000 (UTC) (envelope-from bsd@zeppelin.net) Received: from dazed.zeppelin.net (dazed.zeppelin.net [75.144.17.114]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4BLqQX4WBmz40BR for ; Tue, 4 Aug 2020 22:37:56 +0000 (UTC) (envelope-from bsd@zeppelin.net) Received: from rp64bsd.zeppelin.net.zeppelin.net (pfsense.zeppelin.net [75.144.17.117]) by dazed.zeppelin.net (Postfix) with ESMTP id 7B47A19B7FE for ; Tue, 4 Aug 2020 15:37:50 -0700 (PDT) Date: Tue, 04 Aug 2020 15:37:48 -0700 Message-ID: <87k0yevxz7.wl-bsd@zeppelin.net> From: Josh Howard To: freebsd-arm@freebsd.org Subject: Re: ARM64 hosts eventually lockup running net-mgmt/unifi In-Reply-To: References: <87zh7bis29.wl-bsd@zeppelin.net> User-Agent: Wanderlust/2.15.9 (Almost Unreal) Emacs/28.0 Mule/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-Rspamd-Queue-Id: 4BLqQX4WBmz40BR X-Spamd-Bar: ++ Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=fail (mx1.freebsd.org: domain of bsd@zeppelin.net does not designate 75.144.17.114 as permitted sender) smtp.mailfrom=bsd@zeppelin.net X-Spamd-Result: default: False [2.90 / 15.00]; ARC_NA(0.00)[]; R_SPF_FAIL(1.00)[-all]; FROM_HAS_DN(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; NEURAL_SPAM_SHORT(0.23)[0.229]; MIME_GOOD(-0.10)[text/plain]; TO_DN_NONE(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[freebsd-arm@freebsd.org]; NEURAL_SPAM_MEDIUM(0.45)[0.453]; RCPT_COUNT_ONE(0.00)[1]; DMARC_NA(0.00)[zeppelin.net]; NEURAL_SPAM_LONG(0.32)[0.316]; MID_CONTAINS_FROM(1.00)[]; FROM_EQ_ENVFROM(0.00)[]; RCVD_TLS_LAST(0.00)[]; R_DKIM_NA(0.00)[]; ASN(0.00)[asn:7922, ipnet:75.144.0.0/13, country:US]; MIME_TRACE(0.00)[0:+]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: freebsd-arm@freebsd.org X-Mailman-Version: 2.1.33 Precedence: list List-Id: "Porting FreeBSD to ARM processors." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Aug 2020 22:37:57 -0000 On Mon, 03 Aug 2020 10:44:01 -0700, Ronald Klop wrote: > > On Mon, 03 Aug 2020 18:59:26 +0200, Josh Howard wrote: > > > This one has been sort of a pain to narrow down, but on any of: > > RockPro64, > > RockPi4b, or RPI4, if I run net-mgmt/unifi eventually the host just hard > > locks. Nothing over serial, nothing interesting in the logs, no > > other hints, > > so it's not clear what precisely is causing it. For those > > unfamiliar, unifi > > runs both a Java app and a mongodb server. I've tried with openjdk8 > > (their > > only supports version) and openjdk11, neither one made any > > difference. I'm > > not totally sure how a userland app like this could cause this to happen, > > but it's getting consistent that it eventually does kill my host. > > > > Any ideas or hints would be great! > > > I had the same problem. The default amount of nmbclusters is too > low. If they are full the OS becomes very unresponsive. > > I run this script hourly. It doubles the amount of nmbclusters if more > than half are occupied. > > @hourly bin/nmbclustercheck.sh > [root@rpi3 ~]# more bin/nmbclustercheck.sh > #! /bin/sh > > LINE=$( netstat -m | grep "mbuf clusters" | cut -d ' ' -f 1 ) > CURRENT=$( echo $LINE | cut -d '/' -f 1 ) > MAX=$( echo $LINE | cut -d '/' -f 4 ) > > if test $CURRENT -gt $(( $MAX / 2 )) > then > NEW_MAX=$(( $MAX * 2 )) > echo Increase kern.upc.nmbclusters from $MAX to $NEW_MAX > sysctl kern.ipc.nmbclusters=$NEW_MAX > fi > > > Current amount after 14 days of uptime: > [root@rpi3 ~]# sysctl kern.ipc.nmbclusters > kern.ipc.nmbclusters: 19250 > Thanks for the lead! I did attempt this, sadly it didn't change anything. I graphed the nmbcluster usage over about 12 hours, but at some point the system simply hanged and there was no recovering short of a hard reboot. The number of used clusters did increase gradually, but never got close to the limit. I agree it does seem likely to be somehow related to some resource exhaustion, but just not getting any indication of what it is.