From owner-freebsd-stable@freebsd.org Mon Jul 23 15:13:03 2018 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9A777104D88C for ; Mon, 23 Jul 2018 15:13:03 +0000 (UTC) (envelope-from Mark.Martinec+freebsd@ijs.si) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 327F680CCB for ; Mon, 23 Jul 2018 15:13:03 +0000 (UTC) (envelope-from Mark.Martinec+freebsd@ijs.si) Received: by mailman.ysv.freebsd.org (Postfix) id E6D1D104D889; Mon, 23 Jul 2018 15:13:02 +0000 (UTC) Delivered-To: stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C1C2E104D888 for ; Mon, 23 Jul 2018 15:13:02 +0000 (UTC) (envelope-from Mark.Martinec+freebsd@ijs.si) Received: from mail.ijs.si (mail.ijs.si [IPv6:2001:1470:ff80::25]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4CAA280CC9 for ; Mon, 23 Jul 2018 15:13:02 +0000 (UTC) (envelope-from Mark.Martinec+freebsd@ijs.si) Received: from amavis-ori.ijs.si (localhost [IPv6:::1]) by mail.ijs.si (Postfix) with ESMTP id 41Z4k40gnHz4My for ; Mon, 23 Jul 2018 17:13:00 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ijs.si; h= user-agent:message-id:organization:subject:subject:from:from :date:date:content-transfer-encoding:content-type:content-type :mime-version:received:received:received:received; s=jakla4; t= 1532358776; x=1534950777; bh=TVJdS4h9rzye+sE1Fp6VoUP6ggsbF3i46tE iNFwx3Ks=; b=Dj8Xl+C6oYJFX4jkq5CN6UCiEBiDhZytvVSl25cKHh18amn559f YiR0SzTOaw/TMcLnD18ET7kQMUvzpI8aCZSsZvZ9pfmGXUUrirDS/ujjbJEzahBA F9dzMsbFMy3Z+AMe4xc3jIijFQaLaN3yd2RRltPhus/Z2MxWWojsma3k= X-Virus-Scanned: amavisd-new at ijs.si Received: from mail.ijs.si ([IPv6:::1]) by amavis-ori.ijs.si (mail.ijs.si [IPv6:::1]) (amavisd-new, port 10026) with LMTP id QJOjFiBgOlnB for ; Mon, 23 Jul 2018 17:12:56 +0200 (CEST) Received: from mildred.ijs.si (mailbox.ijs.si [IPv6:2001:1470:ff80::143:1]) by mail.ijs.si (Postfix) with ESMTP id 41Z4k02KXZz4Mx for ; Mon, 23 Jul 2018 17:12:56 +0200 (CEST) Received: from nabiralnik.ijs.si (nabiralnik.ijs.si [IPv6:2001:1470:ff80::80:16]) by mildred.ijs.si (Postfix) with ESMTP id 41Z4k024D3z32 for ; Mon, 23 Jul 2018 17:12:56 +0200 (CEST) Received: from neli.ijs.si (2001:1470:ff80:88:21c:c0ff:feb1:8c91) by nabiralnik.ijs.si with HTTP (HTTP/1.1 POST); Mon, 23 Jul 2018 17:12:56 +0200 MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Date: Mon, 23 Jul 2018 17:12:56 +0200 From: Mark Martinec To: stable@FreeBSD.org Subject: All the memory eaten away by ZFS 'solaris' malloc - on 11.1-R amd64 Organization: Jozef Stefan Institute Message-ID: <1a039af7758679ba1085934b4fb81b57@ijs.si> X-Sender: Mark.Martinec+freebsd@ijs.si User-Agent: Roundcube Webmail/1.3.1 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Jul 2018 15:13:03 -0000 After upgrading an older AMD host from FreeBSD 10.3 to 11.1-RELEASE-p11 (amd64), ZFS is gradually eating up all memory, so that it crashes every few days when the memory is completely exhausted (after swapping heavily for a couple of hours). This machine has only 4 GB of memory. After capping up the ZFS ARC to 1.8 GB the machine can now stay up a bit longer, but in four days all the memory is used up. The machine is lightly loaded, it runs a bind resolver and a lightly used web server, the ps output does not show any excessive memory use by any process. During the last survival period I ran vmstat -m every second and logged results. What caught my eye was the 'solaris' entry, which seems to explain all the exhaustion. The MemUse for the solaris entry starts modestly, e.g. after a few hours of uptime: $ vmstat -m : Type InUse MemUse HighUse Requests Size(s) solaris 3141552 225178K - 12066929 16,32,64,128,256,512,1024,2048,4096,8192,16384,32768 ... but this number keeps steadily growing. After about four days, shortly before a crash, it grew to 2.5 GB, which gets dangerously close to all the available memory: solaris 39359484 2652696K - 234986296 16,32,64,128,256,512,1024,2048,4096,8192,16384,32768 Plotting the 'solaris' MemUse entry vs. wall time in seconds, one can see a steady linear growth, about 25 MB per hour. On a fine-resolution small scale the step size seems to be one small step increase per about 6 seconds. All steps are small, but not all are the same size. The only thing (in my mind) that distinguishes this host from others running 11.1 seems to be that one of the two ZFS pools is down because its disk is broken. This is a scratch data pool, not otherwise in use. The pool with the OS is healthy. The syslog shows entries like the following periodically: Jul 23 16:48:49 xxx ZFS: vdev state changed, pool_guid=15371508659919408885 vdev_guid=11732693005294113354 Jul 23 16:49:09 xxx ZFS: vdev state changed, pool_guid=15371508659919408885 vdev_guid=11732693005294113354 Jul 23 16:55:34 xxx ZFS: vdev state changed, pool_guid=15371508659919408885 vdev_guid=11732693005294113354 The 'zpool status -v' on this pool shows: pool: stuff state: UNAVAIL status: One or more devices could not be opened. There are insufficient replicas for the pool to continue functioning. action: Attach the missing device and online it using 'zpool online'. see: http://illumos.org/msg/ZFS-8000-3C scan: none requested config: NAME STATE READ WRITE CKSUM stuff UNAVAIL 0 0 0 11732693005294113354 UNAVAIL 0 0 0 was /dev/da2 The same machine with this broken pool could previously survive indefinitely under FreeBSD 10.3 . So, could this be the reason for memory depletion? Any fixes for that? Any more tests suggested to perform before I try to get rid of this pool? Mark