From owner-freebsd-bugs@freebsd.org Fri Aug 17 13:33:31 2018 Return-Path: Delivered-To: freebsd-bugs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 46F37106F034 for ; Fri, 17 Aug 2018 13:33:31 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id D68A67DC0F for ; Fri, 17 Aug 2018 13:33:30 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: by mailman.ysv.freebsd.org (Postfix) id 9B0FA106F030; Fri, 17 Aug 2018 13:33:30 +0000 (UTC) Delivered-To: bugs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 77715106F02F for ; Fri, 17 Aug 2018 13:33:30 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.ysv.freebsd.org (mxrelay.ysv.freebsd.org [IPv6:2001:1900:2254:206a::19:3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mxrelay.ysv.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 190F17DC0B for ; Fri, 17 Aug 2018 13:33:30 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.ysv.freebsd.org (Postfix) with ESMTPS id 4E36A1E4BD for ; Fri, 17 Aug 2018 13:33:29 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id w7HDXTBG068116 for ; Fri, 17 Aug 2018 13:33:29 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id w7HDXTcI068115 for bugs@FreeBSD.org; Fri, 17 Aug 2018 13:33:29 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 230704] All the memory eaten away by ZFS 'solaris' malloc Date: Fri, 17 Aug 2018 13:33:29 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 11.2-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: Mark.Martinec@ijs.si X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Aug 2018 13:33:31 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D230704 Bug ID: 230704 Summary: All the memory eaten away by ZFS 'solaris' malloc Product: Base System Version: 11.2-RELEASE Hardware: Any OS: Any Status: New Severity: Affects Some People Priority: --- Component: kern Assignee: bugs@FreeBSD.org Reporter: Mark.Martinec@ijs.si Affected: ZFS on 11.2-RELEASE and 11.1-RELEASE-p11 (but not on 10.3). Running commands like 'zpool list', 'zpool status' or 'zpool iostat' on a defunct pool (with broken underlying disks) leaks memory. When such command runs frequently (like by a monitoring tool 'telegraf'), in a couple of days the system runs out of memory, applications start swapping and eventually everything grinds to a standstill, requiring a forced reboot. In few days, shortly before a freeze, the vmstat -m shows the 'solaris' malloc approaching the total size of the memory (prior to that this number was steadily growing linearly): $ vmstat -m : Type InUse MemUse HighUse Requests Size(s) solaris 39359484 2652696K - 234986296 ... How to repeat: # create a test pool on md mdconfig -a -t swap -s 1Gb gpart create -s gpt /dev/md0 gpart add -t freebsd-zfs -a 4k /dev/md0 zpool create test /dev/md0p1 # destroy the disk underneath the pool, making it "unavailable" mdconfig -d -u 0 -o force reboot now (before a reboot the trouble does not start yet) Now run 'zpool list' periodically, monitoring the growth of the 'solaris' malloc zone: (while true; do zpool list >/dev/null; vmstat -m | \ fgrep solaris; sleep 0.5; done) | awk '{print $2-a; a=3D$2}' 12224540 2509 3121 5022 2507 1834 2508 2505 As suggested by Mark Johnston, here is a dtrace https://www.ijs.si/usr/mark/tmp/dtrace-cmd.out.bz2 from the following command: # dtrace -c "zpool list -Hp" -x temporal=3Doff -n ' dtmalloc::solaris:malloc /pid =3D=3D $target/{@allocs[stack(), args[3]] =3D count()} dtmalloc::solaris:free /pid =3D=3D $target/{@frees[stack(), args[3]] =3D count();}' This will record all allocations and frees from a single instance of "zpool list". Andriy Gapon wrote on the mailing list: I see one memory leak, not sure if it's the only one. It looks like vdev_geom_read_config() leaks all parsed vdev nvlist-s but the last. The problems seems to come from r316760. Before that commit the function would return upon finding the first valid config, but now it keeps iterating. The memory leak should not be a problem when vdev-s are probed sufficiently rarely, but it appears that with an unhealthy pool the probing can happen much more frequently (e.g., every time pools are listed). The whole discussion leading to the above findings is on the stable@freebsd.org mailing list, 2018-07-23 to 2018-08-14, subject: "All the memory eaten away by ZFS 'solaris' malloc - on 11.1-R amd64" --=20 You are receiving this mail because: You are the assignee for the bug.=