From owner-freebsd-stable@FreeBSD.ORG Mon Mar 16 22:59:34 2015 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 2DEA98CF for ; Mon, 16 Mar 2015 22:59:34 +0000 (UTC) Received: from mail-ie0-x22b.google.com (mail-ie0-x22b.google.com [IPv6:2607:f8b0:4001:c03::22b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id E9A613B8 for ; Mon, 16 Mar 2015 22:59:33 +0000 (UTC) Received: by ieclw3 with SMTP id lw3so184779309iec.2 for ; Mon, 16 Mar 2015 15:59:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:date:message-id:subject:from:to:content-type; bh=DUraItg9ujHWhwiD55J79m+L3tFs3VSa0KBP6BqJE4M=; b=uge6bRkuLbdhtbcmOnIzojdAbaEymbGOLp6OPCqwUNeaaxlOUF02KpAViAvONeVLuR GS9VmFSmpkfFsHCak0uwAlbnxbewBd3LfAuO9qsS6meAG19K4RHm9mvniosbjHSHqVYE Cu21QeApz/Xg1omXog4pHm5Bw/VspjmoCK0EPVOXdrcrrlHj0NLC4p1QtJJI3wGGPZ2x CLBn4rt4QuZkN8lWbiq+Uduvo0FHhPLcbQfABHadTOajagfDbyc4SuDcdpAa21UXrkjo C4At30OqmsMkbKtoTrtfnu+V3FpKRVb36VKasmyG7SwhUoPmSMyI2sXa3Q3c5IKsyBF8 tE2A== MIME-Version: 1.0 X-Received: by 10.107.8.215 with SMTP id h84mr88026331ioi.89.1426546773464; Mon, 16 Mar 2015 15:59:33 -0700 (PDT) Sender: jdavidlists@gmail.com Received: by 10.36.67.139 with HTTP; Mon, 16 Mar 2015 15:59:33 -0700 (PDT) Date: Mon, 16 Mar 2015 18:59:33 -0400 X-Google-Sender-Auth: _eCVGr_f5F4VNKbzofYO_4zjkII Message-ID: Subject: Significant memory leak in 9.3p10? From: J David To: freebsd-stable Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 16 Mar 2015 22:59:34 -0000 Recently we have seen a large-scale memory leak on amd64 machines running FreeBSD 9.3-RELEASE-p10. This was first observed on 9.3p2 but has since shown up all the way through p10. Here's what the header of top shows: last pid: 32329; load averages: 0.00, 0.01, 0.21 up 3+15:37:29 22:34:04 25 processes: 2 running, 22 sleeping, 1 waiting CPU: % user, % nice, % system, % interrupt, % idle Mem: 4072M Active, 895M Inact, 1284M Wired, 125M Cache, 826M Buf, 1521M Free Swap: 1024M Total, 874M Used, 149M Free, 85% Inuse About 4G actively being used, another 895M inactive, and another 874M in swap. So it seems like this is a user-space leak, rather than a kernel-space leak. At the time of measurement, this machine was not doing anything and every possible process had been killed trying to find a culprit. The entire output of "ps axlww" is: UID PID PPID CPU PRI NI VSZ RSS MWCHAN STAT TT TIME COMMAND 0 0 0 0 -52 0 0 224 - DLs ?? 0:00.82 [kernel] 0 1 0 0 20 0 6280 556 wait SLs ?? 0:00.57 /sbin/init -- 0 2 0 0 -16 0 0 16 pftm DL ?? 0:00.85 [pfpurge] 0 3 0 0 -16 0 0 16 waiting_ DL ?? 0:00.00 [sctp_iterator] 0 4 0 0 -16 0 0 16 - DL ?? 0:00.00 [xpt_thrd] 0 5 0 0 -16 0 0 16 psleep DL ?? 0:28.85 [pagedaemon] 0 6 0 0 -16 0 0 16 psleep DL ?? 0:45.03 [vmdaemon] 0 7 0 0 -16 0 0 16 pollid DL ?? 0:00.23 [idlepoll] 0 8 0 0 155 0 0 16 pgzero DL ?? 0:00.00 [pagezero] 0 9 0 0 -16 0 0 16 psleep DL ?? 0:00.83 [bufdaemon] 0 10 0 0 -16 0 0 16 audit_wo DL ?? 0:00.00 [audit] 0 11 0 0 155 0 0 32 - RL ?? 8317:13.37 [idle] 0 12 0 0 -76 0 0 240 - WL ?? 301:43.54 [intr] 0 13 0 0 -8 0 0 48 - DL ?? 0:09.89 [geom] 0 14 0 0 -16 0 0 16 - DL ?? 2:58.88 [yarrow] 0 15 0 0 -68 0 0 64 - DL ?? 0:02.32 [usb] 0 16 0 0 -16 0 0 16 vlruwt DL ?? 0:06.35 [vnlru] 0 17 0 0 16 0 0 16 syncer DL ?? 5:28.89 [syncer] 0 18 0 0 -16 0 0 16 sdflush DL ?? 0:10.27 [softdepflush] 0 19 0 0 -16 0 0 16 - DL ?? 0:55.09 [racctd] 0 830 1 0 20 0 45348 2396 wait Is u0 0:00.07 login [pam] (login) 500 32269 830 0 20 0 14556 2428 wait S u0 0:00.09 -sh (sh) 500 32340 32269 0 20 0 16296 1908 - R+ u0 0:00.00 ps axlww Since the issue doesn't seem related to kernel memory usage, vmstat -m and -z have been skipped, but nothing jumps out as using gigs of RAM; they do appear consistent with 1284M of wired memory, which is not unreasonable for the affected machines' tuning and workload. The only user-space processes running are login, sh, and ps. So where did 5.5G of userspace RAM go? The only other potentially useful information is that when this happens, shutting down the system will hang for about ten minutes. $ sudo halt -p Waiting (max 60 seconds) for system process `vnlru' to stop...done Waiting (max 60 seconds) for system process `bufdaemon' to stop...done Waiting (max 60 seconds) for system process `syncer' to stop... Syncing disks, vnodes remaining...0 0 0 0 0 0 0 0 0 done All buffers synced. <----- 10 MINUTE HANG AFTER PRINTING THIS Uptime: 3d15h56m32s usbus0: Controller shutdown uhub0: at usbus0, port 1, addr 1 (disconnected) usbus0: controller did not stop usbus0: Controller shutdown complete acpi0: Powering system off Connection closed by foreign host. So it seems like somewhere after "All buffers synced" and printing the uptime, it's very slowly unwinding whatever is using up all that RAM and swap. Does anyone have any idea what might be causing this or how to fix/prevent it? Thanks in advance for any advice!