Date: Tue, 17 Mar 2015 01:24:04 +0200 From: Konstantin Belousov <kostikbel@gmail.com> To: J David <j.david.lists@gmail.com> Cc: freebsd-stable <freebsd-stable@freebsd.org> Subject: Re: Significant memory leak in 9.3p10? Message-ID: <20150316232404.GM2379@kib.kiev.ua> In-Reply-To: <CABXB=RRhynY5FWvw3tHrLFRyitTemavXYLBpev5Mjs_kPqimXA@mail.gmail.com>
index | next in thread | previous in thread | raw e-mail
On Mon, Mar 16, 2015 at 06:59:33PM -0400, J David wrote: > Recently we have seen a large-scale memory leak on amd64 machines > running FreeBSD 9.3-RELEASE-p10. > > This was first observed on 9.3p2 but has since shown up all the way through p10. > > Here's what the header of top shows: > > last pid: 32329; load averages: 0.00, 0.01, 0.21 up 3+15:37:29 22:34:04 > 25 processes: 2 running, 22 sleeping, 1 waiting > CPU: % user, % nice, % system, % interrupt, % idle > Mem: 4072M Active, 895M Inact, 1284M Wired, 125M Cache, 826M Buf, 1521M Free > Swap: 1024M Total, 874M Used, 149M Free, 85% Inuse > > About 4G actively being used, another 895M inactive, and another 874M > in swap. So it seems like this is a user-space leak, rather than a > kernel-space leak. > > At the time of measurement, this machine was not doing anything and > every possible process had been killed trying to find a culprit. The > entire output of "ps axlww" is: > > UID PID PPID CPU PRI NI VSZ RSS MWCHAN STAT TT TIME COMMAND > 0 0 0 0 -52 0 0 224 - DLs ?? 0:00.82 [kernel] > 0 1 0 0 20 0 6280 556 wait SLs ?? 0:00.57 /sbin/init -- > 0 2 0 0 -16 0 0 16 pftm DL ?? 0:00.85 [pfpurge] > 0 3 0 0 -16 0 0 16 waiting_ DL ?? 0:00.00 > [sctp_iterator] > 0 4 0 0 -16 0 0 16 - DL ?? 0:00.00 [xpt_thrd] > 0 5 0 0 -16 0 0 16 psleep DL ?? 0:28.85 [pagedaemon] > 0 6 0 0 -16 0 0 16 psleep DL ?? 0:45.03 [vmdaemon] > 0 7 0 0 -16 0 0 16 pollid DL ?? 0:00.23 [idlepoll] > 0 8 0 0 155 0 0 16 pgzero DL ?? 0:00.00 [pagezero] > 0 9 0 0 -16 0 0 16 psleep DL ?? 0:00.83 [bufdaemon] > 0 10 0 0 -16 0 0 16 audit_wo DL ?? 0:00.00 [audit] > 0 11 0 0 155 0 0 32 - RL ?? 8317:13.37 [idle] > 0 12 0 0 -76 0 0 240 - WL ?? 301:43.54 [intr] > 0 13 0 0 -8 0 0 48 - DL ?? 0:09.89 [geom] > 0 14 0 0 -16 0 0 16 - DL ?? 2:58.88 [yarrow] > 0 15 0 0 -68 0 0 64 - DL ?? 0:02.32 [usb] > 0 16 0 0 -16 0 0 16 vlruwt DL ?? 0:06.35 [vnlru] > 0 17 0 0 16 0 0 16 syncer DL ?? 5:28.89 [syncer] > 0 18 0 0 -16 0 0 16 sdflush DL ?? 0:10.27 > [softdepflush] > 0 19 0 0 -16 0 0 16 - DL ?? 0:55.09 [racctd] > 0 830 1 0 20 0 45348 2396 wait Is u0 0:00.07 > login [pam] (login) > 500 32269 830 0 20 0 14556 2428 wait S u0 0:00.09 -sh (sh) > 500 32340 32269 0 20 0 16296 1908 - R+ u0 0:00.00 ps axlww > > Since the issue doesn't seem related to kernel memory usage, vmstat -m > and -z have been skipped, but nothing jumps out as using gigs of RAM; > they do appear consistent with 1284M of wired memory, which is not > unreasonable for the affected machines' tuning and workload. > > The only user-space processes running are login, sh, and ps. So where > did 5.5G of userspace RAM go? > > The only other potentially useful information is that when this > happens, shutting down the system will hang for about ten minutes. > > $ sudo halt -p > Waiting (max 60 seconds) for system process `vnlru' to stop...done > Waiting (max 60 seconds) for system process `bufdaemon' to stop...done > Waiting (max 60 seconds) for system process `syncer' to stop... > Syncing disks, vnodes remaining...0 0 0 0 0 0 0 0 0 done > All buffers synced. <----- 10 MINUTE HANG AFTER PRINTING THIS > Uptime: 3d15h56m32s > usbus0: Controller shutdown > uhub0: at usbus0, port 1, addr 1 (disconnected) > usbus0: controller did not stop > usbus0: Controller shutdown complete > acpi0: Powering system off > Connection closed by foreign host. > > So it seems like somewhere after "All buffers synced" and printing the > uptime, it's very slowly unwinding whatever is using up all that RAM > and swap. > > Does anyone have any idea what might be causing this or how to fix/prevent it? There are a lot of possibilities to create persistent anonymous shared memory objects. Not complete list is tmpfs mounts, swap-backed md disks, sysv shared memory, possibly posix shared memory (I do not remember which implementation is used in stable/9). I quite possible missed some object types. Also note that active/inactive can be explained by cached file pages, and only swap usage suggests that it might be something persisent from the list above.home | help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20150316232404.GM2379>
