From owner-freebsd-questions@freebsd.org Sun Jul 23 20:11:45 2017 Return-Path: Delivered-To: freebsd-questions@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9D66BDAC325 for ; Sun, 23 Jul 2017 20:11:45 +0000 (UTC) (envelope-from srs0=mqzm=62=mail.sermon-archive.info=doug@sermon-archive.info) Received: from mail.sermon-archive.info (sermon-archive.info [71.177.216.148]) by mx1.freebsd.org (Postfix) with ESMTP id 8AB666D86A for ; Sun, 23 Jul 2017 20:11:45 +0000 (UTC) (envelope-from srs0=mqzm=62=mail.sermon-archive.info=doug@sermon-archive.info) Received: from [10.0.1.251] (mini [10.0.1.251]) by mail.sermon-archive.info (Postfix) with ESMTPSA id 3xFwQp51Fvz2fjWl; Sun, 23 Jul 2017 13:02:42 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: how to measure what is consuming swap space From: Doug Hardie In-Reply-To: <5a1313d2-007c-9947-b842-f4c825a8c628@zyxst.net> Date: Sun, 23 Jul 2017 13:02:42 -0700 Cc: freebsd-questions@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: <9A6A44EE-395F-4CA9-8B29-A352BEBC9758@mail.sermon-archive.info> References: <5a1313d2-007c-9947-b842-f4c825a8c628@zyxst.net> To: tech-lists X-Mailer: Apple Mail (2.3273) X-Virus-Scanned: clamav-milter 0.99.2 at mail X-Virus-Status: Clean X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 23 Jul 2017 20:11:45 -0000 > On 22 July 2017, at 02:43, tech-lists wrote: >=20 > Hello list, >=20 > What's the best command to see what processes are consuming swap = space? >=20 I don't see anyone else responding. I will not claim to be an expert, = or to even understand all of this, but here is what I went through to = resolve a similar issue over a couple years. I had a system that would occasionally kill a specific process. The = system did a lot of logging so that I never could find the cause of the = problem. Eventually it happened while I was doing a tail of messages = and I saw the message the system had run out of swap space and "somehow" = picked a process to kill. I started monitoring swap space and found = that the system used very little swap for a couple weeks and then = started using it quite quickly, eventually running out. Once the = process was killed, swap usage went down to somewhere between 0 and 1 = percent. Unfortunately, killing that process also killed the usefulness = of the system. So, I used nagios to check swap usage and when it got = over 50% I would pick a convenient time to restart that process. After a bunch of discussion here, someone told me about procstat -v. I = ran procstat -v on that process right after starting it up and there = were a few entries. Most of them seemed reasonable. Once the swap = usage got to 50%, there were a huge number of entries. Most of them = were of type df. Somewhere I read that df types are generally created = by mmaps that are not file backed. They use swap space for backing up = that segment. However, sw types also appear to be swap backed. I don't = know what the difference is in those. procstat -va works, but generates = way too much information to analyze. Its much easier if you can find = the process causing the problem via other approaches first. For = example, I took a system running only an incoming mail server and ran = top to get the current size of the swap file. It was about 3 percent. = Dividing that number by 4K gives you the number of swap pages in use. = There could be that many entries in the procstat -va output with types = sw and df. Its not likely to be that large as may of the allocations = are larger than one page, but the number is daunting. I went through the source for the process causing the issue and there = were no non-file backed mmaps. There were a bunch of file backed mmaps. = Whats even more interesting is this process runs on a large number of = systems and only one ever showed the problem. I never found the real = cause. The developer of the process did a major restructuring of the = code and released a new version. It no longer has the problem. He has = no idea what he could have done that would have fixed it either. I know this is not a good cookbook solution, but that's what I went = through. -- Doug