From nobody Fri Aug 4 08:31:04 2023 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4RHJq14BXtz4gqT2 for ; Fri, 4 Aug 2023 08:31:57 +0000 (UTC) (envelope-from shrikanth07@gmail.com) Received: from mail-oo1-xc32.google.com (mail-oo1-xc32.google.com [IPv6:2607:f8b0:4864:20::c32]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4RHJq12Mxmz4CpV; Fri, 4 Aug 2023 08:31:57 +0000 (UTC) (envelope-from shrikanth07@gmail.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-oo1-xc32.google.com with SMTP id 006d021491bc7-56475c1b930so309793eaf.0; Fri, 04 Aug 2023 01:31:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1691137916; x=1691742716; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=OEXoWo03rm6awV3YelmRM6lxKh9eXYNAQnC4MsRhr8w=; b=jZe87ckF5jfZzlMaQH2cVOKrReDIAg+JWXNHIs8p0IpYzliqgNY9lCa6K5N48DesyN iKAlFhWzqPYJ7i0xaW6AGRPLd+7izms/+BGqKH7IkWNHups3ldK1AsHU50Ff9rHelZC+ d5Xy+9j8jLmbvierzE/Zict2cUzTYTUUYIErUyTBehJCTPPfSIuu0p6p2ehM4z44ShCB HYDpMrFWQY7GZW3PVqtza0QA8wMLd+BFD7oKsUshwX0NoITwnFv54QAg0w2KI7iRWzFS RPxKRbYzAb5ZS8Wxs7i3eXLZlbnzrQvP8g8Jpzcgz0a+iaoUVO8Y8e+u3lDVyEcFzYmz PESg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691137916; x=1691742716; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=OEXoWo03rm6awV3YelmRM6lxKh9eXYNAQnC4MsRhr8w=; b=SkH/k2bLDk4I5V24Ql4S95Vy6Q5FqFxfwl3yHCmbgEarXeMJOW36HcZiK2trP9iN2s Jv9RzBD898j7M0l1im1A8jF4IBws3TZPxtdu3CHTQGHcsw+R8Z78ZuawVBq9Lahjy7wQ /XFwjpEcltUT/y0tPQTy0GELQ1IopW+tySidST3JFPDlxq1vVYq1+C/guwaD+8UXtfTv 2knf+irO39hpwHpeNDhdhO3CkyEt2/bkAKwVA0M4mtmA1t6idQT3B8Oa8xwHxwZ7M4zM ng6MOzY6zneZX+vxlaANI7F/MUnoifmDITAfsQHBvgT7T851YVv3Z6shEY/j9qJGZaiR BbCw== X-Gm-Message-State: ABy/qLaX5rO1VfbMZKbNCH3CLGfgy1bMAYLkeB5M3tS/P/FLvfwj7DW7 UaL0btx3tPZoJ+9WpBfiuoIcp07vPGw+d549AMSL8f1ECOQ= X-Google-Smtp-Source: APBJJlHqKqzJqfN/I8wmXUYGluY0iITs6r/GgGw+wPajwHAOmJ4UZKqHgUm2xEJi/0j8lIhdsUTW1jCxo49h0HC/5Ds= X-Received: by 2002:a05:6820:2201:b0:560:b01a:653d with SMTP id cj1-20020a056820220100b00560b01a653dmr18196648oob.0.1691137915832; Fri, 04 Aug 2023 01:31:55 -0700 (PDT) List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@freebsd.org MIME-Version: 1.0 References: In-Reply-To: From: Shrikanth Kamath Date: Fri, 4 Aug 2023 01:31:04 -0700 Message-ID: Subject: Re: How to watch Active pagequeue transitions with DTrace in the vm layer To: Mark Johnston Cc: freebsd-hackers@freebsd.org Content-Type: multipart/alternative; boundary="000000000000bd4d49060214baf9" X-Rspamd-Queue-Id: 4RHJq12Mxmz4CpV X-Spamd-Bar: ---- X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US] --000000000000bd4d49060214baf9 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Thanks and appreciate your response Mark, a follow up query, so the system was probably at some point in the state where there were no pages in the laundry or even had pages backed by swap (refer the top snapshot below) . The two heavy applications with 12G resident + Wired + Buf already caused the Free to drop close to the minimum threshold, any further memory demand would have the pages of these applications move to laundry or swap, then would transition to Inactive or Laundry, later when these pages were referenced back the pagedaemon would move them back to the Active? Is that a correct understanding? last pid: 20494; load averages: 0.38, 0.73, 0.80 up 0+01:49:05 21:14:49 Mem: 9439M Active, 3638M Inact, 2644M Wired, 888M Buf, 413M Free Swap: 8192M Total, 8192M Free PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 12043 root 5 22 0 9069M 7752M kqread 2 49:37 6.25% app1 12051 root 1 20 0 2704M 1964M select 3 0:41 0.00% app2 So if I run DTrace probe on vm_page_enqueue I will probably see that pagedaemon might be the thread that moved all those pages to Active? Is there a way to associate these to the process which referenced these pages Regards, -- Shrikanth R K On Thu, Aug 3, 2023 at 8:14=E2=80=AFAM Mark Johnston wr= ote: > On Thu, Aug 03, 2023 at 12:32:22AM -0700, Shrikanth Kamath wrote: > > A background on the query, > > > > Trying to catch a memory =E2=80=9Cspike=E2=80=9D trigger using DTrace, = refer here two > =E2=80=9Ctop=E2=80=9D > > snapshots captured during a 2 minute window, > > > > last pid: 89900; load averages: 0.75, 0.91, 0.94 up 39+00:37:30 > > 20:03:14 > > > > Mem: 5575M Active, 2152M Inact, 4731M Laundry, 3044M Wired, 1151M Buf, > 382M > > Free > > > > Swap: 8192M Total, 1058M Used, 7134M Free, 12% Inuse > > > > PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMA= ND > > > > 12043 root 5 35 0 11G 9747M kqread 3 > > 128.8H 23.34% app1 > > > > 12051 root 1 20 0 3089M 2274M select 1 > 22:51 > > 0.00% app2 > > > > last pid: 90442; load averages: 1.50, 1.12, 1.02 up 39+00:39:37 > > 20:05:21 > > > > Mem: 8549M Active, 631M Inact, 3340M Laundry, 3159M Wired, 1252M Buf, > 359M > > Free > > > > Swap: 8192M Total, 1894M Used, 6298M Free, 23% Inuse > > > > PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU > COMMAND > > > > 12043 root 5 24 0 11G 9445M kqread 2 > > 128.8H 10.45% app1 > > > > 12051 root 1 20 0 3089M 2173M select 3 > 22:51 > > 0.00% app2 > > > > The spike is ~3G in Active pages, the two large applications have a > > combined resident size of ~12G. The resident size of the applications > > hasn=E2=80=99t changed between these 2 readings, however there is a tar= archive > and > > gzip on a large directory during that window likely causing a reshuffle= . > If > > I count the page allocs and dequeue by execname with DTrace, I see > > tar/vmstat which probably alloc and quickly dequeue, along with a large > > dequeue being undertaken by bufdaemon and pagedaemon. > > > > fbt::vm_page_alloc*:entry > > > > { > > > > @cnt[execname] =3D count(); > > > > } > > > > fbt::vm_page_dequeue:entry > > > > { > > > > @dcnt[execname] =3D count(); > > > > } > > > > Page Alloc > > > > vmstat > > 20222 > > > > tar 2128= 4 > > > > Page Dequeue > > > > vmstat 20114 > > > > bufdaemon 21402 > > > > tar 216= 35 > > > > pagedaemon 360387 > > > > Since the tar / vmstat will not hold the pages in Active, I need to fin= d > > out what application had its pages queued in Active page queue. > > One possibility is that the inactive and laundry queue had previously > contained many referenced pages. Then, when some memory pressure > occurred, the pagedaemon scanned the queues and moved a large number of > pages into the active queue. > > > Is it possible that the system is just moving the LRU pages of these tw= o > > large applications into the inactive queue prior to addressing memory > > pressure? Do these applications need to activate those pages later and > > hence it brings it back into the Active queue? How do I watch this in > > action by using DTrace? Will the following probe catch this trigger? > > > > fbt::vm_page_activate:entry > > > > { > > > > @cnt[execname, pid] =3D count(); > > > > } > > > > tick-10sec > > > > { > > > > printa(@cnt); > > > > printf("ACTIVE[%d] pages\n", `vm_dom[0].vmd_pagequeues[1].pq_cnt= ); > > > > } > > > > *** This system is running only one vmdomain (# sysctl vm.ndomains =E2= =80=93> > > vm.ndomains: 1). > > > > *** running release 12.1, on an amd64 kernel. The physical memory > installed > > is 16G. > > In 12.1, you'd probably want something like: > > fbt::vm_page_enqueue:entry > /args[1] =3D=3D 1 /* PQ_ACTIVE *// > { > ... > } > > since vm_page_unwire(m, PQ_ACTIVE) will also move a page into the active > queue, but your D script above won't catch that. > > I would also look at the "pages reactivated by the page daemon" counter > that appears in vmstat -s output. That'll tell you how many times the > page daemon moved a page from PQ_INACTIVE/PQ_LAUNDRY to PQ_ACTIVE > because it found that the page had been referenced. > --=20 Shrikanth R K --000000000000bd4d49060214baf9 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Thanks and appreciate your response Mark,= a follow up query, so the system was probably at some point in the state w= here there were no pages in the laundry or even had pages backed by swap (r= efer the top snapshot below) . The two heavy applications with 12G resident= =C2=A0+ Wired=C2=A0+ Buf already caused the Free to drop close to the minim= um threshold, any further memory demand would have the pages of these appli= cations move to laundry or swap, then would transition to Inactive or Laund= ry, later when these pages were referenced back the pagedaemon would move t= hem back to the Active? Is that a correct understanding?

last pid: = 20494; =C2=A0load averages: =C2=A00.38, =C2=A00.73, =C2=A00.80 =C2=A0up 0+0= 1:49:05 =C2=A0 =C2=A021:14:49
Mem: 9439M Active, 3638M Inact, 2644M Wir= ed, 888M Buf, 413M Free =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Swap: 8192M Total, 8192M Free

PID USERNAME =C2=A0 =C2=A0THR= PRI NICE =C2=A0 SIZE =C2=A0 =C2=A0RES STATE =C2=A0 =C2=A0C =C2=A0 TIME =C2= =A0 =C2=A0WCPU COMMAND =C2=A0
12043 root =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A05 =C2=A022 =C2=A0 =C2=A00 =C2=A09069M =C2=A07752M kqread =C2=A0 2 =C2=A0= 49:37 =C2=A0 6.25% app1=C2=A0
12051 root =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A01 =C2=A020 =C2=A0 =C2=A00 =C2=A02704M =C2=A01964M select =C2=A0 3 =C2=A0= 0:41 =C2=A0 0.00% app2

So if I run DT= race probe on vm_page_enqueue I will probably see that pagedaemon might be = the thread that moved all those pages to Active? Is there a way to associat= e these to the process which referenced these pages

Regards,
--
Shrikanth R K

On Thu, Aug 3, 2023 at 8:14=E2=80=AFAM Mark Johnston <markj@freebsd.org> wrote:
On Thu, Aug 03, 2023 at 12:32:22AM -0700, Shrikanth Kama= th wrote:
> A background on the query,
>
> Trying to catch a memory =E2=80=9Cspike=E2=80=9D trigger using DTrace,= refer here two =E2=80=9Ctop=E2=80=9D
> snapshots captured during a 2 minute window,
>
> last pid: 89900;=C2=A0 load averages:=C2=A0 0.75,=C2=A0 0.91,=C2=A0 0.= 94=C2=A0 up 39+00:37:30
> 20:03:14
>
> Mem: 5575M Active, 2152M Inact, 4731M Laundry, 3044M Wired, 1151M Buf,= 382M
> Free
>
> Swap: 8192M Total, 1058M Used, 7134M Free, 12% Inuse
>
> PID USERNAME=C2=A0 =C2=A0 THR PRI NICE=C2=A0 =C2=A0SIZE=C2=A0 RES=C2= =A0 =C2=A0STATE=C2=A0 =C2=A0C TIME=C2=A0 =C2=A0 WCPU COMMAND
>
> 12043=C2=A0 =C2=A0 root=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A05=C2=A0 =C2=A0 35=C2=A0 =C2=A0 =C2=A00=C2=A0 =C2=A0 =C2=A0 =C2=A0= 11G=C2=A0 9747M kqread=C2=A0 3
> 128.8H=C2=A0 23.34% app1
>
> 12051=C2=A0 =C2=A0 root=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A01=C2=A0 =C2=A0 20=C2=A0 =C2=A0 =C2=A00=C2=A0 =C2=A0 3089M=C2=A0 2= 274M select=C2=A0 =C2=A01=C2=A0 22:51
>=C2=A0 =C2=A00.00%=C2=A0 =C2=A0 app2
>
> last pid: 90442;=C2=A0 load averages:=C2=A0 1.50,=C2=A0 1.12,=C2=A0 1.= 02=C2=A0 up 39+00:39:37
> 20:05:21
>
> Mem: 8549M Active, 631M Inact, 3340M Laundry, 3159M Wired, 1252M Buf, = 359M
> Free
>
> Swap: 8192M Total, 1894M Used, 6298M Free, 23% Inuse
>
> PID=C2=A0 =C2=A0USERNAME=C2=A0 =C2=A0THR PRI NICE=C2=A0 =C2=A0SIZE=C2= =A0 =C2=A0 RES STATE=C2=A0 =C2=A0C TIME=C2=A0 =C2=A0WCPU=C2=A0 COMMAND
>
> 12043=C2=A0 =C2=A0 =C2=A0 =C2=A0root=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 5=C2=A0 24=C2=A0 =C2=A0 0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A01= 1G=C2=A0 9445M kqread=C2=A0 2
> 128.8H 10.45%=C2=A0 app1
>
> 12051=C2=A0 =C2=A0 =C2=A0 =C2=A0root=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 1=C2=A0 20=C2=A0 =C2=A0 0=C2=A0 =C2=A0 =C2=A03089M=C2=A0 217= 3M select=C2=A0 =C2=A03=C2=A0 22:51
>=C2=A0 =C2=A00.00%=C2=A0 =C2=A0 app2
>
> The spike is ~3G in Active pages, the two large applications have a > combined resident size of ~12G. The resident size of the applications<= br> > hasn=E2=80=99t changed between these 2 readings, however there is a ta= r archive and
> gzip on a large directory during that window likely causing a reshuffl= e. If
> I count the page allocs and dequeue by execname with DTrace, I see
> tar/vmstat which probably alloc and quickly dequeue, along with a larg= e
> dequeue being undertaken by bufdaemon and pagedaemon.
>
> fbt::vm_page_alloc*:entry
>
> {
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0@cnt[execname] =3D count();
>
> }
>
> fbt::vm_page_dequeue:entry
>
> {
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0@dcnt[execname] =3D count();
>
> }
>
> Page Alloc
>
>=C2=A0 =C2=A0vmstat
> 20222
>
>=C2=A0 =C2=A0tar=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 21284
>
> Page Dequeue
>
>=C2=A0 =C2=A0vmstat=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 20114=
>
>=C2=A0 =C2=A0bufdaemon=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A021402
>
>=C2=A0 =C2=A0tar=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A021635
>
>=C2=A0 =C2=A0pagedaemon=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0360387
>
> Since the tar / vmstat will not hold the pages in Active, I need to fi= nd
> out what application had its pages queued in Active page queue.

One possibility is that the inactive and laundry queue had previously
contained many referenced pages.=C2=A0 Then, when some memory pressure
occurred, the pagedaemon scanned the queues and moved a large number of
pages into the active queue.

> Is it possible that the system is just moving the LRU pages of these t= wo
> large applications into the inactive queue prior to addressing memory<= br> > pressure?=C2=A0 Do these applications need to activate those pages lat= er and
> hence it brings it back into the Active queue? How do I watch this in<= br> > action by using DTrace? Will the following probe catch this trigger? >
> fbt::vm_page_activate:entry
>
> {
>
> @cnt[execname, pid] =3D count();
>
> }
>
> tick-10sec
>
> {
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 printa(@cnt);
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 printf("ACTIVE[%d] pages\n", `vm_= dom[0].vmd_pagequeues[1].pq_cnt);
>
> }
>
> *** This system is running only one vmdomain (# sysctl vm.ndomains =E2= =80=93>
> vm.ndomains: 1).
>
> *** running release 12.1, on an amd64 kernel. The physical memory inst= alled
> is 16G.

In 12.1, you'd probably want something like:

fbt::vm_page_enqueue:entry
/args[1] =3D=3D 1 /* PQ_ACTIVE *//
{
...
}

since vm_page_unwire(m, PQ_ACTIVE) will also move a page into the active queue, but your D script above won't catch that.

I would also look at the "pages reactivated by the page daemon" c= ounter
that appears in vmstat -s output.=C2=A0 That'll tell you how many times= the
page daemon moved a page from PQ_INACTIVE/PQ_LAUNDRY to PQ_ACTIVE
because it found that the page had been referenced.


--
= Shrikanth R K
--000000000000bd4d49060214baf9--