Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 21 Mar 2023 11:11:30 -0400
From:      Ken Merry <ken@freebsd.org>
To:        hackers@freebsd.org
Subject:   Getting v_wire_count from a kernel core dump?
Message-ID:  <66742036-C8DF-4A13-9D4A-CDA71217E574@freebsd.org>

next in thread | raw e-mail | index | archive | help

--Apple-Mail=_D56B27C3-1350-4263-B0C8-3DAB4A3FCAD4
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=utf-8

I have kernel core dumps from several machines out in the field =
(customer sites) that were out of memory panics, and I=E2=80=99m trying =
to figure out, from the kernel core dumps, whether we=E2=80=99re dealing =
with a potential page leak.

For context, these machines are running stable/13 from April 2021, but =
they do have the fix for this bug:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D256507

Which is this commit in stable/13:

=
https://cgit.freebsd.org/src/commit/?id=3D6094749a1a5dafb8daf98deab23fc968=
070bc695

On a running system, I can get a rough idea whether there is a page leak =
by looking at the VM system page counters:

# sysctl vm.stats |grep count
vm.stats.vm.v_cache_count: 0
vm.stats.vm.v_user_wire_count: 0
vm.stats.vm.v_laundry_count: 991626
vm.stats.vm.v_inactive_count: 39733216
vm.stats.vm.v_active_count: 11821309
vm.stats.vm.v_wire_count: 11154113
vm.stats.vm.v_free_count: 1599981
vm.stats.vm.v_page_count: 65347213

So the first 5 numbers add up to 65300245 in this case, for a difference =
of 46968. =20

Am I off base here as far as the various counts adding up to the page =
count?  (e.g. is the wire count just an additional attribute of a page =
and not another separate state like active, inactive or laundry?)

Looking at the kernel core dump for one of the systems I see:

kgdb) print vm_cnt
$1 =3D {v_swtch =3D 0xfffffe022158f2f8, v_trap =3D 0xfffffe022158f2f0,
  v_syscall =3D 0xfffffe022158f2e8, v_intr =3D 0xfffffe022158f2e0,
  v_soft =3D 0xfffffe022158f2d8, v_vm_faults =3D 0xfffffe022158f2d0,
  v_io_faults =3D 0xfffffe022158f2c8, v_cow_faults =3D =
0xfffffe022158f2c0,
  v_cow_optim =3D 0xfffffe022158f2b8, v_zfod =3D 0xfffffe022158f2b0,
  v_ozfod =3D 0xfffffe022158f2a8, v_swapin =3D 0xfffffe022158f2a0,
  v_swapout =3D 0xfffffe022158f298, v_swappgsin =3D 0xfffffe022158f290,
  v_swappgsout =3D 0xfffffe022158f288, v_vnodein =3D 0xfffffe022158f280,
  v_vnodeout =3D 0xfffffe022158f278, v_vnodepgsin =3D =
0xfffffe022158f270,
  v_vnodepgsout =3D 0xfffffe022158f268, v_intrans =3D =
0xfffffe022158f260,
  v_reactivated =3D 0xfffffe022158f258, v_pdwakeups =3D =
0xfffffe022158f250,
  v_pdpages =3D 0xfffffe022158f248, v_pdshortfalls =3D =
0xfffffe022158f240,
  v_dfree =3D 0xfffffe022158f238, v_pfree =3D 0xfffffe022158f230,
  v_tfree =3D 0xfffffe022158f228, v_forks =3D 0xfffffe022158f220,
  v_vforks =3D 0xfffffe022158f218, v_rforks =3D 0xfffffe022158f210,
  v_kthreads =3D 0xfffffe022158f208, v_forkpages =3D 0xfffffe022158f200,
  v_vforkpages =3D 0xfffffe022158f1f8, v_rforkpages =3D =
0xfffffe022158f1f0,
  v_kthreadpages =3D 0xfffffe022158f1e8, v_wire_count =3D =
0xfffffe022158f1e0,
  v_page_size =3D 4096, v_page_count =3D 65342843, v_free_reserved =3D =
85343,
  v_free_target =3D 1392195, v_free_min =3D 412056, v_inactive_target =3D =
2088292,
  v_pageout_free_min =3D 136, v_interrupt_free_min =3D 8, v_free_severe =
=3D 248698}
(kgdb) print vm_ndomains
$2 =3D 4
(kgdb) print vm_dom[0].vmd_pagequeues[0].pq_cnt
$3 =3D 6298704
(kgdb) print vm_dom[0].vmd_pagequeues[1].pq_cnt
$4 =3D 3423939
(kgdb) print vm_dom[0].vmd_pagequeues[2].pq_cnt
$5 =3D 629834
(kgdb) print vm_dom[0].vmd_pagequeues[3].pq_cnt
$6 =3D 0
(kgdb) print vm_dom[1].vmd_pagequeues[0].pq_cnt
$7 =3D 2301793
(kgdb) print vm_dom[1].vmd_pagequeues[1].pq_cnt
$8 =3D 7130193
(kgdb) print vm_dom[1].vmd_pagequeues[2].pq_cnt
$9 =3D 701495
(kgdb) print vm_dom[1].vmd_pagequeues[3].pq_cnt
$10 =3D 0
(kgdb) print vm_dom[2].vmd_pagequeues[0].pq_cnt
$11 =3D 464429
(kgdb) print vm_dom[2].vmd_pagequeues[1].pq_cnt
$12 =3D 9123532
(kgdb) print vm_dom[2].vmd_pagequeues[2].pq_cnt
$13 =3D 1037423
(kgdb) print vm_dom[2].vmd_pagequeues[3].pq_cnt
$14 =3D 0
(kgdb) print vm_dom[3].vmd_pagequeues[0].pq_cnt
$15 =3D 5444946
(kgdb) print vm_dom[3].vmd_pagequeues[1].pq_cnt
$16 =3D 4466782
(kgdb) print vm_dom[3].vmd_pagequeues[2].pq_cnt
$17 =3D 785195
(kgdb) print vm_dom[3].vmd_pagequeues[3].pq_cnt
$18 =3D 0
(kgdb)=20


Adding up the page queue counts:

6298704
3423939
629834
++p
10352477
2301793
7130193
701495
++p
10133481
+p
20485958
464429
9123532
1037423
++p
10625384
+p
31111342
5444946
4466782
785195
++p
10696923
+p
41808265

So, about 23M pages short of v_page_count. =20

v_wire_count is a per-CPU counter, and on a running system it gets added =
up.  But trying to access it in the kernel core dump yields:

(kgdb) print vm_cnt.v_wire_count
$2 =3D (counter_u64_t) 0xfffffe022158f1e0
(kgdb) print *$2
Cannot access memory at address 0xfffffe022158f1e0

Anyone have any ideas whether I can figure out whether there is a page =
leak from the core dump?

Thanks,

Ken
=E2=80=94=20
Ken Merry
ken@FreeBSD.ORG




--Apple-Mail=_D56B27C3-1350-4263-B0C8-3DAB4A3FCAD4
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=utf-8

<html><head><meta http-equiv=3D"content-type" content=3D"text/html; =
charset=3Dutf-8"></head><body style=3D"overflow-wrap: break-word; =
-webkit-nbsp-mode: space; line-break: after-white-space;">I have kernel =
core dumps from several machines out in the field (customer sites) that =
were out of memory panics, and I=E2=80=99m trying to figure out, from =
the kernel core dumps, whether we=E2=80=99re dealing with a potential =
page leak.<div><br></div><div>For context, these machines are running =
stable/13 from April 2021, but they do have the fix for this =
bug:</div><div><br></div><div><a =
href=3D"https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D256507">https:=
//bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D256507</a><br><div>
<div><br></div><div>Which is this commit in =
stable/13:</div><div><br></div><div><a =
href=3D"https://cgit.freebsd.org/src/commit/?id=3D6094749a1a5dafb8daf98dea=
b23fc968070bc695">https://cgit.freebsd.org/src/commit/?id=3D6094749a1a5daf=
b8daf98deab23fc968070bc695</a></div><div><br></div><div>On a running =
system, I can get a rough idea whether there is a page leak by looking =
at the VM system page counters:</div><div><br></div><div><div># sysctl =
vm.stats |grep count</div><div>vm.stats.vm.v_cache_count: =
0</div><div>vm.stats.vm.v_user_wire_count: =
0</div><div>vm.stats.vm.v_laundry_count: =
991626</div><div>vm.stats.vm.v_inactive_count: =
39733216</div><div>vm.stats.vm.v_active_count: =
11821309</div><div>vm.stats.vm.v_wire_count: =
11154113</div><div>vm.stats.vm.v_free_count: =
1599981</div><div>vm.stats.vm.v_page_count: =
65347213</div></div><div><br></div><div>So the first 5 numbers add up =
to&nbsp;65300245 in this case, for a difference of 46968. =
&nbsp;</div><div><br></div><div>Am I off base here as far as the various =
counts adding up to the page count? &nbsp;(e.g. is the wire count just =
an additional attribute of a page and not another separate state like =
active, inactive or laundry?)</div><div><br></div><div>Looking at the =
kernel core dump for one of the systems I =
see:</div><div><br></div><div><div>kgdb) print vm_cnt</div><div>$1 =3D =
{v_swtch =3D 0xfffffe022158f2f8, v_trap =3D =
0xfffffe022158f2f0,</div><div>&nbsp; v_syscall =3D 0xfffffe022158f2e8, =
v_intr =3D 0xfffffe022158f2e0,</div><div>&nbsp; v_soft =3D =
0xfffffe022158f2d8, v_vm_faults =3D 0xfffffe022158f2d0,</div><div>&nbsp; =
v_io_faults =3D 0xfffffe022158f2c8, v_cow_faults =3D =
0xfffffe022158f2c0,</div><div>&nbsp; v_cow_optim =3D 0xfffffe022158f2b8, =
v_zfod =3D 0xfffffe022158f2b0,</div><div>&nbsp; v_ozfod =3D =
0xfffffe022158f2a8, v_swapin =3D 0xfffffe022158f2a0,</div><div>&nbsp; =
v_swapout =3D 0xfffffe022158f298, v_swappgsin =3D =
0xfffffe022158f290,</div><div>&nbsp; v_swappgsout =3D =
0xfffffe022158f288, v_vnodein =3D 0xfffffe022158f280,</div><div>&nbsp; =
v_vnodeout =3D 0xfffffe022158f278, v_vnodepgsin =3D =
0xfffffe022158f270,</div><div>&nbsp; v_vnodepgsout =3D =
0xfffffe022158f268, v_intrans =3D 0xfffffe022158f260,</div><div>&nbsp; =
v_reactivated =3D 0xfffffe022158f258, v_pdwakeups =3D =
0xfffffe022158f250,</div><div>&nbsp; v_pdpages =3D 0xfffffe022158f248, =
v_pdshortfalls =3D 0xfffffe022158f240,</div><div>&nbsp; v_dfree =3D =
0xfffffe022158f238, v_pfree =3D 0xfffffe022158f230,</div><div>&nbsp; =
v_tfree =3D 0xfffffe022158f228, v_forks =3D =
0xfffffe022158f220,</div><div>&nbsp; v_vforks =3D 0xfffffe022158f218, =
v_rforks =3D 0xfffffe022158f210,</div><div>&nbsp; v_kthreads =3D =
0xfffffe022158f208, v_forkpages =3D 0xfffffe022158f200,</div><div>&nbsp; =
v_vforkpages =3D 0xfffffe022158f1f8, v_rforkpages =3D =
0xfffffe022158f1f0,</div><div>&nbsp; v_kthreadpages =3D =
0xfffffe022158f1e8, v_wire_count =3D =
0xfffffe022158f1e0,</div><div>&nbsp; v_page_size =3D 4096, v_page_count =
=3D 65342843, v_free_reserved =3D 85343,</div><div>&nbsp; v_free_target =
=3D 1392195, v_free_min =3D 412056, v_inactive_target =3D =
2088292,</div><div>&nbsp; v_pageout_free_min =3D 136, =
v_interrupt_free_min =3D 8, v_free_severe =3D 248698}</div><div>(kgdb) =
print vm_ndomains</div><div>$2 =3D 4</div><div>(kgdb) print =
vm_dom[0].vmd_pagequeues[0].pq_cnt</div><div>$3 =3D =
6298704</div><div>(kgdb) print =
vm_dom[0].vmd_pagequeues[1].pq_cnt</div><div>$4 =3D =
3423939</div><div>(kgdb) print =
vm_dom[0].vmd_pagequeues[2].pq_cnt</div><div>$5 =3D =
629834</div><div>(kgdb) print =
vm_dom[0].vmd_pagequeues[3].pq_cnt</div><div>$6 =3D 0</div><div>(kgdb) =
print vm_dom[1].vmd_pagequeues[0].pq_cnt</div><div>$7 =3D =
2301793</div><div>(kgdb) print =
vm_dom[1].vmd_pagequeues[1].pq_cnt</div><div>$8 =3D =
7130193</div><div>(kgdb) print =
vm_dom[1].vmd_pagequeues[2].pq_cnt</div><div>$9 =3D =
701495</div><div>(kgdb) print =
vm_dom[1].vmd_pagequeues[3].pq_cnt</div><div>$10 =3D 0</div><div>(kgdb) =
print vm_dom[2].vmd_pagequeues[0].pq_cnt</div><div>$11 =3D =
464429</div><div>(kgdb) print =
vm_dom[2].vmd_pagequeues[1].pq_cnt</div><div>$12 =3D =
9123532</div></div><div><div>(kgdb) print =
vm_dom[2].vmd_pagequeues[2].pq_cnt</div><div>$13 =3D =
1037423</div><div>(kgdb) print =
vm_dom[2].vmd_pagequeues[3].pq_cnt</div><div>$14 =3D 0</div><div>(kgdb) =
print vm_dom[3].vmd_pagequeues[0].pq_cnt</div><div>$15 =3D =
5444946</div><div>(kgdb) print =
vm_dom[3].vmd_pagequeues[1].pq_cnt</div><div>$16 =3D =
4466782</div><div>(kgdb) print =
vm_dom[3].vmd_pagequeues[2].pq_cnt</div><div>$17 =3D =
785195</div><div>(kgdb) print =
vm_dom[3].vmd_pagequeues[3].pq_cnt</div><div>$18 =3D =
0</div><div>(kgdb)&nbsp;</div></div><div><br></div><div><br></div><div>Add=
ing up the page queue =
counts:</div><div><br></div><div><div>6298704</div><div>3423939</div><div>=
629834</div><div>++p</div><div>10352477</div><div>2301793</div><div>713019=
3</div><div>701495</div><div>++p</div><div>10133481</div><div>+p</div><div=
>20485958</div><div>464429</div><div>9123532</div><div>1037423</div><div>+=
+p</div><div>10625384</div><div>+p</div><div>31111342</div><div>5444946</d=
iv><div>4466782</div><div>785195</div><div>++p</div><div>10696923</div><di=
v>+p</div><div>41808265</div></div><div><br></div><div>So, about 23M =
pages short of v_page_count. =
&nbsp;</div><div><br></div><div>v_wire_count is a per-CPU counter, and =
on a running system it gets added up. &nbsp;But trying to access it in =
the kernel core dump yields:</div><div><br></div><div><div>(kgdb) print =
vm_cnt.v_wire_count</div><div>$2 =3D (counter_u64_t) =
0xfffffe022158f1e0</div><div>(kgdb) print *$2</div><div>Cannot access =
memory at address =
0xfffffe022158f1e0</div></div><div><br></div><div>Anyone have any ideas =
whether I can figure out whether there is a page leak from the core =
dump?</div><div><br></div><div>Thanks,</div><div><br></div><div>Ken</div><=
div>=E2=80=94&nbsp;</div><div>Ken =
Merry</div><div>ken@FreeBSD.ORG</div><div><br></div><br =
class=3D"Apple-interchange-newline">
</div>


<br></div></body></html>=

--Apple-Mail=_D56B27C3-1350-4263-B0C8-3DAB4A3FCAD4--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?66742036-C8DF-4A13-9D4A-CDA71217E574>