Date: Tue, 21 Mar 2023 11:11:30 -0400 From: Ken Merry <ken@freebsd.org> To: hackers@freebsd.org Subject: Getting v_wire_count from a kernel core dump? Message-ID: <66742036-C8DF-4A13-9D4A-CDA71217E574@freebsd.org>
next in thread | raw e-mail | index | archive | help
--Apple-Mail=_D56B27C3-1350-4263-B0C8-3DAB4A3FCAD4 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 I have kernel core dumps from several machines out in the field = (customer sites) that were out of memory panics, and I=E2=80=99m trying = to figure out, from the kernel core dumps, whether we=E2=80=99re dealing = with a potential page leak. For context, these machines are running stable/13 from April 2021, but = they do have the fix for this bug: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D256507 Which is this commit in stable/13: = https://cgit.freebsd.org/src/commit/?id=3D6094749a1a5dafb8daf98deab23fc968= 070bc695 On a running system, I can get a rough idea whether there is a page leak = by looking at the VM system page counters: # sysctl vm.stats |grep count vm.stats.vm.v_cache_count: 0 vm.stats.vm.v_user_wire_count: 0 vm.stats.vm.v_laundry_count: 991626 vm.stats.vm.v_inactive_count: 39733216 vm.stats.vm.v_active_count: 11821309 vm.stats.vm.v_wire_count: 11154113 vm.stats.vm.v_free_count: 1599981 vm.stats.vm.v_page_count: 65347213 So the first 5 numbers add up to 65300245 in this case, for a difference = of 46968. =20 Am I off base here as far as the various counts adding up to the page = count? (e.g. is the wire count just an additional attribute of a page = and not another separate state like active, inactive or laundry?) Looking at the kernel core dump for one of the systems I see: kgdb) print vm_cnt $1 =3D {v_swtch =3D 0xfffffe022158f2f8, v_trap =3D 0xfffffe022158f2f0, v_syscall =3D 0xfffffe022158f2e8, v_intr =3D 0xfffffe022158f2e0, v_soft =3D 0xfffffe022158f2d8, v_vm_faults =3D 0xfffffe022158f2d0, v_io_faults =3D 0xfffffe022158f2c8, v_cow_faults =3D = 0xfffffe022158f2c0, v_cow_optim =3D 0xfffffe022158f2b8, v_zfod =3D 0xfffffe022158f2b0, v_ozfod =3D 0xfffffe022158f2a8, v_swapin =3D 0xfffffe022158f2a0, v_swapout =3D 0xfffffe022158f298, v_swappgsin =3D 0xfffffe022158f290, v_swappgsout =3D 0xfffffe022158f288, v_vnodein =3D 0xfffffe022158f280, v_vnodeout =3D 0xfffffe022158f278, v_vnodepgsin =3D = 0xfffffe022158f270, v_vnodepgsout =3D 0xfffffe022158f268, v_intrans =3D = 0xfffffe022158f260, v_reactivated =3D 0xfffffe022158f258, v_pdwakeups =3D = 0xfffffe022158f250, v_pdpages =3D 0xfffffe022158f248, v_pdshortfalls =3D = 0xfffffe022158f240, v_dfree =3D 0xfffffe022158f238, v_pfree =3D 0xfffffe022158f230, v_tfree =3D 0xfffffe022158f228, v_forks =3D 0xfffffe022158f220, v_vforks =3D 0xfffffe022158f218, v_rforks =3D 0xfffffe022158f210, v_kthreads =3D 0xfffffe022158f208, v_forkpages =3D 0xfffffe022158f200, v_vforkpages =3D 0xfffffe022158f1f8, v_rforkpages =3D = 0xfffffe022158f1f0, v_kthreadpages =3D 0xfffffe022158f1e8, v_wire_count =3D = 0xfffffe022158f1e0, v_page_size =3D 4096, v_page_count =3D 65342843, v_free_reserved =3D = 85343, v_free_target =3D 1392195, v_free_min =3D 412056, v_inactive_target =3D = 2088292, v_pageout_free_min =3D 136, v_interrupt_free_min =3D 8, v_free_severe = =3D 248698} (kgdb) print vm_ndomains $2 =3D 4 (kgdb) print vm_dom[0].vmd_pagequeues[0].pq_cnt $3 =3D 6298704 (kgdb) print vm_dom[0].vmd_pagequeues[1].pq_cnt $4 =3D 3423939 (kgdb) print vm_dom[0].vmd_pagequeues[2].pq_cnt $5 =3D 629834 (kgdb) print vm_dom[0].vmd_pagequeues[3].pq_cnt $6 =3D 0 (kgdb) print vm_dom[1].vmd_pagequeues[0].pq_cnt $7 =3D 2301793 (kgdb) print vm_dom[1].vmd_pagequeues[1].pq_cnt $8 =3D 7130193 (kgdb) print vm_dom[1].vmd_pagequeues[2].pq_cnt $9 =3D 701495 (kgdb) print vm_dom[1].vmd_pagequeues[3].pq_cnt $10 =3D 0 (kgdb) print vm_dom[2].vmd_pagequeues[0].pq_cnt $11 =3D 464429 (kgdb) print vm_dom[2].vmd_pagequeues[1].pq_cnt $12 =3D 9123532 (kgdb) print vm_dom[2].vmd_pagequeues[2].pq_cnt $13 =3D 1037423 (kgdb) print vm_dom[2].vmd_pagequeues[3].pq_cnt $14 =3D 0 (kgdb) print vm_dom[3].vmd_pagequeues[0].pq_cnt $15 =3D 5444946 (kgdb) print vm_dom[3].vmd_pagequeues[1].pq_cnt $16 =3D 4466782 (kgdb) print vm_dom[3].vmd_pagequeues[2].pq_cnt $17 =3D 785195 (kgdb) print vm_dom[3].vmd_pagequeues[3].pq_cnt $18 =3D 0 (kgdb)=20 Adding up the page queue counts: 6298704 3423939 629834 ++p 10352477 2301793 7130193 701495 ++p 10133481 +p 20485958 464429 9123532 1037423 ++p 10625384 +p 31111342 5444946 4466782 785195 ++p 10696923 +p 41808265 So, about 23M pages short of v_page_count. =20 v_wire_count is a per-CPU counter, and on a running system it gets added = up. But trying to access it in the kernel core dump yields: (kgdb) print vm_cnt.v_wire_count $2 =3D (counter_u64_t) 0xfffffe022158f1e0 (kgdb) print *$2 Cannot access memory at address 0xfffffe022158f1e0 Anyone have any ideas whether I can figure out whether there is a page = leak from the core dump? Thanks, Ken =E2=80=94=20 Ken Merry ken@FreeBSD.ORG --Apple-Mail=_D56B27C3-1350-4263-B0C8-3DAB4A3FCAD4 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 <html><head><meta http-equiv=3D"content-type" content=3D"text/html; = charset=3Dutf-8"></head><body style=3D"overflow-wrap: break-word; = -webkit-nbsp-mode: space; line-break: after-white-space;">I have kernel = core dumps from several machines out in the field (customer sites) that = were out of memory panics, and I=E2=80=99m trying to figure out, from = the kernel core dumps, whether we=E2=80=99re dealing with a potential = page leak.<div><br></div><div>For context, these machines are running = stable/13 from April 2021, but they do have the fix for this = bug:</div><div><br></div><div><a = href=3D"https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D256507">https:= //bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D256507</a><br><div> <div><br></div><div>Which is this commit in = stable/13:</div><div><br></div><div><a = href=3D"https://cgit.freebsd.org/src/commit/?id=3D6094749a1a5dafb8daf98dea= b23fc968070bc695">https://cgit.freebsd.org/src/commit/?id=3D6094749a1a5daf= b8daf98deab23fc968070bc695</a></div><div><br></div><div>On a running = system, I can get a rough idea whether there is a page leak by looking = at the VM system page counters:</div><div><br></div><div><div># sysctl = vm.stats |grep count</div><div>vm.stats.vm.v_cache_count: = 0</div><div>vm.stats.vm.v_user_wire_count: = 0</div><div>vm.stats.vm.v_laundry_count: = 991626</div><div>vm.stats.vm.v_inactive_count: = 39733216</div><div>vm.stats.vm.v_active_count: = 11821309</div><div>vm.stats.vm.v_wire_count: = 11154113</div><div>vm.stats.vm.v_free_count: = 1599981</div><div>vm.stats.vm.v_page_count: = 65347213</div></div><div><br></div><div>So the first 5 numbers add up = to 65300245 in this case, for a difference of 46968. = </div><div><br></div><div>Am I off base here as far as the various = counts adding up to the page count? (e.g. is the wire count just = an additional attribute of a page and not another separate state like = active, inactive or laundry?)</div><div><br></div><div>Looking at the = kernel core dump for one of the systems I = see:</div><div><br></div><div><div>kgdb) print vm_cnt</div><div>$1 =3D = {v_swtch =3D 0xfffffe022158f2f8, v_trap =3D = 0xfffffe022158f2f0,</div><div> v_syscall =3D 0xfffffe022158f2e8, = v_intr =3D 0xfffffe022158f2e0,</div><div> v_soft =3D = 0xfffffe022158f2d8, v_vm_faults =3D 0xfffffe022158f2d0,</div><div> = v_io_faults =3D 0xfffffe022158f2c8, v_cow_faults =3D = 0xfffffe022158f2c0,</div><div> v_cow_optim =3D 0xfffffe022158f2b8, = v_zfod =3D 0xfffffe022158f2b0,</div><div> v_ozfod =3D = 0xfffffe022158f2a8, v_swapin =3D 0xfffffe022158f2a0,</div><div> = v_swapout =3D 0xfffffe022158f298, v_swappgsin =3D = 0xfffffe022158f290,</div><div> v_swappgsout =3D = 0xfffffe022158f288, v_vnodein =3D 0xfffffe022158f280,</div><div> = v_vnodeout =3D 0xfffffe022158f278, v_vnodepgsin =3D = 0xfffffe022158f270,</div><div> v_vnodepgsout =3D = 0xfffffe022158f268, v_intrans =3D 0xfffffe022158f260,</div><div> = v_reactivated =3D 0xfffffe022158f258, v_pdwakeups =3D = 0xfffffe022158f250,</div><div> v_pdpages =3D 0xfffffe022158f248, = v_pdshortfalls =3D 0xfffffe022158f240,</div><div> v_dfree =3D = 0xfffffe022158f238, v_pfree =3D 0xfffffe022158f230,</div><div> = v_tfree =3D 0xfffffe022158f228, v_forks =3D = 0xfffffe022158f220,</div><div> v_vforks =3D 0xfffffe022158f218, = v_rforks =3D 0xfffffe022158f210,</div><div> v_kthreads =3D = 0xfffffe022158f208, v_forkpages =3D 0xfffffe022158f200,</div><div> = v_vforkpages =3D 0xfffffe022158f1f8, v_rforkpages =3D = 0xfffffe022158f1f0,</div><div> v_kthreadpages =3D = 0xfffffe022158f1e8, v_wire_count =3D = 0xfffffe022158f1e0,</div><div> v_page_size =3D 4096, v_page_count = =3D 65342843, v_free_reserved =3D 85343,</div><div> v_free_target = =3D 1392195, v_free_min =3D 412056, v_inactive_target =3D = 2088292,</div><div> v_pageout_free_min =3D 136, = v_interrupt_free_min =3D 8, v_free_severe =3D 248698}</div><div>(kgdb) = print vm_ndomains</div><div>$2 =3D 4</div><div>(kgdb) print = vm_dom[0].vmd_pagequeues[0].pq_cnt</div><div>$3 =3D = 6298704</div><div>(kgdb) print = vm_dom[0].vmd_pagequeues[1].pq_cnt</div><div>$4 =3D = 3423939</div><div>(kgdb) print = vm_dom[0].vmd_pagequeues[2].pq_cnt</div><div>$5 =3D = 629834</div><div>(kgdb) print = vm_dom[0].vmd_pagequeues[3].pq_cnt</div><div>$6 =3D 0</div><div>(kgdb) = print vm_dom[1].vmd_pagequeues[0].pq_cnt</div><div>$7 =3D = 2301793</div><div>(kgdb) print = vm_dom[1].vmd_pagequeues[1].pq_cnt</div><div>$8 =3D = 7130193</div><div>(kgdb) print = vm_dom[1].vmd_pagequeues[2].pq_cnt</div><div>$9 =3D = 701495</div><div>(kgdb) print = vm_dom[1].vmd_pagequeues[3].pq_cnt</div><div>$10 =3D 0</div><div>(kgdb) = print vm_dom[2].vmd_pagequeues[0].pq_cnt</div><div>$11 =3D = 464429</div><div>(kgdb) print = vm_dom[2].vmd_pagequeues[1].pq_cnt</div><div>$12 =3D = 9123532</div></div><div><div>(kgdb) print = vm_dom[2].vmd_pagequeues[2].pq_cnt</div><div>$13 =3D = 1037423</div><div>(kgdb) print = vm_dom[2].vmd_pagequeues[3].pq_cnt</div><div>$14 =3D 0</div><div>(kgdb) = print vm_dom[3].vmd_pagequeues[0].pq_cnt</div><div>$15 =3D = 5444946</div><div>(kgdb) print = vm_dom[3].vmd_pagequeues[1].pq_cnt</div><div>$16 =3D = 4466782</div><div>(kgdb) print = vm_dom[3].vmd_pagequeues[2].pq_cnt</div><div>$17 =3D = 785195</div><div>(kgdb) print = vm_dom[3].vmd_pagequeues[3].pq_cnt</div><div>$18 =3D = 0</div><div>(kgdb) </div></div><div><br></div><div><br></div><div>Add= ing up the page queue = counts:</div><div><br></div><div><div>6298704</div><div>3423939</div><div>= 629834</div><div>++p</div><div>10352477</div><div>2301793</div><div>713019= 3</div><div>701495</div><div>++p</div><div>10133481</div><div>+p</div><div= >20485958</div><div>464429</div><div>9123532</div><div>1037423</div><div>+= +p</div><div>10625384</div><div>+p</div><div>31111342</div><div>5444946</d= iv><div>4466782</div><div>785195</div><div>++p</div><div>10696923</div><di= v>+p</div><div>41808265</div></div><div><br></div><div>So, about 23M = pages short of v_page_count. = </div><div><br></div><div>v_wire_count is a per-CPU counter, and = on a running system it gets added up. But trying to access it in = the kernel core dump yields:</div><div><br></div><div><div>(kgdb) print = vm_cnt.v_wire_count</div><div>$2 =3D (counter_u64_t) = 0xfffffe022158f1e0</div><div>(kgdb) print *$2</div><div>Cannot access = memory at address = 0xfffffe022158f1e0</div></div><div><br></div><div>Anyone have any ideas = whether I can figure out whether there is a page leak from the core = dump?</div><div><br></div><div>Thanks,</div><div><br></div><div>Ken</div><= div>=E2=80=94 </div><div>Ken = Merry</div><div>ken@FreeBSD.ORG</div><div><br></div><br = class=3D"Apple-interchange-newline"> </div> <br></div></body></html>= --Apple-Mail=_D56B27C3-1350-4263-B0C8-3DAB4A3FCAD4--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?66742036-C8DF-4A13-9D4A-CDA71217E574>