From owner-freebsd-hackers@freebsd.org Fri Apr 28 05:26:24 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A75DBD54374 for ; Fri, 28 Apr 2017 05:26:24 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-210-51.reflexion.net [208.70.210.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 6737B7A2 for ; Fri, 28 Apr 2017 05:26:23 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 7428 invoked from network); 28 Apr 2017 05:29:35 -0000 Received: from unknown (HELO mail-cs-02.app.dca.reflexion.local) (10.81.19.2) by 0 (rfx-qmail) with SMTP; 28 Apr 2017 05:29:35 -0000 Received: by mail-cs-02.app.dca.reflexion.local (Reflexion email security v8.40.0) with SMTP; Fri, 28 Apr 2017 01:26:22 -0400 (EDT) Received: (qmail 18704 invoked from network); 28 Apr 2017 05:26:22 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 28 Apr 2017 05:26:22 -0000 Received: from [192.168.1.106] (c-76-115-7-162.hsd1.or.comcast.net [76.115.7.162]) by iron2.pdx.net (Postfix) with ESMTPSA id 6D413EC904C; Thu, 27 Apr 2017 22:26:21 -0700 (PDT) From: Mark Millard Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: FYI: example "panic: ARM64TODO: reclaim_pv_chunk" on a Pine64+ 2GB with head -r317015 [mdconfig -d not effectively releasing memory?] Date: Thu, 27 Apr 2017 22:26:20 -0700 References: <4BC7E6BC-4BF9-4F5E-9851-E022AC9A3082@dsl-only.net> <51050A07-B951-45C0-82CE-73BB342F012E@dsl-only.net> To: freebsd-arm , freebsd-hackers@freebsd.org, FreeBSD Current In-Reply-To: <51050A07-B951-45C0-82CE-73BB342F012E@dsl-only.net> Message-Id: <302D1255-4D34-4C1B-8F3A-9180A6AF8768@dsl-only.net> X-Mailer: Apple Mail (2.3273) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 28 Apr 2017 05:26:24 -0000 [As the text does not really follow from the earlier text I'd sent directly I'm top posting a hypothesis about where so much active memory came from to be so low in available memory to get an reclaim_pv_chunk attempt.] My hypothesis is that the "mdconfig -d"s from vm_copy_base (from /usr/src/release/tools/vmimage.subr ) did not actually release the memory resources involved (from vnode backed mdconfig use): vm_copy_base() { # Creates a new UFS root filesystem and copies the contents of = the # current root filesystem into it. This produces a "clean" disk # image without any remnants of files which were created = temporarily # during image-creation and have since been deleted (e.g., = downloaded # package archives). mkdir -p ${DESTDIR}/old mdold=3D$(mdconfig -f ${VMBASE}) mount /dev/${mdold} ${DESTDIR}/old truncate -s ${VMSIZE} ${VMBASE}.tmp mkdir -p ${DESTDIR}/new mdnew=3D$(mdconfig -f ${VMBASE}.tmp) newfs -L rootfs /dev/${mdnew} mount /dev/${mdnew} ${DESTDIR}/new tar -cf- -C ${DESTDIR}/old . | tar -xUf- -C ${DESTDIR}/new umount_loop /dev/${mdold} rmdir ${DESTDIR}/old mdconfig -d -u ${mdold} umount_loop /dev/${mdnew} rmdir ${DESTDIR}/new tunefs -n enable /dev/${mdnew} mdconfig -d -u ${mdnew} mv ${VMBASE}.tmp ${VMBASE} } Without such prior mdconfig activity the "cp -p" and following "xz -T 0" do not get the large Meme:Active figure in top, "xz -T 0" getting more like 781M Mem:Active with a xz:SIZE of 791M and xz:RES < 800M (varying). Zero xz:SWAP. xz also gets all the cores going, so well over 300% in top always (4 cores) instead of < 100%. In this context both the cp and the xz finish just fine. In other words: no low memory problem without first having the vnode backed mdconfig use. =46rom the prior top report for the failure, partially repeated here: . . . Mem: 1618M Active, 17M Inact, 315M Wired, 204M Buf, 15M Free Swap: 6144M Total, 34M Used, 6110M Free, 348K Out PID USERNAME THR PRI NICE SIZE RES SWAP STATE C TIME = CPU COMMAND 48988 root 4 31 0 651M 27048K 0K RUN 0 0:03 = 87.60% xz -T 0 = /usr/obj/DESTDIRs/vmimage-aarch64/vmimages/FreeBSD-12.0-CURRENT- . . . The combination 1618M Mem:Active but 34M Swap:Used and 651M xz:SIZE but 27M xz:RES and 0K xz:SWAP just seems very odd, like it should not happen. The 17M Mem:Inact is odd for the context as well. (Mem:Wired, Mem:Buf, and Mem:Free look normal.) An alternate hypothesis would be the memory "leak" is from mkimg not having it memory-use cleaned up. This happens after vm_copy_base but before the cp/xz sequence and is what produces vm.raw. For reference of what worked just fine after the post-panic reboot, using the already existing vm.raw (sparse) file as a starting place: # cp -p vm.raw = /usr/obj/DESTDIRs/vmimage-aarch64/vmimages/FreeBSD-12.0-CURRENT-arm64-aarc= h64.raw # xz -T 0 = /usr/obj/DESTDIRs/vmimage-aarch64/vmimages/FreeBSD-12.0-CURRENT-arm64-aarc= h64.raw # ls -lTt *raw* -rw-r--r-- 1 root wheel 34360566272 Apr 27 18:40:24 2017 vm.raw -rw-r--r-- 1 root wheel 34359746560 Apr 27 18:37:29 2017 = vm.raw.nested_bsd -rw-r--r-- 1 root wheel 27917287424 Apr 27 18:34:45 2017 raw.img # du -sm *raw* 1762 raw.img 1583 vm.raw 1583 vm.raw.nested_bsd (Before the .xz replaces the .raw:) # ls -lT /usr/obj/DESTDIRs/vmimage-aarch64/vmimages/ total 33820032 -rw-r--r-- 1 root wheel 34360566272 Apr 27 18:40:24 2017 = FreeBSD-12.0-CURRENT-arm64-aarch64.raw # du -sm /usr/obj/DESTDIRs/vmimage-aarch64/vmimages/* 32777 = /usr/obj/DESTDIRs/vmimage-aarch64/vmimages/FreeBSD-12.0-CURRENT-arm64-aarc= h64.raw (After xz:) # ls -lT /usr/obj/DESTDIRs/vmimage-aarch64/vmimages/ total 258208 -rw-r--r-- 1 root wheel 264275808 Apr 27 18:40:24 2017 = FreeBSD-12.0-CURRENT-arm64-aarch64.raw.xz # du -sm /usr/obj/DESTDIRs/vmimage-aarch64/vmimages/* 253 = /usr/obj/DESTDIRs/vmimage-aarch64/vmimages/FreeBSD-12.0-CURRENT-arm64-aarc= h64.raw.xz (Mem:Active returned to 10M when xz finished.) Prior reports from capturing text: On 2017-Apr-27, at 7:31 PM, Mark Millard wrote: > [Another example panic. Again no dump. But I have what > a top -PCwaopid froze at this time.] >=20 > On 2017-Apr-27, at 4:22 PM, Mark Millard = wrote: >=20 >> Unfortunately for this FYI the attempt to produce a dump >> failed and so all the information that I have is what I >> first captured from the console output: a backtrace. >>=20 >> The context was head -r317015 on a Pine64+ 2GB. At the >> time I was experimenting with trying to build a vm.raw >> from my own build of FreeBSD. The (root) file system >> is on a USB SSD off of a powered USB hub. >>=20 >> panic: ARM64TODO: reclaim_pv_chunk >> cpuid =3D 1 >> time =3D 1493332968 >> KDB: stack backtrace: >> db_trace_self() at db_trace_self_wrapper+0x28 >> pc =3D 0xffff000000605cc0 lr =3D 0xffff0000000869cc >> sp =3D 0xffff000083ba4f00 fp =3D 0xffff000083ba5110 >>=20 >> db_trace_self_wrapper() at vpanic+0x164 >> pc =3D 0xffff0000000869cc lr =3D 0xffff00000031d464 >> sp =3D 0xffff000083ba5120 fp =3D 0xffff000083ba5190 >>=20 >> vpanic() at panic+0x4c >> pc =3D 0xffff00000031d464 lr =3D 0xffff00000031d2fc >> sp =3D 0xffff000083ba51a0 fp =3D 0xffff000083ba5220 >>=20 >> panic() at reclaim_pv_chunk+0x10 >> pc =3D 0xffff00000031d2fc lr =3D 0xffff00000061a234 >> sp =3D 0xffff000083ba5230 fp =3D 0xffff000083ba5230 >>=20 >> reclaim_pv_chunk() at get_pv_entry+0x240 >> pc =3D 0xffff00000061a234 lr =3D 0xffff000000616184 >> sp =3D 0xffff000083ba5240 fp =3D 0xffff000083ba5260 >>=20 >> get_pv_entry() at pmap_enter+0x694 >> pc =3D 0xffff000000616184 lr =3D 0xffff0000006156a0 >> sp =3D 0xffff000083ba5270 fp =3D 0xffff000083ba5300 >>=20 >> pmap_enter() at vm_fault_hold+0x28c >> pc =3D 0xffff0000006156a0 lr =3D 0xffff0000005b9740 >> sp =3D 0xffff000083ba5310 fp =3D 0xffff000083ba5460 >>=20 >> vm_fault_hold() at vm_fault+0x70 >> pc =3D 0xffff0000005b9740 lr =3D 0xffff0000005b9464 >> sp =3D 0xffff000083ba5470 fp =3D 0xffff000083ba54a0 >>=20 >> vm_fault() at data_abort+0xe0 >> pc =3D 0xffff0000005b9464 lr =3D 0xffff00000061ad94 >> sp =3D 0xffff000083ba54b0 fp =3D 0xffff000083ba5560 >>=20 >> data_abort() at handle_el1h_sync+0x70 >> pc =3D 0xffff00000061ad94 lr =3D 0xffff000000607870 >> sp =3D 0xffff000083ba5570 fp =3D 0xffff000083ba5680 >>=20 >> handle_el1h_sync() at kern_select+0x9fc >> pc =3D 0xffff000000607870 lr =3D 0xffff00000037db3c >> sp =3D 0xffff000083ba5690 fp =3D 0xffff000083ba58f0 >>=20 >> kern_select() at sys_select+0x5c >> pc =3D 0xffff00000037db3c lr =3D 0xffff00000037dc58 >> sp =3D 0xffff000083ba5900 fp =3D 0xffff000083ba5930 >>=20 >> sys_select() at do_el0_sync+0xa48 >> pc =3D 0xffff00000037dc58 lr =3D 0xffff00000061b91c >> sp =3D 0xffff000083ba5940 fp =3D 0xffff000083ba5a70 >>=20 >> do_el0_sync() at handle_el0_sync+0x6c >> pc =3D 0xffff00000061b91c lr =3D 0xffff0000006079e8 >> sp =3D 0xffff000083ba5a80 fp =3D 0xffff000083ba5b90 >>=20 >> handle_el0_sync() at 0x4948c >> pc =3D 0xffff0000006079e8 lr =3D 0x000000000004948c >> sp =3D 0xffff000083ba5ba0 fp =3D 0x0000ffffffffd960 >=20 >=20 > This time I got to record from top: > (swap is on a swap partition) > (pid 49888's SIZE vs. RES and SWAP might be interesting) > (as might the Active figure) >=20 > last pid: 48988; load averages: 0.64, 0.44, 0.38 = = up 0+04:21:01 19:19:50 > 32 processes: 2 running, 30 sleeping > CPU 0: 13.2% user, 0.0% nice, 23.2% system, 0.3% interrupt, 63.3% = idle > CPU 1: 4.6% user, 0.0% nice, 23.9% system, 0.0% interrupt, 71.5% = idle > CPU 2: 2.1% user, 0.0% nice, 23.2% system, 0.0% interrupt, 74.8% = idle > CPU 3: 3.3% user, 0.0% nice, 23.8% system, 0.0% interrupt, 72.8% = idle > Mem: 1618M Active, 17M Inact, 315M Wired, 204M Buf, 15M Free > Swap: 6144M Total, 34M Used, 6110M Free, 348K Out >=20 > PID USERNAME THR PRI NICE SIZE RES SWAP STATE C TIME = CPU COMMAND > 48988 root 4 31 0 651M 27048K 0K RUN 0 0:03 = 87.60% xz -T 0 = /usr/obj/DESTDIRs/vmimage-aarch64/vmimages/FreeBSD-12.0-CURRENT-arm64-aarc= h64.raw > 11983 root 1 22 0 5068K 0K 0K wait 3 0:00 = 0.00% make vm-image vm-install DESTDIR=3D/usr/obj/DESTDIRs/vmimage-aarch64= () > 11981 root 1 42 0 7320K 0K 1516K wait 1 0:00 = 0.00% sh = /root/sys_build_scripts.aarch64-host/make_noscript_aarch64_nodebug_clang_b= ootstrap-aarch64-host.sh vm-image vm-install=20 > 11980 root 1 20 0 6656K 1548K 0K select 0 0:02 = 0.00% [script] > 11977 root 1 30 0 7320K 0K 1516K wait 3 0:00 = 0.00% /bin/sh = /root/sys_build_scripts.aarch64-host/make_aarch64_nodebug_clang_bootstrap-= aarch64-host.sh vm-image vm-install DEST > 2694 root 1 20 0 8804K 2072K 0K CPU2 2 0:07 = 0.17% top -PCwaopid > 827 root 1 20 0 7320K 0K 360K wait 0 0:00 = 0.00% su () > 826 markmi 1 22 0 10372K 0K 1532K wait 3 0:00 = 0.00% su () > 820 markmi 1 24 0 7320K 0K 1516K wait 1 0:00 = 0.00% -sh () > 819 markmi 1 20 0 18416K 1152K 0K select 1 0:21 = 0.00% sshd: markmi@pts/1 (sshd) > 816 root 1 20 0 18416K 3276K 0K select 0 0:00 = 0.00% sshd: markmi [priv] (sshd) > 765 root 1 20 0 7320K 0K 224K wait 2 0:00 = 0.00% su () > 764 markmi 1 23 0 10372K 0K 1532K wait 0 0:00 = 0.00% su () > 758 markmi 1 31 0 7320K 0K 1516K wait 1 0:00 = 0.00% -sh () > 757 markmi 1 20 0 18416K 228K 904K select 3 0:01 = 0.01% sshd: markmi@pts/0 (sshd) > 754 root 1 25 0 18416K 3276K 0K select 1 0:00 = 0.00% sshd: markmi [priv] (sshd) > 746 root 1 27 0 7320K 1532K 0K ttyin 0 0:00 = 0.00% -sh (sh) > 745 root 1 20 0 10372K 0K 1532K wait 1 0:00 = 0.00% login [pam] () > 700 root 1 20 0 6948K 0K 168K nanslp 1 0:00 = 0.00% /usr/sbin/cron -s () > 696 smmsp 1 20 0 10460K 0K 184K pause 0 0:00 = 0.00% sendmail: Queue runner@00:30:00 for /var/spool/clientmqueue = () > 693 root 1 20 0 10460K 1392K 0K select 1 0:00 = 0.03% sendmail: accepting connections (sendmail) > 690 root 1 20 0 15800K 968K 0K select 2 0:00 = 0.00% /usr/sbin/sshd > 661 root 1 20 0 6656K 344K 0K select 2 0:01 = 0.00% /usr/sbin/powerd > 658 root 2 20 0 12788K 12672K 0K select 0 0:02 = 0.01% /usr/sbin/ntpd -g -c /etc/ntp.conf -p /var/run/ntpd.pid -f = /var/db/ntpd.drift > 620 root 32 52 0 6384K 1100K 0K rpcsvc 1 0:00 = 0.00% nfsd: server (nfsd) > 619 root 1 52 0 6384K 704K 0K select 1 0:00 = 0.00% nfsd: master (nfsd) > 617 root 1 20 0 6684K 688K 0K select 1 0:00 = 0.00% /usr/sbin/mountd -r > 478 root 1 20 0 6676K 596K 0K select 3 0:00 = 0.00% /usr/sbin/rpcbind > 469 root 1 20 0 6680K 572K 0K select 2 0:00 = 0.00% /usr/sbin/syslogd -s > 396 root 1 20 0 9580K 32K 0K select 0 0:00 = 0.00% /sbin/devd > 308 _dhcp 1 20 0 6800K 532K 0K select 2 0:00 = 0.00% dhclient: awg0 (dhclient) > 307 root 1 52 0 6800K 424K 0K select 2 0:00 = 0.00% dhclient: awg0 [priv] (dhclient) >=20 > And here is the backtrace: >=20 > timeout stopping cpus > panic: ARM64TODO: reclaim_pv_chunk > cpuid =3D 0 > time =3D 1493345992 > KDB: stack backtrace: > db_trace_self() at db_trace_self_wrapper+0x28 > pc =3D 0xffff000000605cc0 lr =3D 0xffff0000000869cc > sp =3D 0xffff000083d301d0 fp =3D 0xffff000083d303e0 >=20 > db_trace_self_wrapper() at vpanic+0x164 > pc =3D 0xffff0000000869cc lr =3D 0xffff00000031d464 > sp =3D 0xffff000083d303f0 fp =3D 0xffff000083d30460 >=20 > vpanic() at panic+0x4c > pc =3D 0xffff00000031d464 lr =3D 0xffff00000031d2fc > sp =3D 0xffff000083d30470 fp =3D 0xffff000083d304f0 >=20 > panic() at reclaim_pv_chunk+0x10 > pc =3D 0xffff00000031d2fc lr =3D 0xffff00000061a234 > sp =3D 0xffff000083d30500 fp =3D 0xffff000083d30500 >=20 > reclaim_pv_chunk() at get_pv_entry+0x240 > pc =3D 0xffff00000061a234 lr =3D 0xffff000000616184 > sp =3D 0xffff000083d30510 fp =3D 0xffff000083d30530 >=20 > get_pv_entry() at pmap_enter+0x694 > pc =3D 0xffff000000616184 lr =3D 0xffff0000006156a0 > sp =3D 0xffff000083d30540 fp =3D 0xffff000083d305d0 >=20 > pmap_enter() at vm_fault_hold+0x28c > pc =3D 0xffff0000006156a0 lr =3D 0xffff0000005b9740 > sp =3D 0xffff000083d305e0 fp =3D 0xffff000083d30730 >=20 > vm_fault_hold() at vm_fault+0x70 > pc =3D 0xffff0000005b9740 lr =3D 0xffff0000005b9464 > sp =3D 0xffff000083d30740 fp =3D 0xffff000083d30770 >=20 > vm_fault() at data_abort+0xe0 > pc =3D 0xffff0000005b9464 lr =3D 0xffff00000061ad94 > sp =3D 0xffff000083d30780 fp =3D 0xffff000083d30830 >=20 > data_abort() at handle_el0_sync+0x6c > pc =3D 0xffff00000061ad94 lr =3D 0xffff0000006079e8 > sp =3D 0xffff000083d30840 fp =3D 0xffff000083d30950 >=20 > handle_el0_sync() at 0x400a3de4 > pc =3D 0xffff0000006079e8 lr =3D 0x00000000400a3de4 > sp =3D 0xffff000083d30960 fp =3D 0x0000ffffbfdfcd30 >=20 > KDB: enter: panic > [ thread pid 48988 tid 100230 ] > Stopped at kdb_enter+0x44: undefined d4200000 >=20 =3D=3D=3D Mark Millard markmi at dsl-only.net