From owner-freebsd-current@freebsd.org Tue Nov 8 06:42:11 2016 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8D8E3C3648F for ; Tue, 8 Nov 2016 06:42:11 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-210-54.reflexion.net [208.70.210.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 53EC7825 for ; Tue, 8 Nov 2016 06:42:10 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 16485 invoked from network); 8 Nov 2016 06:43:07 -0000 Received: from unknown (HELO rtc-sm-01.app.dca.reflexion.local) (10.81.150.1) by 0 (rfx-qmail) with SMTP; 8 Nov 2016 06:43:07 -0000 Received: by rtc-sm-01.app.dca.reflexion.local (Reflexion email security v8.10.1) with SMTP; Tue, 08 Nov 2016 01:42:13 -0500 (EST) Received: (qmail 21188 invoked from network); 8 Nov 2016 06:42:13 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 8 Nov 2016 06:42:13 -0000 Received: from [192.168.1.106] (c-76-115-7-162.hsd1.or.comcast.net [76.115.7.162]) by iron2.pdx.net (Postfix) with ESMTPSA id A2C15EC8AB5; Mon, 7 Nov 2016 22:42:08 -0800 (PST) From: Mark Millard Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 10.1 \(3251\)) Subject: Re: FYI: a Pine64+ 2G aarch64 head -r308247 crash and some information about it Date: Mon, 7 Nov 2016 22:42:08 -0800 References: To: freebsd-arm , FreeBSD Current In-Reply-To: Message-Id: <7561E656-3EFC-4338-A419-C35BBA40D07A@dsl-only.net> X-Mailer: Apple Mail (2.3251) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Nov 2016 06:42:11 -0000 On 2016-Nov-7, at 8:54 PM, Mark Millard wrote: > This is just in case any of the information happens to prove = useful/interesting. > I'm not expecting any assistance. >=20 > Note: After the crash ddb was not responding to input so this is all = that I have. >=20 > Note: This was an experiment with head -r308247 but was built like = stable for > performance issues (GENERIC included and then overridden, not = GENERIC-UP > based). >=20 > The below was found via dmesg and/or /var/log/messages content while = the Pine64 > was busy building lang/gcc6 and its prerequisites (as a stability = test). >=20 > It got lots of spurious interrupt notices per second, such as: >=20 >> gic0: Spurious interrupt detected: last irq: 27 on CPU0 >> gic0: Spurious interrupt detected: last irq: 27 on CPU2 >> gic0: Spurious interrupt detected: last irq: 106 on CPU3 >> gic0: Spurious interrupt detected: last irq: 27 on CPU1 >> gic0: Spurious interrupt detected: last irq: 27 on CPU1 >> gic0: gic0: Spurious interrupt detected: last irq: 27 on CPU1 >> Spurious interrupt detected: last irq: 27 on CPU3 >> gic0: gic0: Spurious interrupt detected: last irq: 27 on CPU0 >> Spurious interrupt detected: last irq: 27 on CPU1 >> gic0: Spurious interrupt detected: last irq: 27 on CPU1 >> gic0: Spurious interrupt detected: last irq: 27 on CPU2 >> gic0: Spurious interrupt detected: last irq: 27 on CPU3 >=20 > 27 happened the most by far. 106 was fairly rare. I'd not noticed any = other > figures. =46rom what I saw all were "gic0". >=20 > sh had a few signal 11's and one signal 4 as of when I had last = checked: >=20 >> pid 13900 (sh), uid 0: exited on signal 11 (core dumped) >> pid 19325 (sh), uid 0: exited on signal 11 (core dumped) >> pid 49697 (sh), uid 0: exited on signal 11 (core dumped) >> pid 68390 (sh), uid 0: exited on signal 4 (core dumped) >> pid 81149 (sh), uid 0: exited on signal 11 (core dumped) >=20 > I did not notice any other core dumps. >=20 > And the following happened once that I noticed: >=20 >> (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 16 a3 a4 80 00 00 40 00=20= >> (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an = error >> (da0:umass-sim0:0:0:0): Retrying command >=20 > The root filesystem was on a USB SSD. >=20 > The above was all from a ssh session history. The below is from the = serial > console. . . >=20 > Later it got the crash: >=20 >> panic: ARM64TODO: reclaim_pv_chunk >> cpuid =3D 2 >> KDB: stack backtrace: >> db_trace_self() at db_trace_self_wrapper+0x28 >> pc =3D 0xffff00000068b430 lr =3D 0xffff0000000631dc >> sp =3D 0xffff000083758080 fp =3D 0xffff000083758290 >>=20 >> db_trace_self_wrapper() at vpanic+0x170 >> pc =3D 0xffff0000000631dc lr =3D 0xffff000000335f10 >> sp =3D 0xffff0000837582a0 fp =3D 0xffff000083758320 >>=20 >> vpanic() at panic+0x4c >> pc =3D 0xffff000000335f10 lr =3D 0xffff000000335d9c >> sp =3D 0xffff000083758330 fp =3D 0xffff0000837583b0 >>=20 >> panic() at reclaim_pv_chunk+0x10 >> pc =3D 0xffff000000335d9c lr =3D 0xffff0000006a13d4 >> sp =3D 0xffff0000837583c0 fp =3D 0xffff0000837583c0 >>=20 >> reclaim_pv_chunk() at get_pv_entry+0x2bc >> pc =3D 0xffff0000006a13d4 lr =3D 0xffff00000069c270 >> sp =3D 0xffff0000837583d0 fp =3D 0xffff000083758400 >>=20 >> get_pv_entry() at pmap_enter+0x68c >> pc =3D 0xffff00000069c270 lr =3D 0xffff00000069b41c >> sp =3D 0xffff000083758410 fp =3D 0xffff0000837584b0 >>=20 >> pmap_enter() at vm_fault_hold+0x2f0 >> pc =3D 0xffff00000069b41c lr =3D 0xffff000000641eb8 >> sp =3D 0xffff0000837584c0 fp =3D 0xffff000083758600 >>=20 >> vm_fault_hold() at vm_fault_quick_hold_pages+0x120 >> pc =3D 0xffff000000641eb8 lr =3D 0xffff000000645004 >> sp =3D 0xffff000083758610 fp =3D 0xffff000083758670 >>=20 >> vm_fault_quick_hold_pages() at vn_io_fault1+0x250 >> pc =3D 0xffff000000645004 lr =3D 0xffff00000042b788 >> sp =3D 0xffff000083758680 fp =3D 0xffff0000837587c0 >>=20 >> vn_io_fault1() at vn_io_fault+0x170 >> pc =3D 0xffff00000042b788 lr =3D 0xffff0000004297a4 >> sp =3D 0xffff0000837587d0 fp =3D 0xffff000083758840 >>=20 >> vn_io_fault() at dofilewrite+0xbc >> pc =3D 0xffff0000004297a4 lr =3D 0xffff0000003a35e4 >> sp =3D 0xffff000083758850 fp =3D 0xffff000083758890 >>=20 >> dofilewrite() at kern_writev+0x6c >> pc =3D 0xffff0000003a35e4 lr =3D 0xffff0000003a32d4 >> sp =3D 0xffff0000837588a0 fp =3D 0xffff0000837588e0 >>=20 >> kern_writev() at sys_write+0x7c >> pc =3D 0xffff0000003a32d4 lr =3D 0xffff0000003a3258 >> sp =3D 0xffff0000837588f0 fp =3D 0xffff000083758930 >>=20 >> sys_write() at do_el0_sync+0x6fc >> pc =3D 0xffff0000003a3258 lr =3D 0xffff0000006a2778 >> sp =3D 0xffff000083758940 fp =3D 0xffff000083758a70 >>=20 >> do_el0_sync() at handle_el0_sync+0x64 >> pc =3D 0xffff0000006a2778 lr =3D 0xffff00000068d1d0 >> sp =3D 0xffff000083758a80 fp =3D 0xffff000083758b90 >>=20 >> handle_el0_sync() at 0x696ff0 >> pc =3D 0xffff00000068d1d0 lr =3D 0x0000000000696ff0 >> sp =3D 0xffff000083758ba0 fp =3D 0x0000ffffffffc610 >>=20 >> KDB: enter: panic >> [ thread pid 850 tid 100149 ] >> Stopped at kdb_enter+0x40: undefined d4200000 >=20 >=20 > Side notes: >=20 > To do the lang/gcc6 test I adjusted lang/gcc6/Makefile to (hopefully) = allow > an aarch64 build attempt: >=20 >> # svnlite diff /usr/ports/lang/gcc6 >> Index: /usr/ports/lang/gcc6/Makefile >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> --- /usr/ports/lang/gcc6/Makefile (revision 424540) >> +++ /usr/ports/lang/gcc6/Makefile (working copy) >> @@ -35,7 +35,7 @@ >> DISTVERSION=3D ${PORTVERSION:C/([0-9]+).*\.s([0-9]+)/\1-\2/} >> GCC_VERSION=3D ${PORTVERSION:C/(.+)\.s[0-9]{8}/\1/} >> SUFFIX=3D ${PORTVERSION:C/([0-9]+).*/\1/} >> -ONLY_FOR_ARCHS=3D amd64 i386 powerpc powerpc64 sparc64 arm armv6 = armv6hf >> +ONLY_FOR_ARCHS=3D amd64 i386 powerpc powerpc64 sparc64 arm armv6 = armv6hf aarch64 >> USES=3D compiler cpe gmake iconv libtool makeinfo perl5 = tar:bzip2 >> USE_BINUTILS=3D yes >> USE_PERL5=3D build >> @@ -73,7 +73,7 @@ >> USE_GCC=3D 4.9+ >> .endif >>=20 >> -.if ${ARCH} =3D=3D "armv6" || ${ARCH} =3D=3D "armv6hf" >> +.if ${ARCH} =3D=3D "armv6" || ${ARCH} =3D=3D "armv6hf" || ${ARCH} =3D=3D= "aarch64" >> . if ${COMPILER_TYPE} =3D=3D clang >> . if empty(PORT_OPTIONS:MBOOTSTRAP) >> MAKE_ARGS+=3DCXXFLAGS=3D-fbracket-depth=3D512 >=20 > It did not make it to lang/gcc6 itself but did build several things. >=20 > (I did not care if lang/gcc6 really worked or not: Environment = stability > test.) >=20 > =3D=3D=3D > Mark Millard > markmi at dsl-only.net There was fsck activity after cutting and applying power but luckily the = SSD content (largely) survived. The compressed message log shows the following as an example of when the = "gic0" Spurious interrupts start: > Nov 7 18:54:28 pine64 kernel: ugen3.2: at usbus3 > Nov 7 18:54:28 pine64 kernel: umass0 on uhub3 > Nov 7 18:54:28 pine64 kernel: umass0: on usbus3 > Nov 7 18:54:28 pine64 kernel: umass0: SCSI over Bulk-Only; quirks =3D = 0x0100 > Nov 7 18:54:28 pine64 kernel: umass0:0:0: Attached to scbus0 > Nov 7 18:54:28 pine64 kernel: mountroot: waiting for device = /dev/ufs/PINE642Grootfs... > Nov 7 18:54:28 pine64 kernel: da0 at umass-sim0 bus 0 scbus0 target 0 = lun 0 > Nov 7 18:54:28 pine64 kernel: da0: Fixed = Direct Access SPC-4 SCSI device > Nov 7 18:54:28 pine64 kernel: da0: Serial Number > Nov 7 18:54:28 pine64 kernel: da0: 40.000MB/s transfers > Nov 7 18:54:28 pine64 kernel: da0: 228936MB (468862128 512 byte = sectors) > Nov 7 18:54:28 pine64 kernel: da0: quirks=3D0x2 > Nov 7 18:54:28 pine64 kernel: awg0: link state changed to DOWN > Nov 7 18:54:28 pine64 kernel: gic0: Spurious interrupt detected: last = irq: 27 on CPU2 > Nov 7 18:54:28 pine64 kernel: awg0: link state changed to UP > Nov 7 18:54:28 pine64 kernel: gic0: Spurious interrupt detected: last = irq: 27 on CPU3 > Nov 7 18:54:31 pine64 kernel: . > Nov 7 18:54:31 pine64 kernel: gic0: Spurious interrupt detected: last = irq: 27 on CPU2 > Nov 7 18:54:31 pine64 ntpd[602]: ntpd 4.2.8p8-a (1): Starting > Nov 7 18:54:31 pine64 ntpd[603]: leapsecond file = ('/var/db/ntpd.leap-seconds.list'): good hash signature > Nov 7 18:54:31 pine64 ntpd[603]: leapsecond file = ('/var/db/ntpd.leap-seconds.list'): loaded, expire=3D2017-06-01T00:00:00Z = last=3D2017-01-01T00:00:00Z ofs=3D37 > Nov 7 19:36:17 pine64 kernel: gic0: gic0: Spurious interrupt = detected: last irq: 27 on CPU1 > Nov 7 19:36:17 pine64 kernel: Spurious interrupt detected: last irq: = 106 on CPU3 > Nov 7 19:36:17 pine64 kernel: gic0: Spurious interrupt detected: last = irq: 27 on CPU1 > Nov 7 19:36:17 pine64 kernel: gic0: Spurious interrupt detected: last = irq: 27 on CPU2 > Nov 7 19:36:17 pine64 kernel: gic0: Spurious interrupt detected: last = irq: 27 on CPU0 > Nov 7 19:36:17 pine64 last message repeated 2 times > Nov 7 19:36:17 pine64 kernel: gic0: gic0: Spurious interrupt = detected: last irq: 27 on CPU0 > Nov 7 19:36:17 pine64 kernel: Spurious interrupt detected: last irq: = 27 on CPU3 Always shortly after awg0's link state changed to down after the da0 = information. In looking at the logs I've seen a 32 and some 114's and 92's as well as = the 27's and 106's. There were a few brief periods over which non-27's were fairly frequent. There are a few of the CAM status message blocks, about 7 over = everything recorded, including a few prior boots of the Pine64. An interesting point is that somewhat before and while the lang/gcc6 = related build activity was active the "gic0" notices stopped: > Nov 8 02:44:40 pine64 kernel: gic0: Spurious interrupt detected: last = irq: 27 on CPU3 > Nov 8 03:01:15 pine64 kernel: pid 13900 (sh), uid 0: exited on signal = 11 (core dumped) > Nov 8 03:01:39 pine64 kernel: pid 19325 (sh), uid 0: exited on signal = 11 (core dumped) > Nov 8 03:02:02 pine64 pkg-static: indexinfo-0.2.5 installed > Nov 8 03:02:04 pine64 pkg-static: gettext-runtime-0.19.8.1 installed > Nov 8 03:04:42 pine64 kernel: pid 49697 (sh), uid 0: exited on signal = 11 (core dumped) > Nov 8 03:05:26 pine64 kernel: pid 68390 (sh), uid 0: exited on signal = 4 (core dumped) > Nov 8 03:06:10 pine64 kernel: pid 81149 (sh), uid 0: exited on signal = 11 (core dumped) > Nov 8 03:08:36 pine64 pkg-static: gettext-tools-0.19.8.1 installed > Nov 8 03:09:07 pine64 pkg-static: gmake-4.2.1_1 installed > Nov 8 03:27:01 pine64 pkg-static: perl5-5.24.1.r4 installed > Nov 8 03:27:33 pine64 pkg-static: p5-Locale-gettext-1.06 installed > Nov 8 03:27:42 pine64 pkg-static: help2man-1.43.3_1 installed > Nov 8 03:30:22 pine64 pkg-static: texinfo-6.1.20160425,1 installed > Nov 8 03:31:59 pine64 pkg-static: m4-1.4.17_1,1 installed > Nov 8 03:34:04 pine64 pkg-static: bison-2.7.1,1 installed > Nov 8 03:37:44 pine64 pkg-static: gmp-5.1.3_3 installed > Nov 8 03:39:14 pine64 pkg-static: mpfr-3.1.5 installed > Nov 8 03:44:53 pine64 kernel: (da0:umass-sim0:0:0:0): WRITE(10). CDB: = 2a 00 16 bd 19 c0 00 00 80 00=20 > Nov 8 03:44:53 pine64 kernel: (da0:umass-sim0:0:0:0): CAM status: CCB = request completed with an error > Nov 8 03:44:53 pine64 kernel: (da0:umass-sim0:0:0:0): Retrying = command > Nov 8 03:44:55 pine64 kernel: gic0: Spurious interrupt detected: last = irq: 106 on CPU3 > Nov 8 03:44:55 pine64 kernel: gic0: Spurious interrupt detected: last = irq: 106 on CPU3 Somewhat after the mpfr installation above the "gic0" notices return. devel/binutils started to build but did not complete. =3D=3D=3D Mark Millard markmi at dsl-only.net