From owner-freebsd-hackers@freebsd.org Wed Dec 13 01:46:03 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3CFF2E8CB7F for ; Wed, 13 Dec 2017 01:46:03 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-210-133.reflexion.net [208.70.210.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id EEC1E7CAA0 for ; Wed, 13 Dec 2017 01:46:02 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 31472 invoked from network); 13 Dec 2017 01:46:00 -0000 Received: from unknown (HELO mail-cs-02.app.dca.reflexion.local) (10.81.19.2) by 0 (rfx-qmail) with SMTP; 13 Dec 2017 01:46:00 -0000 Received: by mail-cs-02.app.dca.reflexion.local (Reflexion email security v8.40.3) with SMTP; Tue, 12 Dec 2017 20:46:00 -0500 (EST) Received: (qmail 1238 invoked from network); 13 Dec 2017 01:46:00 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 13 Dec 2017 01:46:00 -0000 Received: from [192.168.1.25] (c-76-115-7-162.hsd1.or.comcast.net [76.115.7.162]) by iron2.pdx.net (Postfix) with ESMTPSA id CD280EC86EF; Tue, 12 Dec 2017 17:45:59 -0800 (PST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: A head kernel rpi2 boot-hang bisected: -r326346 good; -r326347 and later hangs-up during boot [backtraces added] From: Mark Millard In-Reply-To: Date: Tue, 12 Dec 2017 17:45:59 -0800 Cc: FreeBSD Hackers , Freebsd-arm , FreeBSD Current Content-Transfer-Encoding: quoted-printable Message-Id: <21626363-B6F9-44CE-82C2-91BF0A9F5E4F@dsl-only.net> References: <32649011-1CFA-4215-BB37-00E4493882CD@dsl-only.net> <4b2420e8899.5992ee2b@mail.schwarzes.net> <241F43AF-23B2-4C54-9B77-A5A7CE3F8E57@dsl-only.net> To: jeff@FreeBSD.org X-Mailer: Apple Mail (2.3273) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 Dec 2017 01:46:03 -0000 [Just adding back traces. ffs_mount use of uma_zcreate is involved in the kernel thread, as is uma_reclaim_worker's use of uma_reclaim_locked in the uma thread.] On 2017-Dec-12, at 4:20 PM, Mark Millard wrote: > On 2017-Dec-12, at 3:19 PM, Mark Millard = wrote: >=20 >> On 2017-Dec-12, at 2:02 PM, Andreas Schwarz wrote: >>=20 >>> On 12.12.17, Mark Millard wrote: >>>=20 >>>> I initially jumped from -r326192 to -r326726 and >>>> ended up with a rpi2 that would normally hang >>>> somewhere around release APs being displayed. >>>> (I have had a couple of completed boots but many >>>> dozens of hung-up attempts.) Both a debug kernel >>>> and a non-debug kernel hang the same way. >>>>=20 >>>> Bisecting the kernel (holding world -r326726 >>>> constant) showed: >>>>=20 >>>> -r326346 did not hang (nor did before) >>>> -r326347 and later hung. >>>=20 >>> JFYI, the latest kernel (and world) running at one of my=20 >>> RPI2-B is r326631, without any issues. >>=20 >> Interesting. (By the way: My context >> is with a V1.1 Cortex-A7 based rpi2, >> not V1.2 and Cortex-A53.) >>=20 >> I've almost always run the root file >> system being on a USB SSD instead of >> on mmcsd0 . I wonder if that is >> somehow involved since it may be >> unusual. UFS file system. >>=20 >> The USB SSD is on a powered hub that >> is in turn plugged into the rpi2. >>=20 >> [I had the hang problem before the >> following and after.] >>=20 >> The mechanism for holding mmcsd0 in >> failed recently but the ejection >> mechanism still works. So I hold >> in mmcsd0 until after I get a USB >> SSD boot now. (Interrupt boot, unload, >> boot/autoboot, picks up the kernel >> from the USB SSD.) >>=20 >> This means that I effectively can >> not avoid the USB SSD any more >> unless I get my hands on a different >> V1.1 rpi2. >=20 > Looks like I'll get my hands on a different > rpi2 V1.1 in a few days. So I should then > be able to do reasonable mmcsd0-only > experiments. At least once I find the time. >=20 > FYI, in case boot details are involved > in reproducing the problem. . . >=20 > On the mmcsd0 I have /boot/loader.conf with: >=20 > kern.cam.boot_delay=3D"10000" > vfs.mountroot.timeout=3D"10" >=20 > and /etc/fstab with: >=20 > /dev/ufs/RPI2rootfs / ufs rw,noatime 1 1 > /dev/label/RPI2Aswap none swap sw 0 0 > /dev/label/RPI2Aboot /boot/msdos msdosfs rw,noatime 0 0 >=20 > where the /dev/ufs/RPI2rootfs was the USB SSD. >=20 > However, I interrupt the loader and unload and > then boot or autoboot. (But the hangs started > before this extra sequence was involved.) >=20 > On the USB SSD I have /boot/loader.conf with: >=20 > kern.cam.boot_delay=3D"10000" > vfs.mountroot.timeout=3D"10" >=20 > and /etc/fstab with: >=20 > /dev/da0p1 / ufs rw,noatime 1 1 > /dev/da0p2 none swap sw 0 0 >=20 >=20 > What db> showed does point to things that > -r326347 involve: >=20 > chain 32: > thread 100001 (pid 1, kernel) blocked on sx "umadrain" XLOCK > thread 100077 (pid 23, uma) is on a run queue >=20 > But for all I know -r326347 could be depending > on something required to be true but not correct > elsewhere in the rpi2 support. I'm not claiming > that -r326347 is wrong, just that it is involved. > I've way to little knowledge to claim to know > what is wrong on the evidence that I have. >=20 > I've not yet tried a bpi-m3 Cortex-A7 context or > a pine64+ 2GB or rpi3 Cortex-A53 context. Nor > powerpc64 nor powerpc. At some point I'll get the > time for one or more of these. I've not had amd64 > problems in this area. >=20 > I may not be able to test the bpi-m3: its barrel > connector for power seems flaky and it is > difficult to keep the board powered for long > periods in recent times. For this new example -r326347 kernel based boot-hang: chain 35: thread 100001 (pid 1, kernel) blocked on sx "umadrain" XLOCK thread 100073 (pid 23, uma) is on a run queue db> bt Tracing pid 1 tid 100001 td 0xd429f000 cpu_switch() at cpu_switch+0x18 pc =3D 0xc0584f80 lr =3D 0xc0299ad8 (sched_switch+0x5c0) sp =3D 0xd42a56b8 fp =3D 0xd42a56e8 sched_switch() at sched_switch+0x5c0 pc =3D 0xc0299ad8 lr =3D 0xc0275f28 (mi_switch+0x258) sp =3D 0xd42a56f0 fp =3D 0xd42a5718 r4 =3D 0x0001f0f5 r5 =3D 0x00000000 r6 =3D 0xd429f000 r7 =3D 0x0006612d r8 =3D 0x00000000 r9 =3D 0x00000104 r10 =3D 0xc08073f0 mi_switch() at mi_switch+0x258 pc =3D 0xc0275f28 lr =3D 0xc02c1508 (sleepq_switch+0x1c0) sp =3D 0xd42a5720 fp =3D 0xd42a5748 r4 =3D 0xd429f000 r5 =3D 0xc0947dc4 r6 =3D 0xc09a9c68 r7 =3D 0xc0947dc0 r8 =3D 0x00000000 r9 =3D 0xc09440c0 r10 =3D 0x000000f4 sleepq_switch() at sleepq_switch+0x1c0 pc =3D 0xc02c1508 lr =3D 0xc02c1308 (sleepq_wait+0x48) sp =3D 0xd42a5750 fp =3D 0xd42a5760 r4 =3D 0xd429f000 r5 =3D 0x00000000 r6 =3D 0xc09a9c68 r7 =3D 0xc06b415c r8 =3D 0x00000001 r9 =3D 0xc09a9c78 r10 =3D 0x00000000 sleepq_wait() at sleepq_wait+0x48 pc =3D 0xc02c1308 lr =3D 0xc02748b0 (_sx_slock_hard+0x298) sp =3D 0xd42a5768 fp =3D 0xd42a57c8 r4 =3D 0xc09a9c68 r5 =3D 0xc0823ac0 r6 =3D 0xc06ae146 r7 =3D 0x00000000 _sx_slock_hard() at _sx_slock_hard+0x298 pc =3D 0xc02748b0 lr =3D 0xc027457c (_sx_slock_int+0x140) sp =3D 0xd42a57d0 fp =3D 0xd42a57f8 r4 =3D 0x00000078 r5 =3D 0x00000765 r6 =3D 0xc09a9c68 r7 =3D 0x00000765 r8 =3D 0xc09a9c78 r9 =3D 0xd7ad8740 r10 =3D 0x00000000 _sx_slock_int() at _sx_slock_int+0x140 pc =3D 0xc027457c lr =3D 0xc052b024 (uma_zcreate+0x10c) sp =3D 0xd42a5800 fp =3D 0xd42a5850 r4 =3D 0x00000078 r5 =3D 0xc09a9c68 r6 =3D 0xc06df86e r7 =3D 0xc06dd153 r8 =3D 0x00000000 r9 =3D 0x00000000 r10 =3D 0x00000000 uma_zcreate() at uma_zcreate+0x10c pc =3D 0xc052b024 lr =3D 0xc050cc90 (ffs_mount+0x80) sp =3D 0xd42a5858 fp =3D 0xd42a5980 r4 =3D 0xd7844000 r5 =3D 0x00000000 r6 =3D 0x00000003 r7 =3D 0xc09a99c0 r8 =3D 0x00000000 r9 =3D 0xd429f000 r10 =3D 0xd828c400 ffs_mount() at ffs_mount+0x80 pc =3D 0xc050cc90 lr =3D 0xc032e1d4 (vfs_donmount+0xeec) sp =3D 0xd42a5988 fp =3D 0xd42a5b38 r4 =3D 0xffffffff r5 =3D 0xd74ec120 r6 =3D 0xd429f000 r7 =3D 0x00000000 r8 =3D 0x00000000 r9 =3D 0xc425acd0 r10 =3D 0xd828c400 vfs_donmount() at vfs_donmount+0xeec pc =3D 0xc032e1d4 lr =3D 0xc0330ea8 (kernel_mount+0x70) sp =3D 0xd42a5b40 fp =3D 0xd42a5b78 r4 =3D 0xd82a9000 r5 =3D 0x00000000 r6 =3D 0x00004000 r7 =3D 0x00000000 r8 =3D 0xd87e2050 r9 =3D 0xd87e2040 r10 =3D 0x00000000 kernel_mount() at kernel_mount+0x70 pc =3D 0xc0330ea8 lr =3D 0xc03335dc (parse_mount+0x458) sp =3D 0xd42a5b80 fp =3D 0xd42a5c60 r4 =3D 0xd4263500 r5 =3D 0xd87e2054 r6 =3D 0xd82a9000 r7 =3D 0x00000000 parse_mount() at parse_mount+0x458 pc =3D 0xc03335dc lr =3D 0xc0331a44 ($a.2+0x28) sp =3D 0xd42a5c68 fp =3D 0xd42a5dc8 r4 =3D 0xd7845018 r5 =3D 0x00000000 r6 =3D 0xd7845018 r7 =3D 0xd78404da r8 =3D 0xc06bfbd4 r9 =3D 0xfffffff7 r10 =3D 0xd7836380 $a.2() at $a.2+0x28 pc =3D 0xc0331a44 lr =3D 0xc020a82c (start_init+0x5c) sp =3D 0xd42a5dd0 fp =3D 0xd42a5e20 r4 =3D 0xc06a370c r5 =3D 0xc0823ad0 r6 =3D 0x00000000 r7 =3D 0x00000000 r8 =3D 0xd42a5e48 r9 =3D 0x00000000 r10 =3D 0xd429b000 start_init() at start_init+0x5c pc =3D 0xc020a82c lr =3D 0xc0230f10 (fork_exit+0xa0) sp =3D 0xd42a5e28 fp =3D 0xd42a5e40 r4 =3D 0xd429f000 r5 =3D 0xd429b000 r6 =3D 0xc020a7d0 r7 =3D 0x00000000 r8 =3D 0xd42a5e48 r9 =3D 0x00000000 r10 =3D 0x00000000 fork_exit() at fork_exit+0xa0 pc =3D 0xc0230f10 lr =3D 0xc0564c8c (swi_exit) sp =3D 0xd42a5e48 fp =3D 0x00000000 r4 =3D 0xc020a7d0 r5 =3D 0x00000000 r6 =3D 0x00000000 r7 =3D 0x00000000 r8 =3D 0x00000000 r10 =3D 0x00000000 swi_exit() at swi_exit pc =3D 0xc0564c8c lr =3D 0xc0564c8c (swi_exit) sp =3D 0xd42a5e48 fp =3D 0x00000000 db> bt Tracing pid 23 tid 100073 td 0xd7ad8740 cpu_switch() at cpu_switch+0x18 pc =3D 0xc0584f80 lr =3D 0xc0299ad8 (sched_switch+0x5c0) sp =3D 0xd87dad58 fp =3D 0xd87dad88 sched_switch() at sched_switch+0x5c0 pc =3D 0xc0299ad8 lr =3D 0xc0275f28 (mi_switch+0x258) sp =3D 0xd87dad90 fp =3D 0xd87dadb8 r4 =3D 0x00000025 r5 =3D 0x00000000 r6 =3D 0xd7ad8740 r7 =3D 0x000240ea r8 =3D 0x00000000 r9 =3D 0x00000100 r10 =3D 0xc08073f0 mi_switch() at mi_switch+0x258 pc =3D 0xc0275f28 lr =3D 0xc052d9f8 (uma_reclaim_locked+0x200) sp =3D 0xd87dadc0 fp =3D 0xd87dade8 r4 =3D 0x00000003 r5 =3D 0xc08073f0 r6 =3D 0xc08073f0 r7 =3D 0x00000000 r8 =3D 0x00000000 r9 =3D 0xc0824310 r10 =3D 0xc06df86e uma_reclaim_locked() at uma_reclaim_locked+0x200 pc =3D 0xc052d9f8 lr =3D 0xc052de70 (uma_reclaim_worker+0x4c) sp =3D 0xd87dadf0 fp =3D 0xd87dae20 r4 =3D 0xc09a9c68 r5 =3D 0xc4242d80 r6 =3D 0xc06df86e r7 =3D 0x00000000 r8 =3D 0x00000100 r9 =3D 0xc09b3d08 r10 =3D 0xc09a9c7c uma_reclaim_worker() at uma_reclaim_worker+0x4c pc =3D 0xc052de70 lr =3D 0xc0230f10 (fork_exit+0xa0) sp =3D 0xd87dae28 fp =3D 0xd87dae40 r4 =3D 0xd7ad8740 r5 =3D 0xd70ce000 r6 =3D 0xc052de24 r7 =3D 0x00000000 r8 =3D 0xd87dae48 r9 =3D 0xd7ada3a0 r10 =3D 0xc0824e8c fork_exit() at fork_exit+0xa0 pc =3D 0xc0230f10 lr =3D 0xc0564c8c (swi_exit) sp =3D 0xd87dae48 fp =3D 0x00000000 r4 =3D 0xc052de24 r5 =3D 0x00000000 r6 =3D 0x7ff6d83f r7 =3D 0xd6f583a0 r8 =3D 0xc0824dcc r10 =3D 0xc0824e8c swi_exit() at swi_exit pc =3D 0xc0564c8c lr =3D 0xc0564c8c (swi_exit) sp =3D 0xd87dae48 fp =3D 0x00000000 db> show thread 100001 Thread 100001 at 0xd429f000: proc (pid 1): 0xd429b000 name: kernel stack: 0xd42a4000-0xd42a5fff flags: 0x4 pflags: 0x20000000 state: INHIBITED: {SLEEPING} wmesg: umadrain wchan: 0xc09a9c68 sleeptimo 0. 0 (curr 26. = e3fd787f00000000) priority: 84 container lock: sleepq chain (0xc0947dc4) last voluntary switch: 26523 ms ago last involuntary switch: 26695 ms ago db> show thread 100073 Thread 100073 at 0xd7ad8740: proc (pid 23): 0xd70ce000 name: uma stack: 0xd87d9000-0xd87dafff flags: 0x4 pflags: 0x200000 state: RUNQ priority: 84 container lock: sched lock 3 (0xc0942ec0) last voluntary switch: 26694 ms ago db> show lock 0xc0942ec0 class: spin mutex name: sched lock 3 flags: {SPIN, RECURSE} state: {UNOWNED} =3D=3D=3D Mark Millard markmi at dsl-only.net