From owner-freebsd-current@freebsd.org Tue Dec 12 11:10:44 2017 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9B460E946C0 for ; Tue, 12 Dec 2017 11:10:44 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-210-169.reflexion.net [208.70.210.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 5C09876985 for ; Tue, 12 Dec 2017 11:10:43 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 2325 invoked from network); 12 Dec 2017 11:10:36 -0000 Received: from unknown (HELO mail-cs-02.app.dca.reflexion.local) (10.81.19.2) by 0 (rfx-qmail) with SMTP; 12 Dec 2017 11:10:36 -0000 Received: by mail-cs-02.app.dca.reflexion.local (Reflexion email security v8.40.3) with SMTP; Tue, 12 Dec 2017 06:10:36 -0500 (EST) Received: (qmail 11924 invoked from network); 12 Dec 2017 11:10:36 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 12 Dec 2017 11:10:36 -0000 Received: from [192.168.1.25] (c-76-115-7-162.hsd1.or.comcast.net [76.115.7.162]) by iron2.pdx.net (Postfix) with ESMTPSA id D0258EC8B7B; Tue, 12 Dec 2017 03:10:35 -0800 (PST) From: Mark Millard Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: A head kernel rpi2 boot-hang bisected: -r326346 good; -r326347 and later hangs-up during boot Message-Id: <32649011-1CFA-4215-BB37-00E4493882CD@dsl-only.net> Date: Tue, 12 Dec 2017 03:10:34 -0800 To: jeff@FreeBSD.org, Freebsd-arm , FreeBSD Hackers , FreeBSD Current X-Mailer: Apple Mail (2.3273) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Dec 2017 11:10:44 -0000 I initially jumped from -r326192 to -r326726 and ended up with a rpi2 that would normally hang somewhere around release APs being displayed. (I have had a couple of completed boots but many dozens of hung-up attempts.) Both a debug kernel and a non-debug kernel hang the same way. Bisecting the kernel (holding world -r326726 constant) showed: -r326346 did not hang (nor did before) -r326347 and later hung. Conclusion: -r326347 changes are involved in the boot hangups. Context: head -r326726 based (from before I did the bisect). My knowledge is limited so I may not have picked optimal material to get from the db> prompt. It appears that the messages about "random" occur during the hang (indefinite wait). As I remember, I have seen examples where the "Trying to mount" did not show up --but normally it did. And example from a hung boot: . . . Release APs Trying to mount root from ufs:/dev/ufs/RPI2rootfs [rw,noatime]... random: unblocking device. arc4random: no preloaded entropy cache arc4random: no preloaded entropy cache arc4random: no preloaded entropy cache The hang never seems to time out but just sits there, even for hours. It allows the ~^B sequence to get to the db> prompt. I've looked around a little a couple of times. One common point is that show allchains has everything listed as sleeping except: chain 32: thread 100001 (pid 1, kernel) blocked on sx "umadrain" XLOCK thread 100077 (pid 23, uma) is on a run queue chain 33: (Note: uma's thread number has varied, as has the one for [pagedaemon].) bd> ps pid ppid pgrp uid state wmesg wchan cmd 28 0 0 0 DL syncer 0xc095a31c [syncer] 27 0 0 0 DL vlruwt 0xd7417730 [vnlru] 26 0 0 0 DL psleep 0xc0959c00 [bufdaemon] 25 0 0 0 RL [bufspacedaemon] 24 0 0 0 DL psleep 0xc095e8f8 [vmdaemon] 23 0 0 0 RL (threaded) [pagedaemon] 100065 RunQ [pagedaemon] 100076 D launds 0xc095e804 [laundry: dom0] 100077 RunQ [uma] . . . 1 0 0 0 DL umadrai 0xc095e528 [kernel] 0 0 0 0 RLs (threaded) [kernel] 100000 D swapin 0xc09673c8 [swapper] . . . 100072 D - 0xd75b7e80 [if_io_tqg_1] 100073 RunQ [if_io_tqg_2] 100074 D - 0xd75b7d80 [if_io_tqg_3] 100075 D - 0xd75b7d00 [if_config_tqg_0] 100078 D - 0xd83dc100 [softirq_0] 100079 D - 0xd75b7c00 [softirq_1] 100080 RunQ [softirq_2] 100081 D - 0xd75b7b00 [softirq_3] (Which if_io_tqg_ and softirq_ pair has RunQ varies.) All RunQ's are shown above. One or two [idle: CPU]'s have state CanRun and the other [idle: CPU]'s have state RUN. They are the only items with those states. Example from the same hangup: 10 0 0 0 RL (threaded) [idle] 100002 Run CPU 0 [idle: cpu0] 100003 Run CPU 1 [idle: cpu1] 100004 CanRun [idle: cpu2] 100005 CanRun [idle: cpu3] db> show lock 0xc095e528 class: sx name: umadrain state: XLOCK: 0xd6cbe740 (tid 100077, pid 23, "uma") waiters: shared db> show thread 100001 Thread 100001 at 0xd40a7000: proc (pid 1): 0xd40a3000 name: kernel stack: 0xd40ac000-0xd40adfff flags: 0x4 pflags: 0x20000000 state: INHIBITED: {SLEEPING} wmesg: umadrain wchan: 0xc095e528 sleeptimo 0. 0 (curr 51d. = 5eac6a0400000000) priority: 84 container lock: sleepq chain (0xc0957244) last voluntary switch: 1297717 ms ago last involuntary switch: 1297809 ms ago db> show thread 100077 Thread 100077 at 0xd6cbe740: proc (pid 23): 0xd6cab000 name: uma stack: 0xd83ca000-0xd83cbfff flags: 0x4 pflags: 0x200000 state: RUNQ priority: 84 container lock: sched lock 2 (0xc0952640) last voluntary switch: 1297815 ms ago last involuntary switch: 1297815 ms ago db> show thread 100073 Thread 100073 at 0xd7406740: proc (pid 0): 0xc09673c8 name: if_io_tqg_2 stack: 0xd742a000-0xd742bfff flags: 0x4 pflags: 0x200000 state: RUNQ priority: 24 container lock: sched lock 2 (0xc0952640) last voluntary switch: 1297818 ms ago db> show thread 100080 Thread 100080 at 0xd7431ae0: proc (pid 0): 0xc09673c8 name: softirq_2 stack: 0xd83f5000-0xd83f6fff flags: 0x4 pflags: 0x200000 state: RUNQ priority: 24 container lock: sched lock 2 (0xc0952640) last voluntary switch: 1297816 ms ago db> show lock 0xc0952640 class: spin mutex name: sched lock 2 flags: {SPIN, RECURSE} state: {UNOWNED} db> show lock 0xc0957244 class: spin mutex name: sleepq chain flags: {SPIN, RECURSE} state: {UNOWNED} db> show thread 100065 Thread 100065 at 0xd6cbb000: proc (pid 23): 0xd6cab000 name: pagedaemon stack: 0xd7403000-0xd7404fff flags: 0x14 pflags: 0x20200000 state: RUNQ priority: 84 container lock: sched lock 1 (0xc0951f80) last voluntary switch: 1029 ms ago last involuntary switch: 28606 ms ago db> show lock 0xc0951f80 class: spin mutex name: sched lock 1 flags: {SPIN, RECURSE} state: {UNOWNED} =3D=3D=3D Mark Millard markmi at dsl-only.net