Date: Tue, 12 Dec 2017 03:10:34 -0800 From: Mark Millard <markmi@dsl-only.net> To: jeff@FreeBSD.org, Freebsd-arm <freebsd-arm@freebsd.org>, FreeBSD Hackers <freebsd-hackers@freebsd.org>, FreeBSD Current <freebsd-current@freebsd.org> Subject: A head kernel rpi2 boot-hang bisected: -r326346 good; -r326347 and later hangs-up during boot Message-ID: <32649011-1CFA-4215-BB37-00E4493882CD@dsl-only.net>
next in thread | raw e-mail | index | archive | help
I initially jumped from -r326192 to -r326726 and ended up with a rpi2 that would normally hang somewhere around release APs being displayed. (I have had a couple of completed boots but many dozens of hung-up attempts.) Both a debug kernel and a non-debug kernel hang the same way. Bisecting the kernel (holding world -r326726 constant) showed: -r326346 did not hang (nor did before) -r326347 and later hung. Conclusion: -r326347 changes are involved in the boot hangups. Context: head -r326726 based (from before I did the bisect). My knowledge is limited so I may not have picked optimal material to get from the db> prompt. It appears that the messages about "random" occur during the hang (indefinite wait). As I remember, I have seen examples where the "Trying to mount" did not show up --but normally it did. And example from a hung boot: . . . Release APs Trying to mount root from ufs:/dev/ufs/RPI2rootfs [rw,noatime]... random: unblocking device. arc4random: no preloaded entropy cache arc4random: no preloaded entropy cache arc4random: no preloaded entropy cache The hang never seems to time out but just sits there, even for hours. It allows the <CR>~^B sequence to get to the db> prompt. I've looked around a little a couple of times. One common point is that show allchains has everything listed as sleeping except: chain 32: thread 100001 (pid 1, kernel) blocked on sx "umadrain" XLOCK thread 100077 (pid 23, uma) is on a run queue chain 33: (Note: uma's thread number has varied, as has the one for [pagedaemon].) bd> ps pid ppid pgrp uid state wmesg wchan cmd 28 0 0 0 DL syncer 0xc095a31c [syncer] 27 0 0 0 DL vlruwt 0xd7417730 [vnlru] 26 0 0 0 DL psleep 0xc0959c00 [bufdaemon] 25 0 0 0 RL [bufspacedaemon] 24 0 0 0 DL psleep 0xc095e8f8 [vmdaemon] 23 0 0 0 RL (threaded) [pagedaemon] 100065 RunQ [pagedaemon] 100076 D launds 0xc095e804 [laundry: dom0] 100077 RunQ [uma] . . . 1 0 0 0 DL umadrai 0xc095e528 [kernel] 0 0 0 0 RLs (threaded) [kernel] 100000 D swapin 0xc09673c8 [swapper] . . . 100072 D - 0xd75b7e80 [if_io_tqg_1] 100073 RunQ [if_io_tqg_2] 100074 D - 0xd75b7d80 [if_io_tqg_3] 100075 D - 0xd75b7d00 [if_config_tqg_0] 100078 D - 0xd83dc100 [softirq_0] 100079 D - 0xd75b7c00 [softirq_1] 100080 RunQ [softirq_2] 100081 D - 0xd75b7b00 [softirq_3] (Which if_io_tqg_<?> and softirq_<?> pair has RunQ varies.) All RunQ's are shown above. One or two [idle: CPU<?>]'s have state CanRun and the other [idle: CPU<?>]'s have state RUN. They are the only items with those states. Example from the same hangup: 10 0 0 0 RL (threaded) [idle] 100002 Run CPU 0 [idle: cpu0] 100003 Run CPU 1 [idle: cpu1] 100004 CanRun [idle: cpu2] 100005 CanRun [idle: cpu3] db> show lock 0xc095e528 class: sx name: umadrain state: XLOCK: 0xd6cbe740 (tid 100077, pid 23, "uma") waiters: shared db> show thread 100001 Thread 100001 at 0xd40a7000: proc (pid 1): 0xd40a3000 name: kernel stack: 0xd40ac000-0xd40adfff flags: 0x4 pflags: 0x20000000 state: INHIBITED: {SLEEPING} wmesg: umadrain wchan: 0xc095e528 sleeptimo 0. 0 (curr 51d. = 5eac6a0400000000) priority: 84 container lock: sleepq chain (0xc0957244) last voluntary switch: 1297717 ms ago last involuntary switch: 1297809 ms ago db> show thread 100077 Thread 100077 at 0xd6cbe740: proc (pid 23): 0xd6cab000 name: uma stack: 0xd83ca000-0xd83cbfff flags: 0x4 pflags: 0x200000 state: RUNQ priority: 84 container lock: sched lock 2 (0xc0952640) last voluntary switch: 1297815 ms ago last involuntary switch: 1297815 ms ago db> show thread 100073 Thread 100073 at 0xd7406740: proc (pid 0): 0xc09673c8 name: if_io_tqg_2 stack: 0xd742a000-0xd742bfff flags: 0x4 pflags: 0x200000 state: RUNQ priority: 24 container lock: sched lock 2 (0xc0952640) last voluntary switch: 1297818 ms ago db> show thread 100080 Thread 100080 at 0xd7431ae0: proc (pid 0): 0xc09673c8 name: softirq_2 stack: 0xd83f5000-0xd83f6fff flags: 0x4 pflags: 0x200000 state: RUNQ priority: 24 container lock: sched lock 2 (0xc0952640) last voluntary switch: 1297816 ms ago db> show lock 0xc0952640 class: spin mutex name: sched lock 2 flags: {SPIN, RECURSE} state: {UNOWNED} db> show lock 0xc0957244 class: spin mutex name: sleepq chain flags: {SPIN, RECURSE} state: {UNOWNED} db> show thread 100065 Thread 100065 at 0xd6cbb000: proc (pid 23): 0xd6cab000 name: pagedaemon stack: 0xd7403000-0xd7404fff flags: 0x14 pflags: 0x20200000 state: RUNQ priority: 84 container lock: sched lock 1 (0xc0951f80) last voluntary switch: 1029 ms ago last involuntary switch: 28606 ms ago db> show lock 0xc0951f80 class: spin mutex name: sched lock 1 flags: {SPIN, RECURSE} state: {UNOWNED} =3D=3D=3D Mark Millard markmi at dsl-only.net
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?32649011-1CFA-4215-BB37-00E4493882CD>