Date: Thu, 7 Jan 2016 14:16:16 -0800 From: Mark Millard <markmi@dsl-only.net> To: Hans Petter Selasky <hps@selasky.org>, Ian Lepore <ian@freebsd.org> Cc: freebsd-arm <freebsd-arm@freebsd.org> Subject: Re: FYI: various 11.0-CURRENT -r293227 (and older) hangs on arm (rpi2): a description of sorts Message-ID: <D44C4EF3-0976-45E7-944A-A8F23D3D89BF@dsl-only.net> In-Reply-To: <B7E8D0FD-B3A9-40DF-B0ED-9D3041F8B2A2@dsl-only.net> References: <E0379BE9-308A-4219-A8AE-A5FFE828BA93@dsl-only.net> <1452183170.1215.4.camel@freebsd.org> <FB0D5486-AD27-44A7-86CA-68989AE08EC7@dsl-only.net> <1452196099.1215.12.camel@freebsd.org> <568EC4D8.7010106@selasky.org> <8B728C93-9C90-4821-A607-5D157F028812@dsl-only.net> <568ED810.8010309@selasky.org> <568ED92C.9070602@selasky.org> <B7E8D0FD-B3A9-40DF-B0ED-9D3041F8B2A2@dsl-only.net>
next in thread | previous in thread | raw e-mail | index | archive | help
I'm top posting this change of information about the hang status seen = via gstat: After a long time the gstat -cod is showing a non-zero value in one = place: L(q) for md0 is showing 4 now. (I've no clue when it changed. I do not expect that I missed the 4 = before.) md0 is for the file-system based page file. That file is on the SSD, not = the sdcard. =3D=3D=3D Mark Millard markmi at dsl-only.net On 2016-Jan-7, at 2:04 PM, Mark Millard <markmi@dsl-only.net> wrote: >=20 > On 2016-Jan-7, at 1:31 PM, Hans Petter Selasky <hps at selasky.org> = wrote: >>=20 >> On 01/07/16 22:26, Hans Petter Selasky wrote: >>> On 01/07/16 21:20, Mark Millard wrote: >>>>=20 >>>> On 2016-Jan-7, at 12:04 PM, Hans Petter Selasky <hps at = selasky.org> >>>> wrote: >>>>>=20 >>>>> On 01/07/16 20:48, Ian Lepore wrote: >>>>>> If the filesystems and swap space are on a usb drive, then maybe = it's >>>>>> the usb subsystem that's hanging. The wait states you showed for = those >>>>>> processes are consistant with what I've seen when all buffers get >>>>>> backed up in a queue on one non-responsive or slow device. It = may be >>>>>> that there's a way to get the system deadlocked when it's low on >>>>>> buffers and there is memory pressure causing the swap to be used = (I >>>>>> generally run arms systems without any swap configured). >>>>>>=20 >>>>>> Running gstat in another window while this is going on may give = you >>>>>> some insight into the situation. Beyond that I don't know what = to look >>>>>> at, especially since you generally can't launch any new tools = once the >>>>>> system gets into this kind of state. >>>>>>=20 >>>>>> -- Ian >>>>>=20 >>>>> Hi, >>>>>=20 >>>>> All USB transfers towards disk devices have timeouts, so if = something >>>>> is hanging at USB level, you'll get a printout eventually. >>>>=20 >>>> What sort of timescale after deadlock/live-lock is observed to >>>> apparently have started does one have to wait in order to conclude >>>> that the timeouts would have happened and so they do not apply to = the >>>> deadlock/live-lock? >>>>=20 >>>>> The USB kernel processes needed for doing I/O transfers are not >>>>> pinned to RAM. Can it happen if a USB process is swapped to disk, >>>>> that the system cannot wakeup a swapped out process to get more = swap? >>>>>=20 >>>>> --HPS >>>>=20 >>>=20 >>> Hi, >>>=20 >>>> Wow. Could I use ddb to somehow check on the "USB kernel processes" >>>> swap status when the overall context is deadlocked/live-locked? >>>=20 >>> Are you able to run something like: >>>=20 >>> ps auxwwH | grep usb >>>=20 >>>> If yes, how? Otherwise something in top or some such display that = I'd >>> left running over the serial console would have to present useful >>> information on the subject. Is there anything that would? >>>=20 >>=20 >> Are you able to SSH into the box or ping it? >>=20 >> --HPS >=20 > Once the live-lock condition is reached no new processes can be = created as far as I can tell: the attempt will hang any process that = attempts the creation. >=20 > I'd need "ps auxwwH" to be internally repeating to even get that much: = I'd have to start it before the live-lock happened and it would have to = be still running when the hang occurs, no on-going process creations = involved. >=20 > I'm not so sure that two communicating processes (ps and grep over a = pipe) would work but I can not get to even one new process so far. >=20 > ssh sessions also hang, input and output stop for them fairly = generally. (Sometimes the context is such that ^t still works but shows = no progress in what it reports.) No new ssh connections are possible: = "Operation timed out". >=20 > ping does respond normally: it is more of a live-lock status then a = true deadlock one overall. >=20 > The serial console still outputs what it was already running if that = process does nothing that locks up. Changing what it is doing generally = locks it up too. >=20 > Doing something like unplugging a usb keyboard or mouse or plugging = one in does show the expected messages via the console: it is more of a = live-lock status then a true deadlock one overall. >=20 > I can get to ddb after the hang. But I do not know what I'd do with it = to find any useful information. >=20 >=20 > As noted in another message: I used gstat instead of top on the serial = console: >=20 >> gstat shows everything zero during a hang, even L(q) column. (Length = of queue?) >>=20 >> I used: >>=20 >> gstat -cod >>=20 >> and had it running over the serial console port during the attempted = portmaster activity. >=20 >=20 =3D=3D=3D Mark Millard markmi at dsl-only.net
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?D44C4EF3-0976-45E7-944A-A8F23D3D89BF>