From owner-freebsd-arm@freebsd.org Fri Jan 8 05:49:56 2016 Return-Path: Delivered-To: freebsd-arm@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 82DC9A6786A for ; Fri, 8 Jan 2016 05:49:56 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-210-3.reflexion.net [208.70.210.3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 45AE1158B for ; Fri, 8 Jan 2016 05:49:55 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 26565 invoked from network); 8 Jan 2016 05:49:53 -0000 Received: from unknown (HELO rtc-sm-01.app.dca.reflexion.local) (10.81.150.1) by 0 (rfx-qmail) with SMTP; 8 Jan 2016 05:49:53 -0000 Received: by rtc-sm-01.app.dca.reflexion.local (Reflexion email security v7.80.0) with SMTP; Fri, 08 Jan 2016 00:49:56 -0500 (EST) Received: (qmail 23405 invoked from network); 8 Jan 2016 05:49:56 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with SMTP; 8 Jan 2016 05:49:56 -0000 X-No-Relay: not in my network X-No-Relay: not in my network X-No-Relay: not in my network X-No-Relay: not in my network Received: from [192.168.1.8] (c-76-115-7-162.hsd1.or.comcast.net [76.115.7.162]) by iron2.pdx.net (Postfix) with ESMTPSA id 551521C43BC; Thu, 7 Jan 2016 21:49:47 -0800 (PST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: FYI: various 11.0-CURRENT -r293227 (and older) hangs on arm (rpi2): a description of sorts From: Mark Millard In-Reply-To: <97E0840E-987C-4893-9E63-EA51741CFC75@dsl-only.net> Date: Thu, 7 Jan 2016 21:49:52 -0800 Cc: freebsd-arm Content-Transfer-Encoding: quoted-printable Message-Id: <5D7C239F-6D43-4C7A-B7F2-88E7488418B2@dsl-only.net> References: <1452183170.1215.4.camel@freebsd.org> <1452196099.1215.12.camel@freebsd.org> <568EC4D8.7010106@selasky.org> <8B728C93-9C90-4821-A607-5D157F028812@dsl-only.net> <568ED810.8010309@selasky.org> <568ED92C.9070602@selasky.org> <97E0840E-987C-4893-9E63-EA51741CFC75@dsl-only.net> To: Hans Petter Selasky , Ian Lepore , Warner Losh X-Mailer: Apple Mail (2.2104) X-BeenThere: freebsd-arm@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: "Porting FreeBSD to ARM processors." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Jan 2016 05:49:56 -0000 Top post of a major conclusion: I have isolated a working vs. failing context for the hangs issue: (Note that everything for world and ports and such is on the SSD root = partition in my examples.) A) Using a swap file on the root partition as the swap space leads to = hangs when the space gets sufficient activity vs. B) Using a swap partition as the swap space works without hangs It is the same SSD as before both ways. (I had to dump, repartition, = restore since I'd not provided space for a swap partition earlier.) A swap partition on the sdcard as the swap space also works. So it appears there is a problem with using swapfiles --at least when = they are on otherwise sometimes-also-busy file systems but possibly more = generally. (As the SSD has a USB SSD interface to the RPI2, swapfiles do = not provide trim support any more than swap partitions would.) The SSD is likely noticeably faster in various respects and so may be = more of a challenge for swapfile handling in some way (via extra = file-system/IO/resource load with less time between various activities), = at least on rpi2's. I now have for the SSD context: $ df -m Filesystem 1M-blocks Used Avail Capacity Mounted on /dev/ufs/RPI2rootfs 440365 11179 393957 3% / devfs 0 0 0 100% /dev /dev/mmcsd0s1 49 7 42 15% /boot/msdos $ swapinfo Device 1K-blocks Used Avail Capacity /dev/gpt/RPI2swap 3282756 905876 2376880 28% =3D=3D=3D Mark Millard markmi at dsl-only.net On 2016-Jan-7, at 3:24 PM, Mark Millard wrote: >=20 >=20 > On 2016-Jan-7, at 2:28 PM, Warner Losh wrote: >>=20 >> 4 page requests shouldn't hang the whole system. That should be more = like hundreds or thousands depending on the tuning you've done. >>=20 >> Warner >>=20 >=20 > FYI: I do not remember doing any explicit tuning. Other than having a = SSD for the root file system (via fstab content) and using cortex-a7 = related compile options things are default with ssh and little else = enabled as I remember. I'm even currently running KERNCONF=3DRPI2 = instead of my RPI2-NODBG variant. >=20 > For my note about L(q)=3D=3D4 for md0: "SWAP/swap/md0" showed 0. The = only "name" showing a non-zero value was "md0" --and only for L(q). >=20 >=20 >=20 > It does look like the latest hang finally produced some messages: 3 = copies of >=20 > smsc0: warning: failed to create new mbuf >=20 > but these messages do not normally appear. >=20 >=20 >=20 >> On Thu, Jan 7, 2016 at 3:16 PM, Mark Millard = wrote: >> I'm top posting this change of information about the hang status seen = via gstat: >>=20 >> After a long time the gstat -cod is showing a non-zero value in one = place: >>=20 >> L(q) for md0 is showing 4 now. >>=20 >> (I've no clue when it changed. I do not expect that I missed the 4 = before.) >>=20 >> md0 is for the file-system based page file. That file is on the SSD, = not the sdcard. >>=20 >>=20 >> =3D=3D=3D >> Mark Millard >> markmi at dsl-only.net >>=20 >> On 2016-Jan-7, at 2:04 PM, Mark Millard wrote: >>=20 >>>=20 >>> On 2016-Jan-7, at 1:31 PM, Hans Petter Selasky = wrote: >>>>=20 >>>> On 01/07/16 22:26, Hans Petter Selasky wrote: >>>>> On 01/07/16 21:20, Mark Millard wrote: >>>>>>=20 >>>>>> On 2016-Jan-7, at 12:04 PM, Hans Petter Selasky >>>>>> wrote: >>>>>>>=20 >>>>>>> On 01/07/16 20:48, Ian Lepore wrote: >>>>>>>> If the filesystems and swap space are on a usb drive, then = maybe it's >>>>>>>> the usb subsystem that's hanging. The wait states you showed = for those >>>>>>>> processes are consistant with what I've seen when all buffers = get >>>>>>>> backed up in a queue on one non-responsive or slow device. It = may be >>>>>>>> that there's a way to get the system deadlocked when it's low = on >>>>>>>> buffers and there is memory pressure causing the swap to be = used (I >>>>>>>> generally run arms systems without any swap configured). >>>>>>>>=20 >>>>>>>> Running gstat in another window while this is going on may give = you >>>>>>>> some insight into the situation. Beyond that I don't know what = to look >>>>>>>> at, especially since you generally can't launch any new tools = once the >>>>>>>> system gets into this kind of state. >>>>>>>>=20 >>>>>>>> -- Ian >>>>>>>=20 >>>>>>> Hi, >>>>>>>=20 >>>>>>> All USB transfers towards disk devices have timeouts, so if = something >>>>>>> is hanging at USB level, you'll get a printout eventually. >>>>>>=20 >>>>>> What sort of timescale after deadlock/live-lock is observed to >>>>>> apparently have started does one have to wait in order to = conclude >>>>>> that the timeouts would have happened and so they do not apply to = the >>>>>> deadlock/live-lock? >>>>>>=20 >>>>>>> The USB kernel processes needed for doing I/O transfers are not >>>>>>> pinned to RAM. Can it happen if a USB process is swapped to = disk, >>>>>>> that the system cannot wakeup a swapped out process to get more = swap? >>>>>>>=20 >>>>>>> --HPS >>>>>>=20 >>>>>=20 >>>>> Hi, >>>>>=20 >>>>>> Wow. Could I use ddb to somehow check on the "USB kernel = processes" >>>>>> swap status when the overall context is deadlocked/live-locked? >>>>>=20 >>>>> Are you able to run something like: >>>>>=20 >>>>> ps auxwwH | grep usb >>>>>=20 >>>>>> If yes, how? Otherwise something in top or some such display that = I'd >>>>> left running over the serial console would have to present useful >>>>> information on the subject. Is there anything that would? >>>>>=20 >>>>=20 >>>> Are you able to SSH into the box or ping it? >>>>=20 >>>> --HPS >>>=20 >>> Once the live-lock condition is reached no new processes can be = created as far as I can tell: the attempt will hang any process that = attempts the creation. >>>=20 >>> I'd need "ps auxwwH" to be internally repeating to even get that = much: I'd have to start it before the live-lock happened and it would = have to be still running when the hang occurs, no on-going process = creations involved. >>>=20 >>> I'm not so sure that two communicating processes (ps and grep over a = pipe) would work but I can not get to even one new process so far. >>>=20 >>> ssh sessions also hang, input and output stop for them fairly = generally. (Sometimes the context is such that ^t still works but shows = no progress in what it reports.) No new ssh connections are possible: = "Operation timed out". >>>=20 >>> ping does respond normally: it is more of a live-lock status then a = true deadlock one overall. >>>=20 >>> The serial console still outputs what it was already running if that = process does nothing that locks up. Changing what it is doing generally = locks it up too. >>>=20 >>> Doing something like unplugging a usb keyboard or mouse or plugging = one in does show the expected messages via the console: it is more of a = live-lock status then a true deadlock one overall. >>>=20 >>> I can get to ddb after the hang. But I do not know what I'd do with = it to find any useful information. >>>=20 >>>=20 >>> As noted in another message: I used gstat instead of top on the = serial console: >>>=20 >>>> gstat shows everything zero during a hang, even L(q) column. = (Length of queue?) >>>>=20 >>>> I used: >>>>=20 >>>> gstat -cod >>>>=20 >>>> and had it running over the serial console port during the = attempted portmaster activity. >>>=20 >>>=20 >> =3D=3D=3D >> Mark Millard >> markmi at dsl-only.net >>=20 >>=20 >>=20 >>=20 >>=20 >> _______________________________________________ >> freebsd-arm@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-arm >> To unsubscribe, send any mail to = "freebsd-arm-unsubscribe@freebsd.org" >>=20 >=20 =3D=3D=3D Mark Millard markmi at dsl-only.net