Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 07 Jan 2016 12:48:19 -0700
From:      Ian Lepore <ian@freebsd.org>
To:        Mark Millard <markmi@dsl-only.net>
Cc:        freebsd-arm <freebsd-arm@freebsd.org>
Subject:   Re: FYI: various 11.0-CURRENT -r293227 (and older) hangs on arm (rpi2): a description of sorts
Message-ID:  <1452196099.1215.12.camel@freebsd.org>
In-Reply-To: <FB0D5486-AD27-44A7-86CA-68989AE08EC7@dsl-only.net>
References:  <E0379BE9-308A-4219-A8AE-A5FFE828BA93@dsl-only.net> <1452183170.1215.4.camel@freebsd.org> <FB0D5486-AD27-44A7-86CA-68989AE08EC7@dsl-only.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 2016-01-07 at 11:24 -0800, Mark Millard wrote:
> On 2016-Jan-7, at 8:12 AM, Ian Lepore <ian@freebsd.org> wrote:
> > 
> > On Thu, 2016-01-07 at 02:19 -0800, Mark Millard wrote:
> > > I've had various hangs when the rpi2 was busy over longish
> > > periods,
> > > both debug buildkernel/buildworld builds of the arm and non-debug
> > > variants. No log files or console messages produced.
> > > 
> > > I've not had any analogous issues with powerpc64 (PowerMac G5) or
> > > with amd64 (Virtual Box used on Mac OS X).
> > > 
> > > I've finally discovered that if I have, say, top running on the
> > > rpi2
> > > serial console that top continues to update its display so long
> > > as I
> > > leave it alone during the hang. (Otherwise it hangs too.) So I
> > > finally have a little window for seeing some of what is
> > > happening.
> > > 
> > > An example top display showed after the hang:
> > > 
> > > Mem: 764M Active 12M Inact 141M Wired 98M Buf 8k free
> > > Swap: 2048M Total 29M Used 2019 Free 1% in use
> > > 
> > > (Yep: Just 8K free Mem.)
> > > 
> > 
> > That's not a problem.
> > 
> > > The unusual STATEs for processes seemed to be (for the specific
> > > hang):
> > > 
> > > STATE   COMMANDs
> > > pfault  [ld] [ld] /usr/sbin/syslogd
> > > vmwait  [ld] [md0] [kernel]
> > > wswbuf  [pagedaemon]
> > > 
> > > Those same 3 states seem to always be involved. Some of the
> > > processes
> > > vary from one hang to the next: the prior hang had
> > > build/genautoma ,
> > > /usr/sbin/moused , and /usr/sbin/ntpd instead of 3 [ld]'s.
> > > 
> > > /usr/sbin/syslogd, [md0], [kernel], and [pagedaemon] and their
> > > states
> > > do not seem to vary (so far).
> > > 
> > > 
> > 
> > Everything is backed up waiting for slow sdcard IO.  You can get an
> > amd64 system with many cores and gigabytes of ram into the same
> > state
> > with an sdcard (or any other storage device that takes literally
> > seconds for any individual IO to complete).  All the available
> > buffers
> > get queued up to the one slow device, then you can't do anything
> > that
> > requires IO (even launch tools to try to figure out what's going
> > on).
> > 
> > -- Ian
> 
> This is not the (or a) sdcard for the root file system, it is a fast,
> 400GB+ SSD, USB 3.0 capable (not that rpi2 uses it that way). Note
> below the "da0" and the size and such (other than /boot/msdos):
> 
> ugen0.5: <Other World Computing> at usbus0
> umass0: <Other World Computing Envoy Pro, class 0/0, rev 2.10/1.00,
> addr 5> on usbus0
> umass0:  SCSI over Bulk-Only; quirks = 0x0100
> umass0:0:0: Attached to scbus0
> da0 at umass-sim0 bus 0 scbus0 target 0 lun 0
> da0: <ASMT 2105 0> Fixed Direct Access SPC-4 SCSI device
> da0: Serial Number XXXXXXXXXXXX
> Release APs
> da0: 40.000MB/s transfers
> da0: 457862MB (937703088 512 byte sectors)
> da0: quirks=0x2<NO_6_BYTE>
> Trying to mount root from ufs:/dev/ufs/RPI2rootfs [rw,noatime]...
> . . .
> Starting file system checks:
> /dev/ufs/RPI2rootfs: FILE SYSTEM CLEAN; SKIPPING CHECKS
> /dev/ufs/RPI2rootfs: clean, 109711666 free (14002 frags, 13712208
> blocks, 0.0% fragmentation)
> Mounting local file systems:.
> . . .
> 
> > Filesystem          1M-blocks  Used  Avail Capacity  Mounted on
> > /dev/ufs/RPI2rootfs    443473 16791 391203     4%    /
> > devfs                       0     0      0   100%    /dev
> > /dev/mmcsd0s1              49     7     42    15%    /boot/msdos
> 
> 
> In USB 3.0 contexts I have never observed seconds for an IO for these
> types of SSDs and I use them that way extensively. Nor for USB 2.0
> uses, though that is not as common of a context for me. Nor have I
> had any problems with the type of USB 3.0 capable hub messing up IO.
> 
> I use this type of SSD to hold my Virtual Box virtual machine(s) that
> I run amd64 FreeBSD in on Mac OS X. No problems there. But it is true
> that I've never directly booted amd64 FreeBSD from one of these SSDs
> in a non-virtual amd64 context.
> 
> Ignoring that for a moment, so this is an acceptable/expected FreeBSD
> behavior when a "disk" device is slow? Interesting. I've let it sit
> for hours and the hangup does not clear: it is effectively deadlocked
> for overall usage. The rpi2 never will be able to buildworld,
> buildkernel, ports, etc. reliably if this is the sort of behavior
> that results.
> 
> Back to this context: I there a way for me to confirm the queuing of
> buffers to the SSD? Or at least some detail about its buffer usage?
> Can I get some information from ddb that would confirm/deny/provide
> insight?
> 
> 

If the filesystems and swap space are on a usb drive, then maybe it's
the usb subsystem that's hanging.  The wait states you showed for those
processes are consistant with what I've seen when all buffers get
backed up in a queue on one non-responsive or slow device.  It may be
that there's a way to get the system deadlocked when it's low on
buffers and there is memory pressure causing the swap to be used (I
generally run arms systems without any swap configured).

Running gstat in another window while this is going on may give you
some insight into the situation.  Beyond that I don't know what to look
at, especially since you generally can't launch any new tools once the
system gets into this kind of state.

-- Ian




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1452196099.1215.12.camel>