From owner-freebsd-arm@FreeBSD.ORG  Thu Sep 12 15:53:40 2013
Return-Path: <owner-freebsd-arm@FreeBSD.ORG>
Delivered-To: freebsd-arm@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTP id 6506A3DD;
 Thu, 12 Sep 2013 15:53:40 +0000 (UTC) (envelope-from ian@FreeBSD.org)
Received: from mho-02-ewr.mailhop.org (mho-02-ewr.mailhop.org [204.13.248.72])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 265792C62;
 Thu, 12 Sep 2013 15:53:40 +0000 (UTC)
Received: from c-24-8-230-52.hsd1.co.comcast.net ([24.8.230.52]
 helo=damnhippie.dyndns.org)
 by mho-02-ewr.mailhop.org with esmtpsa (TLSv1:AES256-SHA:256)
 (Exim 4.72) (envelope-from <ian@FreeBSD.org>)
 id 1VK9D9-000IKI-3E; Thu, 12 Sep 2013 15:53:39 +0000
Received: from [172.22.42.240] (revolution.hippie.lan [172.22.42.240])
 by damnhippie.dyndns.org (8.14.3/8.14.3) with ESMTP id r8CFraix008520;
 Thu, 12 Sep 2013 09:53:36 -0600 (MDT) (envelope-from ian@FreeBSD.org)
X-Mail-Handler: Dyn Standard SMTP by Dyn
X-Originating-IP: 24.8.230.52
X-Report-Abuse-To: abuse@dyndns.com (see
 http://www.dyndns.com/services/sendlabs/outbound_abuse.html for abuse
 reporting information)
X-MHO-User: U2FsdGVkX1/4VqXqhh5iB19di4GrDIfi
Subject: Re: Panic mounting root on BeagleBone Black
From: Ian Lepore <ian@FreeBSD.org>
To: Warner Losh <imp@bsdimp.com>
In-Reply-To: <F85C2A12-21DC-41C6-9037-15AFD0B1AD7E@bsdimp.com>
References: <47E403AE-01A2-4AC8-8028-41F0298FAC3E@freebsd.org>
 <1378997738.1111.631.camel@revolution.hippie.lan>
 <F85C2A12-21DC-41C6-9037-15AFD0B1AD7E@bsdimp.com>
Content-Type: text/plain; charset="windows-1251"
Date: Thu, 12 Sep 2013 09:53:36 -0600
Message-ID: <1379001216.1111.633.camel@revolution.hippie.lan>
Mime-Version: 1.0
X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port 
Content-Transfer-Encoding: quoted-printable
X-MIME-Autoconverted: from 8bit to quoted-printable by damnhippie.dyndns.org
 id r8CFraix008520
Cc: "freebsd-arm@freebsd.org" <freebsd-arm@FreeBSD.org>
X-BeenThere: freebsd-arm@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Porting FreeBSD to the StrongARM Processor <freebsd-arm.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arm>,
 <mailto:freebsd-arm-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arm>
List-Post: <mailto:freebsd-arm@freebsd.org>
List-Help: <mailto:freebsd-arm-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arm>,
 <mailto:freebsd-arm-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 12 Sep 2013 15:53:40 -0000

On Thu, 2013-09-12 at 09:44 -0600, Warner Losh wrote:
> On Sep 12, 2013, at 8:55 AM, Ian Lepore wrote:
>=20
> > On Wed, 2013-09-11 at 06:43 -0700, Tim Kientzle wrote:
> >> Just built a new image for BBB from SVN r255438.
> >>=20
> >> At the second boot, I got this:
> >> =10=10
> >> Mounting local file systems:.
> >> mmcsd0: Error indicated: 1 Timeout
> >> g_vfs_done():mmcsd0s2a[READ(offset=3D2016903168, length=3D4096)]erro=
r =3D 5
> >> vnode_pager_getpages: I/O read error
> >> vm_fault: pager read error, pid 126 (ps)
> >> mmcsd0: Error indicated: 1 Timeout
> >> g_vfs_done():mmcsd0s2a[READ(offset=3D131072, length=3D32768)]error =3D=
 5
> >> sdhci_ti0-slot0: Got data interrupt 0x00000010, but there is no acti=
ve command.
> >> sdhci_ti0-slot0: =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D REGISTER=
 DUMP =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> >> sdhci_ti0-slot0: Sys addr: 0x00000000 | Version:  0x00003101
> >> sdhci_ti0-slot0: Blk size: 0x00000200 | Blk cnt:  0x00000010
> >> sdhci_ti0-slot0: Argument: 0x0024679e | Trn mode: 0x0000193a
> >> sdhci_ti0-slot0: Present:  0x01f70000 | Host ctl: 0x00000006
> >> sdhci_ti0-slot0: Power:    0x0000000d | Blk gap:  0x00000000
> >> sdhci_ti0-slot0: Wake-up:  0x00000000 | Clock:    0x00000007
> >> sdhci_ti0-slot0: Timeout:  0x0000000d | Int stat: 0x00000000
> >> sdhci_ti0-slot0: Int enab: 0x017f00fb | Sig enab: 0x017f00fb
> >> sdhci_ti0-slot0: AC12 err: 0x00000000 | Slot int: 0x00000000
> >> sdhci_ti0-slot0: Caps:     0x06e10080 | Max curr: 0x00000000
> >> sdhci_ti0-slot0: =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D
> >>=20
> >> =85. few more similar messages, then =85.
> >>=20
> >> mmcsd0: Error indicated: 1 Timeout
> >> g_vfs_done():mmcsd0s2a[WRITE(offset=3D20808192, length=3D512)]error =
=3D 5
> >> g_vfs_done():mmcsd0s2a[WRITE(offset=3D1276346368, length=3D24576)]er=
ror =3D 5
> >> panic: brelse: inappropriate B_PAGING or B_CLUSTER bp 0xcd148778
> >> [bt snipped]
> >>=20
> >=20
> > This was a single occurance, right?  Like you're not dead in the wate=
r
> > or anything?
> >=20
> > There's insanity in that info... the register dump shows a multi-bloc=
k
> > write (8kbytes) was set up, but the command that timed out was a read.
> > If a prior write had timed out why isn't there a g_vfs_done() error
> > logged for it?
> >=20
> > I think what we really need is some better error recovery in the mmc =
and
> > sd layers.  Retrying a failed IO is cheap and easy.  More complex
> > recovery is possible too (power cycling and re-intializing the card
> > and/or controller).  But that has its own difficulties -- what if the
> > nature of the problem was that the user swapped cards? -- you don't w=
ant
> > to retry a write under those conditions.
>=20
> I'd disagree with this...  Retrying often is the wrong thing to do. If =
the write didn't work the first time, why would it work the second? Looks=
 like a programming bug here in controlling the sdhci controller since we=
 got errors, then we got an interrupt with no pending commands. This sugg=
ests that our timeout isn't quite right...
>=20
> Warner
>=20

Retrying too often or endlessly is wrong, but IMO so is not retrying at
all, especially when the standard specifies error recovery strategies.

-- Ian