From owner-freebsd-arm@FreeBSD.ORG  Fri Sep 13 03:31:18 2013
Return-Path: <owner-freebsd-arm@FreeBSD.ORG>
Delivered-To: freebsd-arm@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTP id 9F634F50;
 Fri, 13 Sep 2013 03:31:18 +0000 (UTC)
 (envelope-from kientzle@freebsd.org)
Received: from monday.kientzle.com (99-115-135-74.uvs.sntcca.sbcglobal.net
 [99.115.135.74])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 6DC6D22B7;
 Fri, 13 Sep 2013 03:31:17 +0000 (UTC)
Received: (from root@localhost)
 by monday.kientzle.com (8.14.4/8.14.4) id r8D3VAmj049351;
 Fri, 13 Sep 2013 03:31:10 GMT (envelope-from kientzle@freebsd.org)
Received: from [192.168.2.123] (CiscoE3000 [192.168.1.65])
 by kientzle.com with SMTP id nhb4cxwh58tb68anfrtsaupaxn;
 Fri, 13 Sep 2013 03:31:10 +0000 (UTC)
 (envelope-from kientzle@freebsd.org)
Content-Type: text/plain; charset=windows-1251
Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\))
Subject: Re: Panic mounting root on BeagleBone Black
From: Tim Kientzle <kientzle@freebsd.org>
In-Reply-To: <1378997738.1111.631.camel@revolution.hippie.lan>
Date: Thu, 12 Sep 2013 20:31:09 -0700
Content-Transfer-Encoding: quoted-printable
Message-Id: <07BF0EC3-D6A0-4CFD-9D79-EE83862A81A9@freebsd.org>
References: <47E403AE-01A2-4AC8-8028-41F0298FAC3E@freebsd.org>
 <1378997738.1111.631.camel@revolution.hippie.lan>
To: Ian Lepore <ian@freebsd.org>
X-Mailer: Apple Mail (2.1508)
Cc: "freebsd-arm@freebsd.org" <freebsd-arm@freebsd.org>
X-BeenThere: freebsd-arm@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Porting FreeBSD to the StrongARM Processor <freebsd-arm.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arm>,
 <mailto:freebsd-arm-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arm>
List-Post: <mailto:freebsd-arm@freebsd.org>
List-Help: <mailto:freebsd-arm-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arm>,
 <mailto:freebsd-arm-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 13 Sep 2013 03:31:18 -0000


On Sep 12, 2013, at 7:55 AM, Ian Lepore <ian@FreeBSD.org> wrote:

> On Wed, 2013-09-11 at 06:43 -0700, Tim Kientzle wrote:
>> Just built a new image for BBB from SVN r255438.
>>=20
>> At the second boot, I got this:
>> =10=10
>> Mounting local file systems:.
>> mmcsd0: Error indicated: 1 Timeout
>> g_vfs_done():mmcsd0s2a[READ(offset=3D2016903168, length=3D4096)]error =
=3D 5
>> vnode_pager_getpages: I/O read error
>> vm_fault: pager read error, pid 126 (ps)
>> mmcsd0: Error indicated: 1 Timeout
>> g_vfs_done():mmcsd0s2a[READ(offset=3D131072, length=3D32768)]error =3D =
5
>> sdhci_ti0-slot0: Got data interrupt 0x00000010, but there is no =
active command.
>> sdhci_ti0-slot0: =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D REGISTER =
DUMP =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>> sdhci_ti0-slot0: Sys addr: 0x00000000 | Version:  0x00003101
>> sdhci_ti0-slot0: Blk size: 0x00000200 | Blk cnt:  0x00000010
>> sdhci_ti0-slot0: Argument: 0x0024679e | Trn mode: 0x0000193a
>> sdhci_ti0-slot0: Present:  0x01f70000 | Host ctl: 0x00000006
>> sdhci_ti0-slot0: Power:    0x0000000d | Blk gap:  0x00000000
>> sdhci_ti0-slot0: Wake-up:  0x00000000 | Clock:    0x00000007
>> sdhci_ti0-slot0: Timeout:  0x0000000d | Int stat: 0x00000000
>> sdhci_ti0-slot0: Int enab: 0x017f00fb | Sig enab: 0x017f00fb
>> sdhci_ti0-slot0: AC12 err: 0x00000000 | Slot int: 0x00000000
>> sdhci_ti0-slot0: Caps:     0x06e10080 | Max curr: 0x00000000
>> sdhci_ti0-slot0: =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=

>>=20
>> =85. few more similar messages, then =85.
>>=20
>> mmcsd0: Error indicated: 1 Timeout
>> g_vfs_done():mmcsd0s2a[WRITE(offset=3D20808192, length=3D512)]error =3D=
 5
>> g_vfs_done():mmcsd0s2a[WRITE(offset=3D1276346368, length=3D24576)]error=
 =3D 5
>> panic: brelse: inappropriate B_PAGING or B_CLUSTER bp 0xcd148778
>> [bt snipped]
>>=20
>=20
> This was a single occurance, right?  Like you're not dead in the water
> or anything?

Here's the scenario:
  * New image built.
  * Booted.
  * Had power pulled almost immediately upon attaining multi-user (I got =
confused)
  * Rebooted.
  * Hit the above on mount root at the second boot.
  * Rebooted again and everything seems fine.

> There's insanity in that info... the register dump shows a multi-block
> write (8kbytes) was set up, but the command that timed out was a read.
> If a prior write had timed out why isn't there a g_vfs_done() error
> logged for it?

I may have over-trimmed.  There were several of these timeout
errors reported all at once.

> I think what we really need is some better error recovery in the mmc =
and
> sd layers.  Retrying a failed IO is cheap and easy.  More complex
> recovery is possible too (power cycling and re-intializing the card
> and/or controller).  But that has its own difficulties -- what if the
> nature of the problem was that the user swapped cards? -- you don't =
want
> to retry a write under those conditions.

I seem to recall seeing something in the AM335x TRM about
an indicator for card removal.  Does that help?

Tim