Date: Thu, 12 Sep 2013 10:05:12 -0600 From: Warner Losh <imp@bsdimp.com> To: Ian Lepore <ian@FreeBSD.org> Cc: "freebsd-arm@freebsd.org" <freebsd-arm@FreeBSD.org> Subject: Re: Panic mounting root on BeagleBone Black Message-ID: <01C5A0CC-A0FC-4635-8370-EAFDC8E8A854@bsdimp.com> In-Reply-To: <1379001216.1111.633.camel@revolution.hippie.lan> References: <47E403AE-01A2-4AC8-8028-41F0298FAC3E@freebsd.org> <1378997738.1111.631.camel@revolution.hippie.lan> <F85C2A12-21DC-41C6-9037-15AFD0B1AD7E@bsdimp.com> <1379001216.1111.633.camel@revolution.hippie.lan>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sep 12, 2013, at 9:53 AM, Ian Lepore wrote: > On Thu, 2013-09-12 at 09:44 -0600, Warner Losh wrote: >> On Sep 12, 2013, at 8:55 AM, Ian Lepore wrote: >>=20 >>> On Wed, 2013-09-11 at 06:43 -0700, Tim Kientzle wrote: >>>> Just built a new image for BBB from SVN r255438. >>>>=20 >>>> At the second boot, I got this: >>>> =10=10 >>>> Mounting local file systems:. >>>> mmcsd0: Error indicated: 1 Timeout >>>> g_vfs_done():mmcsd0s2a[READ(offset=3D2016903168, length=3D4096)]error= =3D 5 >>>> vnode_pager_getpages: I/O read error >>>> vm_fault: pager read error, pid 126 (ps) >>>> mmcsd0: Error indicated: 1 Timeout >>>> g_vfs_done():mmcsd0s2a[READ(offset=3D131072, length=3D32768)]error = =3D 5 >>>> sdhci_ti0-slot0: Got data interrupt 0x00000010, but there is no = active command. >>>> sdhci_ti0-slot0: =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D = REGISTER DUMP =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>>> sdhci_ti0-slot0: Sys addr: 0x00000000 | Version: 0x00003101 >>>> sdhci_ti0-slot0: Blk size: 0x00000200 | Blk cnt: 0x00000010 >>>> sdhci_ti0-slot0: Argument: 0x0024679e | Trn mode: 0x0000193a >>>> sdhci_ti0-slot0: Present: 0x01f70000 | Host ctl: 0x00000006 >>>> sdhci_ti0-slot0: Power: 0x0000000d | Blk gap: 0x00000000 >>>> sdhci_ti0-slot0: Wake-up: 0x00000000 | Clock: 0x00000007 >>>> sdhci_ti0-slot0: Timeout: 0x0000000d | Int stat: 0x00000000 >>>> sdhci_ti0-slot0: Int enab: 0x017f00fb | Sig enab: 0x017f00fb >>>> sdhci_ti0-slot0: AC12 err: 0x00000000 | Slot int: 0x00000000 >>>> sdhci_ti0-slot0: Caps: 0x06e10080 | Max curr: 0x00000000 >>>> sdhci_ti0-slot0: =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= >>>>=20 >>>> =85. few more similar messages, then =85. >>>>=20 >>>> mmcsd0: Error indicated: 1 Timeout >>>> g_vfs_done():mmcsd0s2a[WRITE(offset=3D20808192, length=3D512)]error = =3D 5 >>>> g_vfs_done():mmcsd0s2a[WRITE(offset=3D1276346368, = length=3D24576)]error =3D 5 >>>> panic: brelse: inappropriate B_PAGING or B_CLUSTER bp 0xcd148778 >>>> [bt snipped] >>>>=20 >>>=20 >>> This was a single occurance, right? Like you're not dead in the = water >>> or anything? >>>=20 >>> There's insanity in that info... the register dump shows a = multi-block >>> write (8kbytes) was set up, but the command that timed out was a = read. >>> If a prior write had timed out why isn't there a g_vfs_done() error >>> logged for it? >>>=20 >>> I think what we really need is some better error recovery in the mmc = and >>> sd layers. Retrying a failed IO is cheap and easy. More complex >>> recovery is possible too (power cycling and re-intializing the card >>> and/or controller). But that has its own difficulties -- what if = the >>> nature of the problem was that the user swapped cards? -- you don't = want >>> to retry a write under those conditions. >>=20 >> I'd disagree with this... Retrying often is the wrong thing to do. = If the write didn't work the first time, why would it work the second? = Looks like a programming bug here in controlling the sdhci controller = since we got errors, then we got an interrupt with no pending commands. = This suggests that our timeout isn't quite right... >>=20 >> Warner >>=20 >=20 > Retrying too often or endlessly is wrong, but IMO so is not retrying = at > all, especially when the standard specifies error recovery strategies. I thought we followed the standard's error recovery stuff... Maybe newer = versions have more extensive retry things than in the past... But the = retry should be at the transaction to the card level, not any higher... Warner=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?01C5A0CC-A0FC-4635-8370-EAFDC8E8A854>