Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 23 Sep 2006 18:36:22 +0200
From:      Hans Petter Selasky <hselasky@c2i.net>
To:        Juergen Lock <nox@jelal.kn-bremen.de>
Cc:        freebsd-usb@freebsd.org
Subject:   Re: umass0: BBB reset failed, TIMEOUT (again)
Message-ID:  <200609231836.23889.hselasky@c2i.net>
In-Reply-To: <20060923145704.GA1087@saturn.kn-bremen.de>
References:  <20060920011107.GA9379@saturn.kn-bremen.de> <200609220834.30428.hselasky@c2i.net> <20060923145704.GA1087@saturn.kn-bremen.de>

next in thread | previous in thread | raw e-mail | index | archive | help
On Saturday 23 September 2006 16:57, Juergen Lock wrote:
> On Fri, Sep 22, 2006 at 08:34:29AM +0200, Hans Petter Selasky wrote:
> > On Friday 22 September 2006 00:04, Juergen Lock wrote:
> > > On Wed, Sep 20, 2006 at 11:18:32AM +0200, Hans Petter Selasky wrote:
> > > > On Wednesday 20 September 2006 03:11, Juergen Lock wrote:
> > > > > Today for the first time since this box got a new board I tried to
> > > > > copy data onto the usb cardreader, and after copying for a while =
it
> > > > > suddenly stopped (led stopped flashing, no further io), and after
> > > > > some time i had the above in dmesg.  And that was it, cp process
> > > > > hung, no way to kill it.  Unplugged the thing, and got the expect=
ed
> > > > > panic: vinvalbuf: dirty bufs.  Tried the same thing from linux
> > > > > (after dosfsck), and there copying stopped for a while too, but it
> > > > > then continued and finished.  Is this is some kind of new hardware
> > > > > quirk of the new board's ehci controller, that linux recovers fro=
m?
> > > > >  (via, there already is a `dropped interrupt' fix for it, which
> > > > > helped with my last board...)
> >
> > We can easily check for dropped interrupts. If you run:
> >
> > sysctl hw.usb.ehci.debug=3D15
> > sysctl hw.usb.umass.debug=3D-1
> >
> > When your device hangs. And then send me the log again.
> >
> > > Ok.  This time writing worked, but reading back to verify (cmp) seemed
> > > to hang.  Did the sysctl (see below), then a while later I got an IO
> > > error. Tried to umount, got another IO error, tried umount -f, got a
> > > panic (probably expected.)  I have now installed mtools and won't mou=
nt
> > > umass devices on this box anymore... :/  (Btw when I later tried to
> > > mcopy the file off the thing using the original kernel I noticed the
> > > led was off after it hung, dunno if that also was the case when I tri=
ed
> > > it with the new code but I would suspect so.  At least this time, sin=
ce
> > > it wasnt mounted, I could unplug it without getting a panic...)
> > >
> > >  Oh, one thing that occured to me: Even when you may be able to get
> > > around (what appars to be) hardware quirks like this by retrying IO
> > > or resetting the device, that probably wont work when you have an
> > > umass tape drive (sa), since with tape you can't just retry a
> > > read/write, and resetting it may even rewind, with the next write
> > > erasing everything on the tape.  Just a thought...
> > >
> > >  Anyway, here's the syslog of the `experiment', beginning after the
> > > sysctl:
> >
> > From the log I see that it looks like the statemachine of your device h=
as
> > locked up. Even the reset command is timing out. That should not happen.
> > We could try to reconfigure the device, when reset fails.
>
> OK. I applied the umass_transfer_start(sc, UMASS_T_BBB_STATUS);
> patch in your other message and tried mcopy'ing off it again.
> This time I got a bunch of errors when first connecting it
> (well, more than usual) and /dev/da2s1 didnt appear, so I had to
> replug. (This may be a quirk of the device not of the board, since,
> unlike the IO problems, it also happened sometimes with my old board.)
> Anyway, I have left logs of that in, just in case...
> The first read this time hung with the led on, I did the sysctls,
> and soon after that (after messages were logged) the mcopy command
> exited without any error, leaving a truncated copy!  Just in case,
> I did an fsck_msdos, but it found nothing wrong.  Changed the sysctls
> back to 0 and tried another copy, this time it hung with the led off.
> Turned the sysctls back on and waited until mcopy exited, this time
> with an IO error (led was still off.)  Unplugged, and bzip2'd the log
> (its big, probably because I left the sysctls on while doing the fsck.)
>
>  I guess the usb controller on this board is just weird... :/

=46rom what I can see your device stops responding:

Sep 23 16:26:55 saturn kernel: QTD(0xc54d55c0) at 0x2b9895c0:
Sep 23 16:26:55 saturn kernel: next=3D0x2b989580<> altnext=3D0x00000001<T>
Sep 23 16:26:55 saturn kernel: status=3D0x00001100: toggle=3D0 bytes=3D0x0 =
ioc=3D0=20
c_page=3D0x1
Sep 23 16:26:55 saturn kernel: cerr=3D0 pid=3D1 stat=3DNOT_ACTIVE
Sep 23 16:26:55 saturn kernel: buffer[0]=3D0x2bd15600
Sep 23 16:26:55 saturn kernel: buffer[1]=3D0x2bd98000
Sep 23 16:26:55 saturn kernel: buffer[2]=3D0x00000000
Sep 23 16:26:55 saturn kernel: buffer[3]=3D0x00000000
Sep 23 16:26:55 saturn kernel: buffer[4]=3D0x00000000
Sep 23 16:26:55 saturn kernel: buffer_hi[0]=3D0x00000000
Sep 23 16:26:55 saturn kernel: buffer_hi[1]=3D0x00000000
Sep 23 16:26:55 saturn kernel: buffer_hi[2]=3D0x00000000
Sep 23 16:26:55 saturn kernel: buffer_hi[3]=3D0x00000000
Sep 23 16:26:55 saturn kernel: buffer_hi[4]=3D0x00000000


Sep 23 16:26:55 saturn kernel: QTD(0xc54d5580) at 0x2b989580:
Sep 23 16:26:55 saturn kernel: next=3D0x2b989540<> altnext=3D0x00000001<T>
Sep 23 16:26:55 saturn kernel: status=3D0x10000180: toggle=3D0 bytes=3D0x10=
00 ioc=3D0=20
c_page=3D0x0
Sep 23 16:26:55 saturn kernel: cerr=3D0 pid=3D1 stat=3DACTIVE
Sep 23 16:26:55 saturn kernel: buffer[0]=3D0x2bd98600
Sep 23 16:26:55 saturn kernel: buffer[1]=3D0x2c077000
Sep 23 16:26:55 saturn kernel: buffer[2]=3D0x00000000
Sep 23 16:26:55 saturn kernel: buffer[3]=3D0x00000000
Sep 23 16:26:55 saturn kernel: buffer[4]=3D0x00000000
Sep 23 16:26:55 saturn kernel: buffer_hi[0]=3D0x00000000
Sep 23 16:26:55 saturn kernel: buffer_hi[1]=3D0x00000000
Sep 23 16:26:55 saturn kernel: buffer_hi[2]=3D0x00000000
Sep 23 16:26:55 saturn kernel: buffer_hi[3]=3D0x00000000
Sep 23 16:26:55 saturn kernel: buffer_hi[4]=3D0x00000000

You see here that a transfer is waiting to be transferred. That is why it=20
times out. If you had a missing interrupt problem, the "stat=3DACTIVE" tran=
sfer=20
whould would be "HALTED" at least, or "stat=3DACTIVE" cleared and bytes !=
=3D 0,=20
to indicate a short transfer. Neither of this is the case so:

a) Your EHCI controller stopped.
b) Your USB device got some kind of unrecoverable error.

Right now I am a little short of time, but I have some more patches that yo=
u=20
can try. For example I want to see what happens when you try to re-set the=
=20
configuration value.

=2D-HPS



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200609231836.23889.hselasky>