From owner-freebsd-usb@FreeBSD.ORG Sat Sep 23 19:31:32 2006 Return-Path: X-Original-To: freebsd-usb@freebsd.org Delivered-To: freebsd-usb@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C2BF516A412 for ; Sat, 23 Sep 2006 19:31:32 +0000 (UTC) (envelope-from nox@saturn.kn-bremen.de) Received: from gwyn.kn-bremen.de (gwyn.kn-bremen.de [212.63.36.242]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9E1DE43D46 for ; Sat, 23 Sep 2006 19:31:31 +0000 (GMT) (envelope-from nox@saturn.kn-bremen.de) Received: from gwyn.kn-bremen.de (gwyn [127.0.0.1]) by gwyn.kn-bremen.de (8.13.4/8.13.4/Debian-3sarge1) with ESMTP id k8NJVTxp000862 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Sat, 23 Sep 2006 21:31:29 +0200 Received: from saturn.kn-bremen.de (uucp@localhost) by gwyn.kn-bremen.de (8.13.4/8.13.4/Submit) with UUCP id k8NJVTB0000860; Sat, 23 Sep 2006 21:31:29 +0200 Received: from saturn.kn-bremen.de (nox@localhost [127.0.0.1]) by saturn.kn-bremen.de (8.13.6/8.13.6) with ESMTP id k8NJRGjq011466; Sat, 23 Sep 2006 21:27:16 +0200 (CEST) (envelope-from nox@saturn.kn-bremen.de) Received: (from nox@localhost) by saturn.kn-bremen.de (8.13.6/8.13.6/Submit) id k8NJREkj011465; Sat, 23 Sep 2006 21:27:14 +0200 (CEST) (envelope-from nox) From: Juergen Lock Date: Sat, 23 Sep 2006 21:27:14 +0200 To: Hans Petter Selasky Message-ID: <20060923192714.GA11310@saturn.kn-bremen.de> References: <20060920011107.GA9379@saturn.kn-bremen.de> <200609220834.30428.hselasky@c2i.net> <20060923145704.GA1087@saturn.kn-bremen.de> <200609231836.23889.hselasky@c2i.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200609231836.23889.hselasky@c2i.net> User-Agent: Mutt/1.5.11 Cc: freebsd-usb@freebsd.org Subject: Re: umass0: BBB reset failed, TIMEOUT (again) X-BeenThere: freebsd-usb@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD support for USB List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 23 Sep 2006 19:31:32 -0000 On Sat, Sep 23, 2006 at 06:36:22PM +0200, Hans Petter Selasky wrote: > On Saturday 23 September 2006 16:57, Juergen Lock wrote: > > On Fri, Sep 22, 2006 at 08:34:29AM +0200, Hans Petter Selasky wrote: > > > On Friday 22 September 2006 00:04, Juergen Lock wrote: > > > > On Wed, Sep 20, 2006 at 11:18:32AM +0200, Hans Petter Selasky wrote: > > > > > On Wednesday 20 September 2006 03:11, Juergen Lock wrote: > > > > > > Today for the first time since this box got a new board I tried to > > > > > > copy data onto the usb cardreader, and after copying for a while it > > > > > > suddenly stopped (led stopped flashing, no further io), and after > > > > > > some time i had the above in dmesg. And that was it, cp process > > > > > > hung, no way to kill it. Unplugged the thing, and got the expected > > > > > > panic: vinvalbuf: dirty bufs. Tried the same thing from linux > > > > > > (after dosfsck), and there copying stopped for a while too, but it > > > > > > then continued and finished. Is this is some kind of new hardware > > > > > > quirk of the new board's ehci controller, that linux recovers from? > > > > > > (via, there already is a `dropped interrupt' fix for it, which > > > > > > helped with my last board...) > > > > > > We can easily check for dropped interrupts. If you run: > > > > > > sysctl hw.usb.ehci.debug=15 > > > sysctl hw.usb.umass.debug=-1 > > > > > > When your device hangs. And then send me the log again. > > > > > > > Ok. This time writing worked, but reading back to verify (cmp) seemed > > > > to hang. Did the sysctl (see below), then a while later I got an IO > > > > error. Tried to umount, got another IO error, tried umount -f, got a > > > > panic (probably expected.) I have now installed mtools and won't mount > > > > umass devices on this box anymore... :/ (Btw when I later tried to > > > > mcopy the file off the thing using the original kernel I noticed the > > > > led was off after it hung, dunno if that also was the case when I tried > > > > it with the new code but I would suspect so. At least this time, since > > > > it wasnt mounted, I could unplug it without getting a panic...) > > > > > > > > Oh, one thing that occured to me: Even when you may be able to get > > > > around (what appars to be) hardware quirks like this by retrying IO > > > > or resetting the device, that probably wont work when you have an > > > > umass tape drive (sa), since with tape you can't just retry a > > > > read/write, and resetting it may even rewind, with the next write > > > > erasing everything on the tape. Just a thought... > > > > > > > > Anyway, here's the syslog of the `experiment', beginning after the > > > > sysctl: > > > > > > From the log I see that it looks like the statemachine of your device has > > > locked up. Even the reset command is timing out. That should not happen. > > > We could try to reconfigure the device, when reset fails. > > > > OK. I applied the umass_transfer_start(sc, UMASS_T_BBB_STATUS); > > patch in your other message and tried mcopy'ing off it again. > > This time I got a bunch of errors when first connecting it > > (well, more than usual) and /dev/da2s1 didnt appear, so I had to > > replug. (This may be a quirk of the device not of the board, since, > > unlike the IO problems, it also happened sometimes with my old board.) > > Anyway, I have left logs of that in, just in case... > > The first read this time hung with the led on, I did the sysctls, > > and soon after that (after messages were logged) the mcopy command > > exited without any error, leaving a truncated copy! Just in case, > > I did an fsck_msdos, but it found nothing wrong. Changed the sysctls > > back to 0 and tried another copy, this time it hung with the led off. > > Turned the sysctls back on and waited until mcopy exited, this time > > with an IO error (led was still off.) Unplugged, and bzip2'd the log > > (its big, probably because I left the sysctls on while doing the fsck.) > > > > I guess the usb controller on this board is just weird... :/ > > >From what I can see your device stops responding: > > Sep 23 16:26:55 saturn kernel: QTD(0xc54d55c0) at 0x2b9895c0: > Sep 23 16:26:55 saturn kernel: next=0x2b989580<> altnext=0x00000001 > Sep 23 16:26:55 saturn kernel: status=0x00001100: toggle=0 bytes=0x0 ioc=0 > c_page=0x1 > Sep 23 16:26:55 saturn kernel: cerr=0 pid=1 stat=NOT_ACTIVE > Sep 23 16:26:55 saturn kernel: buffer[0]=0x2bd15600 > Sep 23 16:26:55 saturn kernel: buffer[1]=0x2bd98000 > Sep 23 16:26:55 saturn kernel: buffer[2]=0x00000000 > Sep 23 16:26:55 saturn kernel: buffer[3]=0x00000000 > Sep 23 16:26:55 saturn kernel: buffer[4]=0x00000000 > Sep 23 16:26:55 saturn kernel: buffer_hi[0]=0x00000000 > Sep 23 16:26:55 saturn kernel: buffer_hi[1]=0x00000000 > Sep 23 16:26:55 saturn kernel: buffer_hi[2]=0x00000000 > Sep 23 16:26:55 saturn kernel: buffer_hi[3]=0x00000000 > Sep 23 16:26:55 saturn kernel: buffer_hi[4]=0x00000000 > > > Sep 23 16:26:55 saturn kernel: QTD(0xc54d5580) at 0x2b989580: > Sep 23 16:26:55 saturn kernel: next=0x2b989540<> altnext=0x00000001 > Sep 23 16:26:55 saturn kernel: status=0x10000180: toggle=0 bytes=0x1000 ioc=0 > c_page=0x0 > Sep 23 16:26:55 saturn kernel: cerr=0 pid=1 stat=ACTIVE > Sep 23 16:26:55 saturn kernel: buffer[0]=0x2bd98600 > Sep 23 16:26:55 saturn kernel: buffer[1]=0x2c077000 > Sep 23 16:26:55 saturn kernel: buffer[2]=0x00000000 > Sep 23 16:26:55 saturn kernel: buffer[3]=0x00000000 > Sep 23 16:26:55 saturn kernel: buffer[4]=0x00000000 > Sep 23 16:26:55 saturn kernel: buffer_hi[0]=0x00000000 > Sep 23 16:26:55 saturn kernel: buffer_hi[1]=0x00000000 > Sep 23 16:26:55 saturn kernel: buffer_hi[2]=0x00000000 > Sep 23 16:26:55 saturn kernel: buffer_hi[3]=0x00000000 > Sep 23 16:26:55 saturn kernel: buffer_hi[4]=0x00000000 > > You see here that a transfer is waiting to be transferred. That is why it > times out. If you had a missing interrupt problem, the "stat=ACTIVE" transfer > whould would be "HALTED" at least, or "stat=ACTIVE" cleared and bytes != 0, > to indicate a short transfer. Neither of this is the case so: > > a) Your EHCI controller stopped. > b) Your USB device got some kind of unrecoverable error. I suspect the former, since with the previous board I never had these problems. (well, except having to apply the `dropped interrupt' patch...) > > Right now I am a little short of time, but I have some more patches that you > can try. For example I want to see what happens when you try to re-set the > configuration value. OK, send patches whenever you find the time... thanx, Juergen