From owner-freebsd-usb@FreeBSD.ORG Mon Jan 24 11:27:31 2011 Return-Path: Delivered-To: freebsd-usb@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B206E10656A4; Mon, 24 Jan 2011 11:27:31 +0000 (UTC) (envelope-from hselasky@c2i.net) Received: from swip.net (mailfe04.c2i.net [212.247.154.98]) by mx1.freebsd.org (Postfix) with ESMTP id 0BCC68FC22; Mon, 24 Jan 2011 11:27:30 +0000 (UTC) X-Cloudmark-Score: 0.000000 [] X-Cloudmark-Analysis: v=1.1 cv=FCQkFjYgELNkj6Q2r2z7VPLsEezp8QkZcKORzHC3d6k= c=1 sm=1 a=P5NE3bt0QbgA:10 a=Q9fys5e9bTEA:10 a=CL8lFSKtTFcA:10 a=i9M/sDlu2rpZ9XS819oYzg==:17 a=x0rStR0kPNbJMZ0BeEEA:9 a=8_1rYk2IxaacTq1eWX4A:7 a=o0MkvM_mR23W_GsnFrY0jP3Zj_oA:4 a=PUjeQqilurYA:10 a=FoDfC59nWgHHAGjj:21 a=OmT53Uu0yfSjWUJy:21 a=i9M/sDlu2rpZ9XS819oYzg==:117 Received: from [188.126.198.129] (account mc467741@c2i.net HELO laptop002.hselasky.homeunix.org) by mailfe04.swip.net (CommuniGate Pro SMTP 5.2.19) with ESMTPA id 77615632; Mon, 24 Jan 2011 12:27:29 +0100 From: Hans Petter Selasky To: CDP Date: Mon, 24 Jan 2011 12:27:36 +0100 User-Agent: KMail/1.13.5 (FreeBSD/8.2-PRERELEASE; KDE/4.4.5; amd64; ; ) References: <4D3CAE4E.2040407@gmail.com> <201101241034.07591.hselasky@c2i.net> <4D3D5DBF.3080600@gmail.com> In-Reply-To: <4D3D5DBF.3080600@gmail.com> X-Face: *nPdTl_}RuAI6^PVpA02T?$%Xa^>@hE0uyUIoiha$pC:9TVgl.Oq, NwSZ4V"|LR.+tj}g5 %V,x^qOs~mnU3]Gn; cQLv&.N>TrxmSFf+p6(30a/{)KUU!s}w\IhQBj}[g}bj0I3^glmC( :AuzV9:.hESm-x4h240C`9=w MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Message-Id: <201101241227.36923.hselasky@c2i.net> Cc: mav@freebsd.org, freebsd-usb@freebsd.org Subject: Re: System lockups caused by USB external HDD X-BeenThere: freebsd-usb@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD support for USB List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 Jan 2011 11:27:31 -0000 On Monday 24 January 2011 12:08:47 CDP wrote: > On 01/24/11 11:34, Hans Petter Selasky wrote: > > On Monday 24 January 2011 10:00:53 CDP wrote: > >> On 01/24/11 01:56, Daniel O'Connor wrote: > >>> On 24/01/2011, at 9:10, CDP wrote: > >>>> g_vfs_done():da0s2[WRITE(offset=xxxxxxxxxxxx, length=16384)]error = 5 > >>>> [several more lines similar to the above] > >>>> panic: softdep_move_dependencies: need merge code > >>>> cpuid = 0 > >>>> KDB: stack backtrace: > >>>> #0 0x... at kdb_backtrace+0x5e > >>>> #1 0x... at panic+0x182 > >>> > >>> It looks like the disk is dying, or the FS is corrupt (the former might > >>> cause the later). > >>> > >>> Can you run smartctl on the disk? Unfortunately a lot of enclosures > >>> reject SMART commands so you might not be able to :( > >> > >> I have attached the output of smartctl -d sat -a /dev/da0. I didn't yet > >> run a SMART long test for the simple reason that the disk is going into > >> sleep mode and interrupts it. Haven't bothered to keep it alive for a > >> long test but I might just do that. > >> > >> Although, I doubt it's a disk failure, since I do backups on it without > >> problems by using FreeBSD 7.3, on the same space where FreeBSD 8.x > >> fails. And I am talking about over 150GB of data in one run, while > >> 8.2-RC2 crashes after 5-10GB. I have experienced disk failure in the > >> past, on SATA, and a few read/write errors never caused a system lockup. > >> > >> My feeling is that enough traffic on USB causes the problem, and that > >> this problem is only present in the new USB stack. > >> Unfortunately downgrading to 7.x is not an option because there are > >> things that won't work on this notebook. > > > > If you run a simple test like this: > > > > dd if=/dev/da0 of=/dev/null bs=65536 > > dd if=/dev/da0 of=/dev/null bs=16384 > > > > Do you then see any errors? > > > > Do you have a spare USB memory stick which you could run similar write > > tests on? > > Both reads fail with I/O error, while writes to an unused partition seem > to be fine (I interrupted the writes after a while): > > % dd if=/dev/da0 of=/dev/null bs=65536 > dd: /dev/da0: Input/output error > 191732+0 records in > 191732+0 records out > 12565348352 bytes transferred in 429.999272 secs (29221790 bytes/sec) > > % dd if=/dev/da0 of=/dev/null bs=16384 > dd: /dev/da0: Input/output error > 126427+0 records in > 126427+0 records out > 2071379968 bytes transferred in 169.431766 secs (12225452 bytes/sec) > > # dd if=/dev/random of=/dev/da0s3 bs=65536 > ^C329378+0 records in > 329377+0 records out > 21586051072 bytes transferred in 1003.020293 secs (21521051 bytes/sec) > > # dd if=/dev/random of=/dev/da0s3 bs=16384 > ^C679571+0 records in > 679571+0 records out > 11134091264 bytes transferred in 690.135793 secs (16133189 bytes/sec) > > This is what I get in /var/log/messages when the I/O error occurs: > (da0:umass-sim0:0:0:0): AutoSense failed > > However, I experience no lockup. Maybe this situation is not handled > correctly at another level ? I haven't looked into the code of CAM or GEOM that much so I won't say too much about that. I believe the USB/umass is not to blame. What you could do is to add a conditional error printout in "umass_t_bbb_status_callback()" in /sys/dev/usb/storage/umass.c when the error happens. If that error is not a USB transport error, then we are most likely seeing a SCSI issue in layers above umass. Or if you have access to USB analyser use that. There is now also the option to trace USB from the kernel itself, but the feature is in its early development. --HPS