From owner-freebsd-usb@FreeBSD.ORG Wed Jan 26 08:48:24 2011 Return-Path: Delivered-To: freebsd-usb@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 38DE0106566B; Wed, 26 Jan 2011 08:48:24 +0000 (UTC) (envelope-from dr.clau@gmail.com) Received: from mail-wy0-f182.google.com (mail-wy0-f182.google.com [74.125.82.182]) by mx1.freebsd.org (Postfix) with ESMTP id 9245F8FC15; Wed, 26 Jan 2011 08:48:23 +0000 (UTC) Received: by wyf19 with SMTP id 19so692365wyf.13 for ; Wed, 26 Jan 2011 00:48:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:message-id:date:from:user-agent:mime-version:to :cc:subject:references:in-reply-to:x-enigmail-version:content-type; bh=MJEyW9Hk7ULxOIWtJtf4wJrIBT8HfsEosWiLpI5R9rE=; b=iPtLKU2bdDpgUG3rItwrfRMf+gJAvJDOoRGJ7sLFkpOdJ5L6yULUNTMUzB5OjiMAoy /6EeNBX0jrHklia670DIAoQQo3W7EAwZek89rlvXpDeYhZYJ3g7CEJeW6bmTeDGhe5WX LsyyEdzMbpZQOTZY1Ud3LOXcw+kpGN7ZXU3/8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:x-enigmail-version:content-type; b=OqxOClMiTtaYAespF6i3FrzalO0ycmHJw5d93RYq0TkWYH7l5H6Ns/q+9cpjzuFdZo RJ9lZvUdm+O8aCkUcS+JQvIQ9yQQ4h+KQmVxCqbyVgTs88t2bcEk1mPHo0+GlMFXuUa4 DyxeNGtSpb0mHoCxeOktxnx4mlLpC8S8bBwsw= Received: by 10.227.155.82 with SMTP id r18mr7159736wbw.107.1296031702380; Wed, 26 Jan 2011 00:48:22 -0800 (PST) Received: from [127.0.0.103] ([78.96.107.4]) by mx.google.com with ESMTPS id b19sm4010077wbd.7.2011.01.26.00.48.19 (version=TLSv1/SSLv3 cipher=RC4-MD5); Wed, 26 Jan 2011 00:48:20 -0800 (PST) Message-ID: <4D3FDFD4.7090002@gmail.com> Date: Wed, 26 Jan 2011 10:48:20 +0200 From: CDP User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.13) Gecko/20101210 Thunderbird/3.1.7 MIME-Version: 1.0 To: freebsd-usb@freebsd.org References: <4D3CAE4E.2040407@gmail.com> <201101241034.07591.hselasky@c2i.net> <4D3D5DBF.3080600@gmail.com> <201101241227.36923.hselasky@c2i.net> In-Reply-To: <201101241227.36923.hselasky@c2i.net> X-Enigmail-Version: 1.1.2 Content-Type: multipart/mixed; boundary="------------090901020702040209060401" Cc: mav@freebsd.org Subject: Re: System lockups caused by USB external HDD X-BeenThere: freebsd-usb@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD support for USB List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Jan 2011 08:48:24 -0000 This is a multi-part message in MIME format. --------------090901020702040209060401 Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit On 01/24/11 13:27, Hans Petter Selasky wrote: > On Monday 24 January 2011 12:08:47 CDP wrote: >> On 01/24/11 11:34, Hans Petter Selasky wrote: >>> On Monday 24 January 2011 10:00:53 CDP wrote: >>>> On 01/24/11 01:56, Daniel O'Connor wrote: >>>>> On 24/01/2011, at 9:10, CDP wrote: >>>>>> g_vfs_done():da0s2[WRITE(offset=xxxxxxxxxxxx, length=16384)]error = 5 >>>>>> [several more lines similar to the above] >>>>>> panic: softdep_move_dependencies: need merge code >>>>>> cpuid = 0 >>>>>> KDB: stack backtrace: >>>>>> #0 0x... at kdb_backtrace+0x5e >>>>>> #1 0x... at panic+0x182 >>>>> >>>>> It looks like the disk is dying, or the FS is corrupt (the former might >>>>> cause the later). >>>>> >>>>> Can you run smartctl on the disk? Unfortunately a lot of enclosures >>>>> reject SMART commands so you might not be able to :( >>>> >>>> I have attached the output of smartctl -d sat -a /dev/da0. I didn't yet >>>> run a SMART long test for the simple reason that the disk is going into >>>> sleep mode and interrupts it. Haven't bothered to keep it alive for a >>>> long test but I might just do that. >>>> >>>> Although, I doubt it's a disk failure, since I do backups on it without >>>> problems by using FreeBSD 7.3, on the same space where FreeBSD 8.x >>>> fails. And I am talking about over 150GB of data in one run, while >>>> 8.2-RC2 crashes after 5-10GB. I have experienced disk failure in the >>>> past, on SATA, and a few read/write errors never caused a system lockup. >>>> >>>> My feeling is that enough traffic on USB causes the problem, and that >>>> this problem is only present in the new USB stack. >>>> Unfortunately downgrading to 7.x is not an option because there are >>>> things that won't work on this notebook. >>> >>> If you run a simple test like this: >>> >>> dd if=/dev/da0 of=/dev/null bs=65536 >>> dd if=/dev/da0 of=/dev/null bs=16384 >>> >>> Do you then see any errors? >>> >>> Do you have a spare USB memory stick which you could run similar write >>> tests on? >> >> Both reads fail with I/O error, while writes to an unused partition seem >> to be fine (I interrupted the writes after a while): >> >> % dd if=/dev/da0 of=/dev/null bs=65536 >> dd: /dev/da0: Input/output error >> 191732+0 records in >> 191732+0 records out >> 12565348352 bytes transferred in 429.999272 secs (29221790 bytes/sec) >> >> % dd if=/dev/da0 of=/dev/null bs=16384 >> dd: /dev/da0: Input/output error >> 126427+0 records in >> 126427+0 records out >> 2071379968 bytes transferred in 169.431766 secs (12225452 bytes/sec) >> >> # dd if=/dev/random of=/dev/da0s3 bs=65536 >> ^C329378+0 records in >> 329377+0 records out >> 21586051072 bytes transferred in 1003.020293 secs (21521051 bytes/sec) >> >> # dd if=/dev/random of=/dev/da0s3 bs=16384 >> ^C679571+0 records in >> 679571+0 records out >> 11134091264 bytes transferred in 690.135793 secs (16133189 bytes/sec) >> >> This is what I get in /var/log/messages when the I/O error occurs: >> (da0:umass-sim0:0:0:0): AutoSense failed >> >> However, I experience no lockup. Maybe this situation is not handled >> correctly at another level ? > > I haven't looked into the code of CAM or GEOM that much so I won't say too > much about that. I believe the USB/umass is not to blame. What you could do is > to add a conditional error printout in "umass_t_bbb_status_callback()" in > /sys/dev/usb/storage/umass.c when the error happens. If that error is not a > USB transport error, then we are most likely seeing a SCSI issue in layers > above umass. Or if you have access to USB analyser use that. There is now also > the option to trace USB from the kernel itself, but the feature is in its > early development. You are right, I've tracked the problem down to CAM (cam_periph.c: camperiphsensedone()). I've changed the code to behave as it did in 7.3, and it mitigates the problem. I don't get "AutoSense failed" errors anymore and I don't get any lockups/crashes, not even when using softupdates on the external hdd. However, the pauses in disk operations still happen, but this doesn't seem to create any further issues. I haven't looked into this. I've attached a patch. I don't know if this behavior is correct, and I hope someone that knows CAM can take a look into this issue. Claudiu. --------------090901020702040209060401 Content-Type: text/plain; name="cam_periph.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="cam_periph.patch" --- sys/cam/cam_periph.c.orig 2011-01-26 09:38:21.000000000 +0200 +++ sys/cam/cam_periph.c 2011-01-26 09:38:02.000000000 +0200 @@ -1024,7 +1024,9 @@ int frozen = 0; u_int sense_key; int depth = done_ccb->ccb_h.recovery_depth; + int xpt_done_ccb; + xpt_done_ccb = FALSE; status = done_ccb->ccb_h.status; if (status & CAM_DEV_QFRZN) { frozen = 1; @@ -1049,14 +1051,22 @@ if (sense_key != SSD_KEY_NO_SENSE) { saved_ccb->ccb_h.status |= CAM_AUTOSNS_VALID; - } else { + + xpt_done_ccb = TRUE; + } /*else { saved_ccb->ccb_h.status &= ~CAM_STATUS_MASK; saved_ccb->ccb_h.status |= CAM_AUTOSENSE_FAIL; - } + }*/ bcopy(saved_ccb, done_ccb, sizeof(union ccb)); xpt_free_ccb(saved_ccb); + + periph->flags &= ~CAM_PERIPH_RECOVERY_INPROG; + + if (xpt_done_ccb == FALSE) + xpt_action(done_ccb); + break; } default: @@ -1084,7 +1094,9 @@ */ if (frozen != 0) done_ccb->ccb_h.status |= CAM_DEV_QFRZN; - (*done_ccb->ccb_h.cbfcnp)(periph, done_ccb); + + if (xpt_done_ccb == TRUE) + (*done_ccb->ccb_h.cbfcnp)(periph, done_ccb); } static void --------------090901020702040209060401--