From owner-freebsd-current@FreeBSD.ORG Sat Mar 13 03:33:47 2010 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 89F67106566C for ; Sat, 13 Mar 2010 03:33:47 +0000 (UTC) (envelope-from yanefbsd@gmail.com) Received: from mail-px0-f200.google.com (mail-px0-f200.google.com [209.85.216.200]) by mx1.freebsd.org (Postfix) with ESMTP id 5B3758FC2B for ; Sat, 13 Mar 2010 03:33:47 +0000 (UTC) Received: by pxi38 with SMTP id 38so1023440pxi.27 for ; Fri, 12 Mar 2010 19:33:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=kmNBJ4krYIloamPYqE5StcTBpTJGUoLiDFYw0DGWqXQ=; b=Te/sHitdvcwdlL24KUG4uy65M1wX79ntEqNuLQeqwA256jRBbjsf7xjF5P6AmgD2bB Fltzi+PQbzbilCGLmph4/yzQvX6/LxFB0W5oJ19XVQNW+Zz3qofDsW299+JbAq38i6ij VGUk8P71IOxIrQd1Ll9TOjKyVHC7HIhxJ7ubA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=NOEaR6smL17gzXpQgA7cmeaUlP4Lgb+cKrWn3I+HGvGeENzPqcg/u3CvUp6wHcrY4c ooFOex5PwRWNqKP3jwnJ2vS9VlyaAKZl53B0eo9JRNTCW1tIxig/fVpPkWFVJvewPaGd vM2GhZzRwGWkB35zOPcHIsYKwKX6mpFuD08oQ= MIME-Version: 1.0 Received: by 10.143.27.20 with SMTP id e20mr3276657wfj.256.1268451226847; Fri, 12 Mar 2010 19:33:46 -0800 (PST) In-Reply-To: <7d6fde3d1003102158o7834ca67lce3eca23aa723fd1@mail.gmail.com> References: <7d6fde3d1003070207q621e69ado2cb64e431feacd76@mail.gmail.com> <7d6fde3d1003070224k3626a9b5y98c11a43eef1bed4@mail.gmail.com> <4e6cba831003101356i534341ffr2961b983854ab788@mail.gmail.com> <7dc40bd01003101407m605e41ey2d8ace0049cf5e61@mail.gmail.com> <7d6fde3d1003102158o7834ca67lce3eca23aa723fd1@mail.gmail.com> Date: Fri, 12 Mar 2010 19:33:46 -0800 Message-ID: <7d6fde3d1003121933s4ba7b57fw6542628c16edf723@mail.gmail.com> From: Garrett Cooper To: Giovanni Trematerra Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: Tom Couch , FreeBSD Current Subject: Re: Removing USB keyboard after filesystems synced causes panic with destroyed mutex twa(4)? X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 13 Mar 2010 03:33:47 -0000 On Wed, Mar 10, 2010 at 9:58 PM, Garrett Cooper wrote: > On Wed, Mar 10, 2010 at 2:07 PM, Tom Couch = wrote: >> Hi FreeBSD-current, >> =A0 =A0 My name is Tom Couch, >> I am part of the 3ware driver team recently acquired by LSI. >> I believe Giovanni's patch, below, is the correct fix for this bug. >> >> I am available to maintain the twa driver, now that I am on this list. >> Let me know how I can help, >> >> Tom >> >> >> On Wed, Mar 10, 2010 at 1:56 PM, Giovanni Trematerra < >> giovanni.trematerra@gmail.com> wrote: >> >>> On Sun, Mar 7, 2010 at 11:24 AM, Garrett Cooper >>> wrote: >>> > On Sun, Mar 7, 2010 at 2:07 AM, Garrett Cooper >>> wrote: >>> >> Hi Alexander and Hans, >>> >> =A0 =A0I recently did the following which generated a panic on a >>> >> 9-CURRENT kernel compiled on the 26th: >>> >> >>> >> 1. Executed reboot >>> >> 2. Removed keyboard. >>> >> 3. Some time after `All buffers synced\nUptime: ...' was displayed, >>> >> the keyboard was registered disconnected. >>> >> 4. The interrupt was delivered to my twa(4) enabled card and the >>> >> kernel panicked, like so: >>> >> >>> >> ugen2.2: at usbus2 (disconnected) >>> >> uhub8: at uhub2, port 1, addr 2 (disconnected) >>> >> ugen2.3: at usbus2 (disconnected) >>> >> ukbd0: at uhub8, port 3, addr 3 (disconnected) >>> >> uhid0: at uhub8, port 3, addr 3 (disconnected) >>> >> panic: mtx_lock_spin() of destroyed mutex @ >>> /usr/src/sys/dev/twa/tw_cl_intr.c:88 >>> >> >>> >> cpuid =3D 1 >>> >> KDB: enter: panic >>> >> [thread pid 12 tid 100025 ] >>> >> Stopped at =A0 =A0 =A0 =A0 kdb_enter+0x3d: movq =A0 =A0 $0,0x40289c(= %rip) >>> >> db> >>> >> >>> >> =A0 =A0I wish I could provide you with more details, but unfortunate= ly I >>> >> the USB bus isn't registering the fact that I'm reattaching the >>> >> keyboard right now and the box won't reboot automatically :( (didn't >>> >> set the right sysctl beforehand to panic automatically). I'll try an= d >>> >> reproduce the issue again, but I was just wondering whether or not y= ou >>> >> guys had seen this problem before. >>> > >>> > =A0 =A0Phew... it's reproducible with that kernel. Here's what I did >>> > exactly (because my original directions were incorrect): >>> > =A0 =A01. Hit power button (for S5). >>> > =A0 =A02. Disconnect keyboard RIGHT as `Uptime: ...' is displayed. >>> > =A0 =A0Kernel panicked on my system again. Now to figure out if it st= ill >>> > exists with a kernel compiled today, and also how to debug it if it >>> > still does exist :/... >>> > Thanks, >>> > -Garrett >>> >>> Hi Garrett, >>> Could you please try the patch below and report back? >>> >>> Thank you >>> >>> diff -r cab6489de66d sys/dev/twa/tw_cl_intr.c >>> --- a/sys/dev/twa/tw_cl_intr.c =A0 =A0 =A0 =A0Wed Mar 03 04:51:13 2010 = -0500 >>> +++ b/sys/dev/twa/tw_cl_intr.c =A0 =A0 =A0 =A0Wed Mar 10 06:29:05 2010 = -0500 >>> @@ -75,9 +75,12 @@ tw_cl_interrupt(struct tw_cl_ctlr_handle >>> =A0 =A0 =A0if (ctlr =3D=3D NULL) >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 goto out; >>> >>> - =A0 =A0 /* If we get an interrupt while resetting, it is a shared >>> - =A0 =A0 =A0 =A0one for another device, so just bail */ >>> - =A0 =A0 if (ctlr->state & TW_CLI_CTLR_STATE_RESET_IN_PROGRESS) >>> + =A0 =A0 /* >>> + =A0 =A0 =A0* =A0If we get an interrupt while resetting or shutting do= wn, >>> + =A0 =A0 =A0* =A0it is a shared one for another device, so just bail >>> + =A0 =A0 =A0*/ >>> + =A0 =A0 if (ctlr->state & TW_CLI_CTLR_STATE_RESET_IN_PROGRESS || >>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 (ctrl->state & TW_CLI_CTLR_ST= ATE_ACTIVE) =3D=3D 0) >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0goto out; >>> >>> =A0 =A0 =A0/* Apart from the typo above (s/ctrl/ctlr/), things work appropriately now at reboot. The only problem is that bootup is really wonky now, because the RAID had a LOT of issues attaching to cam(4) (failed in 2/3 cold boot attempts); an additional branch condition may need to be added to the above if-statement if this change didn't take that into account. However, if the old behavior was incorrect and the new behavior is correct, s.t. the RAID controller demonstrating bus detection timeout issue that is occurring with a lot of USB devices and some RAID controllers today, this could be extremely problematic. So, while it looks better than before at reboot, it's not ready yet for prime time; I'd rather that the bug was filed with the patch you provided after the typo fixed, with the caveat mentioned and NOT committed, because the adverse affect(s) seem a bit more annoying than the previous panic issue described. > I'll give the patch a try sometime before the weekend; I have a > critical deadline that I need to work through and the machine can't be > taken offline until then. Thanks :), -Garrett