From owner-freebsd-current@FreeBSD.ORG Thu Jun 23 12:54:46 2011 Return-Path: Delivered-To: current@FreeBSD.ORG Received: from mx1.freebsd.org (unknown [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id ECE02106575C; Thu, 23 Jun 2011 12:54:46 +0000 (UTC) (envelope-from ache@vniz.net) Received: from vniz.net (vniz.net [194.87.13.69]) by mx1.freebsd.org (Postfix) with ESMTP id 2B3068FC13; Thu, 23 Jun 2011 12:54:45 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by vniz.net (8.14.4/8.14.4) with ESMTP id p5NCsiYO043131; Thu, 23 Jun 2011 16:54:44 +0400 (MSD) (envelope-from ache@vniz.net) Received: (from ache@localhost) by localhost (8.14.5/8.14.5/Submit) id p5NCsiDS043130; Thu, 23 Jun 2011 16:54:44 +0400 (MSD) (envelope-from ache) Date: Thu, 23 Jun 2011 16:54:44 +0400 From: Andrey Chernov To: "Kenneth D. Merry" , will@FreeBSD.ORG Message-ID: <20110623125443.GA42879@vniz.net> Mail-Followup-To: Andrey Chernov , "Kenneth D. Merry" , will@freebsd.org, Kostik Belousov , "Justin T. Gibbs" , Eir Nym , current@FreeBSD.ORG References: <20110620001912.GA60252@vniz.net> <4DFEAD4F.1040603@FreeBSD.org> <20110620070222.GA74009@vniz.net> <20110620080146.GF48734@deviant.kiev.zoral.com.ua> <20110620114656.GA83524@vniz.net> <20110621161719.GA16166@nargothrond.kdm.org> <20110621204934.GB9877@vniz.net> <20110622035404.GA38834@nargothrond.kdm.org> <20110622041325.GA13754@vniz.net> <20110622200919.GA72504@nargothrond.kdm.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110622200919.GA72504@nargothrond.kdm.org> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: Kostik Belousov , Eir Nym , "Justin T. Gibbs" , current@FreeBSD.ORG, will@FreeBSD.ORG Subject: Re: Exactly that commit (was Re: Latest -current 100% hang at the late boot stage) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Jun 2011 12:54:47 -0000 Apparently there is another problem plain ATA CD/DVD related. With r223443 hangs nature is changed: I see no more waiting in "caplck" state, just xpt_thrd waiting in "ccb_scan" state forever and those repeated messages: run_interrupt_driven_hooks: still waiting after 60 seconds for xpt_config run_interrupt_driven_hooks: still waiting after 120 seconds for xpt_config ... and so on. On Wed, Jun 22, 2011 at 02:09:19PM -0600, Kenneth D. Merry wrote: > On Wed, Jun 22, 2011 at 08:13:25 +0400, Andrey Chernov wrote: > > On Tue, Jun 21, 2011 at 09:54:04PM -0600, Kenneth D. Merry wrote: > > > These two are interesting: > > > > > > > http://img825.imageshack.us/img825/1249/21062011014m.jpg > > > > http://img839.imageshack.us/img839/3791/21062011015.jpg > > > > > > It looks like the GEOM event thread is stuck inside the cd(4) driver. The > > > cd(4) driver is trying to acquire the peripheral lock, and is sleeping > > > until it gets it. > > > > > > What isn't clear is who is holding it. The ps output shows an idle thread > > > running on CPU 1, and thread 100014 (taskq) running on CPU 0. > > > Unfortunately I don't see a stack trace for that. (I might have missed > > > it.) > > > > > > Do you happen to have the image with the stack trace for that thread? > > > > I don't have the image because no disks are mounted at that stage and the > > swap slice is not attached. But I can issue more specific DDB commands to > > narrow it down, just say what you need in detail. > > > > BTW, the machine have 2 DVD both are attached to Marvell IDE plain ATA > > interface, they always works before. > > > > Are you sure that something holding the lock? 'show lock' shows absolutely > > nothing, it is empty. > > Well, after looking at the code a little more, it looks like the "lock" > that is being held is the periph lock, which is really just a flag. > So 'show lock' wouldn't show anything relevant. Here's cam_periph_hold(): > > int > cam_periph_hold(struct cam_periph *periph, int priority) > { > int error; > > /* > * Increment the reference count on the peripheral > * while we wait for our lock attempt to succeed > * to ensure the peripheral doesn't disappear out > * from user us while we sleep. > */ > > if (cam_periph_acquire(periph) != CAM_REQ_CMP) > return (ENXIO); > > mtx_assert(periph->sim->mtx, MA_OWNED); > while ((periph->flags & CAM_PERIPH_LOCKED) != 0) { > periph->flags |= CAM_PERIPH_LOCK_WANTED; > if ((error = mtx_sleep(periph, periph->sim->mtx, priority, > "caplck", 0)) != 0) { > cam_periph_release_locked(periph); > return (error); > } > } > > periph->flags |= CAM_PERIPH_LOCKED; > return (0); > } > > The GEOM event thread is stuck sleeping in the mtx_sleep() call above. So > that tells me that one of several things is going on: > > - There is a path in the cd(4) driver where it can call cam_periph_hold() > but not cam_periph_unhold(). > > - There is another thread in the system that has called cam_periph_hold(), > and has gotten stuck before it can call cam_periph_unhold(). > > - The hold/unhold logic is broken, and there is a case where a thread > waiting for the lock can miss the wakeup. After looking at the code, I > don't think this is the case, but I may have missed something. > > So it is probably one of the first two cases. From the dmesg, I only see > cd1 listed, not cd0. So it is possible that cd0 is stuck in the probe code > somewhere, and the geom code just gets stuck trying to open it when the > probe hasn't completed. > > Seeing the stack trace for the taskq thread that is running on CPU 0 > (process 100014) might be enlightening, it's hard to say. That may or may > not show the issue. > > It's possible that this issue is directly related to the commit in > question; perhaps there is an error being returned that wasn't returned > before and it isn't being handled right in the cd(4) driver. (The cd(4) > driver wasn't touched in the commit.) > > It's also possible that the commit in question just changed the timing and > your system is hitting a race that was there previously. > > Ken > -- > Kenneth Merry > ken@FreeBSD.ORG -- http://ache.vniz.net/