From owner-freebsd-current@FreeBSD.ORG Thu Jun 23 12:51:40 2011 Return-Path: Delivered-To: current@FreeBSD.org Received: from mx1.freebsd.org (unknown [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CCF901065674; Thu, 23 Jun 2011 12:51:40 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 348DE8FC12; Thu, 23 Jun 2011 12:51:38 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id PAA28784; Thu, 23 Jun 2011 15:51:37 +0300 (EEST) (envelope-from avg@FreeBSD.org) Message-ID: <4E0336D8.80300@FreeBSD.org> Date: Thu, 23 Jun 2011 15:51:36 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.17) Gecko/20110504 Lightning/1.0b2 Thunderbird/3.1.10 MIME-Version: 1.0 To: "Kenneth D. Merry" References: <20110619232307.GA57530@vniz.net> <20110620001912.GA60252@vniz.net> <4DFEAD4F.1040603@FreeBSD.org> <20110620070222.GA74009@vniz.net> <20110620080146.GF48734@deviant.kiev.zoral.com.ua> <20110620114656.GA83524@vniz.net> <20110621161719.GA16166@nargothrond.kdm.org> <20110621204934.GB9877@vniz.net> <20110622035404.GA38834@nargothrond.kdm.org> <20110622041325.GA13754@vniz.net> <20110622200919.GA72504@nargothrond.kdm.org> In-Reply-To: <20110622200919.GA72504@nargothrond.kdm.org> X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: Andrey Chernov , current@FreeBSD.org, Eir Nym , Kostik Belousov , "Justin T. Gibbs" , will@FreeBSD.org Subject: Re: Exactly that commit (was Re: Latest -current 100% hang at the late boot stage) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Jun 2011 12:51:40 -0000 on 22/06/2011 23:09 Kenneth D. Merry said the following: > The GEOM event thread is stuck sleeping in the mtx_sleep() call above. So > that tells me that one of several things is going on: > > - There is a path in the cd(4) driver where it can call cam_periph_hold() > but not cam_periph_unhold(). > > - There is another thread in the system that has called cam_periph_hold(), > and has gotten stuck before it can call cam_periph_unhold(). > > - The hold/unhold logic is broken, and there is a case where a thread > waiting for the lock can miss the wakeup. After looking at the code, I > don't think this is the case, but I may have missed something. > > So it is probably one of the first two cases. From the dmesg, I only see > cd1 listed, not cd0. So it is possible that cd0 is stuck in the probe code > somewhere, and the geom code just gets stuck trying to open it when the > probe hasn't completed. > > Seeing the stack trace for the taskq thread that is running on CPU 0 > (process 100014) might be enlightening, it's hard to say. That may or may > not show the issue. > > It's possible that this issue is directly related to the commit in > question; perhaps there is an error being returned that wasn't returned > before and it isn't being handled right in the cd(4) driver. (The cd(4) > driver wasn't touched in the commit.) > > It's also possible that the commit in question just changed the timing and > your system is hitting a race that was there previously. I have a suspicion that this is actually the case. More than once I've seen under qemu that the kernel boot non-deterministically gets stuck in the cd driver. Other people have also bumped into this. E.g., here's one of the reports that I googled up, it's not exactly the same as what ache has reported, but somewhat similar: http://lists.freebsd.org/pipermail/freebsd-current/2010-October/020336.html -- Andriy Gapon