From owner-freebsd-stable@FreeBSD.ORG Tue Oct 14 18:39:52 2008 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4BEF51065694; Tue, 14 Oct 2008 18:39:52 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from server.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id BA2118FC21; Tue, 14 Oct 2008 18:39:51 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from localhost.corp.yahoo.com (john@localhost [IPv6:::1]) (authenticated bits=0) by server.baldwin.cx (8.14.2/8.14.2) with ESMTP id m9EIdUh1025092; Tue, 14 Oct 2008 14:39:36 -0400 (EDT) (envelope-from jhb@freebsd.org) From: John Baldwin To: Gerrit =?iso-8859-1?q?K=FChn?= Date: Tue, 14 Oct 2008 13:12:04 -0400 User-Agent: KMail/1.9.7 References: <20080807132947.061d24eb.gerrit@pmp.uni-hannover.de> <200810131027.40630.jhb@freebsd.org> <20081014115333.3d3e41ab.gerrit@pmp.uni-hannover.de> In-Reply-To: <20081014115333.3d3e41ab.gerrit@pmp.uni-hannover.de> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Message-Id: <200810141312.05094.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by milter-greylist-2.0.2 (server.baldwin.cx [IPv6:::1]); Tue, 14 Oct 2008 14:39:37 -0400 (EDT) X-Virus-Scanned: ClamAV 0.93.1/8424/Tue Oct 14 12:10:56 2008 on server.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.6 required=4.2 tests=BAYES_00,NO_RELAYS autolearn=ham version=3.1.3 X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on server.baldwin.cx Cc: freebsd-stable@freebsd.org, Jeremy Chadwick , d@delphij.net, jeff@freebsd.org Subject: Re: Regression 7.0R -> 7-stable? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Oct 2008 18:39:52 -0000 On Tuesday 14 October 2008 05:53:33 am Gerrit K=FChn wrote: > On Mon, 13 Oct 2008 10:27:40 -0400 John Baldwin wrote > about Re: Regression 7.0R -> 7-stable?: >=20 > JB> On Monday 13 October 2008 03:09:46 am Gerrit K=FChn wrote: >=20 > JB> > JB> Ok, can you run gdb on your kernel.debug and do > JB> > JB> 'l *0xffffffff804608c0' >=20 > JB> > 0xffffffff804608c0 is in scheduler (/usr/src/sys/vm/vm_glue.c:670). > JB> > [...lines 665-674...] >=20 > JB> I was afraid of that, it basically means that it finished the entire > JB> boot process. =20 >=20 > I already thought so because I saw a grey (not white) cursor afterwards. Are you sure you aren't using dual consoles somehow with serial being prima= ry? =20 If you break into the loader, what does 'show console' show? > JB> The next step is that init (pid 1) should be scheduled > JB> and try to execute. You can maybe add some printf's to the code to > JB> start up init to see how far it gets. The routine in question is > JB> 'start_init()' in sys/kern/init_main.c. >=20 > Let me see... > I added my first printf in line 619 (and several after that), right after > the "Need just enough stack..." comment. This was never reached, the > system hangs before that. >=20 > After that I added printf before and after vfs_mountroot(). Now the things > runs just a bit further for the first time. I see my new printfs and > between them the message "Trying to mount root from ufs:/dev/ad0s1a". > After that come all my printfs I had added before, followed by > "start_init: trying /sbin/init". Then it hangs again. >=20 > I am a bit puzzled because I did not see the "Trying to mount..." and > "start_init:..." messages before. Just trying again to boot with the > same setup hangs in vfs_mountroot() (printf before is displayed, printf > after not). It appears to me as if the hang is caused by some kind of > "parallel task", and what I am seeing on the console stops a bit earlier > or later depending on that. > As I am seeing this only with the ULE-scheduler: Is the scheduler already > in action at this point, and may the hang depend on what it is deciding > to do? Hmmm, I'm really not sure. I wonder if you are having some sort of interru= pt=20 storm. What if you disable SMP via 'kern.smp.disabled=3D1' in the loader, = does=20 that help at all? =2D-=20 John Baldwin