From owner-freebsd-stable@FreeBSD.ORG Sun Jun 4 01:20:02 2006 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 31C1916A473 for ; Sun, 4 Jun 2006 01:20:02 +0000 (UTC) (envelope-from dwhite@gumbysoft.com) Received: from carver.gumbysoft.com (carver.gumbysoft.com [66.220.23.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id D1C0343D45 for ; Sun, 4 Jun 2006 01:20:01 +0000 (GMT) (envelope-from dwhite@gumbysoft.com) Received: by carver.gumbysoft.com (Postfix, from userid 1000) id 067C972DA5; Sat, 3 Jun 2006 18:18:35 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by carver.gumbysoft.com (Postfix) with ESMTP id 0071272DA2; Sat, 3 Jun 2006 18:18:34 -0700 (PDT) Date: Sat, 3 Jun 2006 18:18:34 -0700 (PDT) From: Doug White To: Brian Tao In-Reply-To: <20060603195754.Q15261-100000@as2.dm.egate.net> Message-ID: <20060603181136.C40001@carver.gumbysoft.com> References: <20060603195754.Q15261-100000@as2.dm.egate.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: FREEBSD-STABLE Subject: Re: 6.1 kernel unable to find /dev ? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 04 Jun 2006 01:20:02 -0000 On Sat, 3 Jun 2006, Brian Tao wrote: > I had a very stable 6.1-R amd64 server (once I swapped out some > bad RAM, that is) that needed a couple more hard drives installed. > There were some problems with the upgrade (device renumbering woes, > basically... topic of another thread), and it had to be rolled back. > > Upon rolling back, the previously-good kernel would no longer > complete the boot after the device probe. I saw two types of panics > on the serial console: > > | Trying to mount root from ufs:/dev/ad4s1a > | Lookup of /dev for devfs, error: 20 Error 20 is ENOTDIR which means something along the requested path exists, but it is not a directory. From this output it looks the root directory entry is somehow corrupted or being misinterpeted. > | exec /sbin/init: error 20 > | exec /sbin/oinit: error 20 > | exec /sbin/init.bak: error 20 > | exec /rescue/init: error 20 > | exec /stand/sysinstall: error 20 > | init: not found in path > | /sbin/init:/sbin/oinit:/sbin/init.bak:/rescue/init:/stand/sysinstall > | panic: no init > | Uptime: 8s > | Cannot dump. No dump device defined. > | Automatic reboot in 15 seconds - press a key on the console to abort > | --> Press a key on the console to reboot, > | --> or switch off the system now. > > ... and: > > | Trying to mount root from ufs:/dev/ad4s1a > | pid 47 (sh), uid 0: exited on signal 11 > | TPTE at 0xffff8000040028e0 IS ZERO @ VA 80051c000 > | panic: bad pte > | Uptime: 8s This is usually indicative of bad RAM or a faulty processor. Since you seem to be having disk problems, it may just be due to the disk returning faulty data. Or there is a bad kernel module in the mix that is randomly corrupting data. > The first one is suggesting that /dev does not exist (or is not a > directory)... I'm thinking this means that devfs is somehow > unavailable, but I did not think it is even possible to disable devfs > via the kernel config file these days. > > The second one leaves me clueless... I have not been able to find > any useful information on that panic during boot. Granted, I've only > see the "bad pte" panic twice... all other reboot attempts result in > the first type of problem. > > Fortunately, I did happen to keep an old 6.0-RELEASE-p6 kernel > around (Apr 15 2006 build). That kernel boots fine, using the same > filesystem as newer kernels on that drive. I am up-to-date with the > RELENG_6_1 tag. Should I perhaps to a make installkernel installworld > before rebooting? The installed binaries on the server are from an > early 6.1-RELEASE (which *was* successfully booted by this server). I > am running into a few minor but surmountable problems because of the > older kernel version, but I obviously would like to get my world and > kernel back in sync ASAP. My gut feeling is that there is still a disconnect on what the root filesystem is. That or there is hidden corruption that 6.0 isn't noticing that 6.1 is. Here's what I'd do next: 1. Capture the boot output from both the working 6.0 kernel and your broken 6.1 kernel and compare the two. If there are differences or errors being returned from the ATA controller or disks then those will need to be addressed. 2. Try a splat-over reinstall of 6.1-R from CD to force everything to match up. Mount the filesystems but don't mark them to be newfs'd. Install the GENERIC kernel only. If you are going to be tracking a branch, please read the instructions at the end of src/UPDATING on how to perform the build. There is a specific procedure and not following it can cause significant issues. While unlikely, it is possible to irreparibly damage the system by not following the instructions to the letter. -- Doug White | FreeBSD: The Power to Serve dwhite@gumbysoft.com | www.FreeBSD.org