From owner-freebsd-stable@FreeBSD.ORG  Sun Jun  4 01:20:02 2006
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
X-Original-To: freebsd-stable@freebsd.org
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 31C1916A473
	for <freebsd-stable@freebsd.org>; Sun,  4 Jun 2006 01:20:02 +0000 (UTC)
	(envelope-from dwhite@gumbysoft.com)
Received: from carver.gumbysoft.com (carver.gumbysoft.com [66.220.23.50])
	by mx1.FreeBSD.org (Postfix) with ESMTP id D1C0343D45
	for <freebsd-stable@freebsd.org>; Sun,  4 Jun 2006 01:20:01 +0000 (GMT)
	(envelope-from dwhite@gumbysoft.com)
Received: by carver.gumbysoft.com (Postfix, from userid 1000)
	id 067C972DA5; Sat,  3 Jun 2006 18:18:35 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1])
	by carver.gumbysoft.com (Postfix) with ESMTP id 0071272DA2;
	Sat,  3 Jun 2006 18:18:34 -0700 (PDT)
Date: Sat, 3 Jun 2006 18:18:34 -0700 (PDT)
From: Doug White <dwhite@gumbysoft.com>
To: Brian Tao <taob@risc.org>
In-Reply-To: <20060603195754.Q15261-100000@as2.dm.egate.net>
Message-ID: <20060603181136.C40001@carver.gumbysoft.com>
References: <20060603195754.Q15261-100000@as2.dm.egate.net>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: FREEBSD-STABLE <freebsd-stable@freebsd.org>
Subject: Re: 6.1 kernel unable to find /dev ?
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 04 Jun 2006 01:20:02 -0000

On Sat, 3 Jun 2006, Brian Tao wrote:

>    I had a very stable 6.1-R amd64 server (once I swapped out some
> bad RAM, that is) that needed a couple more hard drives installed.
> There were some problems with the upgrade (device renumbering woes,
> basically...  topic of another thread), and it had to be rolled back.
>
>    Upon rolling back, the previously-good kernel would no longer
> complete the boot after the device probe.  I saw two types of panics
> on the serial console:
>
> | Trying to mount root from ufs:/dev/ad4s1a
> | Lookup of /dev for devfs, error: 20

Error 20 is ENOTDIR which means something along the requested path exists, 
but it is not a directory. From this output it looks the root directory 
entry is somehow corrupted or being misinterpeted.

> | exec /sbin/init: error 20
> | exec /sbin/oinit: error 20
> | exec /sbin/init.bak: error 20
> | exec /rescue/init: error 20
> | exec /stand/sysinstall: error 20
> | init: not found in path
> | /sbin/init:/sbin/oinit:/sbin/init.bak:/rescue/init:/stand/sysinstall
> | panic: no init
> | Uptime: 8s
> | Cannot dump. No dump device defined.
> | Automatic reboot in 15 seconds - press a key on the console to abort
> | --> Press a key on the console to reboot,
> | --> or switch off the system now.
>
> ... and:
>
> | Trying to mount root from ufs:/dev/ad4s1a
> | pid 47 (sh), uid 0: exited on signal 11
> | TPTE at 0xffff8000040028e0  IS ZERO @ VA 80051c000
> | panic: bad pte
> | Uptime: 8s

This is usually indicative of bad RAM or a faulty processor. Since you 
seem to be having disk problems, it may just be due to the disk returning 
faulty data. Or there is a bad kernel module in the mix that is randomly 
corrupting data.

>    The first one is suggesting that /dev does not exist (or is not a
> directory)... I'm thinking this means that devfs is somehow
> unavailable, but I did not think it is even possible to disable devfs
> via the kernel config file these days.
>
>    The second one leaves me clueless... I have not been able to find
> any useful information on that panic during boot.  Granted, I've only
> see the "bad pte" panic twice... all other reboot attempts result in
> the first type of problem.
>
>    Fortunately, I did happen to keep an old 6.0-RELEASE-p6 kernel
> around (Apr 15 2006 build).  That kernel boots fine, using the same
> filesystem as newer kernels on that drive.  I am up-to-date with the
> RELENG_6_1 tag.  Should I perhaps to a make installkernel installworld
> before rebooting?  The installed binaries on the server are from an
> early 6.1-RELEASE (which *was* successfully booted by this server).  I
> am running into a few minor but surmountable problems because of the
> older kernel version, but I obviously would like to get my world and
> kernel back in sync ASAP.

My gut feeling is that there is still a disconnect on what the root 
filesystem is. That or there is hidden corruption that 6.0 isn't noticing 
that 6.1 is.  Here's what I'd do next:

1. Capture the boot output from both the working 6.0 kernel and your 
broken 6.1 kernel and compare the two. If there are differences or errors 
being returned from the ATA controller or disks then those will need to be 
addressed.

2. Try a splat-over reinstall of 6.1-R from CD to force everything to 
match up. Mount the filesystems but don't mark them to be newfs'd. Install 
the GENERIC kernel only.

If you are going to be tracking a branch, please read the instructions at 
the end of src/UPDATING on how to perform the build. There is a specific 
procedure and not following it can cause significant issues. While 
unlikely, it is possible to irreparibly damage the system by not following 
the instructions to the letter.

-- 
Doug White                    |  FreeBSD: The Power to Serve
dwhite@gumbysoft.com          |  www.FreeBSD.org