From owner-freebsd-mobile@FreeBSD.ORG  Sat Jul 10 18:22:40 2004
Return-Path: <owner-freebsd-mobile@FreeBSD.ORG>
Delivered-To: freebsd-mobile@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 073BA16A4CE
	for <freebsd-mobile@freebsd.org>;
	Sat, 10 Jul 2004 18:22:40 +0000 (GMT)
Received: from arginine.spc.org (arginine.spc.org [195.206.69.236])
	by mx1.FreeBSD.org (Postfix) with ESMTP id F1A4743D49
	for <freebsd-mobile@freebsd.org>;
	Sat, 10 Jul 2004 18:22:38 +0000 (GMT)	(envelope-from bms@spc.org)
Received: from localhost (localhost [127.0.0.1])
	by arginine.spc.org (Postfix) with ESMTP
	id 15C0D653E8; Sat, 10 Jul 2004 19:22:37 +0100 (BST)
Received: from arginine.spc.org ([127.0.0.1])
 by localhost (arginine.spc.org [127.0.0.1]) (amavisd-new, port 10024)
 with LMTP id 83031-01; Sat, 10 Jul 2004 19:22:36 +0100 (BST)
Received: from empiric.dek.spc.org
	(host81-156-14-104.range81-156.btcentralplus.com [81.156.14.104])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by arginine.spc.org (Postfix) with ESMTP
	id 13661652FE; Sat, 10 Jul 2004 19:22:36 +0100 (BST)
Received: by empiric.dek.spc.org (Postfix, from userid 1001)
	id EE5F2615E; Sat, 10 Jul 2004 19:22:35 +0100 (BST)
Date: Sat, 10 Jul 2004 19:22:35 +0100
From: Bruce M Simpson <bms@spc.org>
To: freebsd-mobile@freebsd.org
Message-ID: <20040710182235.GA838@empiric.dek.spc.org>
Mail-Followup-To: freebsd-mobile@freebsd.org,
	Dan Langille <dan@langille.org>
References: <40EDB001.4311.E98BBCCC@localhost>
	<20040709013152.GR15368@empiric.dek.spc.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20040709013152.GR15368@empiric.dek.spc.org>
cc: Dan Langille <dan@langille.org>
Subject: T41 CDRW page fault saga
X-BeenThere: freebsd-mobile@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Mobile computing with FreeBSD <freebsd-mobile.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-mobile>,
	<mailto:freebsd-mobile-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-mobile>
List-Post: <mailto:freebsd-mobile@freebsd.org>
List-Help: <mailto:freebsd-mobile-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-mobile>,
	<mailto:freebsd-mobile-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 10 Jul 2004 18:22:40 -0000

On Fri, Jul 09, 2004 at 02:31:52AM +0100, Bruce M Simpson wrote:
> If we can establish that the problem is isolated to a specific ATA
> controller revision, we may be getting somewhere....

I've got more data from the local user's affected machine. We had to
manually transcribe the messages as I don't have enough firewire kit
around to do dcons.

This is the kernel I'm using:
FreeBSD empiric.dek.spc.org 5.2-CURRENT FreeBSD 5.2-CURRENT #1: Tue Jul  6 23:17:47 BST 2004     bms@kimchi.dek.spc.org:/usr/src/sys/i386/compile/EMPIRIC  i386

There isn't a panic per se. The page fault only manifests itself on the
affected T41 when the CDRW module is inserted; if it's removed during boot,
all is well.

We managed to pull a backtrace. It's clear this happens only during
mountroot and it could be a trashed stack. The addresses, of course,
are specific to my production -CURRENT kernel (I usually build kernel.debug),

I couldn't get a panic (it kept complaining of not having enough room
on my dumpdev, although I know for a fact I have enough blocks to cover
physical memory which is 512MB on this box).

This message occurs immediately after mountroot is attempted (it finds
the root filesystem correctly) and after the ATAPI_IDENTIFY messages which
others have reported (inspection of the ata driver suggests these messages
are benign, but green@ has since posted patches which address the
'device atapicam' case):

---8<---8<---
Fatal trap 12: page fault while in kernel mode
fault virtual address = 0x1ff01ff
fault code            = supervisor read, page not present
instruction pointer   = 0x08:0x1ff01ff
stack pointer         = 0x10:0xd3e9cb30
frame pointer         = 0x10:0xd3e9cb54
code segment          = base 0x0, limit 0xfffff, type 0x1b
                      = DPL 0, pres 1, def32 1, gran 1
processor eflags      = interrupt enabled, resume, IOPL = 0
current process       = 1 (swapper)
kernel: type 12 trap, code = 0
Stopped at   0x1ff01ff
---8<---8<---

(On entry into DDB: eip = 0xc05d4808, esp = 0xd3e9c97c, fp = 0xd3e9c980)

We managed to get a backtrace using "show thr" as follows (we didn't
transcribe the stack parameters, just the backtrace):-

---8<---8<---
kernload at 0x1ff01ff
devfs_allocv at devfs_allocv+0x13c
devfs_root at devfs_root+0x23
devfs_nmount at devfs_nmount+0xaf
getdiskbyname at getdiskbyname+0xb1
setrootbyname at setrootbyname+0xb
vfs_mountroot_try at vfs_mountroot_try+0xcf
vfs_mountroot at vfs_mountroot+0x6b
start_init at start_init+0x53
fork_exit
fork_trampoline
---8<---8<---

I'll try to pin down the exact opcode/line in devfs_allocv() where the
call stack appears to be getting to screwed up.

Hopefully this helps continuing efforts to debug this problem.

Regards,
BMS