From owner-freebsd-mobile@FreeBSD.ORG Sat Jul 10 18:22:40 2004 Return-Path: Delivered-To: freebsd-mobile@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 073BA16A4CE for ; Sat, 10 Jul 2004 18:22:40 +0000 (GMT) Received: from arginine.spc.org (arginine.spc.org [195.206.69.236]) by mx1.FreeBSD.org (Postfix) with ESMTP id F1A4743D49 for ; Sat, 10 Jul 2004 18:22:38 +0000 (GMT) (envelope-from bms@spc.org) Received: from localhost (localhost [127.0.0.1]) by arginine.spc.org (Postfix) with ESMTP id 15C0D653E8; Sat, 10 Jul 2004 19:22:37 +0100 (BST) Received: from arginine.spc.org ([127.0.0.1]) by localhost (arginine.spc.org [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 83031-01; Sat, 10 Jul 2004 19:22:36 +0100 (BST) Received: from empiric.dek.spc.org (host81-156-14-104.range81-156.btcentralplus.com [81.156.14.104]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by arginine.spc.org (Postfix) with ESMTP id 13661652FE; Sat, 10 Jul 2004 19:22:36 +0100 (BST) Received: by empiric.dek.spc.org (Postfix, from userid 1001) id EE5F2615E; Sat, 10 Jul 2004 19:22:35 +0100 (BST) Date: Sat, 10 Jul 2004 19:22:35 +0100 From: Bruce M Simpson To: freebsd-mobile@freebsd.org Message-ID: <20040710182235.GA838@empiric.dek.spc.org> Mail-Followup-To: freebsd-mobile@freebsd.org, Dan Langille References: <40EDB001.4311.E98BBCCC@localhost> <20040709013152.GR15368@empiric.dek.spc.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20040709013152.GR15368@empiric.dek.spc.org> cc: Dan Langille Subject: T41 CDRW page fault saga X-BeenThere: freebsd-mobile@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Mobile computing with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 10 Jul 2004 18:22:40 -0000 On Fri, Jul 09, 2004 at 02:31:52AM +0100, Bruce M Simpson wrote: > If we can establish that the problem is isolated to a specific ATA > controller revision, we may be getting somewhere.... I've got more data from the local user's affected machine. We had to manually transcribe the messages as I don't have enough firewire kit around to do dcons. This is the kernel I'm using: FreeBSD empiric.dek.spc.org 5.2-CURRENT FreeBSD 5.2-CURRENT #1: Tue Jul 6 23:17:47 BST 2004 bms@kimchi.dek.spc.org:/usr/src/sys/i386/compile/EMPIRIC i386 There isn't a panic per se. The page fault only manifests itself on the affected T41 when the CDRW module is inserted; if it's removed during boot, all is well. We managed to pull a backtrace. It's clear this happens only during mountroot and it could be a trashed stack. The addresses, of course, are specific to my production -CURRENT kernel (I usually build kernel.debug), I couldn't get a panic (it kept complaining of not having enough room on my dumpdev, although I know for a fact I have enough blocks to cover physical memory which is 512MB on this box). This message occurs immediately after mountroot is attempted (it finds the root filesystem correctly) and after the ATAPI_IDENTIFY messages which others have reported (inspection of the ata driver suggests these messages are benign, but green@ has since posted patches which address the 'device atapicam' case): ---8<---8<--- Fatal trap 12: page fault while in kernel mode fault virtual address = 0x1ff01ff fault code = supervisor read, page not present instruction pointer = 0x08:0x1ff01ff stack pointer = 0x10:0xd3e9cb30 frame pointer = 0x10:0xd3e9cb54 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 1 (swapper) kernel: type 12 trap, code = 0 Stopped at 0x1ff01ff ---8<---8<--- (On entry into DDB: eip = 0xc05d4808, esp = 0xd3e9c97c, fp = 0xd3e9c980) We managed to get a backtrace using "show thr" as follows (we didn't transcribe the stack parameters, just the backtrace):- ---8<---8<--- kernload at 0x1ff01ff devfs_allocv at devfs_allocv+0x13c devfs_root at devfs_root+0x23 devfs_nmount at devfs_nmount+0xaf getdiskbyname at getdiskbyname+0xb1 setrootbyname at setrootbyname+0xb vfs_mountroot_try at vfs_mountroot_try+0xcf vfs_mountroot at vfs_mountroot+0x6b start_init at start_init+0x53 fork_exit fork_trampoline ---8<---8<--- I'll try to pin down the exact opcode/line in devfs_allocv() where the call stack appears to be getting to screwed up. Hopefully this helps continuing efforts to debug this problem. Regards, BMS