From owner-freebsd-current@FreeBSD.ORG Mon Jan 19 03:39:24 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id AA7F616A4CE for ; Mon, 19 Jan 2004 03:39:24 -0800 (PST) Received: from gw.catspoiler.org (217-ip-163.nccn.net [209.79.217.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1E59643D1F for ; Mon, 19 Jan 2004 03:39:23 -0800 (PST) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.12.9p2/8.12.9) with ESMTP id i0JBdD7E055679; Mon, 19 Jan 2004 03:39:17 -0800 (PST) (envelope-from truckman@FreeBSD.org) Message-Id: <200401191139.i0JBdD7E055679@gw.catspoiler.org> Date: Mon, 19 Jan 2004 03:39:13 -0800 (PST) From: Don Lewis To: mjs@cc.tut.fi In-Reply-To: MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii cc: freebsd-current@FreeBSD.org Subject: Re: 5.2R: panic (syncer) on IBM x345 (SMP and Vinum) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Jan 2004 11:39:24 -0000 On 19 Jan, Matti Saarinen wrote: > > I've been able to crash a server (usenet news server) running 5.2R. > The crash happens with and without ACPI. The attached info is with > ACPI enabled. I would be very pleased if someone could tell me why the > box crashed and how to prevent it from happening. I tried searching > the list archives and googling wihout any positive result. > > The hardware is IBM x345 with two CPUs (Pentium4), internal LSI > SCSI/RAID controller and external IBM SCSI controller (which is really > Adaptec SCSI Card 29320LP). There is IBM ESX400 disk array connected > to the Adaptec controller. All the disks are U320 disk. > > The root filesystem is mirrored with the LSI adapter (which only > supports mirroring of two drives). There are three other mirrored > filesystems created with vinum. On all file systems except root, I've > enabled soft updates. I've tested all the filesystems (mirrored root, > vinum mirrors and filesystems created on single disks) with bonnie++ > and iozone and the server has behaved well. > (da0:ahd0:0:0:0): Retrying Command > (da0:ahd0:0:0:0): Queue Full > (da0:ahd0:0:0:0): tagged openings now 128 > (da0:ahd0:0:0:0): Retrying Command Try using the camcontrol modepage command to turn off write caching on each of the drives (set the WCE bit to 0). This should eliminate the need for the driver to crank down the number of tagged openings. Less stress on the error recovery code may keep the bug from being triggered. > Fatal trap 12: page fault while in kernel mode > cpuid = 0; apic id = 00 > fault virtual address = 0x0 > fault code = supervisor write, page not present > instruction pointer = 0x8:0xc07bcafe > stack pointer = 0x10:0xe7b96784 > frame pointer = 0x10:0xe7b967c0 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, def32 1, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 79 (syncer) > > > > Attached below are the verbose boot logs from the server and the > kernel debugger output. > trap_fatal(e7b96744,0,c0837ed0,2cd,cafe9500) at trap_fatal+0x326 > trap_pfault(e7b96744,0,0,1ea30e7,0) at trap_pfault+0x1c2 > trap(e7b90018,10,e7b90010,0,d9a46000) at trap+0x2fd > calltrap() at calltrap+0x5 > --- trap 0xc, eip = 0xc07bcafe, esp = 0xe7b96784, ebp = 0xe7b967c0 --- > generic_bcopy(d78de930,0,d78de930,e7b967e4,c06590e1) at generic_bcopy+0x1a > vinumstrategy(d78de930,cafe9500,e7b9680c,c05da937,d78de930) at vinumstrategy+0xa6 > dev_strategy(d78de930,0,2ee,1,c077dc95) at dev_strategy+0x41 > spec_xstrategy(cb6d071c,d78de930,e7b96828,c05d9c38,e7b96854) at spec_xstrategy+0x1d7 Looks like vinum is passing a NULL pointer to bcopy.