From owner-freebsd-current@FreeBSD.ORG Fri Jul 13 21:57:58 2007 Return-Path: X-Original-To: freebsd-current@freebsd.org Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id C579D16A406; Fri, 13 Jul 2007 21:57:58 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57]) by mx1.freebsd.org (Postfix) with ESMTP id 5CF6C13C4A6; Fri, 13 Jul 2007 21:57:58 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from scott-longs-computer.local (phobos.samsco.home [192.168.254.11]) (authenticated bits=0) by pooker.samsco.org (8.13.8/8.13.8) with ESMTP id l6DLvpPQ017537; Fri, 13 Jul 2007 15:57:52 -0600 (MDT) (envelope-from scottl@samsco.org) Message-ID: <4697F551.4090801@samsco.org> Date: Fri, 13 Jul 2007 15:57:37 -0600 From: Scott Long User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1.4) Gecko/20070509 SeaMonkey/1.1.2 MIME-Version: 1.0 To: Matt Reimer References: <200707131528.51396.jhb@freebsd.org> In-Reply-To: X-Enigmail-Version: 0.95.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by milter-greylist-2.0.2 (pooker.samsco.org [168.103.85.57]); Fri, 13 Jul 2007 15:57:52 -0600 (MDT) X-Spam-Status: No, score=-1.4 required=5.5 tests=ALL_TRUSTED autolearn=failed version=3.1.8 X-Spam-Checker-Version: SpamAssassin 3.1.8 (2007-02-13) on pooker.samsco.org Cc: freebsd-current@freebsd.org, scottl@freebsd.org Subject: Re: arcmsr crash X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 Jul 2007 21:57:58 -0000 Matt Reimer wrote: > On 7/13/07, John Baldwin wrote: >> On Tuesday 05 June 2007 05:22:38 pm Matt Reimer wrote: >> > Once a week or so we're seeing a panic with a -current kernel built >> > just before the gcc 4.2 import (maybe three weeks ago). The box has a >> > Supermicro X7DBE/X7DBE+ motherboard with two Xeon 5160s, 16G RAM, and >> > an Areca 1220 controller with eight 500G disks connected. >> > >> > Does this indicate that the arcmsr driver is at fault: >> > >> > Tracing command irq16: arcmsr0 pid 26 tid 100018 td 0xffffff040fc5b000 >> > cpustop_handler() at cpustop_handler+0x35 >> > ipi_nmi_handler() at ipi_nmi_handler+0x2e >> > trap() at trap+0x365 >> > nmi_calltrap() at nmi_calltrap+0x8 >> > --- trap 0x13, rip = 0xffffffff8041ab11, rsp = 0xffffffffab59eff0, rbp >> > = 0xffffffffac0a37d0 --- >> > siocnclose() at siocnclose+0x21 >> > sio_cnputc() at sio_cnputc+0x89 >> > cnputc() at cnputc+0x6a >> > putchar() at putchar+0x5f >> > kvprintf() at kvprintf+0xd45 >> > printf() at printf+0xe1 >> > panic() at panic+0x145 >> > xpt_done() at xpt_done+0x14a >> > arcmsr_interrupt() at arcmsr_interrupt+0x2df >> > ithread_loop() at ithread_loop+0x108 >> > fork_exit() at fork_exit+0xaa >> > fork_trampoline() at fork_trampoline+0xe >> > --- trap 0, rip = 0, rsp = 0xffffffffac0a3d30, rbp = 0 --- >> >> Looks like it has panic'd here: >> >> switch (done_ccb->ccb_h.path->periph->type) { >> case CAM_PERIPH_BIO: >> mtx_lock(&cam_bioq_lock); >> TAILQ_INSERT_TAIL(&cam_bioq, &done_ccb->ccb_h, >> sim_links.tqe); >> done_ccb->ccb_h.pinfo.index = CAM_DONEQ_INDEX; >> mtx_unlock(&cam_bioq_lock); >> swi_sched(cambio_ih, 0); >> break; >> default: >> panic("unknown periph type %d", >> done_ccb->ccb_h.path->periph->type); >> } >> >> which should seem to indicate that, yes, it is a driver bug. > > That code in -CURRENT looks a bit different (cam_simq_lock instead of > cam_bioq_lock, etc.). Is that relevant to your analysis? > > Matt The locking is different, but the problem is basically the same. Are you using 7-CURRENT or 6.x? Scott