From owner-freebsd-current@FreeBSD.ORG Fri Jul 13 23:38:15 2007 Return-Path: X-Original-To: freebsd-current@freebsd.org Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 98B7E16A404 for ; Fri, 13 Jul 2007 23:38:15 +0000 (UTC) (envelope-from mattjreimer@gmail.com) Received: from an-out-0708.google.com (an-out-0708.google.com [209.85.132.250]) by mx1.freebsd.org (Postfix) with ESMTP id 543EE13C478 for ; Fri, 13 Jul 2007 23:38:15 +0000 (UTC) (envelope-from mattjreimer@gmail.com) Received: by an-out-0708.google.com with SMTP id c14so157533anc for ; Fri, 13 Jul 2007 16:38:14 -0700 (PDT) DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=YddXIDLGDBolKmyPM4ph2ZTsM7aGyS/WqmnNh8DRkywrrdFlm7jvGrS7Uk4ck6jtwo3NVy7I5WbzG68qqjvN6ho/9U9Qi0+267l9F4fswaP8DLul72k+mTOJEXnO86JSZHoWT4QZClK3ZAB8qmEf0OH7L7z9JA4ImeoOhGYmSsk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=reDP6wzvDP85rIpyovJnZvVXYM660Hnd2jHj0lnbxn5tOOxHDxwXc32Vywk5GeQt1FkOHAh8f6thLpkrKPfe9jEKXgG4PL1Qe3bP0vcwNDVIqvnAEKsCDNz8Mj+4R86/W8Q5Cd/arExFdCHq7UoFy1wfPAJ1T+RjTQYQ/6EQu3o= Received: by 10.100.153.17 with SMTP id a17mr1188861ane.1184369894390; Fri, 13 Jul 2007 16:38:14 -0700 (PDT) Received: by 10.100.142.3 with HTTP; Fri, 13 Jul 2007 16:38:14 -0700 (PDT) Message-ID: Date: Fri, 13 Jul 2007 16:38:14 -0700 From: "Matt Reimer" To: "Scott Long" In-Reply-To: <46980AE2.6070206@samsco.org> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <200707131528.51396.jhb@freebsd.org> <4697F551.4090801@samsco.org> <46980AE2.6070206@samsco.org> Cc: freebsd-current@freebsd.org Subject: Re: arcmsr crash X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 Jul 2007 23:38:15 -0000 On 7/13/07, Scott Long wrote: > Matt Reimer wrote: > > On 7/13/07, Scott Long wrote: > >> Matt Reimer wrote: > >> > On 7/13/07, John Baldwin wrote: > >> >> On Tuesday 05 June 2007 05:22:38 pm Matt Reimer wrote: > >> >> > Once a week or so we're seeing a panic with a -current kernel built > >> >> > just before the gcc 4.2 import (maybe three weeks ago). The box > >> has a > >> >> > Supermicro X7DBE/X7DBE+ motherboard with two Xeon 5160s, 16G RAM, > >> and > >> >> > an Areca 1220 controller with eight 500G disks connected. > >> >> > > >> >> > Does this indicate that the arcmsr driver is at fault: > >> >> > > >> >> > Tracing command irq16: arcmsr0 pid 26 tid 100018 td > >> 0xffffff040fc5b000 > >> >> > cpustop_handler() at cpustop_handler+0x35 > >> >> > ipi_nmi_handler() at ipi_nmi_handler+0x2e > >> >> > trap() at trap+0x365 > >> >> > nmi_calltrap() at nmi_calltrap+0x8 > >> >> > --- trap 0x13, rip = 0xffffffff8041ab11, rsp = > >> 0xffffffffab59eff0, rbp > >> >> > = 0xffffffffac0a37d0 --- > >> >> > siocnclose() at siocnclose+0x21 > >> >> > sio_cnputc() at sio_cnputc+0x89 > >> >> > cnputc() at cnputc+0x6a > >> >> > putchar() at putchar+0x5f > >> >> > kvprintf() at kvprintf+0xd45 > >> >> > printf() at printf+0xe1 > >> >> > panic() at panic+0x145 > >> >> > xpt_done() at xpt_done+0x14a > >> >> > arcmsr_interrupt() at arcmsr_interrupt+0x2df > >> >> > ithread_loop() at ithread_loop+0x108 > >> >> > fork_exit() at fork_exit+0xaa > >> >> > fork_trampoline() at fork_trampoline+0xe > >> >> > --- trap 0, rip = 0, rsp = 0xffffffffac0a3d30, rbp = 0 --- > >> >> > >> >> Looks like it has panic'd here: > >> >> > >> >> switch (done_ccb->ccb_h.path->periph->type) { > >> >> case CAM_PERIPH_BIO: > >> >> mtx_lock(&cam_bioq_lock); > >> >> TAILQ_INSERT_TAIL(&cam_bioq, &done_ccb->ccb_h, > >> >> sim_links.tqe); > >> >> done_ccb->ccb_h.pinfo.index = CAM_DONEQ_INDEX; > >> >> mtx_unlock(&cam_bioq_lock); > >> >> swi_sched(cambio_ih, 0); > >> >> break; > >> >> default: > >> >> panic("unknown periph type %d", > >> >> done_ccb->ccb_h.path->periph->type); > >> >> } > >> >> > >> >> which should seem to indicate that, yes, it is a driver bug. > >> > > >> > That code in -CURRENT looks a bit different (cam_simq_lock instead of > >> > cam_bioq_lock, etc.). Is that relevant to your analysis? > >> > > >> > Matt > >> > >> The locking is different, but the problem is basically the same. Are > >> you using 7-CURRENT or 6.x? > > > > 7-CURRENT from right before the gcc upgrade. > > > > Matt > > Crud.... now that I look closer, I can definitely see the locking > problems in the driver. I think the locking will have to be completely > overhauled. Can I use you as a guinea pig for testing? Please do! What's the gist of the problem? Matt