Date: Fri, 13 Jul 2007 15:57:37 -0600 From: Scott Long <scottl@samsco.org> To: Matt Reimer <mattjreimer@gmail.com> Cc: freebsd-current@freebsd.org, scottl@freebsd.org Subject: Re: arcmsr crash Message-ID: <4697F551.4090801@samsco.org> In-Reply-To: <f383264b0707131336l2d552d56l4140a2521549bfdf@mail.gmail.com> References: <f383264b0706051422s6579746ap53a9206c36491dae@mail.gmail.com> <200707131528.51396.jhb@freebsd.org> <f383264b0707131336l2d552d56l4140a2521549bfdf@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Matt Reimer wrote:
> On 7/13/07, John Baldwin <jhb@freebsd.org> wrote:
>> On Tuesday 05 June 2007 05:22:38 pm Matt Reimer wrote:
>> > Once a week or so we're seeing a panic with a -current kernel built
>> > just before the gcc 4.2 import (maybe three weeks ago). The box has a
>> > Supermicro X7DBE/X7DBE+ motherboard with two Xeon 5160s, 16G RAM, and
>> > an Areca 1220 controller with eight 500G disks connected.
>> >
>> > Does this indicate that the arcmsr driver is at fault:
>> >
>> > Tracing command irq16: arcmsr0 pid 26 tid 100018 td 0xffffff040fc5b000
>> > cpustop_handler() at cpustop_handler+0x35
>> > ipi_nmi_handler() at ipi_nmi_handler+0x2e
>> > trap() at trap+0x365
>> > nmi_calltrap() at nmi_calltrap+0x8
>> > --- trap 0x13, rip = 0xffffffff8041ab11, rsp = 0xffffffffab59eff0, rbp
>> > = 0xffffffffac0a37d0 ---
>> > siocnclose() at siocnclose+0x21
>> > sio_cnputc() at sio_cnputc+0x89
>> > cnputc() at cnputc+0x6a
>> > putchar() at putchar+0x5f
>> > kvprintf() at kvprintf+0xd45
>> > printf() at printf+0xe1
>> > panic() at panic+0x145
>> > xpt_done() at xpt_done+0x14a
>> > arcmsr_interrupt() at arcmsr_interrupt+0x2df
>> > ithread_loop() at ithread_loop+0x108
>> > fork_exit() at fork_exit+0xaa
>> > fork_trampoline() at fork_trampoline+0xe
>> > --- trap 0, rip = 0, rsp = 0xffffffffac0a3d30, rbp = 0 ---
>>
>> Looks like it has panic'd here:
>>
>> switch (done_ccb->ccb_h.path->periph->type) {
>> case CAM_PERIPH_BIO:
>> mtx_lock(&cam_bioq_lock);
>> TAILQ_INSERT_TAIL(&cam_bioq, &done_ccb->ccb_h,
>> sim_links.tqe);
>> done_ccb->ccb_h.pinfo.index = CAM_DONEQ_INDEX;
>> mtx_unlock(&cam_bioq_lock);
>> swi_sched(cambio_ih, 0);
>> break;
>> default:
>> panic("unknown periph type %d",
>> done_ccb->ccb_h.path->periph->type);
>> }
>>
>> which should seem to indicate that, yes, it is a driver bug.
>
> That code in -CURRENT looks a bit different (cam_simq_lock instead of
> cam_bioq_lock, etc.). Is that relevant to your analysis?
>
> Matt
The locking is different, but the problem is basically the same. Are
you using 7-CURRENT or 6.x?
Scott
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4697F551.4090801>
