Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 13 Jul 2007 16:38:14 -0700
From:      "Matt Reimer" <mattjreimer@gmail.com>
To:        "Scott Long" <scottl@samsco.org>
Cc:        freebsd-current@freebsd.org
Subject:   Re: arcmsr crash
Message-ID:  <f383264b0707131638g6a9dcf84yaeb1ed233086717d@mail.gmail.com>
In-Reply-To: <46980AE2.6070206@samsco.org>
References:  <f383264b0706051422s6579746ap53a9206c36491dae@mail.gmail.com> <200707131528.51396.jhb@freebsd.org> <f383264b0707131336l2d552d56l4140a2521549bfdf@mail.gmail.com> <4697F551.4090801@samsco.org> <f383264b0707131621t7d167dbdw7ab94fe5fe1f4c58@mail.gmail.com> <46980AE2.6070206@samsco.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 7/13/07, Scott Long <scottl@samsco.org> wrote:
> Matt Reimer wrote:
> > On 7/13/07, Scott Long <scottl@samsco.org> wrote:
> >> Matt Reimer wrote:
> >> > On 7/13/07, John Baldwin <jhb@freebsd.org> wrote:
> >> >> On Tuesday 05 June 2007 05:22:38 pm Matt Reimer wrote:
> >> >> > Once a week or so we're seeing a panic with a -current kernel built
> >> >> > just before the gcc 4.2 import (maybe three weeks ago). The box
> >> has a
> >> >> > Supermicro X7DBE/X7DBE+ motherboard with two Xeon 5160s, 16G RAM,
> >> and
> >> >> > an Areca 1220 controller with eight 500G disks connected.
> >> >> >
> >> >> > Does this indicate that the arcmsr driver is at fault:
> >> >> >
> >> >> > Tracing command irq16: arcmsr0 pid 26 tid 100018 td
> >> 0xffffff040fc5b000
> >> >> > cpustop_handler() at cpustop_handler+0x35
> >> >> > ipi_nmi_handler() at ipi_nmi_handler+0x2e
> >> >> > trap() at trap+0x365
> >> >> > nmi_calltrap() at nmi_calltrap+0x8
> >> >> > --- trap 0x13, rip = 0xffffffff8041ab11, rsp =
> >> 0xffffffffab59eff0, rbp
> >> >> > = 0xffffffffac0a37d0 ---
> >> >> > siocnclose() at siocnclose+0x21
> >> >> > sio_cnputc() at sio_cnputc+0x89
> >> >> > cnputc() at cnputc+0x6a
> >> >> > putchar() at putchar+0x5f
> >> >> > kvprintf() at kvprintf+0xd45
> >> >> > printf() at printf+0xe1
> >> >> > panic() at panic+0x145
> >> >> > xpt_done() at xpt_done+0x14a
> >> >> > arcmsr_interrupt() at arcmsr_interrupt+0x2df
> >> >> > ithread_loop() at ithread_loop+0x108
> >> >> > fork_exit() at fork_exit+0xaa
> >> >> > fork_trampoline() at fork_trampoline+0xe
> >> >> > --- trap 0, rip = 0, rsp = 0xffffffffac0a3d30, rbp = 0 ---
> >> >>
> >> >> Looks like it has panic'd here:
> >> >>
> >> >>                 switch (done_ccb->ccb_h.path->periph->type) {
> >> >>                 case CAM_PERIPH_BIO:
> >> >>                         mtx_lock(&cam_bioq_lock);
> >> >>                         TAILQ_INSERT_TAIL(&cam_bioq, &done_ccb->ccb_h,
> >> >>                                           sim_links.tqe);
> >> >>                         done_ccb->ccb_h.pinfo.index = CAM_DONEQ_INDEX;
> >> >>                         mtx_unlock(&cam_bioq_lock);
> >> >>                         swi_sched(cambio_ih, 0);
> >> >>                         break;
> >> >>                 default:
> >> >>                         panic("unknown periph type %d",
> >> >>                             done_ccb->ccb_h.path->periph->type);
> >> >>                 }
> >> >>
> >> >> which should seem to indicate that, yes, it is a driver bug.
> >> >
> >> > That code in -CURRENT looks a bit different (cam_simq_lock instead of
> >> > cam_bioq_lock, etc.). Is that relevant to your analysis?
> >> >
> >> > Matt
> >>
> >> The locking is different, but the problem is basically the same.  Are
> >> you using 7-CURRENT or 6.x?
> >
> > 7-CURRENT from right before the gcc upgrade.
> >
> > Matt
>
> Crud.... now that I look closer, I can definitely see the locking
> problems in the driver.  I think the locking will have to be completely
> overhauled.  Can I use you as a guinea pig for testing?

Please do!

What's the gist of the problem?

Matt



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?f383264b0707131638g6a9dcf84yaeb1ed233086717d>