Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 20 Mar 2000 21:55:27 -0500 (EST)
From:      "John W. DeBoskey" <jwd@unx.sas.com>
To:        Mike Smith <msmith@freebsd.org>
Cc:        freebsd-current@freebsd.org, Brad Chisholm <sasblc@unx.sas.com>
Subject:   Re: AMI MegaRAID lockup? not accepting commands.
Message-ID:  <200003210255.VAA24932@bb01f39.unx.sas.com>
In-Reply-To: <200003210146.RAA15576@mass.cdrom.com> from Mike Smith at "Mar 20, 2000 05:46:50 pm"

next in thread | previous in thread | raw e-mail | index | archive | help
Hi,

   The controller is new. Dell calls it a Perc2/dc and it has 128Meg
of memory installed in it. I'm not sitting infront of the
machine right now. More detailed information is available
when the machines is booted and you enter the bios setup
on the adapter card.

> >    We have a system with a new AMI card in it controlling a pair
> > of shelves from Dell (fbsd dated: 4.0-20000313-SNAP).
> > 
> >    The relevant dmesg output is below: (complete dmesg at end)
> > 
> > amr0: <AMI MegaRAID> mem 0xf6c00000-0xf6ffffff irq 14 at device 10.1 on pci2
> > amr0: firmware 1.01 bios 1p00  128MB memory
> > amrd0: <MegaRAID logical drive> on amr0
> > amrd0: 172780MB (353853440 sectors) RAID 5 (optimal)
> > 
> >    The adapter does not lockup while testing with bonnie and such.
> 
> Try running 20 or so bonnie processes in parallel; I can usually get it 
> to lock up with this configuration.  I'm wondering which controller 
> you've got there though - I don't recognise the BIOS/firmware versions.
> 
> > However, we have a 50Gig CVS repository sitting on the raid
> > volume. When we do a 'cvs co' of -HEAD, it causes it to lockup.
> > The following messages are repeating continuously:
> > 
> > Mar 19 16:02:59 cvs /kernel: amr0: controller wedged (not taking commands)
> 
> I'm not sure why this happens; the controller isn't coming ready even 
> though we haven't hit any sort of limit that we're aware of.  I've been 
> considering some workarounds involving deferring the command until the 
> controller gives us back an interrupt, but I'm still surprised that we 
> get to this point at all.

   Well, we've been playing around in amr.c/amr_start in the following
code sequence:

    /* spin waiting for the mailbox */
    debug("wait for mailbox");
    for (i = 10000, done = 0, worked = 0; (i > 0) && !done; i--) {
        s = splbio();

        /* is the mailbox free? */
        if (sc->amr_mailbox->mb_busy == 0) {
            debug("got mailbox");
            sc->amr_mailbox64->mb64_segment = 0;
            bcopy(&ac->ac_mailbox, sc->amr_mailbox, AMR_MBOX_CMDSIZE);
            sc->amr_submit_command(sc);
            done = 1;
            sc->amr_workcount++;
            TAILQ_INSERT_TAIL(&sc->amr_work, ac, ac_link);

            /* not free, try to clean up while we wait */
        } else {
-->>       printf("%s: busy flag %x\n", __FUNCTION__, sc->amr_mailbox->mb_busy);
            debug("busy flag %x\n", sc->amr_mailbox->mb_busy);
            worked = amr_done(sc); 
        }
        splx(s);
    }




   Note the addition of the printf statement in the else clause. Two
interesting things happen. One, we are unable to cause the controller
to lock up. Two, the following messages showup in syslog:

Mar 20 12:55:15 cvsstage /kernel: amr_start: busy flag 1
Mar 20 12:55:46 cvsstage last message repeated 1057 times
Mar 20 12:57:47 cvsstage last message repeated 5574 times
Mar 20 12:59:26 cvsstage last message repeated 5431 times
Mar 20 12:59:26 cvsstage /kernel: amr_start: busy flag 0

   If I understand the sequence correctly, we enter splbio() and
then check the mailbox. Most of the time, we take the else clause
and the busy flag is 1 as it should be. However, once every 10 to 12
thousand loops, mb_busy is checked as being 1, but by the time we
get to the else clause, it's 0.

   I wonder if there is some sort of timing issue since the
addition of the printf allows the card to operate correctly. I
haven't traced the kernel printf code, but it could change the
spl level thus allowing the mb_busy flag to be modified.

   Comments?

> 
> Unfortunately, I'm not able to spend any time on this at the moment; if 
> someone wants to do a little experimenting I'd be very happy to talk them 
> through what I think should be done (will require some programming 
> ability).

   We're more than willing to try. Just point us in the right
direction.

> -- 
> \\ Give a man a fish, and you feed him for a day. \\  Mike Smith
> \\ Tell him he should learn how to fish himself,  \\  msmith@freebsd.org
> \\ and he'll hate you for a lifetime.             \\  msmith@cdrom.com

-John




To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200003210255.VAA24932>