Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 21 Mar 2001 19:46:30 -0600
From:      rand@meridian-enviro.com
To:        Mike Smith <msmith@FreeBSD.ORG>
Cc:        freebsd-stable@FreeBSD.ORG, Mike Tancsa <mike@sentex.net>, bryanh@meridian-enviro.com
Subject:   Re: 3ware problems 
Message-ID:  <87bsqu60eh.wl@localhost.meridian-enviro.com>
In-Reply-To: <200103220121.f2M1KwE00867@mass.dis.org>
References:  <87u24m7kc0.wl@delta.meridian-enviro.com> <200103220121.f2M1KwE00867@mass.dis.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Mike> If you can add another function like twe_printstate that invokes
Mike> twe_print_request on each of the requests on the busy queue and
Mike> let me know what they look like, that might give me some clues.
 
Doug> OK, I haven't written the twe_printstate function yet, but I
Doug> think I have the request. I got the filesystem wedged first, and
Doug> then browsing the datastructures with DDB, I think I've found
Doug> the busy queue. Here's the request:

Mike> Cool, this works just as well. 8)

Doug> db> call twe_print_request(0xc1529800)
Doug> twe0: CMD: request_id 89  opcode <READ>  size 7  unit 0  host_id 0
Doug> twe0:  status 0  flags 0x0  count 16  sgl_offset 3
Doug> twe0:  lba 264703
Doug> twe0:   0: 0xce4f000/4096
Doug> twe0:   1: 0x2ab0000/4096
Doug> twe0:  tr_command 0xc1529800/0x1749d800  tr_data 0xcb928000/0xce4f000,8192
Doug> twe0:  tr_status 2  tr_flags 0x1  tr_complete 0xc011f170  tr_private 0

Mike> Er.  This is bad; tr_status == 2 means that the command has been
Mike> completed; it shouldn't still be on the busy queue.  Can you
Mike> check to make sure you have the right queue here?

I am not at all positive I've got the right queue. I *think* I do. I'm
trying to break it again now, and I'll use the code below to verify
the queue. I'm also going to hit the kernel core with gdb to see if I
can verify that. 

Doug> I'm rebuilding the kernel now with the function twe_printstate,
Doug> after I figured it out with the debugger. (This reminds me of a
Doug> saying that has to do with horses and carriages, hmm.)

Mike> Hrm.  It *should* be pretty easy; I'm sorry I confused you with
Mike> the 'printstate' reference; you should be able to fix up
Mike> twe_report to just dump the busy queue:

Mike> 	struct twe_request	*tr;
Mike> ...

Mike> 	TAILQ_FOREACH(tr, TAILQ_FIRST(sc->twe_busy), tr_link)
Mike> 		twe_print_request(tr);

This doesn't compile for me. Every time I try to use 'sc->twe_busy' I
get a syntax error: invalid type argument of `->'

Here is what I'm using right now:

    s = splbio();
    for (i = 0; (sc = devclass_get_softc(twe_devclass, i)) != NULL; i++) {
        twe_print_controller(sc);
        printf("ready queue: %d entries\n", sc->twe_qstat[TWEQ_READY].q_length);
        TAILQ_FOREACH(tr, sc->twe_ready, tr_link) twe_print_request(tr);
        printf("busy queue: %d entries\n", sc->twe_qstat[TWEQ_BUSY].q_length);
        TAILQ_FOREACH(tr, sc->twe_busy, tr_link) twe_print_request(tr);
        printf("complete queue: %d entries\n", sc->twe_qstat[TWEQ_COMPLETE].q_length);
        TAILQ_FOREACH(tr, sc->twe_complete, tr_link) twe_print_request(tr);
    }
    splx(s);

This compiles, and when I run it it doesn't crash!  :) In fact, it
says all the queues are empty.

Doug> Oh, btw, it took over 3 million rows to get it stuck this
Doug> time. Gotta love a test cycle of 6 hours or so.  Sigh.

Mike> This is obviously a really weird case; possibly either an
Mike> extremely narrow race, or some very borderline PCI issue.  One
Mike> question I should have asked, but don't recall whether you
Mike> answered; are you using an AMD K7 system by any chance?  We've
Mike> seen some *very* weird behaviour with these controllers in some
Mike> K7 systems.

Yes, it *is* really weird. I can only get it to break with MySQL. From
a suggestion of Mike Tancsa, I tried lots of concurrent bonnies, and
also running a buildworld with a high -j value. I let both run for
about 12 hours each, with no failure. The only thing that'll kill it
is MySQL.  I'm confused.  :(

Nope. Its a SuperMicro P6DBU, with dual 400MHz CPUs. 

Mike> Thanks again for your help here.

My pleasure. (Calling what we are doing 'help' is complementary. If
this is help, what you are doing for us must be close to divine
intervention!  :))

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?87bsqu60eh.wl>