Date: Wed, 21 Mar 2001 19:46:30 -0600 From: rand@meridian-enviro.com To: Mike Smith <msmith@FreeBSD.ORG> Cc: freebsd-stable@FreeBSD.ORG, Mike Tancsa <mike@sentex.net>, bryanh@meridian-enviro.com Subject: Re: 3ware problems Message-ID: <87bsqu60eh.wl@localhost.meridian-enviro.com> In-Reply-To: <200103220121.f2M1KwE00867@mass.dis.org> References: <87u24m7kc0.wl@delta.meridian-enviro.com> <200103220121.f2M1KwE00867@mass.dis.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Mike> If you can add another function like twe_printstate that invokes Mike> twe_print_request on each of the requests on the busy queue and Mike> let me know what they look like, that might give me some clues. Doug> OK, I haven't written the twe_printstate function yet, but I Doug> think I have the request. I got the filesystem wedged first, and Doug> then browsing the datastructures with DDB, I think I've found Doug> the busy queue. Here's the request: Mike> Cool, this works just as well. 8) Doug> db> call twe_print_request(0xc1529800) Doug> twe0: CMD: request_id 89 opcode <READ> size 7 unit 0 host_id 0 Doug> twe0: status 0 flags 0x0 count 16 sgl_offset 3 Doug> twe0: lba 264703 Doug> twe0: 0: 0xce4f000/4096 Doug> twe0: 1: 0x2ab0000/4096 Doug> twe0: tr_command 0xc1529800/0x1749d800 tr_data 0xcb928000/0xce4f000,8192 Doug> twe0: tr_status 2 tr_flags 0x1 tr_complete 0xc011f170 tr_private 0 Mike> Er. This is bad; tr_status == 2 means that the command has been Mike> completed; it shouldn't still be on the busy queue. Can you Mike> check to make sure you have the right queue here? I am not at all positive I've got the right queue. I *think* I do. I'm trying to break it again now, and I'll use the code below to verify the queue. I'm also going to hit the kernel core with gdb to see if I can verify that. Doug> I'm rebuilding the kernel now with the function twe_printstate, Doug> after I figured it out with the debugger. (This reminds me of a Doug> saying that has to do with horses and carriages, hmm.) Mike> Hrm. It *should* be pretty easy; I'm sorry I confused you with Mike> the 'printstate' reference; you should be able to fix up Mike> twe_report to just dump the busy queue: Mike> struct twe_request *tr; Mike> ... Mike> TAILQ_FOREACH(tr, TAILQ_FIRST(sc->twe_busy), tr_link) Mike> twe_print_request(tr); This doesn't compile for me. Every time I try to use 'sc->twe_busy' I get a syntax error: invalid type argument of `->' Here is what I'm using right now: s = splbio(); for (i = 0; (sc = devclass_get_softc(twe_devclass, i)) != NULL; i++) { twe_print_controller(sc); printf("ready queue: %d entries\n", sc->twe_qstat[TWEQ_READY].q_length); TAILQ_FOREACH(tr, sc->twe_ready, tr_link) twe_print_request(tr); printf("busy queue: %d entries\n", sc->twe_qstat[TWEQ_BUSY].q_length); TAILQ_FOREACH(tr, sc->twe_busy, tr_link) twe_print_request(tr); printf("complete queue: %d entries\n", sc->twe_qstat[TWEQ_COMPLETE].q_length); TAILQ_FOREACH(tr, sc->twe_complete, tr_link) twe_print_request(tr); } splx(s); This compiles, and when I run it it doesn't crash! :) In fact, it says all the queues are empty. Doug> Oh, btw, it took over 3 million rows to get it stuck this Doug> time. Gotta love a test cycle of 6 hours or so. Sigh. Mike> This is obviously a really weird case; possibly either an Mike> extremely narrow race, or some very borderline PCI issue. One Mike> question I should have asked, but don't recall whether you Mike> answered; are you using an AMD K7 system by any chance? We've Mike> seen some *very* weird behaviour with these controllers in some Mike> K7 systems. Yes, it *is* really weird. I can only get it to break with MySQL. From a suggestion of Mike Tancsa, I tried lots of concurrent bonnies, and also running a buildworld with a high -j value. I let both run for about 12 hours each, with no failure. The only thing that'll kill it is MySQL. I'm confused. :( Nope. Its a SuperMicro P6DBU, with dual 400MHz CPUs. Mike> Thanks again for your help here. My pleasure. (Calling what we are doing 'help' is complementary. If this is help, what you are doing for us must be close to divine intervention! :)) To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?87bsqu60eh.wl>