From owner-freebsd-stable Fri Apr 21 9:37:49 2000 Delivered-To: freebsd-stable@freebsd.org Received: from mass.cdrom.com (adsl-63-202-176-112.dsl.snfc21.pacbell.net [63.202.176.112]) by hub.freebsd.org (Postfix) with ESMTP id 3CFED37BC6B; Fri, 21 Apr 2000 09:37:46 -0700 (PDT) (envelope-from msmith@mass.cdrom.com) Received: from mass.cdrom.com (localhost [127.0.0.1]) by mass.cdrom.com (8.9.3/8.9.3) with ESMTP id JAA02335; Fri, 21 Apr 2000 09:44:40 -0700 (PDT) (envelope-from msmith@mass.cdrom.com) Message-Id: <200004211644.JAA02335@mass.cdrom.com> X-Mailer: exmh version 2.1.1 10/15/1999 To: Alfred Perlstein Cc: Mike Smith , stable@FreeBSD.ORG Subject: Re: amr still seems to have issues. In-reply-to: Your message of "Fri, 21 Apr 2000 00:32:05 PDT." <20000421003205.A25458@fw.wintelcom.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Fri, 21 Apr 2000 09:44:35 -0700 From: Mike Smith Sender: owner-freebsd-stable@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > * Mike Smith [000420 11:39] wrote: > > > Hi, we're running 4.0-stable as of Sat Apr 15 18:39:08 PDT 2000 > > > which include the recent amr fixes which we were hoping would cure > > > the lockups with amr. Unfortunatly we are now experiancing reboots, > > > the messages file reveals this: > > > > > > Apr 15 13:31:06 abacus /kernel: amr0: command 31 wedged after 30 seconds > > > > This is extra-bad. Without more feedback from the controller (no > > documentation from AMI yet, sorry. 8() I can only wonder whether you're > > getting a SCSI bus error of some sort that's causing the kernel to time > > these commands out (because the controller is taking too long to respond). > > > > You could try increasing the timeout allowance in amr_periodic(), or just > > disable the poll entirely. This won't help if the controller is really > > dropping commands, though. > > > > > Right now I'm attempting to log off a serial console to see what's > > > going on, however this box has been in production (and doing miserably) > > > for some time now so doing debugging is pretty difficult as well as > > > time consuming where I really need to be working on other issues. > > > > At this point, I have no other ideas, sorry. > > Here's something I hope it helps: Hmm. Looks like I'm not retiring the wedged command correctly. This is a symptom, rather than the real problem, though. See if disabling the command timeout stuff makes the system happy - either these commands are just taking a _long_ time to complete, or you have another problem. I _still_ think you have disk, cable or enclosure issues, but now that you've precipitated this case I can go look at what I'm doing wrong here. Thanks! > amr0: command 40 wedged after 30 seconds > biodone: page busy < 0, pindex: 144, foff: 0x(0,90000), resid: 4096, index: 0 > iosize: 8192, lblkno: 72, flags: 0x30020aa0, npages: 2 > valid: 0xff, dirty: 0x0, wired: 1 > panic: biodone: page busy < 0 > > mp_lock = 01000001; cpuid = 1; lapic.id = 00000000 > boot() called on cpu#1 > > syncing disks... > > Fatal trap 12: page fault while in kernel mode > mp_lock = 01000002; cpuid = 1; lapic.id = 00000000 > fault virtual address = 0x30 > fault code = supervisor read, page not present > instruction pointer = 0x8:0xc0226765 > stack pointer = 0x10:0xff80dd9c > frame pointer = 0x10:0xff80dda0 code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, def32 1, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = Idle > interrupt mask = bio <- SMP: XXX > trap number = 12 > panic: page fault > mp_lock = 01000002; cpuid = 1; lapic.id = 00000000 > boot() called on cpu#1 > Uptime: 2d4h11m39s > amrd0: still open, can't shutdown > > dumping to dev #da/0x20001, offset 128 > dump 1023 1022 Aborting dump due to I/O error. > (da0:ahc1:0:6:0): WRITE(06). CDB: a 7 da f7 8 0 > (da0:ahc1:0:6:0): error code 0 at block no. -964632618 (decimal) > failed, reason: i/o error > Automatic reboot in 15 seconds - press a key on the console to abort > Rebooting... > > Any ideas? > > -- > -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] > "I have the heart of a child; I keep it in a jar on my desk." > -- \\ Give a man a fish, and you feed him for a day. \\ Mike Smith \\ Tell him he should learn how to fish himself, \\ msmith@freebsd.org \\ and he'll hate you for a lifetime. \\ msmith@cdrom.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message