Date: Fri, 21 Apr 2000 00:32:05 -0700 From: Alfred Perlstein <bright@wintelcom.net> To: Mike Smith <msmith@FreeBSD.ORG> Cc: stable@FreeBSD.ORG Subject: Re: amr still seems to have issues. Message-ID: <20000421003205.A25458@fw.wintelcom.net> In-Reply-To: <200004201817.LAA00917@mass.cdrom.com>; from msmith@FreeBSD.ORG on Thu, Apr 20, 2000 at 11:17:21AM -0700 References: <20000420085841.G1838@fw.wintelcom.net> <200004201817.LAA00917@mass.cdrom.com>
next in thread | previous in thread | raw e-mail | index | archive | help
* Mike Smith <msmith@FreeBSD.ORG> [000420 11:39] wrote: > > Hi, we're running 4.0-stable as of Sat Apr 15 18:39:08 PDT 2000 > > which include the recent amr fixes which we were hoping would cure > > the lockups with amr. Unfortunatly we are now experiancing reboots, > > the messages file reveals this: > > > > Apr 15 13:31:06 abacus /kernel: amr0: command 31 wedged after 30 seconds > > This is extra-bad. Without more feedback from the controller (no > documentation from AMI yet, sorry. 8() I can only wonder whether you're > getting a SCSI bus error of some sort that's causing the kernel to time > these commands out (because the controller is taking too long to respond). > > You could try increasing the timeout allowance in amr_periodic(), or just > disable the poll entirely. This won't help if the controller is really > dropping commands, though. > > > Right now I'm attempting to log off a serial console to see what's > > going on, however this box has been in production (and doing miserably) > > for some time now so doing debugging is pretty difficult as well as > > time consuming where I really need to be working on other issues. > > At this point, I have no other ideas, sorry. Here's something I hope it helps: amr0: command 40 wedged after 30 seconds biodone: page busy < 0, pindex: 144, foff: 0x(0,90000), resid: 4096, index: 0 iosize: 8192, lblkno: 72, flags: 0x30020aa0, npages: 2 valid: 0xff, dirty: 0x0, wired: 1 panic: biodone: page busy < 0 mp_lock = 01000001; cpuid = 1; lapic.id = 00000000 boot() called on cpu#1 syncing disks... Fatal trap 12: page fault while in kernel mode mp_lock = 01000002; cpuid = 1; lapic.id = 00000000 fault virtual address = 0x30 fault code = supervisor read, page not present instruction pointer = 0x8:0xc0226765 stack pointer = 0x10:0xff80dd9c frame pointer = 0x10:0xff80dda0 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = Idle interrupt mask = bio <- SMP: XXX trap number = 12 panic: page fault mp_lock = 01000002; cpuid = 1; lapic.id = 00000000 boot() called on cpu#1 Uptime: 2d4h11m39s amrd0: still open, can't shutdown dumping to dev #da/0x20001, offset 128 dump 1023 1022 Aborting dump due to I/O error. (da0:ahc1:0:6:0): WRITE(06). CDB: a 7 da f7 8 0 (da0:ahc1:0:6:0): error code 0 at block no. -964632618 (decimal) failed, reason: i/o error Automatic reboot in 15 seconds - press a key on the console to abort Rebooting... Any ideas? -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] "I have the heart of a child; I keep it in a jar on my desk." To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20000421003205.A25458>