Date: Fri, 21 Apr 2000 00:32:05 -0700 From: Alfred Perlstein <bright@wintelcom.net> To: Mike Smith <msmith@FreeBSD.ORG> Cc: stable@FreeBSD.ORG Subject: Re: amr still seems to have issues. Message-ID: <20000421003205.A25458@fw.wintelcom.net> In-Reply-To: <200004201817.LAA00917@mass.cdrom.com>; from msmith@FreeBSD.ORG on Thu, Apr 20, 2000 at 11:17:21AM -0700 References: <20000420085841.G1838@fw.wintelcom.net> <200004201817.LAA00917@mass.cdrom.com>
next in thread | previous in thread | raw e-mail | index | archive | help
* Mike Smith <msmith@FreeBSD.ORG> [000420 11:39] wrote:
> > Hi, we're running 4.0-stable as of Sat Apr 15 18:39:08 PDT 2000
> > which include the recent amr fixes which we were hoping would cure
> > the lockups with amr. Unfortunatly we are now experiancing reboots,
> > the messages file reveals this:
> >
> > Apr 15 13:31:06 abacus /kernel: amr0: command 31 wedged after 30 seconds
>
> This is extra-bad. Without more feedback from the controller (no
> documentation from AMI yet, sorry. 8() I can only wonder whether you're
> getting a SCSI bus error of some sort that's causing the kernel to time
> these commands out (because the controller is taking too long to respond).
>
> You could try increasing the timeout allowance in amr_periodic(), or just
> disable the poll entirely. This won't help if the controller is really
> dropping commands, though.
>
> > Right now I'm attempting to log off a serial console to see what's
> > going on, however this box has been in production (and doing miserably)
> > for some time now so doing debugging is pretty difficult as well as
> > time consuming where I really need to be working on other issues.
>
> At this point, I have no other ideas, sorry.
Here's something I hope it helps:
amr0: command 40 wedged after 30 seconds
biodone: page busy < 0, pindex: 144, foff: 0x(0,90000), resid: 4096, index: 0
iosize: 8192, lblkno: 72, flags: 0x30020aa0, npages: 2
valid: 0xff, dirty: 0x0, wired: 1
panic: biodone: page busy < 0
mp_lock = 01000001; cpuid = 1; lapic.id = 00000000
boot() called on cpu#1
syncing disks...
Fatal trap 12: page fault while in kernel mode
mp_lock = 01000002; cpuid = 1; lapic.id = 00000000
fault virtual address = 0x30
fault code = supervisor read, page not present
instruction pointer = 0x8:0xc0226765
stack pointer = 0x10:0xff80dd9c
frame pointer = 0x10:0xff80dda0 code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = Idle
interrupt mask = bio <- SMP: XXX
trap number = 12
panic: page fault
mp_lock = 01000002; cpuid = 1; lapic.id = 00000000
boot() called on cpu#1
Uptime: 2d4h11m39s
amrd0: still open, can't shutdown
dumping to dev #da/0x20001, offset 128
dump 1023 1022 Aborting dump due to I/O error.
(da0:ahc1:0:6:0): WRITE(06). CDB: a 7 da f7 8 0
(da0:ahc1:0:6:0): error code 0 at block no. -964632618 (decimal)
failed, reason: i/o error
Automatic reboot in 15 seconds - press a key on the console to abort
Rebooting...
Any ideas?
--
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20000421003205.A25458>
