Date: Tue, 3 Oct 1995 19:18:39 +0100 From: se@zpr.uni-koeln.de (Stefan Esser) To: John Hay <jhay@mikom.csir.co.za> Cc: current@freebsd.org Subject: Re: stable panics while backup to ncr->DAT Message-ID: <199510031818.AA28720@Sysiphos> In-Reply-To: John Hay <jhay@mikom.csir.co.za> "stable panics while backup to ncr->DAT" (Oct 3, 9:21)
next in thread | previous in thread | raw e-mail | index | archive | help
On Oct 3, 9:21, John Hay wrote: } Subject: stable panics while backup to ncr->DAT } Hi, } } I have a 100MHz Pentium with an ASUS motherboard (Sis chipset). The machine } is running a small news feed, a web server and is a ftp site. It has the } latest 2.1-stable as from ctm yesterday. There is normally no problems, } everything is working fine. The machine has been running for a few weeks now. } } It is only when trying to make backups to the DAT tape that I often get } problems. I would guess about 8 out 10 backups will end in a kernel panic. } It seems that there is an error with the tape and while handling that the } ncr code does a read to 0. Don't understand that ? What does "read to 0" mean ? (Do you mean the dereferencing of a NULL pointer by the exception handler code that deals with a unspecific error condition ??) } Maybe just to make thing clear. The problem did not start now, it started } when I got the EXABYTE DAT a few weeks ago. I had 2.0.5 running on the } machine then, but got lots of problems with the DAT. I saw that there were } some fixes to the NCR code and decided to try stable on that machine. It } is better now because I can sometimes get the backup to go right through, } while previously it would die with the first access to the tape. Please send the error messages resulting from such an access! This looks like some compatibility problem, that I'll have to look at ... If you had 2.0.5R running, then you probably just needed a one line fix (QUIRK_NOMSG must be set for your DAT, and has become the default even for devices that don't need it, since it does no harm ...). The NCR driver in FreeBSD-current has some fixes, that did not get applied to 2.1. Your best bet is to rebuild your 2.1 kernel with /sys/pci/ncr.c from FreeBSD-current for that reason ... } Below is the boot probe messages, then a cutout of "nm /kernel | sort | more" } and then the error and panic message. } f01427c0 t _ncr_timeout } f0142b28 t _ncr_exception <------ } f0142f40 t _ncr_int_sto This is a known problem in the ncr exception handler, which has been fixed in FreeBSD-current some time ago. (The exception handler tries to print some information, but can dereference an invalid pointer in certain situations.) } Tue Oct 3 07:23:39 SAT 1995 } ncr0 targ 6?: ERROR (0:110) (8-28-0) (88/13) @ (ed0:180003b0). dstat: 0 = dma fifo NOT empty, no other error condition sist: 0x110 = handshake timeout (+ reselected by another device) socl: 0x28 = ATN sicl: 0x08 = BSY + ATN ed0: data_in + 86 There has been a handshake timeout while transferring data from the EXABYTE to the NCR. This seems to be in a reselection phase (there has already been some data transferred, according to the data_in+86 position of the NCR instruction pointer). } syncing disks... ncr0 targ 6?: ERROR (0:101) (8-28-0) (88/13) @ (ed0:180003b0). dstat: 0 sist: 0x101 = handshake timeout + scsi parity error This really looks like a cable problem to me ... Have seen something like that before a number of times, with Sparc and DEC boxes that had FAST SCSI and tapes on one SCSI bus. Seems that accesses to only one device at a time hardly ever fail, but if you have a high SCSI load using devices simultanously, such random failures occur ... Please check your complete SCSI setup: Cables (length and quality) and terminators are the most common cause of the kind of problem you describe. You may want to try # ncrcontrol -s async to disable synchronous transfers for some tests. # ncrcontrol -s sync=4 may be an even better test, since the NCR uses some glitch suppression logic when operating at synchronous transfer rates of up to 5MHz. (If I remember right, it can go beyond 5MHz doing asynchronous transfers!) -- Stefan Esser, Zentrum fuer Paralleles Rechnen Tel: +49 221 4706021 Universitaet zu Koeln, Weyertal 80, 50931 Koeln FAX: +49 221 4705160 ============================================================================== http://www.zpr.uni-koeln.de/staff/esser/esser.html <se@ZPR.Uni-Koeln.DE>
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199510031818.AA28720>