Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 3 Oct 1995 19:18:39 +0100
From:      se@zpr.uni-koeln.de (Stefan Esser)
To:        John Hay <jhay@mikom.csir.co.za>
Cc:        current@freebsd.org
Subject:   Re: stable panics while backup to ncr->DAT
Message-ID:  <199510031818.AA28720@Sysiphos>
In-Reply-To: John Hay <jhay@mikom.csir.co.za> "stable panics while backup to ncr->DAT" (Oct  3,  9:21)

next in thread | previous in thread | raw e-mail | index | archive | help
On Oct 3,  9:21, John Hay wrote:
} Subject: stable panics while backup to ncr->DAT
} Hi,
} 
} I have a 100MHz Pentium with an ASUS motherboard (Sis chipset). The machine
} is running a small news feed, a web server and is a ftp site. It has the
} latest 2.1-stable as from ctm yesterday. There is normally no problems,
} everything is working fine. The machine has been running for a few weeks now.
} 
} It is only when trying to make backups to the DAT tape that I often get
} problems. I would guess about 8 out 10 backups will end in a kernel panic.
} It seems that there is an error with the tape and while handling that the
} ncr code does a read to 0.

Don't understand that ? What does "read to 0" mean ?

(Do you mean the dereferencing of a NULL pointer by the
exception handler code that deals with a unspecific error
condition ??)

} Maybe just to make thing clear. The problem did not start now, it started
} when I got the EXABYTE DAT a few weeks ago. I had 2.0.5 running on the
} machine then, but got lots of problems with the DAT. I saw that there were
} some fixes to the NCR code and decided to try stable on that machine. It
} is better now because I can sometimes get the backup to go right through,
} while previously it would die with the first access to the tape.

Please send the error messages resulting from such an access!
This looks like some compatibility problem, that I'll have to 
look at ...

If you had 2.0.5R running, then you probably just needed a one
line fix (QUIRK_NOMSG must be set for your DAT, and has become 
the default even for devices that don't need it, since it does
no harm ...).


The NCR driver in FreeBSD-current has some fixes, that did not
get applied to 2.1. Your best bet is to rebuild your 2.1 kernel
with /sys/pci/ncr.c from FreeBSD-current for that reason ...


} Below is the boot probe messages, then a cutout of "nm /kernel | sort | more"
} and then the error and panic message.

} f01427c0 t _ncr_timeout
} f0142b28 t _ncr_exception    <------
} f0142f40 t _ncr_int_sto

This is a known problem in the ncr exception handler, which has 
been fixed in FreeBSD-current some time ago.

(The exception handler tries to print some information, but can 
dereference an invalid pointer in certain situations.)

} Tue Oct  3 07:23:39 SAT 1995
} ncr0 targ 6?: ERROR (0:110) (8-28-0) (88/13) @ (ed0:180003b0).

dstat:	0	= dma fifo NOT empty, no other error condition
sist:	0x110	= handshake timeout (+ reselected by another device)

socl:	0x28	= ATN
sicl:	0x08	= BSY + ATN

ed0:	data_in + 86

There has been a handshake timeout while transferring 
data from the EXABYTE to the NCR. This seems to be in 
a reselection phase (there has already been some data
transferred, according to the data_in+86 position of
the NCR instruction pointer).

} syncing disks... ncr0 targ 6?: ERROR (0:101) (8-28-0) (88/13) @ (ed0:180003b0).

dstat:	0
sist:	0x101	= handshake timeout + scsi parity error

This really looks like a cable problem to me ...

Have seen something like that before a number of times,
with Sparc and DEC boxes that had FAST SCSI and tapes
on one SCSI bus. Seems that accesses to only one device 
at a time hardly ever fail, but if you have a high SCSI 
load using devices simultanously, such random failures 
occur ...


Please check your complete SCSI setup: Cables (length and
quality) and terminators are the most common cause of the
kind of problem you describe.

You may want to try 

# ncrcontrol -s async

to disable synchronous transfers for some tests.

# ncrcontrol -s sync=4

may be an even better test, since the NCR uses some glitch
suppression logic when operating at synchronous transfer 
rates of up to 5MHz. (If I remember right, it can go beyond 
5MHz doing asynchronous transfers!)


-- 
 Stefan Esser, Zentrum fuer Paralleles Rechnen		Tel:	+49 221 4706021
 Universitaet zu Koeln, Weyertal 80, 50931 Koeln	FAX:	+49 221 4705160
 ==============================================================================
 http://www.zpr.uni-koeln.de/staff/esser/esser.html	  <se@ZPR.Uni-Koeln.DE>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199510031818.AA28720>