FreeBSD Mail Archives

Date:      Wed, 13 Aug 1997 05:40:02 -0700 (PDT)
From:      Stefan Esser <se@FreeBSD.ORG>
To:        freebsd-bugs
Subject:   Re: misc/4293: strang disk error messages
Message-ID:  <199708131240.FAA14894@hub.freebsd.org>

index | next in thread | raw e-mail

The following reply was made to PR misc/4293; it has been noted by GNATS.

From: Stefan Esser <se@FreeBSD.ORG>
To: daniels@media.mit.edu
Cc: FreeBSD-gnats-submit@freebsd.org, Stefan Esser <se@freebsd.org>
Subject: Re: misc/4293: strang disk error messages
Date: Wed, 13 Aug 1997 14:19:41 +0200

 On Aug 13, daniels@media.mit.edu wrote:
 > The disk is a 2GB Quantum (SCSI) running from a PCI SCSI controller.

 What Quantum drive is that ?
 They are of quite different quality ...

 > Every few hours or days, a series of error messages about the disk
 > (and maybe the controller) appear on the console. These messages last
 > about 2 minutes, and then stop. During that time, user activity may
 > freeze, but the Web server (the primary purpose of the system) seems
 > to be running well. My preliminary deciphering of the error messages
 > suggest something wrong with swap space (pager errors) but I can't
 > really tell.

 No, there is an error returned as a result of 
 a disk request from the VM system.

 > Late last week, the computer lost power (as did most of Cambridage,
 > Mass.) which may have contributed to the problem, which only surfaced
 > over the weekend.

 The problem did not exist before that power loss ?

 > Here is a complete cycle of the /var/log/messages accounting of the
 > problem:
 > 
 > Aug 13 06:40:26 borg login: login on ttyv1 as daniels
 > Aug 13 06:41:30 borg /kernel: ncr0: restart (ncr dead ?).
 > Aug 13 06:44:13 borg /kernel: sd0(ncr0:0:0): UNIT ATTENTION asc:29,2 

 The drive returns an UNIT ATTENTION message with 
 ASC=29 and ASCQ=2. This is a little odd, ASC=29 
 and ASCQ=0 have been expected ...

 > Aug 13 06:44:13 borg /kernel: , retries:3
 > Aug 13 06:44:14 borg /kernel: sd0(ncr0:0:0): FAST SCSI-2 100ns (10 Mb/sec) offset 8.
 > Aug 13 06:44:15 borg /kernel: ncr0: restart (ncr dead ?).
 > Aug 13 06:44:15 borg /kernel: ncr0: restart (ncr dead ?).

 > Aug 13 06:44:19 borg /kernel: sd0(ncr0:0:0): FAST SCSI-2 100ns (10 Mb/sec) offset 8.
 > Aug 13 06:44:19 borg /kernel: ncr0: restart (ncr dead ?).
 > Aug 13 06:44:19 borg /kernel: sd0(ncr0:0:0): UNIT ATTENTION asc:29,2 
 > Aug 13 06:44:19 borg /kernel: , retries:1
 > Aug 13 06:44:19 borg /kernel: sd0(ncr0:0:0): FAST SCSI-2 100ns (10 Mb/sec) offset 8.
 > Aug 13 06:44:19 borg /kernel: pid 3577 (httpd), uid 65534: exited on signal 6

 Hmmm, and the system recovers after some time ?

 > >How-To-Repeat:
 > 
 > Just wait a few hours.

 Well, sorry, but this is not true. It may work 
 if *you* wait a few hours, but my system runs 
 fine for however long I let it ...

 So, there must be some other problem. The first
 obvious question is of course, whether the drive
 worked fine up to some external event (opposed 
 to a kernel rebuild :)

 If you did not install a new kernel, then there
 is a high probability, that your drive is going
 bad. Did you check whether it stops spinning
 during the time when those errors are reported ?

 There is a limited number of retries after a 
 SCSI transfer failed, but if a failure extends
 for more than a few seconds, then read errors
 will be returned back to the application (which
 may be the VM code in the kernel, as observed by
 you.)

 For now, I assume a hardware problem. Please let
 me know, if you know for sure, that your hardware
 does not cause the failure ...

 Regards, STefan

home | help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199708131240.FAA14894>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation