Date: Wed, 13 Aug 1997 05:40:02 -0700 (PDT) From: Stefan Esser <se@FreeBSD.ORG> To: freebsd-bugs Subject: Re: misc/4293: strang disk error messages Message-ID: <199708131240.FAA14894@hub.freebsd.org>
next in thread | raw e-mail | index | archive | help
The following reply was made to PR misc/4293; it has been noted by GNATS. From: Stefan Esser <se@FreeBSD.ORG> To: daniels@media.mit.edu Cc: FreeBSD-gnats-submit@freebsd.org, Stefan Esser <se@freebsd.org> Subject: Re: misc/4293: strang disk error messages Date: Wed, 13 Aug 1997 14:19:41 +0200 On Aug 13, daniels@media.mit.edu wrote: > The disk is a 2GB Quantum (SCSI) running from a PCI SCSI controller. What Quantum drive is that ? They are of quite different quality ... > Every few hours or days, a series of error messages about the disk > (and maybe the controller) appear on the console. These messages last > about 2 minutes, and then stop. During that time, user activity may > freeze, but the Web server (the primary purpose of the system) seems > to be running well. My preliminary deciphering of the error messages > suggest something wrong with swap space (pager errors) but I can't > really tell. No, there is an error returned as a result of a disk request from the VM system. > Late last week, the computer lost power (as did most of Cambridage, > Mass.) which may have contributed to the problem, which only surfaced > over the weekend. The problem did not exist before that power loss ? > Here is a complete cycle of the /var/log/messages accounting of the > problem: > > Aug 13 06:40:26 borg login: login on ttyv1 as daniels > Aug 13 06:41:30 borg /kernel: ncr0: restart (ncr dead ?). > Aug 13 06:44:13 borg /kernel: sd0(ncr0:0:0): UNIT ATTENTION asc:29,2 The drive returns an UNIT ATTENTION message with ASC=29 and ASCQ=2. This is a little odd, ASC=29 and ASCQ=0 have been expected ... > Aug 13 06:44:13 borg /kernel: , retries:3 > Aug 13 06:44:14 borg /kernel: sd0(ncr0:0:0): FAST SCSI-2 100ns (10 Mb/sec) offset 8. > Aug 13 06:44:15 borg /kernel: ncr0: restart (ncr dead ?). > Aug 13 06:44:15 borg /kernel: ncr0: restart (ncr dead ?). > Aug 13 06:44:19 borg /kernel: sd0(ncr0:0:0): FAST SCSI-2 100ns (10 Mb/sec) offset 8. > Aug 13 06:44:19 borg /kernel: ncr0: restart (ncr dead ?). > Aug 13 06:44:19 borg /kernel: sd0(ncr0:0:0): UNIT ATTENTION asc:29,2 > Aug 13 06:44:19 borg /kernel: , retries:1 > Aug 13 06:44:19 borg /kernel: sd0(ncr0:0:0): FAST SCSI-2 100ns (10 Mb/sec) offset 8. > Aug 13 06:44:19 borg /kernel: pid 3577 (httpd), uid 65534: exited on signal 6 Hmmm, and the system recovers after some time ? > >How-To-Repeat: > > Just wait a few hours. Well, sorry, but this is not true. It may work if *you* wait a few hours, but my system runs fine for however long I let it ... So, there must be some other problem. The first obvious question is of course, whether the drive worked fine up to some external event (opposed to a kernel rebuild :) If you did not install a new kernel, then there is a high probability, that your drive is going bad. Did you check whether it stops spinning during the time when those errors are reported ? There is a limited number of retries after a SCSI transfer failed, but if a failure extends for more than a few seconds, then read errors will be returned back to the application (which may be the VM code in the kernel, as observed by you.) For now, I assume a hardware problem. Please let me know, if you know for sure, that your hardware does not cause the failure ... Regards, STefan
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199708131240.FAA14894>