Date: Fri, 15 Sep 2000 00:21:35 -0600 From: "Kenneth D. Merry" <ken@kdm.org> To: Jean-Francois Dockes <jean-francois.dockes@wanadoo.fr> Cc: freebsd-stable@FreeBSD.ORG Subject: Re: SCSI retries without errors in /var/log/messages? Message-ID: <20000915002135.A83469@panzer.kdm.org> In-Reply-To: <14784.33648.251152.511680@localhost.dockes.com>; from jean-francois.dockes@wanadoo.fr on Thu, Sep 14, 2000 at 09:51:12AM %2B0200 References: <20000911162530.34FC599C8C@waltz.rahul.net> <20000911130644.A50024@panzer.kdm.org> <14784.33648.251152.511680@localhost.dockes.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Sep 14, 2000 at 09:51:12 +0200, Jean-Francois Dockes wrote: > Kenneth D. Merry writes: > > SCSI errors that we recover from aren't logged. > > I think that most SCSI retries should be logged. > > Except in some specific cases of retries after a unit attention > condition, retries are usually indicative of hardware trouble. There are other reasons, like medium not present, and tape drives use errors to report a lot of stuff. (See saerror() in sys/cam/scsi/scsi_sa.c.) We already have something similar turned on for bootverbose (it logs error messages that we wouldn't otherwise log), but people sometimes get confused and concerned when they see the additional error messages. They (not unreasonably) think something is wrong with their hardware, when in fact they're just seeing normal error messages. (Like devices that don't support the serial number inquiry, CDROM drives without media present, etc.) In any case, if you want to see error messages, even for retried commands, boot with -v and comment out the following print_sense line in scsi_interpret_sense() in sys/cam/scsi/scsi_sa.c: default: /* decrement the number of retries */ retry = ccb->ccb_h.retry_count > 0; if (retry) { ccb->ccb_h.retry_count--; error = ERESTART; print_sense = FALSE; } else error = EIO; break; } > Don't most devices already use a number of internal retries which is > appropriate when they are healthy ? When external retries become > necessary and frequent, the situation is bad, and subsequent failures > are quite probable. Yeah, most devices have internal retry mechanisms, and often error correction mechanisms. > Better to be warned earlier. > > (There are also the retries caused by scsi protocol problems - bad > bus - but these are usually followed by a bus reset which is logged ?) > > And, by the way, 'recovered errors' sense keys (problems solved > internally by the device) should also be logged for the same reason, > only more benign, (but I'm not too sure that many devices actually > generate these). I think there are better ways than printing out sense information to figure out if a device is going bad. Disk drives and tape drives keep statistics in log pages, many of which could be monitored to see if they exceed a certain threshold, or change at a certain rate. Then the administrator could be notified of the problem. You could write a script using camcontrol to dump the log pages, or a small C program to do the same. Another thing you can do with disks is monitor the number of grown defects. (Assuming you've got read and write reallocation turned on.) As far as recovered errors, I've seen disks return them when they automatically reallocate a block. Ken -- Kenneth Merry ken@kdm.org To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20000915002135.A83469>