Date: Sat, 10 Feb 2001 18:05:03 -0800 (PST) From: Matthew Jacob <mjacob@feral.com> To: audit@freebsd.org Cc: "Kenneth D. Merry" <ken@kdm.org>, "Justin T. Gibbs" <gibbs@scsiguy.com>, Gerard Roudier <groudier@club-internet.fr> Subject: a couple of minor but important changes to SCSI error handling Message-ID: <Pine.LNX.4.21.0102101753560.7694-100000@zeppo.feral.com>
next in thread | raw e-mail | index | archive | help
First is scsi_all.c: Index: scsi_all.c =================================================================== RCS file: /home/ncvs/src/sys/cam/scsi/scsi_all.c,v retrieving revision 1.17 diff -u -r1.17 scsi_all.c --- scsi_all.c 2000/10/30 08:08:00 1.17 +++ scsi_all.c 2001/02/11 02:03:01 @@ -2177,16 +2177,16 @@ /* These should be filtered by the peripheral drivers */ /* FALLTHROUGH */ case SSD_KEY_MISCOMPARE: - print_sense = FALSE; - /* FALLTHROUGH */ - case SSD_KEY_RECOVERED_ERROR: - /* decrement the number of retries */ retry = ccb->ccb_h.retry_count > 0; - if (retry) + if (retry) { + error = ERESTART; ccb->ccb_h.retry_count--; - - error = 0; + } else { + error = EIO; + } + case SSD_KEY_RECOVERED_ERROR: + error = 0; /* not an error */ break; case SSD_KEY_ILLEGAL_REQUEST: if (((sense_flags & SF_QUIET_IR) != 0) @@ -2241,6 +2241,7 @@ } } break; + case SSD_KEY_ABORTED_COMMAND: default: /* decrement the number of retries */ retry = ccb->ccb_h.retry_count > 0; @@ -2255,6 +2256,13 @@ error = error_action & SS_ERRMASK; } + /* + * Make sure ABORTED COMMAND errors get + * printed as they're indicative of marginal + * SCSI busses that people should address. + */ + if (sense_key == SSD_KEY_ABORTED_COMMAND) + print_sense = TRUE; } break; } --------------------- 1. The key SSD_KEY_RECOVERED_ERROR is not an error at all and should not be retried. It is an indication that there was an error that was corrected during the execution of the command. This is per ANSI SCSI2 spec. It's possible that these should also be noted to the console (as indicative, perhaps, of growing media defect lists in drives), but the default of printing errors out if bootverbose in this case is probably enough. Also, there'd been a missing ERESTART for that clause anyway. 2. If you have an ABORTED COMMAND, it's almost invariably a SCSI parity error. You should never be silent about these since users should do something about this if it occurs (moving that power cord *away* from the SCSI cable is always a good first start). This should print irrespective of bootverbose because it's an actual real error even if we retry a transmission. Second is scsi_da.c: Index: scsi_da.c =================================================================== RCS file: /home/ncvs/src/sys/cam/scsi/scsi_da.c,v retrieving revision 1.65 diff -u -r1.65 scsi_da.c --- scsi_da.c 2001/02/07 07:05:58 1.65 +++ scsi_da.c 2001/02/11 01:59:42 @@ -1127,7 +1127,7 @@ tag_code = MSG_SIMPLE_Q_TAG; } scsi_read_write(&start_ccb->csio, - /*retries*/4, + /*retries*/10, /* retry a few times */ dadone, tag_code, bp->bio_cmd == BIO_READ, ------ 10 retries with a .5 second delay between each is still only 5 seconds. 10 retries might be more appropriate to a SAN environment with at least a couple of seconds of different initiators spasming the loop. -matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-audit" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.LNX.4.21.0102101753560.7694-100000>