Date: Wed, 13 Sep 2000 13:08:11 -0700 (PDT) From: dhesi@rahul.net (Rahul Dhesi) To: freebsd-stable@freebsd.org Subject: Re: SCSI retries without errors in /var/log/messages? Message-ID: <20000913200811.56E267C63@yellow.rahul.net> References: <freebsd-stable.20000911141718.A51045@panzer.kdm.org>
next in thread | previous in thread | raw e-mail | index | archive | help
"Kenneth D. Merry" <ken@kdm.org> writes: >The timeout for read and write operations in the da(4) driver is 60 >seconds, and we retry things four times. And I understand that an error is logged only if all retries fail. So potentially we could try tree times, with a total 180 second delay, then succeed on the fourth try, and no error would be logged. So I thought about this, and wondered if we could have ongoing SCSI delays with no errors logged. Suppose there is a SCSI hardware problem such that every I/O operation has a 0.01 probability of timing out, which means it has a 0.99 probability of succeeding. As a first approximation, out of every 100 I/O operations typically one will time out, causing a 60-second delay. If we are doing 30 I/O operations per second, then we will encounter one 60-second delay every 3.3 seconds, on the average. Which really means that our rate of I/O operations will be reduced to 30 in 63.3 seconds, or an I/O operation every 2 seconds, approximately. A very, very slow computer system. Will this show up in syslog? Only when all 4 tries fail, the probability of which is (0.01)**4, which is 1 in 100000000. At 30 I/O operations per second, with no delays, we should see one syslog entry every 1157 days, i.e., 3 years. But if every 100th I/O operation is delayed by 60 seconds, then we are really averaging quite a lower rate of I/O operations per second, so we might not see a syslog entry for several decades. This is an aproximate calculation, but the orders of magnitudes should be about right. The most likely source of error in my logic above is that the probability of encountering an error on the first try of a specific I/O operation might not be independent of the probability of an error on a retry. Thus it might not be correct to use the term (0.01)**4 above. But this will very much depend on the exact reason for the error. -- Rahul To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20000913200811.56E267C63>
