Date: Wed, 13 May 1998 10:27:06 -0600 From: "Justin T. Gibbs" <gibbs@plutotech.com> To: fireston@lexmark.com Cc: gibbs@plutotech.com (Justin T. Gibbs), freebsd-scsi@FreeBSD.ORG Subject: Re: SCSI timeouts in dataout mode Message-ID: <199805131630.KAA09877@pluto.plutotech.com> In-Reply-To: Your message of "Wed, 13 May 1998 09:53:45 EDT." <199805131416.IAA28579@pluto.plutotech.com>
next in thread | previous in thread | raw e-mail | index | archive | help
>The problem occurs with only one LUN or many. The HP does not have a status >monitor that we can access, although the blinky lights on the front of the >unit are quite hypnotic. Does it go "ping!"?? 8-) >> Do you have access to a SCSI bus analyzer? > >Yes, and we will be using that tomorrow (our normal duties have gone ignored >a bit too long ). I have no clue how to read the output from such a beast - >what should we be looking for? A transaction that doesn't reconnect, as well as the mean time between request and completions. >I will make the observation that this error is somehow related to the amount >of time it takes to write the file. If it takes more then 10 seconds to >write, we see this error. We have not yet tried to read a large file to see >what problems that can cause. Ahh. This is a big clue. Depending on how the array performs write caching, it may well be that we are hitting it with so many transactions that the cache isn't draining fast enough for them to complete within the 10 second window. 10 seconds seem like an eternity to me, but I suppose if you hit the array with 64 64k transactions to a RAID5 array, it performs write back caching meaning that you can have 64 "completed transactions" waiting in cache flush at the same time you have 64 "pending transactions" it could be awhile before there is cache space to service the new guys. You might want to bump up the timeout. Look for DA_DEFAULT_TIMEOUT in sys/cam/scsi/scsi_da.c. You may also want to increase the frequency of ordered tagged transactions as this will impose a touch of write ordering so the array doesn't starve "expensive writes" into causing a timeout. See DA_ORDEREDTAG_INTERVAL in scsi_da.c. >To hazard a guess, I really think this is some strange interaction between the >scsi code and the cache on the HP. Either the HP is doing the right kind of >handshaking or the FreeBSD box is missing it. But this is only a guess. It's probably not a protocol error at all. The timeout is somewhat arbitrary, but as it is difficult to ask the device "hey are you still working on this?" you have to draw the line somewhere. Perhaps we should choose a larger timeout value by default? >-- >Mik Firestone fireston@lexmark.com >If ever I become an Evil Overlord: >All naive, busty tavern wenches in my realm will be replaced with surly, >world-weary waitresses who will provide no unexpected reinforcement and/or >romantic subplot for the hero or his sidekick. -- Justin To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-scsi" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199805131630.KAA09877>