Date: Mon, 01 Jun 1998 12:15:17 -0400 (EDT) From: Simon Shapiro <shimon@simon-shapiro.org> To: Mark Gregory Salyzyn <mark@bohica.net> Cc: freebsd-scsi@FreeBSD.ORG, tcobb@staff.circle.net Subject: Re: DPT Redux Message-ID: <XFMail.980601121517.shimon@simon-shapiro.org> In-Reply-To: <9805311428.AA11443@deathstar.deathstar.dpt.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 31-May-98 Mark Gregory Salyzyn wrote: ... Some excellent suggestions deleted ... > 2) Simon, you may want to consider what happens when the controller is > indicating busy, do you > perform a timeout on the busy bit of the auxilliary status register, > and if you do, what do > you show to the OS (failed command? spawn a separate command issue > thread to try again later? > spin forever waiting for ready?). Depends on wht the O/S is doing: a. If we are booting, then I poll the commands one at a time, sequentially. b. If we are shutting down, I immediately fail the command. c. In all other cases, I put the command in a queue, tell the O/S the command has been queued and check the busy bit. If the busy bit is set, I (essentially) sleep on the (driver's private) witing commands queue until a command completes. When a command completes, I scan the wait queue and try to submit a command. Before submitting, I check the busy bit. If the busy bit is set, I (essentially) sleep until another command completes. There is no timeout on any command in the wait queue. The reason is that there is no reasonable value to use. Commands that have been submitted to the DPT can be timed out. To do that, you need the options DPT_HANDLE_TIMEOUTS and possible DPT_TIMEOUT_FACTOR defined in the kernel configuration. In this scenario, we check the age of each transaction already committed to the DPT. If it timed out, we abort it and return to the OS a failure status. Please note that this timeout mechanism only works on commands in the submitted queue. > The BSDi BSD/OS driver, for example, simply `locks' waiting for the > controller to get out of > busy, which is the simplest approach to deal with, what should be a > transitory situation. Also, > you may wish to limit the number of outstanding commands to the > controller (the UNIXWARE driver > uses the lock on wait, and 32 CCB limit to reduce the chances of this > problem affecting > performance). The highest performance DPT driver in a Networking > operating system (NETWARE) > does the `spawn an issuing task' approach to allow processing of > network card interrupts while > waiting for the controller to come free. This may be your best > approach > considering you will > no doubt be issuing `next' commands to the controller while in the > context of the controller > interrupt service routine. This is essentially what is being done. As long as the DPT driver can malloc memory for requests, it takes them from the OS. If commands cannot be submitted to the DPT hardware, they simply wait. The assumption is that if the DPT is too busy, it is too busy. The other assumption is that every time a command completes, the DPT hardware is a bit less busy. We then try to submit a request. We do check if the controller is busy before submitting a command. Now, if between looking at the external busy bit and completing the transfer of a command, the DPT becomes too busy, this will result in a corrupted transfer. AFAIK, from discussing it with you, once the DPT marked not-busy and transfer of a command started, the DPT will not clobber the partial command. > > My assumption is that you timeout and send a fail up to the OS, which > may explain the 0MB > read capacity result shown in the log? (Conditionally) Correct assumption. The report capacity bit is done during boot. At this point the DPT driver is in polled mode. In polled mode, if the number of commands submitted but not completed is less than the hardware queue depth, we assume that the hardware should not be busy and try to send a command. If this failed (due to the aux-bit busy bit being set), we send failure to the O/S. If, in polled mode, we have submitted as many commands to the hardware as the hardware queue length is reported, and none completed (stalled HBA) we immediately fail. If we submitted a command successfully, we then wait 50us * the amount of wait the OS indicated (in xs->timeout). Once a command completetion is indicated (or we timed-out), we process the command completion, and return to the OS, using normal processing. >From this description, it is clear that if a command takes inordinately long during boot, it may timeout and fail. Please remember that this is ONLY TRUE DURING O/S BOOT. I am reluctant to change the timeout logic as there is no telling what correct timeout is. Too short and we will timeout good but busy builds. Too long and the system will stall on problems. I am adding a bit of more error reporting to the boot section. A patch against 3.0-current and against 2.2-releng will be submitted today, and be available in my ftp server in about an hour or two. The extent of testing on these patches will be to verify that they compile and that a normal system, equipped with these patches boots and operates normally. Later on I will perfrom complete regression testing on the code. > I hope this helps -- Sincerely -- Mark Salyzyn Thanx Mark. Simon --- Sincerely Yours, Simon Shapiro Shimon@Simon-Shapiro.ORG 770.265.7340 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-scsi" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?XFMail.980601121517.shimon>