Date: Tue, 09 Sep 1997 16:39:22 -0600 From: "Justin T. Gibbs" <gibbs@plutotech.com> To: Doug Ledford <dledford@dialnet.net> Cc: "Justin T. Gibbs" <gibbs@plutotech.com>, "Daniel M. Eischen" <deischen@iworks.InterWorks.org>, aic7xxx@freebsd.org Subject: Re: Interesting anomoly with a 2940UW Message-ID: <199709092242.QAA25554@pluto.plutotech.com> In-Reply-To: Your message of "Tue, 09 Sep 1997 17:03:23 CDT." <199709092203.RAA31269@dledford.dialnet.net>
next in thread | previous in thread | raw e-mail | index | archive | help
>Wouldn't matter. If we did pause things here, then when we unpaused them, >the QOUTCNT register would get incremented as we are writing CLRCMDINT to >CLRINT, then we would check QOUTCNT again, it would be non-zero, so we would >re-run the loop, and we would re-write the CMDOUTCNT variable again. Sure, but what causes high interrupt latency, Doug? Other interrupt handlers running, or your interrupt handler taking a long time, is what causes it. In FreeBSD, your interrupt handler can be interrupted by any non masked interrupts which means, during that window, you could easily be diverted from running your loop perhaps long enough for multiple commands to pile up which might just be long enough for you to overflow the qoutfifo before your interrupt handler is resumed and you can complete your work. And, since the sequencer might have stuffed multiple commands into the QOUTFIFO since you last read it, the variance in what you write to CMDOUTCNT and how full the fifo is could be quite large. For example: qoutcnt <- QOUTCNT == 5 Process 5 commands in interrupt handler 10 Commands complete in sequencer Set CMDOUTCNT to 0 should be 10 qoutcnt <- QOUTCNT == 10 Process 10 commands 8 Commands complete in sequencer OVERFLOW!!!! >> Above and beyond this, the code you wrote is inefficient. If you have >> good interrupt latency, you will pause the sequencer on every command >> completion. If you use the algorithm I mentioned initially, pause and >> clear the CMDOUTCNT value every fifodepth completions, you remove this >> race and also pause the sequencer as little as possible. > >OK...look at this: > >news kernel: aic7xxx: Command complete near Qfull count, qoutcnt = 16. repeats 56 times... >Now, tell me that we don't have high interrupt latency and that the >efficiency of that code is as bad as one might think. Okay. I'll tell you again. It's inefficient code. In the example you site, you're only able to fill the QOUTFIFO 56 times after performing how many transactions??? Probably a few hundred thousand on a busy news server if not more. I never said that you don't have high interrupt latency. What I said was that I don't have high interrupt latency, but of course, I don't run Linux. In my system, the hardware interrupt handler for the aic7xxx card simply removes the entry from the QOUTFIFO, sets a few status bits in the generic SCSI structure associated with this transaction and queues it to a software interrupt handler. >As I explained to Dan a few days ago in a private email, when I was messing >around with using a bottom half completion routine, I ran into two problems. > All of the bottom half and task queues are either run based upon the >scheduler, which we *can't* base our completion upon or we risk a deadlock >when the scheduler is blocked for a swap operation, or they are based on the >timer interrupt. The timer interrupt based completion routine had horrible >performance for char reads, namely because each and every read is small and >done sequentially, so the added overhead of waiting for a timer interrupt to >do completion processing was a killer. Now, in the standard isr routine, we >leave interrupts disabled the whole time, including during our completion >processing. Its a shame that Linux doesn't offer a decent software interrupt strategy, but that's not my problem. You should still be able to get decent latency for setting the CMDOUTCNT back to 0 if you clear the QOUTFIFO first, putting entries into a list, setting CMDOUTCNT to zero, then processing the entries on the list. You are probably getting into your interrupt handler plenty fast, but getting crushed by the overhead of generic SCSI processing at interrupt time. >The interrupt routine that produced the messages above was modified, it >enables interrupts during the completion processing. Our isr won't get >called re-entrantly due to the kernel irq mechanism, but it does allow other >interrupts to run during completion processing (so things like mouse >movement in X won't be so jerky during heavy load). The result of that, is >that our interrupt latency can actually get worse as our completion >processing may suffer intermittent interrupts, but we are generally speaking >being friendlier to the system. It's a tradeoff, we give ourselves, with >the spin lock in place, a little more latency since we already happen to >have a lot, in exchange, we reduce the amount of time we run with interrupts >off. Wow. I never knew that you used to run your interrupt handler with all other interrupts disabled. Don't your network servers drop packets like crazy when you do this? >The second reason I wrote it that way is because of this. Let's say your >code answers an interrupt with two commands on the QOUTFIFO, and p-> >cmdoutcnt == 12, then cmdoutcnt will get incremented to 14 while the >QOUTFIFO goes to zero. Now, if the next interrupt has a high latency, then >you may end up using that spin lock far before you ever reach the QOUTFIFO >depth since you didn't update the CMDOUTCNT variable during the last isr. >So, which is more inneficient, allowing a high latency interrupt to block >with only a command or two complete, or writing out the actual CMDOUTCNT on >each interrupt routine when we are already writing to the card? Keep in >mind the interrupt latency that we see sometimes. I'm fully aware that CMDOUTCNT does not directly track the current state of the FIFO. I wanted a lazy update as it means I only have to do a single write which can be done with AAP. In order for your algorithm to work, you have to perform a read and a write with the sequencer paused and having looked at what this does with a PCI bus analyzer, it's simply not worth it. >Also, who's to say the >reason you don't see messages about the QOUTCNT isn't due to this very >condition instead of interrupt latency? A better test to see if this >algorithm does what you want would be not to check and print a message about >the QOUTFIFO depth, but check to see if your sequencer is spin locking on >CMDOUTCNT and holding up the bus. Actually, I incremented a count in sequencer scratch ram for every time I hit the lock. Either every time I went to look it had wrapped to 0 or my lock was never hit. As I said before, you are probably getting into your interrupt handler plenty fast, it's just that your interrupt handler runs for a long time before you go back and clean out the queue. >***************************************************************************** >* Doug Ledford * Unix, Novell, Dos, Windows 3.x, * >* dledford@dialnet.net 873-DIAL * WfW, Windows 95 & NT Technician * >* PPP access $14.95/month ***************************************** >* Springfield, MO and surrounding * Usenet news, e-mail and shell account.* >* communities. Sign-up online at * Web page creation and hosting, other * >* 873-9000 V.34 * services available, call for info. * >***************************************************************************** -- Justin T. Gibbs =========================================== FreeBSD: Turning PCs into workstations ===========================================
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199709092242.QAA25554>