Date: Tue, 12 Nov 1996 15:05:48 -0800 From: "Justin T. Gibbs" <gibbs@freefall.freebsd.org> To: "=?KOI8-R?Q?=E1=CE=C4=D2=C5=CA_=FE=C5=D2=CE=CF=D7?=" (Andrey A. Chernov) <ache@nagual.ru> Cc: current@freebsd.org, scsi@freebsd.org Subject: Re: SCB paging is most dangerous option now! Message-ID: <199611122305.PAA02805@freefall.freebsd.org> In-Reply-To: Your message of "Tue, 12 Nov 1996 18:41:16 %2B0300." <199611121541.SAA00746@nagual.ru>
next in thread | previous in thread | raw e-mail | index | archive | help
>> What were the error messages? >> > >They not stored anywhere now because it seems ANY disk write cause >immediately destruction of inode table including syslog writes. >As I remember there was something like: > >data overrun of XXXX bytes detected > >followed by various retraining/resetting failure attempts. >As I remember no one successfull write's happens. This sounds like a cache coherency bug with your motherboard. What kind is it? The reason I belive this to be the case is that: 1) SCB paging causes the same piece of memory to be DMA'ed in and out in rapid succession - much more often then in the non paging case. The amount of DMA will see a dramatic increase when you switch from 1 to two active targets. 2) After I saw your bug report last night, I again attempted to reproduce the error. I made my 2940 look as much like a 2842 as I could by making the driver believe that it only has 4 SCBs. After about 30 minutes of poinding my two disks with as many as 30 outstanding transactions at a time, I gave up. I will try again tonight with my aic7850 card (3 SCBs) as soon as I can rip the machine apart and rearange my disks. Now I don't have access to a Rev E board anywhere, and the driver does take advantage of undocumented features of that revision of the aic7770. I can send you a little snippet of code that can verify that the 1 important feature, being able to store full 8 bit values in the QIN and QOUTFIFO does work on your card without you having to turn on SCB paging. I don't believe this to be the case since 1 drive would not work at all either. If someone has either a 2742A(T) or 2842A that they'd like to send me, I may be able to debug this further. If it is DMA related, it should be easy to see that by playing with your cache settings and trying to reproduce the problem. If you are going to do this, attempt to repro it *only in single user mode*, with your filesystems mounted read only, by starting multiple processes acessing the disks. I have yet to lose any disk data with this kind of testing, and this will usually fail easily if the problem you are reporting still exists. If the system starts to go south, note what the error messages are and hit the reset button. Multiple dds (at least 8 to each drive) from the raw partitions of your disks to /dev/null will work nicely. -- Justin T. Gibbs =========================================== FreeBSD: Turning PCs into workstations ===========================================
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199611122305.PAA02805>