From owner-aic7xxx Fri Jul 9 10:50:59 1999 Delivered-To: aic7xxx@freebsd.org Received: from ronly.co.uk (rongw1.ronly.co.uk [194.126.70.65]) by hub.freebsd.org (Postfix) with ESMTP id 103BB14C3F for ; Fri, 9 Jul 1999 10:50:52 -0700 (PDT) (envelope-from nt@dataskill.co.uk) Received: by ronly.co.uk id m112enk-00013mC (Debian Smail-3.2 1996-Jul-4 #3); Fri, 9 Jul 1999 18:51:08 +0100 (BST) Received: from dsl1.dataskill.co.uk(10.1.1.1) by rongw1.ronly.co.uk via smap (V1.3) id sma025008; Fri Jul 9 18:51:01 1999 Received: from dataskill.co.uk ([10.1.1.51]) by dataskill.co.uk with esmtp id m112enE-000H4RC (Debian Smail-3.2 1996-Jul-4 #1); Fri, 9 Jul 1999 18:50:36 +0100 (BST) Message-ID: <37863674.C14847BC@dataskill.co.uk> Date: Fri, 09 Jul 1999 18:50:44 +0100 From: Nick Taylor Organization: Dataskill X-Mailer: Mozilla 4.5 [en] (X11; I; Linux 2.0.36 i686) X-Accept-Language: en MIME-Version: 1.0 To: Doug Ledford Cc: Stephan Loescher , AIC7xxx@FreeBSD.ORG Subject: 3940W / Kernel 2.0.36+ fix References: <37862400.921473D7@dataskill.co.uk> <378619DC.7E22CD80@redhat.com> Content-Type: multipart/alternative; boundary="------------03544987E7EA8D9080B6A618" Sender: owner-aic7xxx@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org --------------03544987E7EA8D9080B6A618 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Hi again Have tried disabling the MMAPIO as suggested and that seems to have done the trick :-) Many thanks. I don't know whether you feel it's worth building in some checks so that these 3940s don't break - The BIOS on mine are 1.24, maybe this has been superceded with one that does work. If there's any info that would help.... Nick --- Doug Ledford wrote: > Nick Taylor wrote: > > > > Hi > > > > I am still seeing 3940 problems, I get a similar lock up. For me it is as > > soon as I try to access 2 hds at the same time. > > > > However it appears that not all 3940Ws fail as some people have indicated > > that they are using them OK. > > > > My problem also appeared with kernel 2.0.36, 2.0.33 being OK. I am also > > convinced that something has been broken, but sadly am not a C hacker so > > don't know how this problem can be resolved. > > > > Nick > > --- > > > > Stephan Loescher wrote: > > > > > Hi! > > > > > > I have found a bug, that appeares first in the aic7xxx-Code in Linux > > > 2.0.34 (5.0.14/3.2.4) and is there up to recent 2.3.xx-kernels! The > > > aic7xxx-Code in Linux 2.0.33 (4.1.1/3.2.1) runs stable for me. > > > > > > The sympoms: > > > When I copy a lot of large files from my harddisk (IBM DCAS-34330W) to > > > my magneto-optical (MO) drive, then after some time (5 seconds to > > > several minutes) the Linux kernel stops. The system is freezed (locked > > > up) and the SCSI-bus led, the MO-led and the harddisk-led is lighting. I > > > canīt log into my system. Mouse and keyboard are "dead". > > > The source-and target-filesystems are ext2. > > > I can reproduce this behaviour. > > > I can copy files between all my harddisks without any error. > > > With kernel 2.0.33 there are no problems! > > > > > > I nailed it down with linux/Documentation/BUG-HUNTING to the > > > aic7xxx-Code, because when I replace the aic7xxx-files in 2.0.34 with > > > the files from 2.0.33, then the system runs stable. > > > > > > I have tried the following kernels: > > > 2.0.34 > > > 2.0.35 > > > 2.0.36 > > > 2.1.128 > > > 2.2.2 > > > 2.2.5 > > > 2.2.6 > > > 2.2.7 > > > 2.2.10 > > > 2.3.4 > > > (with and without all AC-patches) > > > > > > Also disabling all aic7xxx-features does not help. > > > I tried these options: > > > aic7xxx=verbose, aic7xxx=pci_parity, aic7xxx=verbose:0x1ffff > > > and disabled TAGGED_QUEUEING at all. > > > > > > To help you finding the bug, I tried all aic7xxx-patches for Linux > > > 2.0.33 from the last 4.x.x up to 5.0.13. The results are: > > > > > > 5.0.0 /3.2.2: OK > > > 5.0.1 /3.2.2: does not boot, seems _very_ unstable > > > 5.0.10/3.2.2: OK > > > 5.0.11/3.2.2: Makes endless SCSI-resets after issuing commands like > > > echo "scsi remove-single-device 0 0 1 0 " >/proc/scsi/scsi > > > 5.0.12/3.2.2: locks up the system as 5.0.14 does! > > > 5.0.13/3.2.2: locks up the system as 5.0.14 does! > > > > > > My system: > > > Pentium-200 (single-CPU) > > > SCSI-HA: Adaptec 3490U, Bios 1.24 > > > Channel A: > > > 0 : CD Sony CDU-76S > > > 1 : HD Seagate ST32430N > > > 3 : CDRW Yamaha CRW4416S 1.0f > > > 4 : Streamer Tandberg NS20 Pro > > > 5 : HD IBM DCAS-34330 > > > 6 : HD IBM DCAS-34330W > > > (End of SCSI-bus with active termination, and AHA with auto-termination.) > > > Channel B: > > > 0 : Olympus Deltis-MOS320 (MO) > > > 3 : HP ScanJet > > > (End of SCSI-bus with passive termination, and AHA with auto-termination.) > > > > > > What was changed in the aic7xxx-code after 5.0.10/3.2.2? > > > > > > What can I do to help you finding the bug? > > > > > > Stephan. > > OK, I must have missed this report somehow. Anyway, the big item of change > between the 5.0.10 and later versions is that all later versions default to > using MMAPIO instead of PIO. So, if you want to test things out, go into the > aic7xxx.c file, find the line that reads: > > #define MMAPIO > > and comment that line out then recompile. That should disable MMAPed I/O on > your system and that will let us know if your problem is related to > simultaneous I/O to different MMAP regions on the card. Note, there may be > more than one line of the #define MMAPIO in your source code, but assuming you > are using an Intel based machine, you need only find the one in the #ifdef > __i386__ block of code. You can ignore the other architectures. I would give > and exact line number, but depending on which patch you use this could be > greatly different as that section of code was in a state of flux during the > 5.0.10->14 days. > > -- > Doug Ledford > Opinions expressed are my own, but > they should be everybody's. > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe aic7xxx" in the body of the message -- Nick Taylor mailto://nt@dataskill.co.uk Dataskill, London, England mailto://webmaster@reflexology.org HOME OF REFLEXOLOGY http://www.reflexology.org --------------03544987E7EA8D9080B6A618 Content-Type: text/html; charset=us-ascii Content-Transfer-Encoding: 7bit Hi again

Have tried disabling the MMAPIO as suggested and that seems to have done the trick :-)  Many thanks.

I don't know whether you feel it's worth building in some checks so that these 3940s don't break - The BIOS on mine are 1.24, maybe this has been superceded with one that does work.

If there's any info that would help....

Nick
---

Doug Ledford wrote:

Nick Taylor wrote:
>
> Hi
>
> I am still seeing 3940 problems, I get a similar lock up. For me it is as
> soon as I try to access 2 hds at the same time.
>
> However it appears that not all 3940Ws fail as some people have indicated
> that they are using them OK.
>
> My problem also appeared with kernel 2.0.36, 2.0.33 being OK. I am also
> convinced that something has been broken, but sadly am not a C hacker so
> don't know how this problem can be resolved.
>
> Nick
> ---
>
> Stephan Loescher wrote:
>
> > Hi!
> >
> > I have found a bug, that appeares first in the aic7xxx-Code in Linux
> > 2.0.34 (5.0.14/3.2.4) and is there up to recent 2.3.xx-kernels! The
> > aic7xxx-Code in Linux 2.0.33 (4.1.1/3.2.1) runs stable for me.
> >
> > The sympoms:
> > When I copy a lot of large files from my harddisk (IBM DCAS-34330W) to
> > my magneto-optical (MO) drive, then after some time (5 seconds to
> > several minutes) the Linux kernel stops. The system is freezed (locked
> > up) and the SCSI-bus led, the MO-led and the harddisk-led is lighting. I
> > can´t log into my system. Mouse and keyboard are "dead".
> > The source-and target-filesystems are ext2.
> > I can reproduce this behaviour.
> > I can copy files between all my harddisks without any error.
> > With kernel 2.0.33 there are no problems!
> >
> > I nailed it down with linux/Documentation/BUG-HUNTING to the
> > aic7xxx-Code, because when I replace the aic7xxx-files in 2.0.34 with
> > the files from 2.0.33, then the system runs stable.
> >
> > I have tried the following kernels:
> > 2.0.34
> > 2.0.35
> > 2.0.36
> > 2.1.128
> > 2.2.2
> > 2.2.5
> > 2.2.6
> > 2.2.7
> > 2.2.10
> > 2.3.4
> > (with and without all AC-patches)
> >
> > Also disabling all aic7xxx-features does not help.
> > I tried these options:
> > aic7xxx=verbose, aic7xxx=pci_parity, aic7xxx=verbose:0x1ffff
> > and disabled TAGGED_QUEUEING at all.
> >
> > To help you finding the bug, I tried all aic7xxx-patches for Linux
> > 2.0.33 from the last 4.x.x up to 5.0.13. The results are:
> >
> > 5.0.0 /3.2.2: OK
> > 5.0.1 /3.2.2: does not boot, seems _very_ unstable
> > 5.0.10/3.2.2: OK
> > 5.0.11/3.2.2: Makes endless SCSI-resets after issuing commands like
> >               echo "scsi remove-single-device 0 0 1 0 " >/proc/scsi/scsi
> > 5.0.12/3.2.2: locks up the system as 5.0.14 does!
> > 5.0.13/3.2.2: locks up the system as 5.0.14 does!
> >
> > My system:
> > Pentium-200 (single-CPU)
> > SCSI-HA: Adaptec 3490U, Bios 1.24
> > Channel A:
> > 0 : CD Sony CDU-76S
> > 1 : HD Seagate ST32430N
> > 3 : CDRW Yamaha CRW4416S 1.0f
> > 4 : Streamer Tandberg NS20 Pro
> > 5 : HD IBM DCAS-34330
> > 6 : HD IBM DCAS-34330W
> > (End of SCSI-bus with active termination, and AHA with auto-termination.)
> > Channel B:
> > 0 : Olympus Deltis-MOS320 (MO)
> > 3 : HP ScanJet
> > (End of SCSI-bus with passive termination, and AHA with auto-termination.)
> >
> > What was changed in the aic7xxx-code after 5.0.10/3.2.2?
> >
> > What can I do to help you finding the bug?
> >
> > Stephan.

OK, I must have missed this report somehow.  Anyway, the big item of change
between the 5.0.10 and later versions is that all later versions default to
using MMAPIO instead of PIO.  So, if you want to test things out, go into the
aic7xxx.c file, find the line that reads:

#define MMAPIO

and comment that line out then recompile.  That should disable MMAPed I/O on
your system and that will let us know if your problem is related to
simultaneous I/O to different MMAP regions on the card.  Note, there may be
more than one line of the #define MMAPIO in your source code, but assuming you
are using an Intel based machine, you need only find the one in the #ifdef
__i386__ block of code.  You can ignore the other architectures.  I would give
and exact line number, but depending on which patch you use this could be
greatly different as that section of code was in a state of flux during the
5.0.10->14 days.

--
  Doug Ledford   <dledford@redhat.com>
   Opinions expressed are my own, but
      they should be everybody's.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe aic7xxx" in the body of the message

-- 
Nick Taylor   mailto://nt@dataskill.co.uk   Dataskill, London, England
mailto://webmaster@reflexology.org
HOME OF REFLEXOLOGY   http://www.reflexology.org
  --------------03544987E7EA8D9080B6A618-- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe aic7xxx" in the body of the message