Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 27 Jul 2001 18:52:05 +0200
From:      Martin Kraemer <Martin.Kraemer@Fujitsu-Siemens.com>
To:        Matt Dillon <dillon@earth.backplane.com>
Cc:        konecny@web.markiza.sk, freebsd-stable@FreeBSD.ORG, gibbs@FreeBSD.ORG
Subject:   Re: Continuing ahc problems - also cause fxp failure
Message-ID:  <20010727185205.A892@deejai2.mch.fsc.net>
In-Reply-To: <200107251712.f6PHCCx45487@earth.backplane.com>; from dillon@earth.backplane.com on Wed, Jul 25, 2001 at 10:12:12AM -0700
References:  <20010720105115.A80517@deejai2.mch.fsc.net> <200107251712.f6PHCCx45487@earth.backplane.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Jul 25, 2001 at 10:12:12AM -0700, Matt Dillon wrote:

>     Hmm.  Well, that last conversation seemed to come to a concensus
>     that a known thermal problem with a chip on my DELL motherboard
>     related to heavy use of the on-board adaptec and on-board
>     ethernet might have been the cause.  I replaced the motherboard
>     and moved away from the on-board ethernet (threw in another PCI
>     card), and the problem went away.
> 
>     I don't know if your problem below is the same problem or
>     a different problem.  It sounds like it may be a different
>     problem.

IMO it is quite different, as I changed the following parameters:

* opened the PC to allow free air circulation (*iff* that does anything)

* replaced the on-board 7880UW controller by a PCI AHA-2940UW card.
  While both offer the same functionality, and are made by the same
  manufacturer, they also share the same timeout problems.

In my first mail I said I had seen 4.2-STABLE work and 4.3-STABLE fail,
but that was not true: the old system was 4.2-RELEASE, and I noticed the
error for the first time with 4.3-RELEASE).

So I upgraded to 4.3-STABLE afterwards, no change. So I got the cvs
source tree of dev/aic7xxx/ to see the differences between 4.2-RELEASE
and 4.3-RELEASE. But the gratest change seems to be in the sequencer
code, about which I don't understand very much... In the source file
aic7xxx_freebsd.c (that's where the ahc_timeout() prints the messages)
I see that only little changed since 4.2-RELEASE: a detach routine
was added, but IMO it is only invoked then the device is released
completely. In aic7xxx.c, a LOT has changed.

Can the changes in the sequencer code be the reason for the
still re-occurring "lost interrupts" on higher load -- or what else
can be causing the timeout?

Or can the presence of a second (non-wide) 2940 which is used for my DAT
cause any problems of this kind?

Puzzled,

   Martin

On-board 7880:
 ahc0: <Adaptec aic7880 Ultra SCSI adapter> port 0xf800-0xf8ff mem 0xfedfb000-0xf
 edfbfff irq 9 at device 6.0 on pci0
 ahc0: Using left over BIOS settings
 aic7880: Wide Channel A, SCSI Id=15, 16/255 SCBs
 da0 at ahc0 bus 0 target 0 lun 0
 da0: <IBM DDYS-T18350N S92A> Fixed Direct Access SCSI-3 device 
 da0: 40.000MB/s transfers (20.000MHz, offset 8, 16bit), Tagged Queueing Enabled
 da0: 17501MB (35843670 512 byte sectors: 64H 32S/T 17501C)
 Mounting root from ufs:/dev/da0s1a
 da1 at ahc0 bus 0 target 1 lun 0
 da1: <WDIGTL WDE9100 1.30> Fixed Direct Access SCSI-2 device 
 da1: 40.000MB/s transfers (20.000MHz, offset 8, 16bit)
 da1: 8683MB (17783204 512 byte sectors: 64H 32S/T 8683C)
--
Errors from replacement 2940uw (same as with 7880 originally):
 18:54:28 deejai2 /kernel: (da1:ahc1:0:1:0): SCB 0xe - timed out while idle, SEQADDR == 0x177
 18:54:30 deejai2 /kernel: STACK == 0x17f, 0x189, 0x0, 0xe
 18:54:30 deejai2 /kernel: SXFRCTL0 == 0x80
 18:54:30 deejai2 /kernel: ahc1: Dumping Card State at SEQADDR 0x177
 18:54:31 deejai2 /kernel: SCSISEQ = 0x12, SBLKCTL = 0x2, SSTAT0 0x5
 18:54:31 deejai2 /kernel: SCB count = 140
 18:54:32 deejai2 /kernel: Kernel NEXTQSCB = 111
 18:54:32 deejai2 /kernel: Card NEXTQSCB = 14
 18:54:32 deejai2 /kernel: QINFIFO entries: 14 125 2 22 122 83 64 98 
 18:54:32 deejai2 /kernel: Waiting Queue entries: 
 18:54:32 deejai2 /kernel: Disconnected Queue entries: 
 18:54:32 deejai2 /kernel: QOUTFIFO entries: 
 18:54:32 deejai2 /kernel: Sequencer Free SCB List: 11 3 12 6 9 4 5 0 2 13 15 14 1 8 7 
 18:54:32 deejai2 /kernel: Pending list: 98 64 83 122 22 2 125 14 
 18:54:32 deejai2 /kernel: Kernel Free SCB list: 128 115 20 38 109 11 32 27 107 76 85 108 47 95 35 58 129 60 70 101 96 87 19 66 102 112 10 81 61 59 46 23 65 114 63 50 78 82 30 62 54 86 31 43 8 15 48 25 56 127 113 21 12 105 72 121 28 100 49 103 106 51 6 90 41 84 29 119 74 68 13 17 135 94 5 52 104 123 42 9 24 75 39 73 88 77 53 55 40 97 4 92 33 79 37 18 67 126 16 44 57 0 71 26 1 110 124 36 69 93 117 7 118 34 120 45 3 91 89 80 116 136 137 138 139 99 134 133 132 131 130 
 18:54:32 deejai2 /kernel: Untagged Q(1): 14 
 18:54:32 deejai2 /kernel: sg[0] - Addr 0x34b2000 : Length 1024
 18:54:32 deejai2 /kernel: (da1:ahc1:0:1:0): SCB 14: Immediate reset.  Flags = 0x6040
 18:54:32 deejai2 /kernel: (da1:ahc1:0:1:0): no longer in timeout, status = 34b
 18:54:32 deejai2 /kernel: ahc1: Issued Channel A Bus Reset. 8 SCBs aborted
-- 
<Martin.Kraemer@Fujitsu-Siemens.com>         |     Fujitsu Siemens
Fon: +49-89-636-46021, FAX: +49-89-636-41143 | 81730  Munich,  Germany

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20010727185205.A892>