From owner-freebsd-hackers Sat May 9 11:40:56 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id LAA11088 for freebsd-hackers-outgoing; Sat, 9 May 1998 11:40:56 -0700 (PDT) (envelope-from owner-freebsd-hackers@FreeBSD.ORG) Received: from skynet.ctr.columbia.edu (skynet.ctr.columbia.edu [128.59.64.70]) by hub.freebsd.org (8.8.8/8.8.8) with SMTP id LAA11078; Sat, 9 May 1998 11:40:40 -0700 (PDT) (envelope-from wpaul@skynet.ctr.columbia.edu) Received: (from wpaul@localhost) by skynet.ctr.columbia.edu (8.6.12/8.6.9) id OAA08977; Sat, 9 May 1998 14:41:50 -0400 From: Bill Paul Message-Id: <199805091841.OAA08977@skynet.ctr.columbia.edu> Subject: Re: Call for testers for ThunderLAN ethernet driver To: se@FreeBSD.ORG (Stefan Esser) Date: Sat, 9 May 1998 14:41:48 -0400 (EDT) Cc: hackers@FreeBSD.ORG, cdillon@wolves.k12.mo.us, eivind@yes.no In-Reply-To: <19980509123721.29564@mi.uni-koeln.de> from "Stefan Esser" at May 9, 98 12:37:21 pm X-Mailer: ELM [version 2.4 PL24] Content-Type: text Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Of all the gin joints in all the towns in all the world, Stefan Esser had to walk into mine and say: > On 1998-05-05 11:27 -0400, Bill Paul wrote: > > The Prosignia server I have uses an NCR SCSI card in one of its PCI > > slots. Dmesg says: > > > > Probing for devices on PCI bus 0: > > chip0 rev 2 on pci0:0:0 > > vga0 rev 0 on pci0:11:0 > > ncr0 rev 4 int a irq 5 on pci0:12:0 > > (ncr0:0:0): WIDE SCSI (16 bit) enabled(ncr0:0:0): 10.0 MB/s (200 ns, offset 15) > > (ncr0:0:0): "COMPAQ WDE4360W 1.52" type 0 fixed SCSI 2 > > sd0(ncr0:0:0): Direct-Access > > sd0(ncr0:0:0): WIDE SCSI (16 bit) enabled > > sd0(ncr0:0:0): 40.0 MB/s (50 ns, offset 15) > > 4094MB (8386000 512 byte sectors) > > > tlc0 rev 16 int a irq 9 on pci0:16:0 > > tlc0: Ethernet address: 00:80:5f:7d:fb:b7 > > tl0 at tlc0 physical interface 1 > > tl0: 10/100Mbps full duplex autonegotiating > > tl0: autoneg complete, link status good (full-duplex, 100Mb/s) > > chip1 rev 1 on pci0:20:0 > > chip2 rev 0 on pci0:20:1 > > I've got some other Compaq machine at work which > can't be installed over the network currently for > lack of a NetFlex / ThunderLan driver. > > When are you going to commit the driver ? I'm trying to track down a bug in the receive list handling which is proving extremely elusive. The way I've written it, the chip DMAs received frames directly into the data areas of mbuf clusters. The problem with this is that you have to be very careful not to allow the chip to DMA into a cluster _after_ it's been freed. The way the ThunderLAN works, you give it a linked chain of 'list' structures which contain the physical addresses of the mbuf cluster buffers. Once a frame is received, the chip DMAs it into one of the clusters, then triggers a 'receive end of frame' interrupt to tell you it's complete. You can then hand the mbuf directly to ether_input(), but you also have to make sure to provide a new mbuf to the chip so that it doesn't somehow keep a reference to the cluster that was just used. Every once in a while, under conditions that I can't reliably reproduce, the chip DMAs to a buffer that's been put back on the free list. This corrupts its free list pointer and causes any subsequent cluster allocations off the free list to trigger a page fault (mcl_next points to garbage). The problem is that the page fault doesn't happen until well after the suspect cluster has been corrupted, so by the time the kernel panics the damage has already been done and much of the evidence has been destroyed. I've made several attempts to fix this but it keeps showing up. My latest driver version is on ftp.ctr.columbia.edu:/pub/misc/freebsd (thunderlan.tar.gz). I think the problem is in the way the 'end of receive channel' interrupts are being handled, but I'm having trouble reproducing the condition which makes testing difficult. I don't want to commit the code until I'm sure I've fixed this problem since it can crash the system without any warning and for no apparent reason. On the bright side, I have been able to test the code briefly with 3.0-current and it works as well as it does on 2.2.6 (except for the bug I just described). -Bill -- ============================================================================= -Bill Paul (212) 854-6020 | System Manager, Master of Unix-Fu Work: wpaul@ctr.columbia.edu | Center for Telecommunications Research Home: wpaul@skynet.ctr.columbia.edu | Columbia University, New York City ============================================================================= "It is not I who am crazy; it is I who am mad!" - Ren Hoek, "Space Madness" ============================================================================= To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message