From owner-freebsd-atm Wed Nov 12 17:23:58 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.7/8.8.7) id RAA28928 for atm-outgoing; Wed, 12 Nov 1997 17:23:58 -0800 (PST) (envelope-from owner-freebsd-atm) Received: from eden.dei.uc.pt (eden.dei.uc.pt [193.136.212.3]) by hub.freebsd.org (8.8.7/8.8.7) with SMTP id RAA28921 for ; Wed, 12 Nov 1997 17:23:50 -0800 (PST) (envelope-from aalves@dei.uc.pt) Received: from zorg.dei.uc.pt by eden.dei.uc.pt (5.65v3.2/1.1.10.5/28Jun97-0144PM) id AA01698; Thu, 13 Nov 1997 01:30:04 GMT Received: from localhost (aalves@localhost) by zorg.dei.uc.pt (8.8.5/8.8.5) with SMTP id BAA01437; Thu, 13 Nov 1997 01:24:06 GMT Date: Thu, 13 Nov 1997 01:24:06 +0000 (WET) From: Antonio Luis Alves To: Kenjiro Cho Cc: freebsd-atm@FreeBSD.ORG Subject: Re: ATM driver & udp stream In-Reply-To: <199711120445.NAA24120@hotaka.csl.sony.co.jp> Message-Id: Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-atm@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk On Wed, 12 Nov 1997, Kenjiro Cho wrote: > > >> All was working well until I made a UDP stream test. The sender was a > >> Pentium 233mmx and on the receive side a Pentium 166. The test is a normal > >> netperf UDP_STREAM with default values for the datagram, which is the max > >> udp datagram 9216. The driver crashes at the pdu receive routine when > >> accessing the segments which form the pdu. With the debugger I can see that > >> the mbuf handles are correct but the data on the mbuf is not. Some times > >> the m_type is MT_FREE and m->next as got a valid pointer on it, and other > >> times all the members of the mbuf are zero. On a normal situation the > >> mbufs would be of type MT_DATA, and correctly initialized with the > >> pointers to the external free and reference function, m_len, ext_size and > >> ext_buffer. > >> From the firmware guide it says the card never touches the mbuf, it just > >> pass the mbuf handle received from the supply routine to the segments on > >> the received pdu descriptor. > > It sounds familiar to me..., so you might want to take a look at the > following possibility. > > I had a similar experience with Chuck Cranor's ATM driver long time > ago. It turns out that, under heavy load, a circular list holding > pointers to mbufs gets completely full and wrapped around. As a > result, a wrong mbuf in use gets freed. The system crashes after a > while when it refers to the misplaced mbuf. > > IMO, it's a very bad idea to put a free list chain at the top of a > mbuf cluster data area when a mbuf is on the free list. It is very > hard to debug if something goes wrong. > After reading your mail, the first thing I thought was to put a periodic test to check the mbufs in the circular list of the buffer supply protocol. This way I probably could get more close to problem and try to trace it. This is the only place I can think of at this moment that could cause the problem if the mbufs were freed while still owned by the card. However, because the supply protocol used on the driver was not changed much from the original distribution which is running on other architectures for quite some time I also suspected that the problem could come from the kernel mbuf handling code which could under the heavy load cause the problem. Anyway, today I found that there was new reassembly code on ip_input.c on release 2.2.5, and because the problem only showed when there was reassembly at the ip layer I decided to give it a try and upgraded the Pentium166 machine to this release. It turns out that the driver does not crash anymore under 2.2.5 . I just made this a few hours ago, but the various tests I have made so far did not crash the machine. I rebooted the machine several times and made around 10 udp-stream tests each time without any problem. So it seems the old reassembly code was the key of the problem. However it is still not clear to me if there is a bug on the driver which shows only under heavy load with the old reassembly code, or if the old code was the responsible for the trash of the mbufs. I say this because I don't remember to see on hackers anyone reporting problems with the reassembly code. Right now I am going to upgrade all our machines to 2.2.5 and see if we will not get any problems in the next weeks. Antonio Alves