From owner-freebsd-net@FreeBSD.ORG Wed Sep 28 17:53:32 2005 Return-Path: X-Original-To: freebsd-net@freebsd.org Delivered-To: freebsd-net@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C9AE016A42A for ; Wed, 28 Sep 2005 17:53:32 +0000 (GMT) (envelope-from howells@kde.org) Received: from mail.devrandom.org.uk (host-84-9-223-82.bulldogdsl.com [84.9.223.82]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3A59343D4C for ; Wed, 28 Sep 2005 17:53:32 +0000 (GMT) (envelope-from howells@kde.org) Received: from localhost (localhost [127.0.0.1]) by mail.devrandom.org.uk (Postfix) with ESMTP id 0EFB6FD021 for ; Wed, 28 Sep 2005 18:53:27 +0100 (BST) Received: from mail.devrandom.org.uk ([127.0.0.1]) by localhost (mail.devrandom.org.uk [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 03401-10 for ; Wed, 28 Sep 2005 18:53:21 +0100 (BST) Received: from [192.168.1.167] (unknown [192.168.1.167]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.devrandom.org.uk (Postfix) with ESMTP id 9F802FD023 for ; Wed, 28 Sep 2005 18:53:21 +0100 (BST) From: Chris Howells Organization: K Desktop Environment To: freebsd-net@freebsd.org Date: Wed, 28 Sep 2005 18:52:48 +0100 User-Agent: KMail/1.8.50 References: <20050926142907.GI91328@cell.sick.ru> In-Reply-To: <20050926142907.GI91328@cell.sick.ru> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="nextPart1366180.WD7FVl4mH8"; protocol="application/pgp-signature"; micalg=pgp-sha1 Content-Transfer-Encoding: 7bit Message-Id: <200509281852.53335.howells@kde.org> X-Virus-Scanned: amavisd-new at devrandom.org.uk Subject: Re: em(4) receive part wedging randomly at moderate load X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 Sep 2005 17:53:33 -0000 --nextPart1366180.WD7FVl4mH8 Content-Type: text/plain; charset="koi8-r" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline On Monday 26 September 2005 15:29, Gleb Smirnoff wrote: > during last month we are experiencing a nasty problem with em(4) > driver. Several times a day the receive path of the driver wedges > for a minute or two. During wedge the transmit part works with > no problems. The latter fact makes this problem very nasty, because > the problematic router can't be backed up with help of CARP. This sounds very much like the problem I've been having. It affects two=20 machines, one runs 5.4-STABLE and one runs 4.11-STABLE. Both are Duron 1800= s=20 based on the Asus A7V8X motherboard. The card in the 4.11 machine is: em0: port=20 0xb000-0xb03f mem 0xf4800000-0xf481ffff,0xf5000000-0xf501ffff irq 11 at=20 device 13.0 on pci0 em0@pci0:13:0: class=3D0x020000 card=3D0x002e8086 chip=3D0x100e8086 rev=3D= 0x02=20 hdr=3D0x00 vendor =3D 'Intel Corporation' device =3D '82540EM Gigabit Ethernet Controller' class =3D network subclass =3D ethernet The card in the 5.4 machine is: em0: port=20 0x6400-0x643f mem 0xf0000000-0xf001ffff irq 3 at device 19.0 on pci0 em0@pci0:19:0: class=3D0x020000 card=3D0x10028086 chip=3D0x10268086 rev=3D= 0x04=20 hdr=3D0x00 vendor =3D 'Intel Corporation' device =3D '82545GM Gigabit Ethernet Controller' class =3D network subclass =3D ethernet > The box is serving 8 - 15 kpps, 70 - 100 MBps. It runs stateful pf(4) > firewall, with 50k - 80k states. The IP fastforwarding is enabled. The > average state insert/removal ratio is 300 states per second, however > sometimes several thousands of states can be removed in one pass. The > state removal locks the network code for quite a long time, so I guess > that wedge happens exactly when a lot of states are removed. The NIC > interrupts aren't serviced for some time and it wedges. Happens for me with no pf and serving a single client with samba and much=20 lower load -- only a few tens of KB a second. > The NIC is plugged in Cisco Catalyst 6509 gigabit ethernet port. No > errors are counted on switch port. Mine is a simple unmanaged SMC 5 port GigE switch. > To workaround the problem, I have made the following patch: Interesting, I'll give that a go.... > I am asking developers, who work in Intel, to pay attention to this > problem. Have you tried the em driver directly from intel? It can be found on the In= tel=20 web site. A few people on freebsd-stable are claiming that it works=20 perfectly. I have noticed that having something like this in sysctl.conf helps to redu= ce=20 the frequency of it happening: kern.ipc.somaxconn=3D1024 net.inet.udp.recvspace=3D65536 net.inet.tcp.sendspace=3D65536 net.inet.tcp.recvspace=3D65536 Though sadly it still does happen... =2D-=20 Cheers, Chris Howells -- chris@chrishowells.co.uk, howells@kde.org Web: http://www.chrishowells.co.uk, PGP ID: 0x33795A2C KDE/Qt/C++/PHP Developer: http://www.kde.org --nextPart1366180.WD7FVl4mH8 Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (FreeBSD) iD8DBQBDOth1F8Iu1zN5WiwRAoB3AJ9Y8ePvHQpIZka0AgFjdQAgvqnTyACgnNp2 jbXbStj9oAaPRQjXM2ElfIs= =gWgY -----END PGP SIGNATURE----- --nextPart1366180.WD7FVl4mH8--